Академический Документы
Профессиональный Документы
Культура Документы
FOR THE DEGREE OF BACHELOR OF SCIENCE In the School of Audio Engineering MIDDLESEX UNIVERSITY
JUNE 2013
ABSTRACT.
This dissertation adds to the research in post-production practices by using generative audio to digitally re-construct Foley stages. The rationale for combining generative audio with Foley processes is to analyse the possible implementation of new technology that could benefit from Foley practices in low-budget films. This research project also intersects sound synthesis, signal analysis and user interaction, where a behavioural analysis based on ground reaction forces was prototyped.
ACKNOWLEDGEMENT.
I would like to dedicate this dissertation to Andy J. Farnell whose expertise has really helped me immensely on my way to writing this essay. To Gillian McIver, Helena Hollis and Philippa Embley who never ceased in helping me right until the very end. A Dios por su intervencin divina en este logro Acadmico. A mi madre Zoraida Mndez y Abuela Bertha Daza por hacer the mi una mejor persona. A mis tas Martha y Bellky Mndez por su incondicional apoyo sin utedes nada hubiese sido possible.
ii
TABLE OF CONTENTS.
CHAPTER 1 INTRODUCTION. 1.1 SIGNIFICANCE OF THIS STUDY. 1.2 PROBLEM STATEMENT. 1.3 LAYOUT OF DISSERTATION. CHAPTER 2 GENERATIVE FOOTSTEP SOUNDS. 2.1 LITERATURE REVIEW. 2.1.1 INTRODUCTION. 2.1.2 SOUND TEXTURES. 2.1.3 DEFINITIONS AND PRINCIPLES OF GRANULAR SYNTHESIS. 2.1.4 STOCHASTIC ANALYSIS. 2.1.5 PROCEDURAL AUDIO IN RESPONSE TO FOOTSTEP MODELLING. 2.1.6 SUMMARY. 2.2 METHODOLOGY. 2.2.1 INTRODUCTION. 2.2.2 OBJECTIVES. 2.2.3 PARAMETERS. 2.2.3.1 The Grain Envelope Analysis. 2.2.3.2 The Grain Dynamics. 2.2.3.3 Footstep-modelling. 2.2.3.4 The Ground Reaction Force. 2.2.4 PROCEDURES. 2.2.4.1 Pure Data. 2.2.4.2 Arduino. 2.2.5 Architecture. 2.2.6 SUMMARY. 1 2 3 4 5 5 5 6 8 10 12 14 15 15 15 15 16 18 19 20 23 23 25 26 28
iii
CHAPTER 3 EVALUATION. 3.1 INTRODUCTION. 3.2. QUANTITATIVE DATA. 3.2.1 THE DATA COLLECTION METHOD. 3.2.2 RESEARCH FINDINGS. 3.2.2.1 STATISTICAL ANALYSIS. 3.2.2.2 T-Test. 3.2.2.3 Chi-Square. 3.2.3 THE RESULTS AND EVALUATION. 3.3 QUALITATIVE DATA. 3.3.1 DATA COLLECTION METHOD. 3.3.2 RESEARCH FINDINGS. 3.3.2.1 One-to-One Interview. 3.3.2.2 e-Interviewing. CHAPTER 4 CONCLUSION. APPENDICES. APPENDIX A. APPENDIX B. APPENDIX C. APPENDIX D. APPENDIX E. APPENDIX F. APPENDIX G. REFERENCES. BIBLIOGRAPHY.
29 29 30 31 33 35 38 39 40 41 41 43 43 44 46 48 48 49 52 53 53 54 55 56 59
iii
LIST OF TABLES.
Table 1: Average Quality. Table 2: Chi-Square. Table 3: Expected Values. 37 39 40
LIST OF FIGURES.
Figure 1: Sound Texture Extraction. Figure 2: Gaussian Window. Figure 3: Output list. Figure 4: Transient Detector. Figure 5: Grain Dynamics. Figure 6: GRF Exemplified. Figure 7: The Gait Phase. Figure 8: GRF Distribution in Pure Data. Figure 9: PD Environment. Figure 10: The Cloud. Figure 11: Code in Arduino. Figure 12: Architecture. Figure 13: Prototype. Figure 14: Polynomial Curves. Figure 15: Question 2. Figure 16: Question 3. Figure 17: T-Test in Excel. 6 9 16 17 18 21 22 23 24 24 25 26 27 28 33 35 39
iv
CHAPTER 1 INTRODUCTION.
This project will focus on the use of granular synthesis techniques for dynamically generated audio with a main emphasis on film post-production. In particular, footstep modelling will be studied extensively. The results will be compared with those obtained from previously recorded content including Foley and several location recordings. Creating dynamically generated audio, otherwise known as Procedural Audio (PA) is a practice that involves the process of using programmable sound structures. This allows the user to manipulate audio by establishing the input, the internal and output parameters to ultimately develop a non-repetitive and meaningful sound (Farnell, 2010). Different types of technology have called on a number of methods to attempt to provide a quick and efficient solution for audio, especially on interactive applications such as video games. Many of these sources and methods are discussed below, however it is beyond the scope of this study to try and resolve these issues once and for all. They will undoubtedly cause controversy and debate for many years to come. This work, on the other hand, aims to contribute to the existing evidence that should add to a better understanding of generated audio. This study will highlight the need to continue the research and development of new technology that will help to encompass generative audio.
could benefit from Foley practices in low-budget films. Throughout the last thirty years, customised libraries have been an essential part of post-production work. Recording assets have become an increasing commodity; a single library can easily compile over ten thousand individual samples. According to David Lewis Yewdall, it will literately take years to get to know a library (Yewdall, 2007). Having thousands of sounds collected has relatively simplified sound design; however sound libraries on their own, are nothing but an agglomeration of samples. Excellent editors can create very realistic and convincing sounds, but they will never sound as authentic as custom-recorded ones.
Despite new contributions to this concept being theoretical, a few implementations such as the Foley Automatic developed by Kees van den Doel, Paul G. Kry and Dinesh K. pai, have proven to deliver high-quality synthetic sound. The Foley Automatic is composed of a dynamics simulator, a graphics renderer and a audio modeller. Interactive audio depends upon world events where order and timing are not usually pre-determined. According to Farnell, the common principle, which makes audio interactive, is the need for user input. In an attempt to represent emotional qualities, sounds need to adapt to pull the mood of the user (Farnell, 2007). This project is based on Gavers foundation analysis and synthesis of sounds, which involves an iterative process of analysing recorded material and synthetising a duplicate on the basis of the analysis. As described by Gaver, the criteria for sound texture is based on conveying information about a given aspect of the event as opposed to being perceptibly identical to the original sound (Gaver, 1993). Nicolas Saint-Arnaud, defined sound texture as a constant long-term characteristic and attention span. A sound texture should exhibit similar characteristics over time. It can have local structure and randomness but the characteristics of the fine structure must remain constant on the large scale. A sound texture is characterized by its sustain Attention span is the maximum between events before they become distinct. High level characteristics must be exposed within the attention span of a few seconds (Saint-Arnaud, 1995). Different studies have broadly approached the question of how to perform a sound segmentation in order to create a sonic event
that resembles the original. However, no up to date applications for producing sound textures are available and it is still based on manually editing recorded sound material. An increasing number of analysis and synthesis of sound textures have been formulated in the past few years, where an intersection of many fields such as signal analysis, sound synthesis modelling information retrieval and computer graphics is notorious (Strobl, Eckel and Rochesso, 2006). In the context of footstep modelling, granular synthesis presents arguably the best approach; this research is therefore to study the principles of granular synthesis in an attempt to collect information that could lead to better-structured and concise sound model.
development of time frequency analysis, and set the starting point for granular synthesis. Roads who implemented granular sound processing in the digital domain has also made several contributions. In his book Microsound, he stated that sound can be considered as a succession of frames passing by at a rate too fast to be heard as discrete events; sounds can be broken down into a succession of events on a smaller time scale (Roads, 2001). For the purpose of this research project, the description provided by Gabor with a slight variation on the pure Gaussian curve (see figure 2) will be adopted (Farnell, 2010). A Tukey envelope, also known as the cosine-tapered window, will be used; this envelope attempts to smoothly set the waveform to zero at the boundaries, evolving from a rectangle to a Hannig envelope (Harris, 1978). It is useful to briefly consider the principles of granular synthesis and how these affect audio. According to Roads, (2001) a micro-acoustic event contains a waveform, typically between one thousandth of a second and one tenth of a second, shaped by an amplitude envelope. The components of any grain of sound approach the minimum perceivable time for duration, frequency and amplitude, creating time and frequency domain information. By combining grains over time, sonic atmospheres are created. However, granular synthesis requires a broader amount of control data, which is usually controlled by the user in global terms, leaving the synthesis algorithm to fill in the details.
Gabor (1946) observed that any signal could be expanded in terms of elementary acoustical quanta by a process, which includes time analysis. Grain envelopes and durations vary in a frequency-dependent manner. However, it is the waveform within the grain, which is the most important parameter, as it can vary from grain to grain or be a fixed wave throughout the grains duration. This implementation pointed out the biggest flaw of time granulation, a constant level mismatch at the beginning and end of every sampled grain, creating a micro-transient between grains and thus, resulting in a periodic clicking sound. More recent work has shown that when grain envelopes are overlapped, it creates a seamless cross-fade between them (Jones and Parks, 1988). Numerous generative audio content has been created and extensively developed using the principles of acoustical quanta, allowing sound designers to easily sample, synthesise and shape audio content; producing complex but controllable sounds with a relatively small Central Processing Unit (CPU) usage. According to Curtis, a grain generator is a basic digital synthesis instrument, which consists of a wavetable where amplitude is controlled by a Gaussian envelope. In this project, the global organisation of the grains will follow an asynchronous system, which means that the grains will be encapsulated in regions or clouds which are controlled by a stochastic or chaotic algorithm.
10
according to various parameters. Dynamic stochastic synthesis is a concept that has existed for the last fifty years, composers such as Xenakis, have speculated about the possibility of synthesising completely new sonic waveforms on the basis of probability (Harley, 2004). Xeneakis proposals to the usual method of sound synthesis take the form of five different strategies (Roads, 1996): 1. The direct use of probability distributions such as Gaussian and exponential. 2. Combining probability functions through multiplications. 3. Combining probability functions through addition (over time). 4. Using random variables of amplitude and time as functions of other variables. 5. Going to and fro between events using variables. Roads describes how the user could control the grain cloud by adjusting certain parameters, these include (Roads, 2001): 1. The start-time and duration. 2. The grains duration. 3. The density of grains per second. 4. The frequency band of the cloud. 5. The amplitude envelope of the cloud. 6. Their spatial dispersion. All these considerations will be tested and further explained in the oncoming subheadings, where the effects of different grain duration, densities and irregularities will be examined in more detail.
11
12
audio, which suggests that what is of greatest importance in procedural audio is the meaning we give to the input, internal states and output of the systems. Having taking this all into account, if programming is only a means by which one creates meaningful sound, where does the conflict lie? The problem with procedural audio is that there arent any sets, which contain the sound of a specific object and if there are, there is no way of searching for them. Farnell strongly believes that a better approach to producing sound requires more traditional mathematical approaches based on engineering and physics (Farnell, 2007). However, dynamically generated sound is not the answer to all these problems; there are plenty of areas where it fails to replace recorded sound, such as dialogues and music scores. Even though methods for research and development have been established, practical issues continue to affect the realism of dynamically generated sound. Sound designers, who have adapted their skills and learned new tools, are in the process of finding equilibrium between data and procedural models, which is not a fast or a complete process. Perhaps one of the greatest disadvantages of generated audio is that it still cannot encapsulate the significant sounds of life. Post-production sound effects seem to fall into the psychological rather than technical category, in most cases, they reveal through sound the acoustic landscape in which we live in. Associations of everyday sound play a decisive part in the language of sound imagery, but they can easily be confused. One of the reasons for this is that we often see without hearing (Balazs, 1949). According to Bela Balaz, the Hungarian-Jewish film critic, there is a very considerable difference between our visual and acoustic education. We are far more used to visual forms than sound
13
forms; this is because we have become accustomed to seeing and then hearing, making it rather difficult to draw conclusions about a concrete object just by listening to it. The relationship between visuals and sound will be furthered explained in chapter 3.2.2.1 Statistical Analysis. Sample based audio has proven to be successful, because its principle is to represent our acoustic world, however it is an impractical method as it fails to change in accordance to the visible source. On the other hand, a single procedural structure could accurately replace an entire sound library; the problem does not lie in its principles but in that it attempts to represent motifs associated with various situations in film rather than our acoustic world. Having generated a great deal of sample based audio, production companies have drastically changed our perception of sound through film, associating melodies and sound to specific objects or situations, making it particularly difficult for new content to take over.
2.1.6 Summary.
This literature review has studied the background and evolution of dynamically generated audio and has also analysed its evolution in parallel to developments in technology. It is clear that whilst procedural audio has many obvious advantages, its acceptance has been lower than expected and various reasons have been suggested to explain why this might be the case. This section has also briefly mentioned some footstep modelling followed by a critical analysis of the benefits and challenges of implementing procedural audio in post-production. The following section will present the methodology that will be used during this study.
14
2.2 Methodology.
2.2.1 Introduction.
In this chapter, the objectives, parameters and procedures used in this research project are explained; especially those involved in developing dynamically generated sound where the process for creating a footstep-modelling analysis will be explained.
2.2.2 Objectives.
The general objectives of this research project are: A review of the existent knowledge on sound textures and footstep modelling. To develop a method for the creation of dynamic sound textures. To incorporate the previously mentioned method in footstep sound modelling.
2.2.3 Parameters.
According to Yonathan Bard, models are designed to explain the relationships between quantities that can be measured independently (Bard, 1974). To understand these relationships a set of parameters need to be introduced.
15
The numbers shown in Figure 3 are expressed in milliseconds and are applied to mark the cut-off points between events. Significant sub events can sometimes be found within the events, for this reason the sample gets normalised, which makes the peak-to-peak transient recognition much more effective. This process, however, is strictly for events recognition and is not used as part of any playback. Thus, signal-to-noise ratio is not raised at any moment. Each particle noise event can be pitch-shifted, reversed, stretched and smoothed. In his analysis
16
of walking sounds, Cook suggested that in order to gel the sonic events, a short and exponentially decaying noise burst should be added, which has proven to be and exceptional addition to this algorithm. According to Curtis, time appears to be reversible in the quantum level, meaning that grains or events can be reversed in time. Moreover, if the grain envelope is symmetrical, the reversed event should sound exactly the same. In Pure Data (PD), this was easily achieved by simply reversing the output list, which turned out to be a success as it gave the sound texture a time-reversible feature. However, as the overall amplitude of the samples synthetised were not symmetric, it was impossible to demonstrate that the waveform of a grain and its reversed form were identical. Figure 4 shows the envelope analysis process.
17
18
2.2.3.3 Footstep-modelling.
This section describes how particles are extracted based on Physically Inspired Stochastic Event Modelling (PhISEM). According to Cook, who has extensively researched this area, the parameterisation of walking sounds should involve interaction, preferably provoked by friction or pressure from the feet. A stochastic approach, a non-deterministic sequence of random variables, models the probability that particles will make noise; sound probability is constant at each time step (Cook, 2002). Studies have shown the human ability to perceive source characteristics of a natural auditory event. From various analyses applied on walking sounds, a relationship between auditory events and acoustic structure was found. This study considered sounds of walking and running footstep sequences on different textures. Textures such as gravel, snow and grass were chosen, this was motivated by the assumption that a noisy and rich sound spectra will still be perceived by the ear as a natural sound. Studies carried out by Roberto Bresin, who has extensively studied new models for sound control, shown how a double support is created when both feet are on the ground at the same time, suggesting there are not any silent intervals between two adjacent steps. However, not specifying a time constrain between two particular events will blend them into a unison texture; therefore an Attention Span has to be created between steps, in order to perceive them as separate events (Saint-Arnaud, 1995). According to Bresin, Legato and Staccato can be associated to walking and running respectively. Some of his recent work has reported a strong connection between motion and music performance.
19
Having stated several parameters that directly influence walking sounds, it is evident that large libraries of pre-recorded sounds do not contain every possible scenario, which greatly compromises the sonic appreciation.
20
The swing phase consists of an initial swing, a mid-swing and a terminal swing (Porter, 2007). All these phases exert different forces making it incredibly hard to translate all of his micro-movements into sound. Farnell has proposed to analyse the gait phases not as individual events, but as a distribution of forces. As a result, three phases become apparent as shown in figure 7 (Farnell, 2010): 1. The Contact Phase: The heel makes contact with the ground and the ankle rotates the foot. 2. The Mid-stance Phase: The bodys weight is shifted onto the outer tarsal. 3. The Propulsive Phase: The foot rolls along the ground ending up on its toes.
21
(http://naturalrunningcenter.com/2012/06/21/walking-vs-running-gaits/) Ideally, each gait cycle would generate identical GRF distributions, however they can significantly change as the walking pace and level ground change. If this werent the case, two complete footsteps could be sufficient in generating a walking pattern. This introduces another variable, the movement of the body, which fluctuates above and below the sum of the left and right foots GRF. Andy J. Farnell, explained in his book Designing Sound, the three different modes of movement (Farnell, 2010): 1. Creeping: 2. Walking: Minimises pressure changes, which diminishes the sound. Maximises locomotion while minimising energy expenditure. 3. Running: Accelerates locomotion. Figure 8 exemplifies the Ground Reaction Force distribution of a gait phase, where the bodys weight is transferred onto the heel, sometimes before the weight is completely transferred, there is a transient force experienced just before the load response, surprisingly this force exceeds the normal standing force. The weights distribution between the heel coming down and the toe pushing off evens out just before the propulsive phase where the bodys weight is entirely on the feet.
22
2.2.4 Procedures.
This section describes the instruments and architecture involved in the creation of this research project. It aims to establish an efficient workflow that could later be implemented to future work. This section will also explain how diverse theories and models will be tested and how relevant data will be collected.
23
Figure 9: PD Environment.
24
2.2.4.2 Arduino.
In order to establish a more interactive communication between the user and the patch, a piezo-resistive force sensor was implemented (see figure 13). The prototyping platform Arduino UNO creates a link between PD and the presence sensor. When pressure is applied to the sensor, Arduino will receive the input of the analogue pin, which ranges from 0 to 1023 and it will then transmit the value to the object comport 9600 in PD. Figure 11 illustrates this process.
25
2.2.5 Architecture.
This footstep model has been inspired by Perry Cook and Andy J. Farnells approaches to walking sounds. Their investigations into parametrised synthesis, especially granularity, have been of great help. Figure 12 illustrates the signal flow of this prototype. Based on Roads idea of users control, this patch routes all the information to a common cloud (see figure 10) where the user can easily modify the dynamics of the grain, as well as the sensitivity of the feet sensors. All seven parameters mentioned in section 2.1.4 (see page 11) were taken into account when designing this patch. The sensors define the start time and duration of this process (1). The grain duration is specified by the option smooth, which divides its input into a 100ms window and adds it to the transients size (2). Seemingly the density of grains per second (grains/1000ms) is specified by the option grains (3). Two band-pass filters determine the frequency band of the cloud (4). An amplitude envelope and a freeverb~ (PD customs reverb) have also been incorporated, giving the user the option of custom-shape the signal before it
Figure 12: Architecture.
26
In order to accurately transcribe and digitise the sensors information, a split-phase and a polynomial curve have been incorporated. The split-phase converts the input given by the sensors into a signal that can be later scanned by the Phasor~ object in PD. It combines both feet and creates a time constrain between one another, defining an Attention Span fooling the ear into perceiving both inputs as separate events (Saint-Arnaud, 1995). The polynomial curve is defined by the equation (Farnell, 2010):
27
In order to evaluate the accuracy and precision of these methods external feedback will be collected, this will be explained further in the following chapter.
2.2.6 Summary.
This methodology has extensively analysed the existent knowledge on sound textures with granular synthesis in order to develop a method for the creation of dynamically generated textures (see page 16). It then proceeded to integrate the mentioned model to a footstep model created from the behavioural analysis conducted in section 2.2.2.3 (see page 19). It has also described the architecture of the prototype designed as part of this research. The following chapter will present the Evaluation Process that was used for this project.
28
CHAPTER 3 EVALUATION.
3.1 Introduction.
The evaluation process presented in this study uses a mixed method design. According to John W. Creswell, analysing both quantitative and qualitative data helps to understand the research problem thoroughly (Creswell, 2002). A mixed method design is based upon pragmatic statements, which accept the truth as a normative argument. Interesting opinions have been given regarding mixed methods, however the issue of distinguishing between aesthetic assumptions have not been addressed yet (Sale, Lohfeld, Brazil, 2002). This research project will use a sequential explanatory mixed methods design, according to Creswell (Creswell, 2002), this method is the most straightforward of the six major mixed method approaches, which is an advantage as it organises data more efficiently. This method collects and analyses quantitative data and then goes on to collect and analyse qualitative data. Keneth R. Howe, an educational researcher, stated that researchers should forge ahead only with what works. Following this statement, this study introduced three topics, in order to structure the design of this research project: Priority, Implementation and Integration (Creswell, Plano Clark, Guttman & Hanson, 2003).
29
a) b) c)
Which of these methods, quantitative or qualitative will be emphasised in this study? Will data collection come in sequence or in chronological stages? How will this data be integrated?
Special priority will be given to quantitative data leaving all qualitative results to assist the results obtained in the quantitative stage. For the purposes of efficiency, data was collected and integrated in chronological stages, which offered a more comprehensive information. and broader landscape of the gleaned
30
glass objects were used. One of the purposes of this survey was to demonstrate how synchronised sound textures could fool the ear into thinking that real footsteps are being played. In order to achieve this, a total of ten clips were played to an audience, where a mixture of Foley, location recording and generated sounds were given. A non-probability sampling approach was used for this research project, as it is not the purpose of this study to infer from the sample to the general population but to add to the knowledge of this study.
31
students (Audio and Film students). In total thirty individuals were given surveys. Based on Howes statement (see 3.1), the goals of the surveys were to identify what sound textures participants believed to be real. Two independent variables were introduced; these were Recorded and Generated Sounds, which were played at random to the participants. As mentioned above, a total of ten short-clips containing five different sound textures, were prepared for this survey. The first group of participants surveyed were mostly Audio students, a brief explanation explaining attention span and the layout of the audio, was given prior the survey. At first participants were asked to listen to just the audio of the short-clips. A fifteen second gap between clips was given, not only for them to draw their own conclusions (as an informal conversational interview) but also to allow their short-term memory to forget the sonic information, which they had gathered. According to Perry Miller, the duration of our short memory seems to be between fifteen and thirty seconds (Miller, 1956). This way, the average audio-visual span disappears from ones mind, allowing new data to be processed clearly. The second part of the survey combined both picture and sound. The structure of the survey (see Appendix A) contained three basic questions, which were aimed to investigate the participants relation to sound libraries. Both questions how you would rate the content and what you would look for in sound libraries, were an excellent start, which led to an open debate conducted after the survey. This offered even more data for discussion and research.
32
POOR 40%
33
However, the concept of poor is a very vague statement. Are the contents of these libraries poor in sonic quality? Or are they poor because they do not meet the users needs? In order to clarify this concept a follow up question was introduced, this was What do you look for in sound libraries? As shown in figure 16, Audio and Film students look for very different and specific material. 67% of the audio students surveyed specifically looked for ambience sounds, whereas 57% of the film students surveyed looked for Foley sounds. Many types of hypotheses can be drawn from this statement. The perception of sound in film goes far and beyond the pure physics of the sonic spectra. Throughout history, film producers have chosen to artificially construct the sound of their films (Gorbman, 1976). Advances in technology have expanded the creative possibilities of filmmakers and sound designers; the difference lies in how these sonic experiences are created. Based on the data collected, one could easily assume that film students have an internal approach to sound (Chion, 1990). Physiological sounds such as breathing and moans, or more subjective sounds such as a memory or a mental voice can easily be achieved by using Foley and ADR (Automated Dialogue Replacement) practices, which might explain why their main concern, when browsing through a sound library, are Foley sounds. On the other hand, audio students seek to describe the soundscape of the picture, either by recreating the sonic characteristics of the environment or by artificially creating a completely new sonic environment.
34
AMBIENC E 67%
FOLEY 57%
Another question arises from these two hypotheses. This is, how is the quality of such libraries perceived, if their contents are listen to as part of a group of sounds? This is a very important question as it strives to understand our perception of artificially constructed sound. Being able to recreate generative audio means nothing if it does not work in the context it was designed for. In order to understand this matter, the aforementioned footstep sounds (see 2.2.2.3) were played along with ambience sounds as well as different sound effects and dynamics. The results will be analysed in the next section.
35
population to be investigated is too large. Participants were therefore selected, based on their accessibility and proximity to the researcher. Although convenience sampling does not offer any guarantee of a representative sample, it collects basic data that could later be analysed or used as a pilot study (Gravetter, 2011 p 151). In order to ensure that each variable was evaluated to its best, they were examined one at a time, a series of visual displays were created to help explain the relationships between the variables examined in this study. A total of ten short clips were presented to the participants, to answer the question, where do you think unrealistic sounds have been placed? Participants were given a scale from one to five to rate clips realism. The films that were used for this experiment were: Terminator 2: Judgment day (1991) mixed by the American sound designer Garry Rydstrom, Pulp Fiction (1994) mixed by David Bartlett, Mon Oncle (1958) produced by Jacques Tati and Here (2013) produced as part of my portfolio. Clips one, three, four and five were re-mixed in order to introduce the footsteps generated by the patch developed. The purpose of this experiment was to determine what combination of sounds seemed the most realistic to the participant. The results of this experiment are shown in Appendix B. This research study conducted a T-Test and a Chi-squared test. The aim was to understand whether there was a significant difference between how participants rated the clips with generated sounds and how they rated the clips with recorded sounds. As noted in section 1.2 (see page 3) this dissertation aims to add to the research in Foley practices by using generative audio. It is not therefore a comparative analysis between recorded and generative audio. A combination of generated footsteps was presented to the participants in clips 1,
36
3, 4 and 5. Table 1 shows the average quality that the participants gave to generated and recorded audio respectively. PARTICIPANT 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 AVERAGE STDEV. GENERATED AUDIO 3.25 4 2.5 3.75 1 4.5 3.25 3.75 2.5 2.5 2.5 3 3.25 2.75 4 2.3 2.16 3.5 2.16 2.3 3 2 3.25 3 2.16 3.5 3 3 3.25 4 2.969333333 0.75013991 RECORDED AUDIO 2.16 2.5 3.5 3.5 2.16 3.5 3.16 2.83 2.5 2.3 2.5 2 2.5 4 2.75 3 4.5 3.25 3.75 2.75 2.5 2.3 2.16 3.5 3 3.25 2.75 3 2 3 2.885666667 0.619625489
37
As seen in table 1, it is possible to conclude that there is no statistical difference between the perceived quality of generated and recorded audio, this conclusion is based on their standard deviation values, which clearly shows that the average values from both parties overlap. In order to critically assess these values, a T-Test was conducted; which was aimed to understand how likely these differences were to be reliable.
3.2.2.2 T-Test
Null Hypothesis H0: (GA = RA). There is no discernible sonic difference between recorded audio and generated audio. Alternative Hypothesis H1: (GA < RA). Recorded audio possesses better sonic qualities. Therefore, there is a significant difference between recorded audio and generated audio. Alternative Hypothesis H2: (GA > RA). Generative audio possesses better sonic qualities. Therefore, there is a significant difference between recorded audio and generated audio. All data was computed using Microsoft Excel (figure 17). Additionally, this set of results were compared to those obtained at www.graphpad.com (see Appendix C), from where this research concluded that the two tailed probability (p) value of the data equalled 0.639. This probability value does not provide enough evidence to reject the Null Hypothesis (H0), as there is no evidence to prove that there is a significant difference between recorded and generated audio. However, this does not mean that the Null Hypothesis is true. A couple of conclusions can be drawn from this test:
38
The population surveyed could not discern between recorded and generated audio. An average of 3 (Good Quality) was given to the clips containing generative audio (See Appendix A).
3.2.2.3 Chi-Square
Null Hypothesis H0: (As = Fs) There is no difference between how Audio and Film students perceive audio quality.
Table 2: Chi-Square.
39
EXPECTED VALUES: DEPARTMENT AUDIO FILM GRAND TOTAL GENERATED AUDIO 2.987421501 2.951245166 5.938666667 RECORDED AUDIO 2.903245166 2.868088168 5.771333333 GRAND TOTAL 5.890666667 5.819333333 11.71
Table 3: Expected Values. The p value obtained from Excel was 0.895, which means that this project cannot reject the Null hypothesis and therefore, there is no difference in how audio and film students perceive sound. Moreover, the independent Chi-square values for Audio and Film students were 0.008607859 and 0.008713374 respectively, which are just below the critical value 0.05. This strongly highlights why this hypothesis cannot be rejected.
40
41
had the opportunity to arrange a Face to face interview with Andy Farnell. This fifteen-minute in-depth interview available to listen to online at www.juliantellez.com/interactiveaudio/Farnell.wav. The purpose of this interview was to gain further knowledge in the efficiency, design and implementation of generative audio. Five conversational questions were introduced to Farnell; not only did he give a clear insight of all the aforementioned discussed topics, but he also shared his perspectives with regards to the needs of audio and film. In order to assist with the results obtained by the survey, a couple of interviews were conducted. The structure of these standardised, open-ended interviews included five questions where the content was grounded in the results of the statistical analysis, which was extracted from the survey. The participants A. J. Farnell and Gillian McIver a Canadian filmmaker, writer and visual artist were interviewed using a standardised interview approach to ensure that the same general areas of information were collected from both of them. Additionally, Paul Groom, Alessandro Ugo and Daria Fissoun (Film specialists) were also contacted. As described by Sharan B. Merriam (Merriam, 1998), in regards to qualitative data, collection and analysis occurred simultaneously. According to McNamara (McNamara, 2008), there is potentially a lack of consistency in the way questions are posed, meaning that respondents may or may not be answering the same questions. For this reason, the interview was conducted via e-mail, not only to the ensure consistency between them but also to make it easier for the participants to analyse the questions, allowing them to contribute as much detailed information as they desired.
42
43
examples from Farnells textbook implementing Supercollider instead of PD. Farnell stresses that although implementation is exchangeable, there is still a huge gap between the design and the users implementation. Physically controlled implementation, as proposed by Farnell, is the best way to research this issue. In answer to the question, do you think generative sound could potentially meet the needs of the film industry? Farnell introduced a very interesting analogy, where he related generative sounds as the beginning of a more sophisticated approach to audio. I think in the next ten years you will have a CGA (Computer Generated Audio) in Hollywood CGA is much more powerful than CGI (Computer Generated Imagery) because there is a spectrum where they can be mixed with traditional techniques Most people wont know the difference between generated and recorded audio (Farnell, 2013). Personally, I have found this interview, especially the aforementioned analogy to be very inspiring, I believe that it is possible to restructure the post-production workflow by analysing and designing the sound of a particular location stage so that one could be able to use the sounds created by performers at any location.
3.3.2.2 e-Interviewing.
The email interviewing turned out to be more flexible, convenient and less obtrusive than a conventional interview. However, as it took a lot longer than the previous discussions and interviews, only the information provided by Gillian McIver will be analysed (Appendix D). The questions were introduced generically in order to get more objective answers. The rationale to this stems from a short discussion I had with some film
44
students where they expressed discontent with audio, especially sound libraries. In answer to the question why is it that film-audio is secondary in the film industry? McIver outlined that the problem does not lie in the industry but in education, mentioning that there was a clear division between both departments, so if the problem lies in education, how can both parties overcome difficulties such as correct audio replacement and authentic sonic representations? Just like a DSP fills the gap between expertise and interaction, I believe that there is a gap where the expertise of signal processing can meet the production needs by means of interaction. When asked about the emphasis the film industry puts on the creation of sound technology, McIver replied: Most do not think about it judging by this answer, one could conclude that If any sound technology that is aimed at the film industry were to be developed in the near future, it would have to be embedded and more importantly interactive and user-friendly.
45
CHAPTER 4 CONCLUSION.
The techniques used for the generation and control of grain signals were studied extensively throughout this research project. A special emphasis was placed on structuring a footstep model that enabled an instant interaction between the user and the DSP. It encompassed some of the studies carried out by A. J. Farnell, P. R. Cook and R. Bresin. In adherence to these studies, a process of evaluation and testing was also conducted alongside the footstep method, formulated in this research project. It was a compelling effort to promote generative audio in the postproduction industry. The analysis of sound synthesis with procedural audio was reviewed in great detail, where different approaches for the creation of sound textures were highlighted. Consequently, it defined the evolution and development of generated audio in response to sound modelling. This was achieved by structuring the associations between everyday sounds and sound imagery. The criteria used for this project conveys information that characterises an individual sound by the force, that the body exerts upon it (Gaver, 1993). From the evidence given by the aforementioned authors, a study of the background and evolution of dynamically generated audio was collected; this outlined its advantages and drawbacks. A complete separation between contact objects and interaction was achieved. The main findings create an intersection between sound
46
synthesis (see section 2.1.3, p 8), signal analysis (see section 2.1.5, p 12) and users interaction (see section 2.2.5, p 26) (Strobl, Eckel and Rochesso, 2006). Additionally, an evaluation phase was introduced, where several statistical tests were conducted in order to corroborate the information stated (see section 3.2.2.1, p 35). As noted at the end of sections 3.3.2.1 and 3.3.2.2 (p 43-44), sound technology has an enormous potential, which will most certainly be explored in years to come. Recent advances have placed sound technology in a very prominent position allowing for efficient interaction and productivity. As far as footstep-modelling goes, there are endless possibilities (in terms of sound textures) where further studies can be conducted. I have extensively emphasised the importance of user interactivity throughout this research project. By adding GRF recognition, this study has overcome this issue, allowing the patch to identify the users gait characteristics (see section 2.2.3.4, p 20). However, it is still a prototype and some adjustments will be made in the near future. The principles of GRF apply to every mass on Earth; it would certainly be interesting to recreate any sound by simply extracting sound textures from the environment (Gaver, 1993). This piece of work is intended to promote the use of generated audio in the film industry. As discovered, there are numerous applications for these methods within the post-production sector. However, further research and study is necessary in order to make generated audio a standard practice.
47
APPENDICES.
Appendix A.
Survey.
29 May 2013 London, U.K.
th
Footstep synthesis.
Please take a moment to analyse the clips.. When youre done, please answer the following questions: ABOUT YOU.
How would you rate the content of these libraries? Consistent high quality.
No.
ABOUT THE CLIPS. Please rate the clips on a scale from 1 to 5: (1) poor, (2) fair, (3) good, (4) very good, (5) outstanding.
CLIP 1 2 3 4 5 6 7 8 9 10
48
49
50
51
Appendix C.
52
Appendix D.
Appendix E.
Transients representation patch:
53
Appendix E.
54
Appendix E.
55
REFERENCES.
Ament, V (2009). The Foley Grail: The Art of Performing Sound for Film, Games, and Animation. Oxford: Focal Press. Balazs, B (1949). Theory of Film: Sound. London: Dennis Dobson Ltd. Bard, Y. (1974). Nonlinear Parameter Estimation. New York Academic Press. Chion, M (1990). Audio Vision. New Jersey: Columbia University Press. Cook, P (2002). Real Sound Synthesis for Interactive Applications. Massachusetts: AK Peters, Ltd. Creswell, J.W (2002). Reseach Design: Qualitative, Quantitative and Mixed Methods Approaches. New York: SAGE Publications Ltd. Creswell, J.W., Plano Clark, V. & Hanson, W (2003). Advanced Mixed Methods Research Design. Thousand Oaks: SAGE Publications Ltd. Farnell, A. (2007). Marching Onwards: Procedural Synthetic Footsteps for Video Games and Animation. Proceedings of the Pure Data convention. Farnell, A (2010). Designing Sound. London: MIT Press. Gabor, D. (1946). Theory of communication. Journal of the Institute of Electrical Engineers 3, (93), 429-457. Gabor, D. (1947). Acoustical quanta and the theory of hearing. Nature. 591- 594. Gaver, W.. (1993). How Do We Hear in the World?: Explorations in Ecological Acoustics. Ecological Psychology. 5 (4), 292-297. Gorbman, C (1976). Teaching the Soundtrack. Quarterly Review of Film and Video. Gravetter, F. J., and Wallnau, L. B. (2011). Essentials of Statistics for the Behavioral Sciences (7th Edition). Belmont, CA: Thomson/Wadsworth.
56
Harris, F. (1978). On the Use of Windows for Harmonic Analysis with the Discrete Furier Transform. Proceedings of the IEEE. Harley, J (2004). Xenakis: His Life in Music. New York: Routledge. 215-218. Hennink, M., Hutter, I. & Bailey, A (2011). Qualitative Research Methods. New Jersey: SAGE Publications Ltd. Javarlainen, H (2000). Algorithmic musical composition. Helsinki University of Technology, TiK-111080 Seminar on content creation Jones, D. & Parks, T. (1988). Generation and Combination of Grains for Music Synhthesis. Computer Music Journal. 12 (2), 27-34. McNamara, C . (2008). General Guidelines for Conducting Interviews.Available: http://managementhelp.org/businessresearch/interviews.htm. Last accessed 8th May 2013. Merriam, S. B. (1998). Qualitative research and case study applications in education. San Francisco: Jossey-Bass. Miller, G.A (1956). The Magical Number Seven, Plus or Minus Two: Some Limitis on our Capacity for Processing Information. The Psychological Review. Moray, N. (1959). Attention in dichotic listening: Affective cues and the influence of instructions. Quarterly Journal of Experimental Psychology. 11, 56-60. Mott, R (1990). Sound Effects, Radio TV and Film. Boston: Focal Press. Newton, Sir I., Motte, A. & Machin, J (2010). The Mathematical Principles of Natural Philosophy, Volume 1. Carolaina Charleston: Nabu Press. Porter, D. & Schon, L (2007). Baxter's The Foot and Ankle in Sport. 2nd ed. Missouri: Mosby. Roads, C (2001). Microsound. London: MIT Press. 85-118. Saint-Arnaud, N. (1991). Classification of Sound Textures. Mater of Science in Telecommunications. Universite Laval, Quebec. Sale, J., Lohfeld, L. & Brazil, K (2002). Revisiting the Quantitative-Qualitative Debate: Implications for Mixed-Methods Research. Netherlands: Kluwer Academic Publishers. Strobl, G., Eckel, G. & Rocchesso, D. (2006). Sound Texture Modelling: A Survey. Proceedings of the Sound and Music Computing Conference.
57
Yewdall, D (2011). The Practical Art of Motion Picture Sound. 4th ed. Oxford: Focal Press. Wood, N & Cowan, N. (1995). The Cocktail Party Phenomenon Revisited: How Frequent Are Attention Shifts to One's Name in a Irrelevant Auditory Channel?. Journal of Experimental Psychology: Learning, Memory and Cognition. 21 (1), 225-260.
58
BIBLIOGRAPHY.
Ament, V (2009). The Foley Grail: The Art of Performing Sound for Film, Games, and Animation. Oxford: Focal Press. Balazs, B (1949). Theory of Film: Sound. London: Dennis Dobson Ltd. Bard, Y. (1974). Nonlinear Parameter Estimation. New York Academic Press. Bresin, R., Fridberg, A. & Dahl, S. (2001). Toward a New Model for Sound Control. Proceedings of the COST G-6 Conference on Digital Audio Effects. Bresin, R. & Fontana, F. (2003). Physics-Based Sound Synthesis and Control: Crhusing, Walking and Running by Crumpling Sounds.Proceedings of the XIV Colloquium on Musical Informatics. Chion, M (1990). Audio Vision. New Jersey: Columbia University Press. Cook, P (1999). Toward Physically-Informed Parametric Synthesis of Sound Effects. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics. Cook, P (2002). Real Sound Synthesis for Interactive Applications. Massachusetts: AK Peters, Ltd. Cook, P. (2002). Modeling Bill's Gait: Analysis and Parametric Synthesis of Walking Sounds. Audio Engr. Society 22 Conference. 1-3. Creswell, J.W (2002). Reseach Design: Qualitative, Quantitative and Mixed Methods Approaches. New York: SAGE Publications Ltd. Creswell, J.W., Plano Clark, V. & Hanson, W (2003). Advanced Mixed Methods Research Design. Thousand Oaks: SAGE Publications Ltd. Dahl, S. (2000). The playing of an accent: Preliminary observations from temporal and kinematic analysis of percussionists. Journal of New Music Research. 29 (3), 225-234. Dannenberg, R. & Derenyi, I. (1998). Combining Instrument and Performance
59
Music
Synthesis.
Carnegie
Mellon
University,
Farnell, A. (2007). Marching Onwards: Procedural Synthetic Footsteps for Video Games and Animation. Proceedings of the Pure Data convention. Farnell, A (2010). Designing Sound. London: MIT Press. Forrester, M. (2006). Auditory Perception and Sound as Event: Theorising Sound Imagery in Psychology. Available: http://www.kent.ac.uk/arts/sound-journal/index.html. Last accessed 8th May 2013. Gabor, D. (1946). Theory of communication. Journal of the Institute of Electrical Engineers 3, (93), 429-457. Gabor, D. (1947). Acoustical quanta and the theory of hearing. Nature. 591- 594. Gaver, W.. (1993). How Do We Hear in the World?: Explorations in Ecological Acoustics. Ecological Psychology. 5 (4), 292-297. Gorbman, C (1976). Teaching the Soundtrack. Quarterly Review of Film and Video. Gravetter, F. J., and Wallnau, L. B. (2011). Essentials of Statistics for the Behavioral Sciences (7th Edition). Belmont, CA: Thomson/Wadsworth. Hahn, J., Geigel, J., Gritz. L., Takala, T. & Mishra, S . (1995). An Integrated Approach to Audio and Motion. Journal of Visualization and Computer Animation. 6 (2), 109-129. Harris, F. (1978). On the Use of Windows for Harmonic Analysis with the Discrete Furier Transform. Proceedings of the IEEE. Harley, J (2004). Xenakis: His Life in Music. New York: Routledge. 215-218. Hennink, M., Hutter, I. & Bailey, A (2011). Qualitative Research Methods. New Jersey: SAGE Publications Ltd. Howe, K.R (1988). Against the Quantitative-Qualitative Incompatibility Thesis or dogmas Die Hard. Educational Researcher. Javarlainen, H (2000). Algorithmic musical composition. Helsinki University of Technology, TiK-111080 Seminar on content creation.
60
Jenkins, J. & Ellis, C. (2007). Using Ground Reaction Forces from Gait Analysis: Body Mass as a Week Biometric. Fith International Conference on Pervasive Computing. Jones, D. & Parks, T. (1988). Generation and Combination of Grains for Music Synhthesis. Computer Music Journal. 12 (2), 27-34. Lostchocolatelab. (2010). Audio Implementation Greats No 8: Procedural Audio Now. Available: http://designingsound.org/2010/09/audio-implementation-greats-8-procedural-audi o-now/. Last accessed 8th May 2013. McNamara, C . (2008). General Guidelines for Conducting Interviews.Available: http://managementhelp.org/businessresearch/interviews.htm. Last accessed 8th May 2013. Merriam, S. B. (1998). Qualitative research and case study applications in education. San Francisco: Jossey-Bass. Miller, G.A (1956). The Magical Number Seven, Plus or Minus Two: Some Limitis on our Capacity for Processing Information. The Psychological Review. Milicevic, M. (2008). Film Sound Beyond Reality: Subjective Sound In Narrative Cinema. Available: http://filmsound.org/articles/beyond.htm#pet5. Last accessed 8th May 2013. Moray, N. (1959). Attention in dichotic listening: Affective cues and the influence of instructions. Quarterly Journal of Experimental Psychology. 11, 56-60. Mott, R (1990). Sound Effects, Radio TV and Film. Boston: Focal Press. Newton, Sir I., Motte, A. & Machin, J (2010). The Mathematical Principles of Natural Philosophy, Volume 1. Carolaina Charleston: Nabu Press. Nordahl, R., Serafin, S. & Turchet, L. (2009). Extraction of Ground Reaction Forces for Real Time Synthesis of Walking Sounds. Proceeding Audio Mostly Conference. Nordahl, R., Serafin, S. & Turchet, L (2010). Sound Synthesis and Evaluation of Interactive Footstep for Virtual Reality Applications. Porter, D. & Schon, L (2007). Baxter's The Foot and Ankle in Sport. 2nd ed. Missouri: Mosby. O' Brien, J., Cook, P., Essl, G. (2001). Synthesising Sounds from Physically Based Motion. Computer Graphics Proceeedings, Annual Conference Series.
61
Roads, C (1996). The Computer Music Tutorial. Massachusetts: MIT Press. 338-342. Roads, C. (1988). Introduction to Granular Synthesis. Computer Music Journal. 12 (2), 11-13. Roads, C (2001). Microsound. London: MIT Press. 85-118. Robson, C (2002). Real World Research: A Resource for Social Scientists and Practitioner-Researchers. 2nd ed. New Jersey: Willey. Rowe, R (1993). Interactive Music Systems: Machine Listening and Composing. Cambridge: MIT Press. Rowe, R. (1999). The Aesthetics of Interactive Music Systems.Contemporary Music Review. 18 (3), 83-87. Saint-Arnaud, N. (1991). Classification of Sound Textures. Mater of Science in Telecommunications. Universite Laval, Quebec. Sale, J., Lohfeld, L. & Brazil, K (2002). Revisiting the Quantitative-Qualitative Debate: Implications for Mixed-Methods Research. Netherlands: Kluwer Academic Publishers. Strobl, G., Eckel, G. & Rocchesso, D. (2006). Sound Texture Modelling: A Survey. Proceedings of the Sound and Music Computing Conference. Strobl, G. (2007). Parametric Sound Texture Generator. Graz University, Styria. Turchet, L. & Serafin, S. (2011). A Preliminary Study on Sound Delivery Methods for Footstep Sounds. Proceeding of the 14th International Conference on Digital Audio Effects. Turner, D.. (2010). Qualitative Interview Design: A Practical Guide For Novice Investigators. The Qualitative Report. 15, 754-760. Truax, B (1993). Time-Shifting and Transposition of Sampled Sound With a Real-Time Granulation Technique. Proceedings of the International Computer Music Conference. Yewdall, D (2011). The Practical Art of Motion Picture Sound. 4th ed. Oxford: Focal Press.
62
Wood, N & Cowan, N. (1995). The Cocktail Party Phenomenon Revisited: How Frequent Are Attention Shifts to One's Name in a Irrelevant Auditory Channel?. Journal of Experimental Psychology: Learning, Memory and Cognition. 21 (1), 225-260.
63