Вы находитесь на странице: 1из 546

SCHOOL OF AUDIO ENGINEERING

Diploma in Audio Engineering

Course Notes

1
TABLE OF CONTENTS

Table of Contents.............................................................................................................................................. 2

AE01 – Sound Theory ...................................................................................................................................... 6

AE02 – Music Theory I .................................................................................................................................. 34

AE03 – Analog Tape Machines..................................................................................................................... 59

AE04 – The Decibel ....................................................................................................................................... 70

AE05 – Basic Electronics............................................................................................................................... 88

AE06 – Audio Lines and Patchbays............................................................................................................ 101

AE07 – Analog Mixing Consoles................................................................................................................ 114

AE08 – Signal Processors ............................................................................................................................ 135

AE09 – Microphones.................................................................................................................................... 160

AE11 – Digital Technology ......................................................................................................................... 168

AE12 – Computer Fundamentals................................................................................................................. 182

AE13 – Music Theory II............................................................................................................................... 197

AE14 – MIDI ................................................................................................................................................ 229

AE15 – Digital Recording Formats ............................................................................................................. 263

AE16 Digital Audio Workstations............................................................................................................... 293

AE17 – Mastering for Audio and Multimedia ............................................................................................ 302

AE18 – Synchronization and Timecode...................................................................................................... 318

AE19 Professional Recording studios ......................................................................................................... 332

AE20 – Audio Postproduction for Video .................................................................................................... 335

AE21 – Sound For TV and Film.................................................................................................................. 347

AE22 – Multimedia Overview..................................................................................................................... 366

2
AE23 – Internet Audio ................................................................................................................................. 379

AE24 – Loudspeakers and Amplifiers......................................................................................................... 405

AE25 – Live Sound Reinforcement............................................................................................................. 434

AE27 – Acoustical Climate.......................................................................................................................... 447

AE28 – 3D Sound ......................................................................................................................................... 487

AE29 – The Broadcast Concept................................................................................................................... 504

AE30 – Music Business................................................................................................................................ 517

3
AE01 – SOUND THEORY

Fundamentals of Sound

1. The Elements Of Communication

2. The Sound Pressure Wave

3. Speed of sound wave

3.1 Particle velocity:

4. Amplitude/Loudness/ Volume/Gain

5. Frequency

5.1 Frequency Spectrum

5.2 Pitch:

5.3 Wavelength:

5.4 Period

5.5 Phase

5.6 Phase Shift

6. Difference between musical sound and noise

6.1 Harmonic content:

6.2 Timbre:

6.3 Octaves

7. Waveforms Types:

8. Wave shape

9. Acoustic Envelop:

THE HUMAN EAR

1. The Outer Ear

2. The Middle Ear

3. The Inner Ear

4
4. Neural Processing

5. The Ear and Frequency Perception

5.1 Critical Bandwidth and Beats

6. Frequency range and pressure sensitivity of the ear

7. Noise-induced hearing loss

7.1 A loss of hearing sensitivity

7.2 A loss of hearing acuity

8. Protecting your hearing

9. Perception of sound source direction

9.1 Interaural time difference (ITD)

9.2 Interaural intensity difference (IID)

9.3 Pinnae and head movement effects

9.4 The Haas effect

10. Ear Training

10.1 Listening to Music

10.2 Listening with Microphones

10.3 Listening in Foldback and Mixdown

5
SCHOOL OF AUDIO ENGINEERING AE01– Fundamentals of Sound

Student Notes

AE01 – SOUND THEORY

Fundamentals of Sound

1. The Elements Of Communication

Communication: transfer of information from a source or stimulus through a medium to a


reception point. The medium through which the information travels can be air, water, space or
solid objects. Information that is carried through all natural media takes the form of waves -
repeating patterns that oscillate back and forth. E.g. light, sound, electricity radio and TV
waves.

Stimulus: A medium must be stimulated in order for waves of information to be generated in


it. A stimulus produces energy, which radiates outwards from the source in all directions. The
sun and an electric light bulb produce light energy. A speaker, a vibrating guitar string or tuning
fork and the voice are sound sources, which produce sound energy waves.

Medium: A medium is something intermediate or in the middle. In an exchange of


communication the medium lies between the stimulus and the receptor. The medium transmits
the waves generated by the stimulus and delivers these waves to the receptor. In acoustic
sound transmission, the primary medium is air. In electronic sound transmission the medium is
an electric circuit, Sound waves will not travel through space although light will. In space
no-one can hear you scream.

Reception/Perception: A receptor must be capable of responding to the waves being


transmitted through the medium in order for information to be perceived. The receptor must be
physically configured to sympathetically tune in to the types of waves it receives. An ear or a
microphone is tuned in to sound waves. An eye or a camera is tuned in to light waves. Our
senses respond to the properties or characteristics of waves such as frequency, amplitude and
type of waveform.

2. The Sound Pressure Wave

All sound waves are produced by vibrations of material objects. The object could be the
vibrating string of a guitar, which is very weak, but considerably reinforced by the vibrating
wooded body of the instruments soundboard.

Any vibrating object can act as a sound source and produce a sound wave, and the greater
the surface area the object presents to the air the more it can move, or more medium it can
displace. All sounds are produced by mechanical vibration of objects: e.g. Rods, Diaphragms,
Stings, Reeds, Vocal chords and Forced airflow

The vibrations may have wave shapes that are simple or complex. These wave shapes will be
determined by the shape, size and stiffness of the source and the manner in which the
vibrations are initiated.

They could be initiated by: Hammering (rods), Plucking (string), Bowing (strings), Forced air
flow (vibration of air column - Organ, voice). Vibrations from any of these sources cause a
series of pressure fluctuations of the medium surrounding the object to travel outwards through
the air from the source.

6
SCHOOL OF AUDIO ENGINEERING AE01– Fundamentals of Sound

Student Notes

It is important to note that the particles of the medium in this case molecules of air, do not
travel from the source to the receiver, but vibrate in a direction parallel to the direction of travel
of the sound wave. Thus sound is a longitudinal wave motion, which propagates at right
angles away from the source. The transmission of sound energy via molecular collision is
termed propagation.

When air molecules are at random position or when there is no sound present in the air
medium, normal atmospheric pressure exists. (0.0002 Dynes per cm2, 0.00002 Pascal or
20mpa)

When a sound source is made to vibrate, it causes the air particles surrounding it to be
alternately compressed and rarefied. Thereby fluctuating the mean air pressure between a
higher than normal state and a lower than normal state. This fluctuation is determined by the
rate of vibration and the force at which the vibration was initiated upon the source.

At the initial forward excursion of the vibrating source, the particle nearest to that sound source
is thrown forward to a point where it comes into contact with another adjacent molecule. After
coming into contact with this adjacent molecule, the molecule will move back along the path of
its original travel where its momentum will cause it to bypass its normal rest position and
regress to its extreme rear-ward position from where it will swing back and finally come to its
normal rest.

Recap:

• When there is no sound wave present all particles are in a state of equilibrium
and normal air pressure exists through out the medium.
• Higher pressure occurs where air particles press together, causing a region
of higher than normal pressure called compression.
• Lower pressure occurs in the medium where adjacent particles are moving a
part to create a partial vacuum causing a region of lower than normal
pressure called a rarefaction
• The molecules vibrate at the same rate as the source vibration. Each
air-particle vibrates about its position at rest at the same frequency as the
sound source, i.e. there is sympathetic vibration
• The molecules are displaced from their mean positions by a distance
proportional to the amount of energy in the wave. This means the higher
energy from the source the more displacement from mean position.

7
SCHOOL OF AUDIO ENGINEERING AE01– Fundamentals of Sound

Student Notes

• A pressure wave radiates away from the source at a constant speed i.e. the
speed of sound in air.
• Sound pressure fluctuates between positive and negative amplitudes around
a zero pressure median point, which is in fact the prevailing atmospheric
pressure.
• These areas of compressions and rare factions move away from the body in
the form of a Longitudinal wave motion -each molecule transferring its energy
to another creating an expanding spherical sound filed.

3. Speed of sound wave

The speed of sound (c or S) with which the compressions and rarefaction move through the
medium is the velocity of the sound wave or the speed at which it travels through a medium.
Sound waves need a material medium for transmission; any medium that has an abundance
of molecules can transmit sound.

The speed of sound varies with the density and elasticity of the medium in which it is travelling
through. Sound is capable of travelling through liquids and solid bodies, through water or steel
and other substances. In air, the velocity is affected by:

Density: A medium with more closely packed molecules; the faster sound will travel in that
medium.

Temperature: The speed of sound increases as temperature rises according to the formula

V = 331 + 0.6t m/s Where t is now in Celsius.

Approximately 1 m/s rise for every degree increase in temperature

Humidity: With increase of RH, high frequencies will suffer due to absorption of sound.

At normal temperature (30 degree C) and air pressure (0.0002 dynes/cm2) the velocity of
sound is 342 meters per second. Generally sound moves through water about 4 times as fast
as it does through air and through iron it moves approximately 14 times faster.

The speed of sound can be determined from:

Speed = Distanced travelled/Time taken or Speed = d/t

The speed of sound in another medium is referred to by S. For example, the speed of sound
on magnetic tape is equal to the tape speed at the time of recording i.e. S = 15 or 30 inches
per sec (ips) Velocity refers to speed in a particular direction. As most sound waves move in all
directions unless impeded in some way, the velocity of a sound wave is equivalent to the
speed.

It is worth remembering that at normal temperature, pressure and sea level sound travels
approx: 1meter in 3 milliseconds i.e. 1 meter in 2.92 milliseconds

Example:

How long does it take for sound to travel 1 km

8
SCHOOL OF AUDIO ENGINEERING AE01– Fundamentals of Sound

Student Notes

Time Taken = 1000/342 = 2.92 s

As mentioned above any vibrating object can act as a sound source and thus produce a sound
wave. The greater the surface area the tile object presents to the air, the more air it can move.
The object could be the vibrating string of a guitar, which is very weak, but considerably
reinforced by the vibrating wooded body of the instruments soundboard.

The disturbance of air molecules around a sound source is not restricted to a single source,
two or more sources can emit Sound waves and the medium around each of the sources
would be distributed by each of them (the instruments) Air by virtue of is elasticity, can support
a number of independent sound waves and produce them simultaneously.

3.1 Particle velocity:

Refers to the velocity at which a particle in the path of a sound wave is moved
(displaced) by the wave as it passes.

Should not be confused with the velocity at which the sound wave travels through the
medium, which is constant unless the sound wave encounters a different medium in
which case the sound wave will refract.

If the sound wave is sinusoidal (sine wave shape), particle velocity will be zero at the
peaks of displacement and will reach a maximum when passing through its normal
rest position.

4. Amplitude/Loudness/ Volume/Gain

When describing the energy of a sound wave the term amplitude is used. It is the distance
above or below the centre line of a waveform (such as a pure sine wave).

The greater the displacement of the molecule from its centre position, the more intense the
pressure variation or physical displacement, of the particles, within the medium. In the case of
air medium it represents the pressure change in the air as it deviates from the normal state at
each instant.

Amplitude of a sound wave in air is measured in Pascal or dynes per sq, both units
of air pressure. However for audio purposes air pressure differences are more
meaningful and these are expressed by the logarithmic power ratio called the Bel or
Decibel (dB).

9
SCHOOL OF AUDIO ENGINEERING AE01– Fundamentals of Sound

Student Notes

Waveform amplitudes are measured using various standards.

a. Peak Amplitude refers to the positive and negative maximums of the wave.
b. Root Means Squared Amplitude (RMS) gives a meaningful average of the
peak values and more closely approximates the signal level perceived by
our ears. RMS amplitude is equal to 0.707 times the peak value of the
wave.

Our perception of loudness is not proportional to the energy of the sound wave this
means that the human ear does not perceive all the frequencies at the same intensity.
We are most sensitive to tones in the middle frequencies (3kHz to 4kHz) with
decreasing sensitivity to those having relatively lower or higher frequencies.

Loudness and Volume are not the same: Hi-fi systems have both a loudness switch
and a volume control. A volume control is used to adjust the overall sound level over
the entire frequency range of the audio spectrum (20Hz to 20kHz). A volume control is
not frequency or tone sensitive, when you advance the volume control; all tones are
increased in level. A loudness switch increases the low frequency and high frequency
range of the spectrum while not -affecting the mid range tortes.

Fletcher & Munson Curves or equal loudness contours show the response of the
human ear throughout the audio range and reveal that more audio sound power is
required at the low end and high end of the sound spectrum to obtain sounds of equal
loudness.

10
SCHOOL OF AUDIO ENGINEERING AE01– Fundamentals of Sound

Student Notes

5. Frequency

The rapidity, which a cycle of vibration repeats, itself is called the frequency.

It is measured in cycles-per-second (c.p.s) or simply (Hertz) One complete excursion of a


wave plotted over a 360 degree axis of a circle is known as a cycle. The number of cycles that
occur over the period of one second is known as a Hertz.

Frequency = 1 (second)/ t (period)

E.g.

1/0.01 (seconds per cycle) =100Hz

A cycle can begin at any point on the waveform but to be complete (1 cycle) it must pass
through the zero line and end at a point that has the same value as the starting point.

5.1 Frequency Spectrum

The scope of the audible spectrum of frequencies is from 20Hz to 20KHz. This
spectrum is defined by the particular characteristics of human hearing and
corresponds to the pitch or frequency ranges of all commonly used musical
instruments.

5.2 Pitch:

Describes the fundamental or basic tone of a sound. It is determined by the frequency


of the tone. The frequency of the tone is a measure of the number of complete

11
SCHOOL OF AUDIO ENGINEERING AE01– Fundamentals of Sound

Student Notes

vibrations generated per second. The greater the number of waves per second the
higher the frequency or higher the pitch of the sound.

5.3 Wavelength:

The wavelength of a wave is the actual physical distance covered by a waveform, or


stance between any two corresponding points of a given cycle

Formula: ?= v/f

Where:

? = lambda or the wavelength in the medium, in meters (m)

V is the velocity of sound in the medium (m/s)

F is the frequency in Hertz (Hz)

Typical wavelengths encountered acoustics:

Frequency Wavelength
20Hz 17.1m
1kHz 34cm
8kHz 4.3cm
20kHz 1.7cm

5.4 Period

Is the amount of time required for one complete cycle of the sound wave. That is the
fraction of time required for a complete progression of the wave. eg. a 30 Hz sound
wave completes 30 cycles each second or one cycle every 1/30th of a second
(0.033s)

Formula: P (or time taken for one complete oscillation) = 1/f

E.g.

Period (t) = 1/30

= 0.033s to complete one cycle of 30 Hz Frequency

5.5 Phase

The concept of phase is important in describing sound waves. It refers to the relative,
displacement in time between waves of the same frequency. The studio engineer
must always contend with two distinct waves.

1. Direct waves

2. Reflected Waves

12
SCHOOL OF AUDIO ENGINEERING AE01– Fundamentals of Sound

Student Notes

When the direct sound wave strikes a reflective surface, part of that wave passes
through the surface while that surface material absorbs part of it. The rest is reflected
as a delayed wave.

The direct and reflected wave may be wholly or partially in phase with each other: the
result is that they will either reinforce each other or cancel each other at the point of
the cycle where they converge.

Since a cycle can begin at any point on a waveform, it is possible to have 2 waves
interacting with each other. If 2 waveforms, which have the same frequency and peak
amplitude, are offset in time, they will have different signed amplitudes at each instant
in time. These two waves are said to be out of phase with respect to each other.

When 2 waveforms are completely in phase (0 degree phase different) and of the
same frequency and peak amplitude are added, the resulting waveform is of the same
frequency and phase but will have twice the original amplitude.

If two waveforms are completely out of phase (180 degree phase different), they will
cancel each other out when added resulting in zero amplitude.

13
SCHOOL OF AUDIO ENGINEERING AE01– Fundamentals of Sound

Student Notes

5.6 Phase Shift

(Ø) is often used to describe the relationship between 2 waves. When two sound
sources produce two waves in close proximity to each other they will interact or
interfere with each other. The type of interference these two waves produce is
dependent upon the phase relationship or Phase Shift between them. Time delays
between waveforms introduce different degrees of phase shift.

Phase relationship characteristics for 2 identical waves:

0 degree phase shift -The waves are said to he completely in-phase or correlated or
100% coherent. They interfere constructively with their amplitudes being added
together.

180-degree phase-shift -The waves are completely out of phase or uncorrelated. Zero
Coherence. They interfere destructively with each other with their amplitudes
cancelling each other to produce zero signal.

Other degrees of phase shift -The waves are partially in phase. Both additions and
cancellations will occur. E.g.

90/270 degrees - equal addition cancellation - 50% coherent

< 90 degrees - more constructive interference

>90 degrees - more destructive interference.

If one wave is offset in time, it will be out of phase with the other. If two waveforms
which have the same frequency and peak amplitude are off set in time, they will
interfere constructively or destructively with each other at certain points of the wave
form. The result would be certain frequencies will be boosted and others will he
attenuated.

In music we deal with complex waveforms, it is difficult to perceive the actual phase
addition or subtraction. But the result of out -of-phase conditions will have cancellation
of certain frequencies; with bass frequencies being affected the most. An experiment
with home hi-fi can best demonstrate the phenomenon. Where changing the + and
-connections of one of the speakers will result in a phase different of 180 degrees
between the second speaker.

The result will be:

Loss of bass

A change of the mid + high frequencies response, this could be a boost or cut

Loss of stereo image-instrument placement, depth

Overall loss of amplitude

14
SCHOOL OF AUDIO ENGINEERING AE01– Fundamentals of Sound

Student Notes

Acoustical phase cancellation is the most destructive condition that can arise during
recording. Care must be taken during recording to position stereo mics to maintain
equilateral distance from source.

Phase shift can he calculated by formula:

Ø = T x Fr x 360

Ø is the phase-shift in degrees

T= time delay in seconds

Fr = Frequency in Hertz

E.g. what will be the phase shift if a 100Hz wave was delayed by 5 milliseconds?

Ø = 0.005 X 100 X 360

= 180 degrees or total phase shift.

The arrival of the second wave will be 180 out of phase and will result in zero
amplitude.

What would be the degree of phase shift if a 100Hz wave was delayed by 2.5
milliseconds?

When two waveforms are completely in phase (zero phase different), and of the same
frequency and peak amplitudes are added, the resulting waveform is of the same
frequency and phase but will have twice the original amplitude.

It two waves are completely out of phase (180 phase different) they will cancel each
other when added -resulting in zero amplitude, hence no output.

6. Difference between musical sound and noise

Sound carries an implication that it is something we can hear or that is audible. However
sound exists above the threshold of our hewing called ultrasound -20kHz and above and
below our hearing range called infrasound -20Hz and below.

Regular vibrations produce musical tones. The series of tones/frequencies are vibrating in tune
with one another. Because of their orderliness, we find their tones pleasing.

Noise produces irregular vibrations their air pressure vibrations are random and we perceive
them as unpleasant tones.

A musical note consists of a Fundamental wave and a number of overtones called Harmonies.
These harmonies are known as the harmonic series.

6.1 Harmonic content:

15
SCHOOL OF AUDIO ENGINEERING AE01– Fundamentals of Sound

Student Notes

Is the tonal quality or timbre of the sound. Sound waves are made up of a
fundamental tone and several different frequencies called Harmonics or Overtones.
Harmonic modes: When strings are struck as in a piano, plucked (guitar or bowed
violin), they tend to vibrate in quite a complex manner. In the fundamental mode (1st
harmonic) the string vibrates or oscillates as a whole with respect to the two fixed
ends. In the case of middle C the frequency generated will be 261.63 Hz.

But there is also a tendency for the two halves to oscillate. Thus producing a second
harmonic, at about twice the frequency of the fundamental note.

Harmonics are whole numbered multiples of the fundamental. Overtones may or may
not be harmonically related to the fundamental.

The composite waveform comprising the fundamental and its numerous harmonics
can look quite irregular with many sharp peaks and dips. Yet the fundamental tone
and each harmonic is made up of a very regular shape waveform.

6.2 Timbre:

The first mode of vibration has the fundamental frequency vibrating at its extreme
ends. There is a tendency for the string to vibrate at 1/2, 1/3, 1/4 its length. This is
described as its 2nd mode of vibration, 3rd mode of vibration etc. These subsequent
vibrations, along with the fundamental constitute the timbre or tonal characteristic of a
particular instrument.

The factor that enables us to differentiate the same note being played by several
instruments is the harmonic/overtone relationship between the two instruments
playing the same note.

A Violin playing 440Hz has a completely different fundamental/ harmonic relationship


to a viola. This is the factor that allows us to recognise the difference between the two
instruments although they may be playing the same note. If the fundamental
frequency is 440 Hz its second harmonic will be 880Hz or twice the fundamental (440
x 2 = 880). The third harmonic will be 1320Hz (440x3=1320)

No matter how complex a waveform is, it can be shown to be the sum of sine waves
whose frequencies are whole numbered multiples of the fundamental.

6.3 Octaves

The octave is a musical term, which refers to the distance between one note and its
recurrence higher or lower in a musical scale. An octave distance is always a doubling
of frequency so the octave above the A at 440 Hz is 880 Hz or the second harmonic.
However the next octave above 880 Hz is 1720 Hz or four times the fundamental.
Therefore the octave scale and the harmonic scale are different. The octave scale is
said to be logarithmic while the harmonic scale is linear. The octave range relates
directly to the propensity for human hearing to judge relative pitch or frequency on a
2:1 ratio.

7. Waveforms Types:

Waveforms are the building blocks of sound. By combining raw waveforms one can simulate
any acoustic instrument. This is how synthesisers make sound. They have waveform

16
SCHOOL OF AUDIO ENGINEERING AE01– Fundamentals of Sound

Student Notes

generators, which create all types of waves, which they combine to form composite
waveforms, which approximate real instruments.

Generally musical waveforms can be divided into two categories.

1. Simple
2. Complex

Simple - Where the most basic wave is the Sine wave, which traces a simple
harmonic motion e.g. tuning forks, Pendulum, Flute. These waveforms are called
simple because they are continuous and repetitive. One cycle looks exactly like the
next and they are perfectly symmetrical around the zero line. Simple waves contain
harmonics.

Complex: Speech and music depart from the simple sine form. We can break down a
complex waveform into a combination of sine waves. This method is called Fourier
Analysis, named after the 19th century Frenchman who proposed this method. Wave
synthesis combines simple waves into complex waveforms e.g. the synthesiser. The
ear mechanism also distinguishes frequency in complex waves by breaking them
down into sine wave components.

Noise - Noise is a random mixture of sine waves continuously shifting in frequency,


amplitude and phase. There are generally two types of synt hetic noise:

i. White Noise - equal energy per frequency


ii. Pink noise: equal energy per octave.

8. Wave shape

Wave shape is created by the amplitude and harmonic components in the wave. To create a
Square wave the sine wave fundamental is combined with a number of odd harmonics at
regular intervals.

17
SCHOOL OF AUDIO ENGINEERING AE01– Fundamentals of Sound

Student Notes

Adding even harmonies components with the fundamental creates a Triangle wave shape. A
Saw tooth wave is made of odd and even harmonics added to the fundamental frequency.

18
SCHOOL OF AUDIO ENGINEERING AE01– Fundamentals of Sound

Student Notes

9. Acoustic Envelop:

An important aspect influencing the waveform of a sound is its envelope. Every instrument
produces its own envelop which works in combination with its timbre the determines the
subjected sound of the instrument

The envelope of a waveform describes the way its intensity varies in the time that the sound is
produced and dies away. The envelope therefore describes a relationship between time and
amplitude. This can be viewed on a graph by connecting a wave's the peak points of the same
polarity over a series of cycles. An acoustic envelope bus 4 basic sections: attack decay,
sustain and release.

1. Attack time is the time it takes for the sound to rise to maximum level.
2. Decay time is the internal dynamics of the instrument (resonance of a Tom drum)
can be longer than main decay (sustain) the Tom can ring for a duration.
3. Sustain time is the sound source is maintained from max levels to mid levels.
4. Release Time is the time it takes for a sound to fall below the noise floor.

THE HUMAN EAR

The organ of hearing the ear operates as a Transducer i.e. it translates wave movement
through several mediums - air pressure variations into mechanical action then to liquid
variations and finally to electrical/neural impulses.

1. The Outer Ear

Consists of the Pinna and the ear canal (external meatus). Here sound waves are collected
and directed toward the middle ear. Both the pinna and the ear canal increase the loudness of
a sound we hear by concentrating or focusing the sound waves e.g. the old ear trumpet as a
hearing aid.

The ear canal is often compared to an organ pipe in that certain frequencies (around 3KHz)
will resonate within it because of its dimensions (3cm x 0.7cm) i.e. frequencies whose quarter
wavelengths are similar in size to the length of the canal. The ear will perceive this frequency
band as louder, which corresponds to critical bandwidth for speech intelligibility. “Pipe

19
SCHOOL OF AUDIO ENGINEERING AE01– Fundamentals of Sound

Student Notes

resonance” amplifies the sound pressure falling on the outer ear by around 10dB by the time it
strikes the eardrum, peaking in the 2-4kHz region.

Wiener and Ross have found that diffraction around the head results in a further amplification
effect adding a further 10dB in the same bandwidth.

2. The Middle Ear

The mechanical movements of the tympanic membrane are transmitted through three small
bones known as ossicles, comprising the malleus, incus and stapes – more commonly known
as the hammer, anvil and stirrup – to the oval window of the cochlea. The oval window forms
the boundary between the middle and inner ears.

The malleus (Hammer) is fixed to the middle fibrous layer of the tympanic membrane in such a
way that when the membrane is at rest, it is pulled inwards. Thus the tympanic membrane
when viewed downt he auditory canal from outside appears concave and conical in shape.
One end of the stapes (Stirrup) is the stapes footplate is attached to the oval window of the
cochlea. The malleus and incus (Hammer and anvil) are joined quite firmly such that at normal
intensity levels they act as a single unit, rotating together as the tympanic membrane vibrates
to move the stapes via a ball and socket joint in a piston-like manner. Thus acoustic vibrations
are transmitted via the tympanic membrane and ossicles as mechanical movements to the
cochlea of the inner ear.

The function of the middle ear is two fold:

• To transmit the movements of the tympanic membrane to the fluid which fills the
cochles without significant loss in energy. And
• To protect the hearing system to some extent from the effects of loud sounds,
whether from external sources or the individual concerned.

In order to achieve efficient transfer of energy from the tympanic membrane to the oval
window, the effective pressure acting on the oval window is arranged by mechanical means to
be greater than the that actig on the tympanic membrane. This is to overcome the higher
resistance to movement of the cochlea fluid compared to that of air at the input tot he ear.
Resistance to movement can be thought of as impedance to movement and the impedance of
fluid to movement is high compared to that of air. The ossicles act as a mechanical impedance
converter or impedance transformer and this is achieved by two means

• The lever effect of the malleus (hammer) and incus (anvil)


• The area difference between the tympanic membrane and the stirrup foot plate.

A third aspect of the middle ear which appears relevant to the impedance conversion process
is the buckling movement of the tympanic membrane itself as it moves, resulting in a twofold
increase in the force applied the malleus.

In humans, the area of the tympanic membrane is approximately 13 times larger than the area
of the stapes footplate , and the malleus is approximately 1.3 times the length of the incus.
The buckling effect of the tympanic membrane provides a force increase by a factor of 2. Thus
the pressure at the stapes footplate is about (13 X 1.3 X 2 = 33.8) times larger than the
pressure at the tympanic membrane.

The second function of the middle ear is to provide some protection for the hearing system
from the effects of loud sounds, whether from external sources or the individual concerned.

20
SCHOOL OF AUDIO ENGINEERING AE01– Fundamentals of Sound

Student Notes

This occurs as a result of the action of two muscles in the middle ear: The tensor tympani and
the stapedius muscle. These muscles contract automatically in response to sounds with levels
greater than approximately 75dBSPL and they have the effect of increasing the impedance of
the middle ear by stiffening the ossicular chain. This reduces the efficiency with which
vibrations are transmitted from the tympanic membrane to the inner ear and thus protects the
inner ear to some extent from loud sounds. Approximately 12 to 14 dB of attenuation is
provided by this protection mechanism, but this is for frequencies below 1Khz only. The names
of these muscles derive from where they connect with the ossicular chain: the tensor tympani
is attached near the tympanic membrane and the stapedius muscle is attached to the stapes.

This stiffening of the muslces is known as acoustic reflex. It takes some 60ms to 120 ms for
the muscles to contract in response to a loud sound. In the case of loud impulsive sound such
as the firing of a large gun, it has been suggested that the acoustic reflex is too slow to protect
the hearing system.

3. The Inner Ear

The inner ear consists of 2 fluid-filled structures:

The vestibular system consisting of 3 semi-circular canals, the utricle and sacculus- these are
concerned with balance and posture.

The Cochlea is the organ of hearing. It is about the size of a pea and encased in solid bone. It
is coiled up like a seashell, filled with fluid and divided into an upper and a lower part by a pair
of membranes (Basilar Membrane and Tectorial Membrane). The Oval Window opens into
the upper part of the cochlea, and the pressure releasing Round Window into the lower part.

The rocking motion of the Oval Window caused by the Ossicles sets up sound waves in the
fluid. Amplitude peaks for different frequencies occur along the Basilar Membrane in different
parts of the cochlea, with lower frequencies (e.g. 50Hz) towards the end and higher freq. (e.g.
1500 Hz) at the beginning. High frequencies cause maximal vibration at the stapes end of the
basilar membrane where it is narrow and thick. . Low frequencies cause greater effect at the
apical end where the membrane is thin and wide.

The waves act on hair like nerve terminals bunched under the Basilar Membrane in the Organ
of Corti. These nerves convey signals in the form of neuron discharges to the brain. These
potentials are proportional to the sound pressure falling on the ear over an 80dB range. These
so called “microphonic” potentials were actually picked up and amplified from the cortex of an
anaesthetized cat.

4. Neural Processing

Nerve signals consist of a number of electrochemical impulses, which pass along the fibres at
about 10m/s intervals. Intensity is conveyed by the mean rate of the impulses. Each fibre in the
cochlea nerve responds most sensitively to its own characteristic frequency (CF) requiring a
minimum spl at this frequency to stimulate it or raise it detectably. The CF is directly related to
the part of the basilar membrane from which the stimulus arises.

While the microphonic signals are analog, the neuron discharges are caused by the cochlea
nerves either firing (on) or not firing (off) producing a type of binary code, which the brain
interprets. The loudness of a sound is related to the number of nerve fibres excited (3,000

21
SCHOOL OF AUDIO ENGINEERING AE01– Fundamentals of Sound

Student Notes

maximum) and the repetition rates of such excitation. A single fibre firing would represent the
threshold of sensitivity.

In this way component frequencies (partials) of a signal are separated and their amplitudes
measured. The ear interprets this information as a ratio of amplitudes, deciphering it to give a
picture of harmonic richness or Timbre of the sound.

5. The Ear and Frequency Perception

This section considers how well the hearing system can discriminate between individual
frequency components of an input sound. This will provide the basis for understanding the
resolution of the hearing system and it will underpin discussions relating to the
psychoacoustics of how we hear music, speech and other sounds.

Each component of an input sound will give rise to a displacement of the basilar membrane. At
a particular place. The displacement due to each individual component is spread to some
extent on either side of the peak. Whether or not two components that are of similar amplitude
and close together in frequency can be discriminated depends on the extent to which the
basilar membrane displacements due to each of the two components are clearly separated or
not.

5.1 Critical Bandwidth and Beats

Suppose two pure tones, or sine waves, with amplitudes A1 and A2 and frequencies F1
and F2 are sounded together. If F1 is fixed and F2 is changed slowly from being equal
to or in unison with F1 either upwards or downwards in frequency, the following is
generally heard. When F1 is equal to F2 a single note is heard. As soon as F2 is
moved higher (lower) than F1 a sound with a clearly undulating amplitude variations
known as beats is heard. The frequency of the beats is equal to the (F2 - F1 ) or (F1 –
F2 ) if F1 is greater than F2 and the amplitude varies between (A1 + A2 ) and ((A1 - A2 ),
or (A1 + A2 ) and (A2 – A1 ) If A2 is greater than A1. Note that when the amplitudes
are equal, (A1 = A2 ) the amplitude of the beats varies between 2 X A1 and 0.

For the majority of listeners, beats are usually heard when the frequency difference
between the tones is less than about 12.5Hz, and the sensation of beats generally
gives away to one of a fused tone which sounds rough when the frequency difference
is increased above 15Hz. As the frequency difference is increased further there is a
point where the fused tone gives way to two separate tones but still with the sensation
of roughness, and a further increase in frequency difference is needed for the rough
sensation to become smooth. The smooth separate sensation persists while the two
tones remain within the frequency range of the listener’s hearing.

There is no exact frequency difference at which the change from fused to separate
and from beats to rough to smooth occur for every listener. However, the approximate
frequency and order in which they occur is common to all listeners, and in common
with all psychoacoustic effects, average values are quoted which are based on
measurements made for a large number of listeners.

The point where the two tones are heard as separate as opposed to fused when the
frequency differnce is increased can be thought of as the point where two peak
displacements on the basilar membrane begin to emerge from a single maximum
displacement on the membrane. However, at this point the underlying motion of the
membrane which gives rise to the two peaks causes them to interfere with each other
giving the rough sensation, and it is only when the rough sensation becomes smooth

22
SCHOOL OF AUDIO ENGINEERING AE01– Fundamentals of Sound

Student Notes

that the separation of the places on the membrane is sufficent to fully resolve the two
tones. The frequency difference between the pure tones at the point where a listener’s
perception changes from rough and separate to smooth and separate is known as
critcal bandwidth. A more formal definition is given by Scharf (1970), ‘ the critical
bandwidth is that bandwidth at which subjective responses rather abruptly change.’

The critical bandwidth changes according to frequency. IN practice Critical bandwidth


is usually measured by an effect known as masking in which the rather abrupt change
is more clearly perceived by listeners. Masking is when one frequency can not be
heard as a result of another frequency that is louder and close to it.

6. Frequency range and pressur e sensitivity of the ear

The frequency range of the human ear- the human ear is usually quoted as having a
frequency range of 20Hz to 20,000Hz (20Khz) but this is not necessarily the case for every
individual. This range changes as part of the human ageing process, particularly in terms of
the upper limit which tends to reduce. Healthy young children may have a full range hearing
range up to 20Khz, but by the age of 20, the upper limit may have dropped to 16Khz. It
continues to reduce gradually to about 8Khz by retirement age. This is known as presbyacusis
or presbycusis and is a function of normal ageing process. This reduction in the upper
freqency limit of the hearing range is accompanied by a decline in hearing sensitivity at all
frequencies with age, the decline being less for low frequencies than for high. Hearing losses
can also be induced from prolonged exposure to loud sounds.

The ear’s sensitivity to sounds of different frequencies varies over a vast sound pressure level
range. On Average, the minimum sound pressure variation which can be detected by the
human hearing system around 4Khz is approximately 10 micropascals, 10--5 Pa. This is the
threshold of hearing.

The maximum average sound pressure level which is heard rather than perceived as being
painful is 20Pa. This is the threshold of pain

7. Noise-induced hearing loss

The ear is a sensitive and accurate organ of sound transduction and analysis. However, the
ear can be damaged by exposure to excessive levels of sound or noise. This damage can
manifest itself in two major forms :

7.1 A loss of hearing sensitivity

The effect of noise exposure causes the efficiency of the transduction of sound into
nerve impulses to reduce. This is due to damage to the hair cells in each of the
organs of corti. Note this is different from the threshold shift due to the acoustic reflex
which occurs over a much shorter time period and is a form of built-in hearing
protection. This loss of sensitivity manifests itself as a shift in the threshold of hearing
that they can hear. This shift in the threshold can be temporary, for short times of
exposures, but ultimately it becomes permanent as the hair cells are permanently
flattened as a result of the damage, due to long-term exposure, which does not allow
them time to recover.

7.2 A loss of hearing acuity

23
SCHOOL OF AUDIO ENGINEERING AE01– Fundamentals of Sound

Student Notes

This is a more subtle effect but in many ways is more severe than the first effect. We
have seen that a crucial part of our ability to hear and analyse sounds is our ability to
separate out the sounds into distinct frequency bands, called critical bands. These
bands are very narrow. Their narrowness is due to an active mechanism of positive
feedback in the cochlea which enhances the standing wave effects mentioned earlier.
This enhancement mechanism is very easily damaged; it appears to be more
sensitive to excessive noise than the main transduction system. The effect of the
damage though is not just to reduce the threshold but also to increase the bandwidth
of our acoustic filters. This has two main effects:

• Firstly, our ability to separate out the different components of the sound is
impaired, and this will reduce our ability to understand speech or separate out
desired sound from competing noise. Interestingly it may well make musical
sounds which were consonant more dissonant because of the presence of
more than one frequency harmonic in a critical band.
• The second effect is a reduction in the hearing sensitivity, because the
enhancement mechanism also increases the amplitude sensitivity of the ear.
This effect is more insidious because the effect is less easy to measure and
perceive; it manifests itself as a difficulty in interpreting sounds rather than a
mere reduction in their perceived level.

Another related effect due to damage to the hair cells is noise-induced tinnitus.
Tinnitus is the name given to a condition in which the cochlea spontaneously
generates noise, which can be tonal, random noises, or a mixture of the two. In
noise-induced tinnitus exposure to loud noise triggers this, and as well as being
disturbing, there is some evidence that people who suffer from this complaint may be
more sensitive to noise induced hearing damage.

Because the damage is caused by excessive noise exposure it is more likely at the
frequencies at which the acoustic level at the ear is enhanced. The ear is most
sensitive at the first resonance of the ear canal, or about 4 kHz, and this is the
frequency at which most hearing damage first shows up. Hearing damage in this
region is usually referred to as an audiometric notch. This distinctive pattern is
evidence that the hearing loss measured is due to noise exposure rather than some
other condition, such as the inevitable high-frequency loss due to ageing.

How much noise exposure is acceptable? There is some evidence that the normal noise in
Western society has some long-term effects because measurements on the hearing of other
cultures have shown that there is a much lower threshold of hearing at a given age compared
with Westerners. However, this may be due to other factors as well; for example, the level of
pollution, etc. There is strong evidence, however, that exposure to noises with amplitudes of
greater than 90 dB(SPL) will cause permanent hearing damage. This fact is recognised by
legislation which requires that the noise exposure of workers be less than this limit. Note that if
the work environment has a noise level of greater than this then hearing protection of a
sufficient standard should be used to bring the noise level, at the ear.

8. Protecting your hearing

Hearing loss is insidious and permanent and by the time it is measurable it is too late.
Therefore in order to protect hearing sensitivity and acuity one must be proactive. The first
strategy is to avoid exposure to excess noises. Although 90 dB(SPL) is taken as a damage
threshold if the noise exposure causes ringing in the ears, especially if the ringing lasts longer
than the length of exposure, it may be that damage may be occurring even if the sound level is
less than 90 dB(SPL).

24
SCHOOL OF AUDIO ENGINEERING AE01– Fundamentals of Sound

Student Notes

There are a few situations where potential damage is more likely.

• The first is when listening to recorded music over headphones, as even small ones
are capable of producing damaging sound levels.
• The second is when one is playing music, with either acoustic or electric instruments,
as these are also capable of producing damaging sound levels, especially in small
rooms with a ‘live’ acoustic.

In both cases the levels are under your control and so can be reduced. However, the acoustic
reflex, reduces the sensitivity of your hearing when loud sounds occur. This effect, combined
with the effects of temporary threshold shifts, can result in a sound level increase spiral, where
there is a tendency to increase the sound level ‘to hear it better’ which results in further dulling,
etc. The only real solution is to avoid the loud sounds in the first place. However, if this
situation does occur then a rest away from the excessive noise will allow some sensitivity to
return.

There are sound sources over which one has no control, such as bands, discos, night clubs,
and power tools. In these situations it is a good idea either to limit the noise dose or, better
still, use some hearing protection. For example, one can keep a reasonable distance away
from the speakers at a concert or disco. It takes a few days, or even weeks in the case of
hearing acuity, to recover from a large noise dose so one should avoid going to a loud concert,
or nightclub, every day of the week! The authors regularly use small ‘in-ear’ hearing protectors
when they know they are going to be exposed to high sound levels, and many professional
sound engineers also do the same. These have the advantage of being unobtrusive and
reduce the sound level by a modest, but useful, amount (15-20 dB) while still allowing
conversation to take place at the speech levels required to compete with the noise! These
devises are also available with a ‘flat’ attenuation characteristic with frequency and so do not
alter the sound balance too much, and cost less than a CD recording. For very loud sounds,
such as power tools, then a more extreme form of hearing protection may be required, such as
headphone style ear defenders.

Your hearing is essential, and irreplaceable, for both the enjoyment of music, for
communicating, and socialising with other people. Now and in the future, it is worth taking care
of.

9. Perception of sound source direction

How do we perceive the direction that a sound arrives from ?

The answer is that we make use of our two ears, but how ? Because our two ears are
separated by our head, this has an acoustic effect which is a function of the direction of the
sound. There are two effects of the separation of our ears on the sound wave: firstly the
sounds arrive at different times and secondly they have different intensities. These two effects
are quite different so let us consider them in turn.

9.1 Interaural time difference (ITD)

Because the ears are separated by about 18 cm there will be a time difference
between the sound arriving at the ear nearest the source and the one further away.
So when the sound is off to the left the left ear will receive the sound and when it is off
to the right the right ear will hear it first. If the sound is directly in front, or behind, or
anywhere on the median plane, the sound will arrive at both ears simultaneously. The
time difference between the two ears will depend on the difference in the lengths that

25
SCHOOL OF AUDIO ENGINEERING AE01– Fundamentals of Sound

Student Notes

the two sounds have to travel. the maximum ITD occurs at 90° and is = 6.73 × 10-4 s
(673 µs).

Note that there is no difference in the delay between front and back positions at the
same angle. This means that we must use different mechanisms and strategies to
differentiate between front and back sounds. There is also a frequency limit to the
way in which sound direction can be resolved by the ear in this way. This is due to
the fact that the ear appears to use the phase shift in the wave caused by the
interaural time difference to resolve the direction.

When the phase shift is greater than 180° there will be an unresolvable ambiguity in
the direction because there are two possible angles—one to the left and one to the
right—that could cause such a phase shift. This sets a maximum frequency, at a
particular angle, for this method of sound localization and this angle is 743Hz.

Thus for sounds at 90° the maximum frequency that can have its direction determined
by phase is 743 Hz. However, the ambiguous frequency limit would be higher at
smaller angles.

9.2 Interaural intensity difference (IID)

The other cue that is used to detect the direction of the sound is the differing levels of
intensity that result at each ear due to the shading effect of the head. The levels at
each ear is equal when the sound source is on the median plane but the level at one
ear progressively reduces, and increases at the other, as the source moves away
from the median plane. The level reduces in the ear that is furthest away from the
source.

This means that there will be a minimum frequency below which the effect of intensity
is less useful for localisation which will correspond to when the head is about one third
of a wavelength in size (1/3λ). For a head the diameter of which is 18 cm, this
corresponds to a minimum frequency of about 637Hz

Thus the interaural intensity difference is a cue for direction at high frequencies
whereas a the interaural intensity difference is a cue for direction at low frequencies.
Note that the cross-over between the two techniques starts at about 700 Hz and
would be complete at about four times this frequency at 2.8 kHz. In between these
two frequencies the ability of our ears to resolve direction is not as good as at other
frequencies.

9.3 Pinnae and head movement effects

The above models of directional hearing do not explain how we can resolve front to
back ambiguities or the elevation of the source. There are in fact two ways which are
used by the human being to perform these tasks.

The first is to use the effect of our ears on the sounds we receive to resolve the angle
and direction of the sound. This is due to the fact that sounds striking the pinnae are
reflected into the ear canal by the complex set of ridges that exist on the ear. These
pinnae reflections will be delayed, by a very small but significant amount, and so will
form comb filter interference effects on the sound the ear receives. The delay that a
sound wave experiences will be a function of its direction of arrival, in all three
dimensions, and we can use these cues to help resolve the ambiguities in direction

26
SCHOOL OF AUDIO ENGINEERING AE01– Fundamentals of Sound

Student Notes

that are not resolved by the main directional hearing mechanism. The delays are very
small and so these effects occur at high audio frequencies, typically above 5kHz. The
effect is also person specific, as we all have differently shaped ears and learn these
cues as we grow up. Thus we get confused for a while when we change our acoustic
head shape radically, by cutting very long hair short for example. We also find that if
we hear sound recorded through other people’s ears that we have a poorer ability to
localise the sound, because the interference patterns are not the same as those for
our ears.

The second, and powerful, means of resolving directional ambiguities is to move our
heads. When we hear a sound that we wish to attend to, or resolve its direction, we
move our head towards the sound and may even attempt to place it in front of us in
the normal direction, where all the delays and intensities will be the same. The act of
moving our head will change the direction of the sound arrival and this change of
direction will depend on the sound source position relative to us. Thus a sound from
the rear will move in different direction compared to a sound in front of or above the
listener. This movement cue is one of the reasons that we perceive the sound from
headphones as being ‘in the head’. Because the sound source tracks our head
movement it cannot be outside and hence must be in the head. There is also an
effect due to the fact that the headphones also do not model the effect of the head.
Experiments with headphone listening which correctly model the head and keep the
source direction constant as the head moves give a much more convincing illusion.

9.4 The Haas effect

The effect can be summarised as follows :

• The ear will attend to the direction of the sound that arrives first and will not
attend to the reflections providing they arrive within 30 ms of the first sound.
• The reflections arriving before 30 ms are fused into the perception of the first
arrival. However, if they arrive after 30 ms they will be perceived as echoes.

These results have important implications for studios, concert halls and sound
reinforcement systems. In essence it is important to ensure that the first reflections
arrive at the audience earlier than 30 ms to avoid them being perceived as echoes. In
fact it seems that our preference is for a delay gap of less than 20 ms if the sound of
the hall is to be classed as ‘intimate’. In sound reinforcement systems the output of
the speakers will often be delayed with respect to their acoustic sound but, because of
this effect, we perceive the sound as coming from the acoustic source, unless the
level of sound from the speakers is very high.

10. Ear Training

The basic requirement of a creative sound engineer is to be able to listen well and analyse
what they hear. There are no golden ears, just educated ears. A person develops his or her
awareness of sound through years of education and practice.

We have to constantly work at training our ears by developing good listening habits. As an
engineer, we can concentrate our ear training around three basic practices - music,
microphones and mixing.

10.1 Listening to Music

27
SCHOOL OF AUDIO ENGINEERING AE01– Fundamentals of Sound

Student Notes

Try and dedicate at least half an hour per day to listening to well recorded and mixed
acoustic and electric music. Listen to direct-to-two track mixes and compare with
heavily produced mixes. Listen to different styles of music, including complex musical
forms. Note the basic ensembles used, production clichés and mix set-ups.

Also attend live music concerts. The engineer must learn the true timbral sound of an
instrument and its timbral balances. The engineer must be able to identify the timbral
nuances and the characteristic of particular instruments.

Learn the structuring of orchestral balance. There can be an ensemble chord created
by the string section, the reeds and the brass all working together. Listen to an
orchestra live, stand in front of each section and hear its overall balance and how it
layers with other sections.

For small ensemble work, listen to how a rhythm section works together. How bass,
drums, percussion, guitar and piano interlock. Learn the structure of various song
forms such as verse, chorus, break etc. Learn how lead instrument and lead vocals
interact with this song structure. Notice how instrumentals differ from vocal tracks.

Listen to sound design in a movie or TV show. Notice how the music underscores the
action and the choice of sound effects builds a mood and a soundscape. Notice how
tension is built up and how different characters are supported by the sound design.
Notice the conventions for scoring for different genres of film and different types of TV.

For heavily produced music, listen for production tricks. Identify the use of different
signal processing FX. Listen for panning tricks, doubling of instruments and voices.

Analyse a musical mix into the various components of the sound stage. Notice the
spread of instruments from left to right, front to back up and down. Notice how
different stereo systems and listening rooms influence the sound of the same piece of
music.

10.2 Listening with Microphones

Mic placement relative to the instrument can provide totally different timbral colour eg
proximity boost on closely placed cardiod mics. A mic can be positioned to capture
just a portion of the frequency spectrum of an instrument to be conducive with a
particular “sound” or genre. E.g. rock acoustic piano may favour the piano’s high end
and require close miking near the hammers to accent percussive attack, a sax may be
miked near the top to accent higher notes or an acoustic guitar across the sound hole
for more bass.

The way an engineer mics an instrument is influenced by:

• Type of music
• Type of instrument
• Creative Production
• Acoustics of the hall or studio
• Type of mic
• Leakage considerations

Always make A/B comparisons between mics different and different positions. The ear
can only make good judgements by making comparisons.

28
SCHOOL OF AUDIO ENGINEERING AE01– Fundamentals of Sound

Student Notes

In the studio reflections from stands, baffles, walls, floor and ceiling can affect the
timbre of instruments. This can cause timbre changes, which can be problematic or
used to capture an “artistic” modified spectrum. When miking sections, improper
miking can cause timbre changes due to instrument leakage. The minimum 3:1 mic
spacing rule helps control cross-leakage.

Diffuser walls placed around acoustic instruments can provide an openness and a
blend of the direct /reflected sound field. Mic placement and the number of diffusers
and their placement can greatly enhance the “air” of the instrument.

An Engineer should be able to recognise characteristics of the main frequency bands


with their ears.

Hz Band Characteristics Positive Negative


16 – 160 Extreme lows Felt more than Warmth Muddyness
heard

160-250 Bass No stereo Fatness Boominess


information , Boxiness

250-2000 Low Mid-range Harmonics start Body Horn-


to occur like(500-
1000 Hz)

Ear fatigue
(1kHz-
2kHz)

2000-4000 High Mid-range Vocal intelligibility Gives Tinny, thin


definition

4000-6000 Presence Loudness and Definition, Brash


closeness. energy,
Spatial closeness
information
6000-20000 Highs Depth of field Air. Noise
Crispness.
Boosting/cutti
ng helps
create
seness
/distance

10.3 Listening in Foldback and Mixdown

A balanced cue mix captures the natural blend of the musicians. Good foldback
makes musicians play with each other instead of fighting to be heard. If reed and
brass players can’t hear themselves and the rest of the group they tend to overplay. A

29
SCHOOL OF AUDIO ENGINEERING AE01– Fundamentals of Sound

Student Notes

singer will back off from the mic if their voice in the headphone mix is to loud, or
swallow the mic if the mix is too soft. They will not stay in tune if they cannot hear
backing instruments. Musicians will aim their instruments at music stands or walls for
added reflection to help them overcome a hearing problem.

Dimensional Mixing: The final 10% of a mix picture is spatial placement and layering
of instruments or sounds. Dimensional mixing encompasses timbral balancing and
layering of spectral content and effects with the basic instrumentation. For this, always
think sound in dimensional space: left/right, front/back, up/down.
Think of a mix in Three Levels:

Level A 0 to 1 meter
Level B 1 to 6 meters
Level C 6 meters and further

Instruments which are tracked in the studio are all recorded at roughly the same level
(SOL) and are often close miked. If an instrument is to stand further back in the mix it
has to change in volume and frequency. Most instruments remain on level B so you
can hear them all the time. Their dynamics must be kept relatively stable so their
position does not change. Level A instruments will be lead and solo instruments. Level
C can be background instruments, loud instruments drifting in the background,
sounds, which are felt, rather than heard and Reverb.

Control Room Acoustics: The studio control room must be as neutral as possible if we
are to judge the accuracy of what we have miked or what we are mixing. The control
room is an entire system that includes:

1. Room Acoustics (modal. absorption, diffusion, isolation)


2. Early Reflections: when early, diffused energy fills in the time delay gap and
enlarges the perceived size and depth of the listening environment.
3. Shell stiffness and mass
4. Mixing areas
5. Loudspeakers
6. Loudspeaker decoupling
7. Loudspeaker placement referenced to mix position
8. System gain structure
9. Electronics
10. Grounding
11. Mechanical noise (air con, equipment etc)
12. Equipment and cabinet placement

Every effort should be made during mixdown to listen to the mix on near field and big
monitors.

Work at around 85 dB SPL but listen at many levels for brief periods of time. Listen in
mono. Compare with mixes with the same production style. Take rests and don’t let
your ears get fatigued.

30
AE02 – Music Theory I

1. Basic concepts of Music Notation

1.1 The Stave (Staff)

1.2 Clefs

Rudiments

1. Tones and semitones

1.1 Sharp

1.2 Flat

1.3 Natural

2. Scales

2.1 Writing an F-Major Scale in the treble clef

3. Key signatures

3.1 Key Signature hints:

3.2 Accidentals

4. Tonic

4.1 Triad:

5. Rhythms and Beat Notation

5.1 Pulse or beat

5.2 Dotted Notes

5.3 Rests

6. Time signatures

6.1 Simple Time Signatures

6.2 Compound Time Signature

7. Bar and bar line

8. Additional notes

31
8.1 Double-Sharps & Double-Flats

9. Modes

32
33
SCHOOL OF AUDIO ENGINEERING A02– Music Theory I

Student Notes

AE02 – MUSIC THEORY I

Introduction

Music theory is fundamental to learning about music. It is a requirement for the diploma in
Audio Engineering but is also useful outside the diploma. It provides a platform from
which sounds can be analysed both from the engineering perspective as well as from the
musical perspective. This study in music theory, though from the western perspective,
provides you with the right knowledge to analyse and appreciate a wide variety of music
styles.

Music theory is a lot of fun when you become more familiar with it.

1. Basic concepts of Music Notation

Music notation is done using certain fixed tools:

1.1 The Stave (Staff)

Music in the “western” culture is written on five lines and four spaces:

This collection of lines and spaces is called a staff. The lines and spaces have been
numbered for easy reference. In normal music writing, these numbers will not be
there. The numbering here also shows that the lines and spaces are counted from the
bottom.

Music is made up of three basic components,

• Pitch,
• Harmony
• Rhythm.

These three components are represented on paper by staff lines, Clefs, notes and
rests (notes and rests will be discussed in later lessons).

The staff lines represent musical pitches. The actual pitch represented by a staff line
or space depends on what clef is at the beginning of the staff and the key signature
(more on key signatures later).

1.2 Clefs

There are various types of clefs the two most common ones are:

34
SCHOOL OF AUDIO ENGINEERING A02– Music Theory I

Student Notes

“treble” clef “bass” clef

Practice drawing some on your own:

Start drawing the treble clef from between the 1st and 2nd line and move down to form
a loop.

Keep moving up until you crossed the 5th line after which, loop back down until you
have cut the 1st line.

Start drawing the bass clef on the 4th line and loop up to form the curve at the 5th line.

Loop back down and stop between the 1st and 2nd line. Usually the two dots are added
at the side of the clef.

A treble clef and a bass clef joined together form what is called a ‘grand staff’.

35
SCHOOL OF AUDIO ENGINEERING A02– Music Theory I

Student Notes

Note Naming- Staff lines, clefs and the keyboard

Musical note names are according to the first seven letters of the English alphabets, A, B, C,
D, E, F, G. They then repeat themselves in a cycle. This is as shown below.

C (an
C (an octave octave
Middle higher)
lower) C

The ‘middle C’ indicated on the keyboard is the ‘C’ that is directly corresponding to the ‘C’ in
the space between the treble and bass clefs. The notes indicated on the grand staff are the
consecutive white notes on the keyboard.

You should be able to see the pattern of repetition now after observing the way the notes are
arranged.

Octave simply means that an octave of middle C is the next C after you have gone
through the 7 alphabets associated with musical notes (C, D, E, F, G, A , B, C which
is 7 notes apart). Refer back to the staff lines above. It can be applied for all notes on
the keyboard.

Rudiments

1. Tones and semitones

A tone is a note that is spaced exactly one note higher or lower, i.e. A to B or C to D or A back
to G. It means that as long as the two notes differ by a note exactly it means that it is a tone
apart.

As for a semitone, it means that the notes are half a tone higher or lower, i.e. A to A# (A sharp)
or C to C# or A back to A flat. The special cases are B to C and E to F. You can refer back to
the keyboard below to understand the sequence.

The naming of the black notes requires that you understand what sharps, flats and semitones
are. Sharps and flats are a semitone from its natural note.

36
SCHOOL OF AUDIO ENGINEERING A02– Music Theory I

Student Notes

1.1 Sharp

A sharp [#] is a note that is higher than the original note. That is, A to A# or C to C# or
any other that is on the keyboard. It is usually the black note that is on the right of that
particular note. It can also be noted that E# is also equivalent to F and B# is a C.

1.2 Flat

A flat [b] is a note that is lower than the original note. That is, A to Ab or C to Cb. In the
case of Cb, we can see from the keyboard that Cb is not a black note. This is one of
the special cases. Cb is also equivalent to B. As for the rest, the flat of that particular
note is the black note on the left. It can be seen that the flat and the sharp are
considered to be shared. G# and Ab is the same note. The same is true for all notes.

1.3 Natural

It is the ‘original’ note. Like A, B, C ….G, A etc.

2. Scales

A scale is a series of notes that proceed up or down by step. (‘Step’ means by tone or by
semitone) A major scale proceeds by following a certain pattern of tones and semitones. But
we’ll get to that in a moment. Make certain that you fully understand the difference between
tones and semitones.

Understanding scales depends on your knowledge of tones and semitones. Please note that
any reference to ‘tone’, means a ‘whole tone’.

We’ll go through the process of writing a major scale step by step (no pun intended), and you’ll
see that writing scales is actually a fairly simple process I would recommend getting a piece of
staff paper and writing out the steps as you see them demonstrated here for you. It will help
you to clearly visualize the entire process. We are going to write an F-major scale in the treble
clef, ascending, using quarter notes.

2.1 Writing an F-Major Scale in the treble clef

STEP 1:

Draw a treble clef on a staff. Then place an F on the staff, the ‘F above middle ‘C’.

37
SCHOOL OF AUDIO ENGINEERING A02– Music Theory I

Student Notes

STEP 2:

Write a note on each line and space, ascending for one octave Remember, any note
below the middle line B should point its stem upward; any note above the middle line
‘B’ should point its stem downward. The ‘B’ itself can go either way.

STEP 3:

You’ve now written a scale, but not necessarily a major scale. Major scales follow a
certain pattern of tones and semitones- T T [S.T] T T T [S.T] (T=tone; S=Semitone).

We now have to examine the intervals between each and every note to see that they
conform to this pattern. If they don’t, we can use accidentals (sharps and flats) to
make them conform. We start by looking at the first two notes, ‘F’ and ‘G’. What is the
distance between these two notes? It is a whole tone. Therefore, the first interval in
the pattern, ‘Tone’, is correct, and we can go on.

Now let us look at the 2nd and 3rd notes, the ‘G’ and ‘A’. The distance between these
two notes is a whole tone, so that conforms to the second interval requirement, tone.
On we go! Our next notes to examine are the 3rd and 4th notes, the A’ and ‘B’. This
forms a whole tone.

But our major-scale pattern says that there should only be a semitone between these
two notes. No problem. We will just lower the B to a B-flat, and now it’s a semitone.
Here’s what we’ve got so far:

We show whole tones with a square bracket and semitones with a slur (curve).

Just keep going, checking each interval between all notes in the scale. You will find
that in this scale, the B-flat is the only accidental that we have to use. Here is the
complete correct F major scale:

38
SCHOOL OF AUDIO ENGINEERING A02– Music Theory I

Student Notes

An F-major scale, as you can see, has one flat. It is the only major scale that has one
flat. All the different major scales use their own set of accidentals. In the next lesson,
you’ll learn how to make a proper key signature from the accidentals that are used.

A major scale can be represented in Tonic solfa by the common

Do Re Mi Fa So La Ti Do. For the major scale of F just constructed,

F=Do, G=Re, A=Mi, B-flat=Fa etc.

For practice, try wilting an A-major scale in the bass clef. Just go back to Step 1 and
start on an ‘A’. Make sure that you write your scale using the process mentioned
above. Start with one octave of notes, and then make your adjustments if necessary.

3. Key signatures

We’ve all seen key signatures - they’re the collection of sharps or flats at the beginning of each
staff. We also know what they mean. When we see the following key signature:

We know that every B, E and A will be flat, unless canceled out temporarily by an accidental.
In the previous lesson’s test, you were asked to write an A-flat major scale. If you did your job
properly, it should have looked like this:

Remember, the square brackets represent whole tones, the rounded ones represent
semitones. Now how do we convert those accidentals to a key signature?

Take a look at the scale and write down all of the accidentals you used. In the case of the A-
flat major scale above, you used: A-flat, E-flat, D-flat, and B-flat.

Now we need to know what order to write them down in a key signature. For that, we have a
nifty little rhyme:

Battle Ends And Down Goes Charles Father

The first letter of each word in this sentence tells us the order that the flats are entered in a key
signature: first the ‘B’, then the ‘E’, the ‘A’, and finally the ‘D. It looks like this, in both clefs of the
Grand Staff.

39
SCHOOL OF AUDIO ENGINEERING A02– Music Theory I

Student Notes

A key signature that uses all seven possible flats will look like this:

The neat thing about the “Battle - Ends...” rhyme is that reversing the order of the rhyme gives
us the order of sharps in a key signature:

Father - Charles - Goes - Down - And - Ends — Battle

A key signature that uses all seven possible sharps will look like this:

3.1 Key Signature hints:

There are some little “tricks” that can help you know which major key belongs to which
key signature Consider this key signature

40
SCHOOL OF AUDIO ENGINEERING A02– Music Theory I

Student Notes

You might think this is a rather complicated one to start with, but in fact it’s quite easy
if you remember this rhyme:

When sharps you see, the last is “Ti’.

‘Ti’, of course, is the solfa name for the seventh note of the scale, the ‘leading tone’
(You will learn more about these technical names in a later lesson.) The last sharp
indicated above is the B#. If that’s the seventh note, we know that the next note will be
the key-note, and it will be one diatonic semitone higher. Therefore, this key signature
belongs to C# major.

Consider this key signature. Now remember this little rhyme:

When flats there are, the last is ‘fa’.

Fa’ is the solfa name for the fourth note of the scale. The last flat indicated above is
the F-flat. If that’s the fourth note, we know that the key-note will be four notes lower.
Counting down in this key signature four notes, we hit ‘C-flat’. Therefore, this key
signature belongs to C-flat major

41
SCHOOL OF AUDIO ENGINEERING A02– Music Theory I

Student Notes

3.2 Accidentals

An accidental is a sharp or flat symbol placed in the music that does not normally
belong to the given key.

By this time you should be familiar with sharps and flats and natural notations. It
would be good to also familiarize yourself with the position of each note in relation
with the keyboard. It would definitely be to your own benefit.

4. Tonic

As you know, every scale degree has a technical name.

When we speak of a note in a scale, we can refer to it by its number: ‘G is note number 1 of a
G-major scale), or by its technical name: ‘G’ is the tonic note in a G-major scale.) A technical
name not only identifies a note, but can also give us information as to the function of a note
within a scale. Furthermore, we can build chords on all of the various notes in a scale, and
identify’ those chords by the technical name. (i.e. a tonic chord)

The notes in the major scale are given technical names according to where their numerical
position on the scale.

• The First is the Tonic


• The Second is the Supertonic
• The Third is the Mediant
• The Fourth is the Subdominant
• The Fifth is the Dominant
• The Sixth is the Submediant
• The Seventh is the Leading tone

We are only going to deal with tonic and dominant chords. This is because tonic and dominant
chords form the basic backbone of much of what we call ‘tonal music. First we need to learn a
couple of important definitions:

42
SCHOOL OF AUDIO ENGINEERING A02– Music Theory I

Student Notes

4.1 Triad:

A three-note chord in which one note is identified as the root, another as the 3rd and
the other as the 5th. A chord is the simultaneous sounding of three or more notes.

A chord can be any three or more notes played together, but a triad has a particular
structure. If we are in the key of A-major, this would be the tonic note:

If we build a triad on top of this note, according to the definition of a triad given above,
it would look like this:

This is a three-note chord in which the bottom note is acting as the root, the middle
one is the 3rd, and the top note is the 5th. Any chord in this structure (root-3rd-5th) is
called a triad, (The numbers 3rd and 5th refer to the intervals above the bottom note.)
We say that this is a tonic triad because it is a triad that has been built on the tonic
note of the key we’re in. It is traditional to indicate the triad by using a Roman
numeral. Since we have just built a triad on the first note of the scale, we place the
Roman numeral for ‘1’ underneath it:

The procedure we just followed to create a tonic triad is the same for any key. Here
are several keys, with tonic triads:

(It is traditional in most schools of theory to indicate major triads with an upper-case
‘I’, and minor triads with a lower-case i.) These are tonic triads because they are
chords built on the tonic note. They are triads because the structure of the chord is 1-
3-5. (Root-3rd-5th) Dominant triads are built in similar fashion as tonic triads In other
words, simply go to the dominant note of the scale, and build a 1-3-5 triad. Let’s take
a good look at the structure of a dominant triad Note this one, in D-major:

43
SCHOOL OF AUDIO ENGINEERING A02– Music Theory I

Student Notes

We put the number ‘V’ underneath it because it is a triad that has been built on the
fifth note of the scale. Furthermore, it is called a dominant triad, because the fifth note
is the dominant note. In a dominant triad, there is always that leading tone, the middle
note that “wants” to move up to the tonic. That is what gives dominant chords their
important place in traditional harmony: they help define the tonic chord in that manner.

IMPORTANT: Dominant triads must always be major, no matter what key you write
them in- Take a look again at the V-chord above. You will see that the bottom note is
the dominant note of the key. The middle note is the leading tone of the key. (i.e., C# is
the leading tone in D-major.) This is important. Dominant chords must always have
the leading tone present. But look at this V-chord in A-minor:

A leading tone is always a semitone, but you can see that the leading tone in this triad
(the middle note) is a whole tone away from ‘A’. So we have to raise the ‘G’ to
become ‘G#’:

The simple way to remember this is to remember this rule: “All dominant chords must
be major, whether you are in a major key or a minor key. If you are in a minor key, you
must raise the third (middle) of the chord to make it major.”

The G# is called an accidental. Here are some more dominant triads, in various keys:

The V-chords in the minor keys above had their middle notes (the 3rd) raised by using
an accidental in order to create a leading tone to the tonic. For example, the 2nd
chord has an E# because E# is a leading tone for the tonic (F#).

44
SCHOOL OF AUDIO ENGINEERING A02– Music Theory I

Student Notes

5. Rhythms and Beat Notation

In music notation, the beat or pulse and therefore the rhythm of the music is represented by
fixed kinds notes and time signatures.

5.1 Pulse or beat

Musical notes are not all held for the same duration. There are long notes and short
ones, and all others in between. Composers need a way of indicating to performers
how long to hold each note. By making each note look a little different, this can be
easily communicated. Here is a whole note, a note you’ve probably seen before,
sitting on a line:

The whole note is not normally found sitting on a line like this, of course, it has been
placed there to help you visualize its length. This diagram is showing that one whole
note takes up the entire line. If we divide the line into two equal parts, a whole note
would be too big to fit in it. We need notes of shorter duration. These are called half
notes:

You can tell with this diagram that it takes two half notes to make a whole note. Let us
keep going. The next smaller note value is called a quarter note:

It takes four quarters to make a whole note. Also, you can tell that it takes two quarter
notes to make one half note. We could keep going, theoretically, forever! However,
let’s just do one more for now. Here are notes of even shorter value, called eighth
notes. They look like quarter notes with flags:

So eight eighths equals one whole. It also equals two halves. It also...

Let’s look at all the diagrams placed together. You can see the relationships between
note lengths very clearly:

Here is an equation that should make sense to you.

45
SCHOOL OF AUDIO ENGINEERING A02– Music Theory I

Student Notes

It shows that two quarter notes equal one half note in length.

Here’s another one:

This may look a little complicated, but take your time and figure it out: if you add
together the lengths of one half note, two eighth notes and one quarter note, you will
get one whole note. It is just the same as the following arithmetic equation:

5.2 Dotted Notes

You know that in many time signatures a quarter note equals one beat. When you add
a dot to a note, you add half of its value to the note. What is half of one?

If you add that to the quarter, you get a note that is 1 and a 1/2 beats long. A dotted
quarter note looks like this:

The dot makes the note half again as long as a quarter note.

Here is a dotted half note:

It is one half note plus half of a half note (one quarter). A dotted half note, therefore, is
three beats long.

Similarly, adding a flag to a note makes a note half as long. Remember the eighth
note?

Without the flag, it would look like a quarter note - one beat. By adding the flag it
becomes a note of half that value - an eighth note By adding another flag, it becomes
half as long as an eighth note - a sixteenth note:

46
SCHOOL OF AUDIO ENGINEERING A02– Music Theory I

Student Notes

It takes two sixteenth notes to equal one eighth note. It takes four sixteenth notes to
equal one quarter note. How many sixteenth notes does it take to make one half
note? Eight. One whole note? Sixteen.

Many times when two or more eighth notes are written side-by-side, the flag is
replaced with a beam: These two beams eighths are exactly the same as if the writer
had written.

Same thing for sixteenths:

Using the beam in place of the flags simply makes it look a little “tidier” and a little
easier for a performer to read. It also indicates beat duration. Concerning the direction
of stems, it is important to know that sometimes stems can point upward as in the
examples above. But stems can also point downward, if the note is above the middle
line of the staff:

If the note is on the middle line, the stem may point either upward or downward.

5.3 Rests

For every note, there is a corresponding rest of the same length. For example, the
whole note

is a note that gets four beats. The whole rest also gets four beats:

As you can see, it looks like a small black rectangle that hangs from the fourth line. It
hangs from that line no matter which clef you use.

The half note gets two beats, and so does the half rest:

Here are the “rest” of the rests!

The quarter rest (I beat):

47
SCHOOL OF AUDIO ENGINEERING A02– Music Theory I

Student Notes

The eighth rest (0.5 beat):

The sixteenth rest (0.25 beat):

6. Time signatures

Writers of music have a convenient way of putting music into “sections” or “compartments” that
make it visually easy to follow. The compartments have been discussed before we call them
“measures” or “bars” Take a look at most printed music, and you’ll see this very clear1y.
Measures are separated from each other by “bar lines”.

You’ll also notice at the beginning of each piece of music a time signature. Simply stated, a
time signature consists of two numbers, one being written above the other, to indicate how
many beats are in each bar.

This is stated directly, with simple time signatures for one, or indirectly, with compound time

signatures for example. Our first task is to discover the differences between simple and
compound time.

6.1 Simple Time Signatures

Simple time signatures tell us two things immediately:

HOW MANY beats are in each bar?

What kind of note gets the beat, Study the following:

The time signature tells us two things:

The ‘2’ tells us that there are 2 beats in every bar, and

The ‘4’ tells us that each beat is one quarter note long. Simple! (Guess that’s why they
call it a simple time signature?) Also, notice in bar 2 that the eighth notes have been
beamed together in groups of two. That’s because two eighth notes together are one
quarter note in length. The writer is showing us that the quarter note gets the beat.

48
SCHOOL OF AUDIO ENGINEERING A02– Music Theory I

Student Notes

Here’s the same excerpt with the beats shown above the music:

If we were to count along with the excerpt as it is played, we would say “1, 2, 1, 2, 1,
2” etc.

The subdivision or breakdown of a beat is its number of components. In simple time


signatures, each beat can be “subdivided” into two parts. Here is the same excerpt
with the subdivision, or breakdown, shown underneath:

This excerpt shows four things that describe all simple time signatures:

1. The beat is an un-dotted note.


2. Each beat is subdivided into two components.
3. The top number is not divisible by ‘3’ (Except for time signatures with a ‘3’
on top, which are often simple time signatures!)
4. Simple time signatures show the number of beats in every bar.

6.2 Compound Time Signature

Unlike simple time signatures, compound time signatures do not directly show us the
number of beats per bar. Instead, they show us the number of breakdown notes per
bar.

Study the following:

In this excerpt, we can see that the writer has beamed the first three eighth notes
together. The writer is showing that the first three eighths form one beat; that’s why
they were beamed together. We therefore need to take the eighth notes and
“condense” them to discover what the beat is. Condensing the three eighths down to
th th th
one note gives us a dotted quarter. (1/8 + 1/8 + 1/8 = 1 dotted quarter note.)

In other words, the beat in a bar of music is the dotted quarter. You can see that
by going through the two bars of the excerpt, it is possible to apply dotted quarter note
beats. Here’s what it looks like:

49
SCHOOL OF AUDIO ENGINEERING A02– Music Theory I

Student Notes

Just like with simple time signatures, we can break down each beat into beat
subdivisions. However, though simple time beats break down into two parts,
compound time beats break down into three parts:

You can see that each bar has SIX breakdown notes. The breakdown notes are
EIGHTH notes. Therefore, the time signature is six eights.

So, here are the four things that describe compound time signatures:

1. The beat is a dotted note,


2. Each beat is subdivided into three components.
3. The top number is evenly divisible by ‘3’. (Except for time signatures with a
‘3’ on top.
4. Compound time signatures show the number of breakdown notes in every
bar.

Armed with that knowledge, you should be able to say what time signature the
following excerpt is in:

So let’s study it. Look at bar 1. Notice that the eighth notes are beamed together in
groups of two. Each one of those eighth note pairs can “condense down” to form one
quarter note. Looks like the quarter note may be the beat unit in this excerpt. Can we
apply a quarter note beat pattern to the whole excerpt? Absolutely! This is what it
would look like:

Since applying quarter notes as a beat unit seem to work, we can tell that this is a
simple time signature:

The beat is an un-dotted note.

50
SCHOOL OF AUDIO ENGINEERING A02– Music Theory I

Student Notes

Each beat will divide into two components (one quarter note subdivides into two
eighth notes).

Since we know that it is simple time, the actual time signature should be the same as
the number of beats per bar.

Let us try another one:

Look at how the eighth notes are beamed. Notice in particular, the last group of notes
at the end of the first bar. The dotted eighth, sixteenth, and eighth note have all been
beamed together. If we condense those three notes down, we get one note which is a
dotted quarter in length. It appears that perhaps the dotted quarter will be the beat unit
in this excerpt. Let us see if we can apply a dotted quarter beat to the entire excerpt.
The eighth rest and two eighth notes at the beginning would certainly be explained in
terms of a dotted quarter beat. That leaves the quarter note and the eighth note in the
middle, and that too can fit into the dotted quarter beat pattern.

So how do we assign this excerpt a time signature? The beat is a dotted note, so this
is a compound time. Therefore the numbers of the time signature will reflect the
number of breakdown notes in each bar. As this is compound time, the beat breaks
down into three parts:

How many breakdown notes in each bar? Nine. What kind of notes are the breakdown

notes? Eighth notes. Therefore the time signature is

7. Bar and bar line

Music is often divided up into units called measures or bars. Each measure has a certain
number of beats. The number of beats is determined by the time signature. (Another word for
time signature is meter). For example, some music is written so that every measure has four
beats, and that the quarter note is the unit that ‘gets the beat”.

51
SCHOOL OF AUDIO ENGINEERING A02– Music Theory I

Student Notes

In such a piece the time signature would be we say “four fours when we read this time
signature. In this time signature, the top ‘4’ represents the number of beats per bar: four. The
bottom ‘4 tells us what kind of note gets the beat. The bottom four means “quarter note”.

are some of the commonly used time signatures.

There are things you will eventually need to know about all time signatures. For example, you
will eventually learn that the time signatures listed above are called simple time signatures. But
that’s not necessary right now. All you need to know is that in each of these particular
time signatures:

• The top number tells us how many beats.


• The bottom number tells us what kind of note gets the beat.

Take a look at the following piece of music:

This is a piece of music that has been written in three fours.

That’s obvious, because of the time signature at the beginning of the piece. But let’s say that
the composer forgot to put a time signature at the beginning. How would we be able to know
that the piece was in three fours time?

Well, if you count up the number of beats in each bar, you would find that each bar has three
beats, and that each beat is a quarter note:

Bar1: 3 quarter notes = 3 beats.

Bar2: 4 eighth notes plus 1 quarter note = 3 beats.

Bar3: 1 half note plus 2 eighth notes = 3 beats.

Bar4: 1 dotted half note = 3 beats.

It is necessary, in any given time signature, to make sure that each bar has the same number
of beats, and that the number of beats is the top number of the time signature. If we were to
take the example above and write the count of each bar, it would look like this:

If you play a musical instrument, you are probably already familiar with ‘counting” in this
manner.

What if you were to get a piece of music in which the composer put the time signature at the
beginning, but ‘forgot” to draw in the bar lines?

52
SCHOOL OF AUDIO ENGINEERING A02– Music Theory I

Student Notes

The time signature is two fours. So count two beats, then draw a bar line; then count another
two beats and draw another bar line. It should work out that every bar gets two beats, because
that is what the time signature means.

Here is what it should look like once you have drawn the lines in:

Bar 1: 2 eighths plus 1 quarter = 2 beats.

Bar 2. 4 sixteenths plus1 quarter 2 beats, etc...

You can see that each bar gets 2 beats. The counts have been written in. Notice that each
beat gets a number. In bar I, the first eighth gets a “1. The second eighth gets a “+” to indicate
that its in-between beats one and two. In bar 2 the first sixteenth gets a “1”. The next sixteenth
gets an “e” (our way of showing a note that is one sixteenth past the beat). The next sixteenth
is a “+” because it is one eighth past the beat. The fourth sixteenth gets an “a”. (Our way of
showing a note that is the fourth sixteenth past the beat.) This funny way of showing the

counts makes it easy to say the counts. For example, if you saw a bar of music in that had
eight sixteenth notes, you would say the count like this: “One -e- and - a Two –e- and- a”.

Sometimes we have to write the counts into a bar that features syncopation.

Syncopation occurs when the normal rhythmic stresses in a bar are changed.

For example, normally in a piece of music written in one tends to be quite aware of a strong
- weak - strong - weak” pulsing of the music. If you come across a piece of music in which the
eighth note gets the beat, then each eighth note gets a number, and each sixteenth gets a “+”:

8. Additional notes

Do not forget that in a bar of notes, when there is an accidental, the notes consecutively will be
raised or lowered. In order to get back the original note, you would have to raise or lower the
accidental back to the natural note.

8.1 Double-Sharps & Double-Flats

53
SCHOOL OF AUDIO ENGINEERING A02– Music Theory I

Student Notes

As you already know, raising a letter-name by one semitone can be represented by


placing a sharp in front of the note:

Similarly, lowering a letter name by one semitone can be represented by placing a flat
in front of the note:

By placing a sharp in front of a note, we raise that note by one semitone. By placing a
flat in front of a note, we lower that note by one semitone. There are situations that
arise in which we need to raise a note that is already sharp, thus creating a double-
sharp.

You will see that these situations occur most often in the building of certain minor
scales. A double-sharp sign looks like the letter x:

This note, called ‘a-double-sharp,’ is two semitones higher than ‘A’. If you were to play
it on your instrument, you would play a B. Therefore, we say that ‘A-double-sharp’ and
‘B’ are enharmonic equivalents.

A double-flat literally looks like two flat signs side-by-side:

This note, called ‘a-double-flat’, is two semitones lower than ‘A’. If you were to play it
on your instrument, you would play a ‘G’. A-double-flat and ‘0’ are said to be
enharmonic equivalents. When two notes are ENHARMONICALLY EQUIVALENT,
which means that they both produce the same pitch frequency.

9. Modes

A mode is a type of scale. You have already learned to write major and minor.

Music based on major and minor scales came into common usage in the early 1600s, and of
course we have been using them ever since. Before the 1600s, composers wrote in what were
called modes. There was a resurgence of interest in modes toward the end of the 19th
century, with composers like Debussy.

Modal melodies can be very beautiful, and their study is certainly worthwhile. Such study of
modes can get quite in-depth, and is a fascinating field of study.

54
SCHOOL OF AUDIO ENGINEERING A02– Music Theory I

Student Notes

However, for our purposes here as a rudimentary music theory course, we shall only delve into
the basic construction of modes so that we can identify and write them.

The first and perhaps most important thing is: A mode is distinguished by the pattern of tone
and semitones, not by the actual pitches used.

Take a look at this C-major scale, starting on a middle C and proceeding upward for one
octave.

The tones and semitones have been indicated, and you can tell by that tone-semitone pattern
that this is indeed a C-major scale. What if you were to take this same C-major scale, but
instead of starting on a ‘C’, started on a D and proceeded upward for one octave. It would look
like this:

It still has the pattern of tones and semitones that belong to C-major; it is just that the scale
now starts and ends on a ‘D’ instead of a ‘C’. We call this scale the Dorian mode.

We say that the note ‘D’ is the key note, or final, of the mode. A scale that runs from what
appears to be the second degree (supertonic) up to the second degree an octave higher is
said to be in the Dorian mode.

We can start a scale on all the different notes of our C-major scale above. For example, if we
write a scale from the mediant to the mediant, we get the Phrygian mode:

(The tone-semitone pattern is still that of the C-major scale.)

Subdominant to subdominant gives us the Lydian mode:

Dominant to dominant produces the mixolydian mode:

55
SCHOOL OF AUDIO ENGINEERING A02– Music Theory I

Student Notes

Submediant to submediant produces the Aeolian mode:

And leading tone to leading tone makes the locrian mode:

Incidentally, when you write a major scale from the tonic note up to the tonic note, you are also
forming a mode, called the Ionian mode. So something in C-major could technically be said to
be in C-Ionian, though we more often than not simply call it ‘C-major

56
AE03 – Analog Tape Machines

1. The Analogue Tape Recorder (ATR)

2. The Tape Transport

3. Capstan Motors

3.1 Hysteresis Motor.

3.2 DC Servo Motor.

a. Open-Loop System

b. Closed Loop

c. Zero Loop System

TAPE AND HEAD CONFIGURATIONS.

1. Configurations

2. Track Width

3. Tape Speed

4. Recording Channels

5. Input Signal Modes

5.1 Input Mode

5.2 Reproduce Mode.

5.3 Sync Mode

PROBLEMS WITH MAGNETIC TAPE

1. Bias Current and Dynamic Range of Magnetic Tape

2.1 Record Equalisation

2.2 Playback equalization

2.3 Equalisation standards

3. Head Alignment

3.1 Height

57
3.2 Azimuth

3.3 Zenith

3.4 Wrap

3.5 Rack

4. Electronic Calibration

TAPE SERVICING

1. Print-Through - Tails out tape storage

2. Cleanliness

3. Degaussing

58
SCHOOL OF AUDIO ENGINEERING A03– Analog Tape Machines

Student Notes

AE03 – ANALOG TAPE MACHINES

1. The Analogue Tape Recorder (ATR)

Calling a taperecorder analog refers to its ability to transform an electrical input signal into
corresponding magnetic energy that is stored in the form of magnetic remanents on the tape.

2. The Tape Transport

The process of recording the audio bandwidth on magnetic tape, using a professional reel to
reel ATR depends on the machine’s ability to pass tape across a head path at a constant
speed, with a uniform tension. The mechanism which accomplishes these tasks is called the
tape transport system.

Transport Technology is based on the relationship of physical tape length to specific periods
of time. A professional multitrack recorder runs at one of 2 speeds : 15ips or 30ips. During
playback the time spectrum must be kept stable by duplicating the precise speed at which the
tape was recorded. Thus preserving the original pitch, rhythm and duration of the recorded
programme.

The elements of the transport deck of a studer A-840 1/4” Mastering Machine:

A Supply Reel

B Takeup reel

C Capstan

D Capstan Idler

E Tape guides

F Tension Regulators

G Tape Shuttle Control

H Transport Controls and Tape Timer

The transport controls the movement of the tape across the heads at a constant speed. The
mechanism responds to the various transport control buttons to carry out basic operations
such as :

Play/Record, Stop, Pause, Fast Foreward and Rewind. In the past, one had to take care not
to stop a machine that was in fast shuttle as this could stretch the tape. Instead a rocking
proceedure was used to slow the tape by cutting between FF and REW.

Newer reel to reel ATRs employ an electronics called Total Transport Logic(TTL) which allows
the operator to safely push play whilst rewinding or fast forewarding. TTL uses sensors and an
automatic rocking procedure to control tape movement.

59
SCHOOL OF AUDIO ENGINEERING A03– Analog Tape Machines

Student Notes

The EDIT button frees the tension from the supply and take up reels so that the tape may be
moved manually across the head block to determine an edit point.

3. Capstan Motors

The capstan is the most critical element in the transport system. It is the shaft of a rotational
motor which is kept at a constant rate of speed. There are 2 common types of capstan
motors:

3.1 Hysteresis Motor.

Maintains a constant speed by following the supply voltage frequency from the power
line - a stable reference of 50 or 60 Hz.

3.2 DC Servo Motor.

Uses Motion sensing feedback circuitry. A notched tachometer disk is mounted


directly on the capstan motor shaft. A light is shone through the rotating disk and a
sensor counts the number of disc notches per second registered as light flashes. A
Resolver compares the actual state of rotation with a standard reference to give a
highly accurate and stable capstan speed. This design is now the standard in pro
ATRs.

4. Tape Transport Systems.

Three methods of transporting tape across a headpath:

a. Open-Loop System

Here the tape is squeezed between the capstan and the capstan idler to move the
tape. A small amount of torque or takeup is applied to the supply reel motor, ie in an
opposite direction to that of the tape travel. This provides the right amount of tension
for tape to head contact. A small amount of take-up reel torque helps to spool the tape
onto the take-up reel after the headblock.

b. Closed Loop

The tape guide path is isolated from the rest of the transport with unsupported
sections of tape kept to a minimum. This closed loop minimizes distortions associated
with open loop systems. The tape is actually pulled out of the headblock at a faster
rate than it is allowed to enter.

c. Zero Loop System

Takes full advantage of TTL logic and dc servo feedback circuitry in a system that
does not emply a capstan. Insteadthe tape is shuttled and kept at the right tension by
the interplay of the supply and take-up reel motors. Motion and tape-tension sensors
on both sides of the headblock are used to continually monitor and adjust speed and
tension.

60
SCHOOL OF AUDIO ENGINEERING A03– Analog Tape Machines

Student Notes

TAPE AND HEAD CONFIGURATIONS.

1. Configurations

Number of tracks recorded per width of tape. Professional ATRs (Studer, Otari, Sony) are
available in a wide number of track and tape-width configurations.

2 Trk, 1/4” ; 4-trk, 1/2” ; 8-trk, 1” ; 16- and 24-trk 2”

Budget or Semi-pro configurations (Tascam, Fostex) save money by placing more tracks on
smaller width tapes:

4Trk, 1/4” ; 8- and 16-trk 1/2”

2. Track Width

With greater track width, an increased amount of magnetism can be retained by the tape,
resulting in a higher output signal and an improved SNR. A wider track makes the recording
less susceptible to signal dropouts.When the space between tracks (Guardband) is wider,
crosstalk is less.

3. Tape Speed

Directly related to signal level and bandwidth. More domains pass over the heads at high
speeds therefore making the average magnetisation greater. This allows for increased input
level before saturation. This means you can get more signal on tape before distortion with
increases the SNR of the recording.

At faster tape speeds recorded bandwidth is increased. This is because the wavelengths of
high frequencies are longer ie cover more tape, at fast speeds, hence these frequencies will
not be lost in the gap of the repro head (Scanning Loss).

Common audio production tape speeds are 71/2ips, 15ips and 30ips. 30ips has professional
acceptance for the best frequency response and SNR.

4. Recording Channels

The channel circuitry of an ATR is composed of a number of identical channel modules which
corresponds to the number of tracks the machine can record and play back. Each channel
must perform the same three functions: Record, Reproduce and Erase. These electronic
modules enable adjustment to input level, output level, sync level and equalisation. Monitoring
of signal levels for each channel is done via a dedicated VU meter.

5. Input Signal Modes

The output signal of each ATR module may be switched between three modes: Input,
Repro and Sync.

5.1 Input Mode

61
SCHOOL OF AUDIO ENGINEERING A03– Analog Tape Machines

Student Notes

The output signal is derived from the input section of the module.

5.2 Reproduce Mode.

The signal is derived from the playback head.

5.3 Sync Mode

Sync mode is used during overdubbing on a multitrack ATR. Here we wish to record
new material whilst monitoring the recorded material off tape. The signal from the
repro head is always slightly delayed compared to the record head because of the
distance between the 2 heads. This delay interferes with accurate monitoring and
makes overdubbing impossible. A track selected in sync mode will be played back by
the record head and will thus appear to be synchronised to the input signals.

PROBLEMS WITH MAGNETIC TAPE

1. Bias Current and Dynamic Range of Magnetic Tape

The development of magnetic flux under the influence of magnetisation is not linear. ie it is
slow on the takeup around zero, (Inertia Point) and it trails off at saturation. When a signal is
recorded, it will be distorted at the crossover area around the zero point where there is only a
very small amount of magnetic flux.

This nonlinearity of tape’s magnetic response leads to a restriction in the dynamic range of
magnetic tape. Compensatory measures to extend tape linearity are taken in the application of
bias current to the record heads.

Bias current is applied by mixing the incoming audio signal with a high frequency signal of
150-250KHz . This signal modulates the amplitude of the input signal to a higher average flux
level. Thus the signal to be recorded is moved away from the nonlinear crossover range and
into a linear portion of the curve.

On playback, the high frequency bias signal is ignored by the repro head and only the input
signal is reproduced.

Setting bias level (amplitude) is crucial to optimising the SNR of the recorded material. Bias
level varies with individual record heads and different types of tapes.

2. Tape Equalisation

Two types of equalisation are applied to the record and playback signals to increase the
linearity of the frequency response of magnetic tape.

2.1 Record Equalisation

Magnetic tape has a nonlinear frequency response curve ie recorded low and high
frequency responses are not flat which of course implies that in the recording of these

62
SCHOOL OF AUDIO ENGINEERING A03– Analog Tape Machines

Student Notes

frequencies the SNR decreases. In the record stage, if levels were recorded at a
normal level, high and low frequencies would be too low and not achieve adequate
magnetisation. The audio signal thus has its highs and lows boosted (Pre-emphasis)
before it is recorded by the record head.

Therefore the levels on tape are unnatural and need to be restored on playback. This
is achieved by a complementary Post-emphasis equaliser in the playback circuit
which readjusts the high and low frequencies back to their proper levels.

2.2 Playback equalization

This is the 6dB/octave filter inserted in the playback circuitry to compensate for the
doubling of level per octave response of magnetic tape.

2.3 Equalisation standards

Tape machines have one of three equalisation standard settings. Each EQ setting is
used in a different part of the world.

N.A.B. National Association of Broadcasters (US, Canada, Singapore at 15ips)

IEC International Electrotechnical Commission (Aust, 15ips)

CCIR/DIN International Radio Consultative Commission and Deutsche Industrie


Normen (Europe 15ips

AES Audio Engineering Society (30ips)

3. Head Alignment

An important factor affecting recording quality is the magnetic tape head’s physical positioning
or alignment. For optimum recording the head must track the tape exactly. The head has five
dimensions of alignment; adjustment usually performed by screws on the headblock.

3.1 Height

Determines where the signal will be recorded. The height of the record and playback
heads must be aligned in relation to the tape path and each other for the full
reproduction of the recorded signal.

63
SCHOOL OF AUDIO ENGINEERING A03– Analog Tape Machines

Student Notes

Height

Record Playback

3.2 Azimuth

The tilt of the head in the plane parallel to the tape. All headgaps should be 90
degrees perpendicular to the tape so that they are in-phase with each other.

Azimuth

3.3 Zenith

The tilt of the head towards or away from the tape. The tape must contact the top and
the bottom of the head with equal force otherwise the tape will Skew (ride up and

64
SCHOOL OF AUDIO ENGINEERING A03– Analog Tape Machines

Student Notes

down on the head). Uneven zenith adjustment leads to an uneven wear path on the
head.

Zenith

wear

3.4 Wrap

The angle at which the tape bends around the head and the location of the gap in
that angle.

correct wrap Incorrect wrap

3.5 Rack

How far foreward the head is. Determines the pressure of the tape against the head.

65
SCHOOL OF AUDIO ENGINEERING A03– Analog Tape Machines

Student Notes

Rack

4. Electronic Calibration

Tape formulations differ from each other to the extent that an ATR must be calibrated to to
optimise its performance with a particular tape formulation. ATRs have variable electronic
adjustments for record/playback level, equalisation, bias current. Standard levels must be
adhered to so that tapes may be played on different ATRs. The procedure used to set the
controls to standard levels is called Calibration.

Calibration is carried out to provide a standard reference for levels and equalisation. A
reproduce alignment tape is used which is available in various tape speeds and track width
configurations. The tape contains the following set of recorded materials:

• Standard Reference level - 700Hz or 1Khz signal recorded at a standard


reference flux level of 185 nWb/m.
• Azimuth adjustment tone - 15Khz for 30 secs.
• Frequency response tones from high to low.

Procedure for alignment:

i. The playback head is first calibrated for each track using the reference tape
and setting repro level and high frequency playback levels.
ii. Next the record head bias level is adjusted. Bias signal is increased to a
peak and then pulled back 1dB.
iii. Next the record head is calibrated, again on each track, by recording
reference tone of 700 Hz from the oscillator onto fresh tape at 0VU.
iv. Record 10KHz tone at 0VU and adjust record Hi-freq EQ.
v. Use a 50Hz tone to adjust Low freq playback EQ.

66
SCHOOL OF AUDIO ENGINEERING A03– Analog Tape Machines

Student Notes

TAPE SERVICING

1. Print-Through - Tails out tape storage

A form of deterioration of tape quality. Print-through is the transfer of a recorded signal from
one layer of magnetic tape to an adjacent layer by means of magnetic induction. This gives
rise to an audible pre-echo or ghosting effect on playback. Print-through is a tape storage
problem. When the tape is rolled up print-through is greatest between a layer of tape and the
one immediately above it, which are in phase when the tape has been wound up onto the
supply reel with its beginning or “Head” out..

Conversely, bad print-through can be avoided if the tape is wound for storage onto the takeup
reel so that it is “Tail Out”. Print-through still occurs, but the ghosting effect is reversed as an
echo which follows and is therefore masked by the original sound.

2. Cleanliness

The ATR heads and transport must be kept clean of oxide particles which are shed due to the
friction of tape running over the heads. Oxide accumulation is most critical on the heads
themselves leading to a loss of signal in record and playback.

Tape heads and metal transport guides are cleaned with denatured (pure) alcohol and a soft
cotton swab. Tape head cleaning should occur before every recording session.

3. Degaussing

Heads do tend to retain small amounts of residual magnetism which can lead to high
frequency deterioration in record and repro. Periodic Degaussing or demagnetising of the
heads is necessary.

A degausser operates like an erase head: ie it produces a high level alternating signal which
saturates and randomises the residual magnetic flux. Degaussing procedure requires care or
the heads can be harmed. All tapes sould be removed and the ATR switched off. The
Degausser is turned on and slowly moved towards the head. It is gently moved across the
head without touching it and then slowly removed. Repeat the same procedure for each head.

1
Assignment 1 – AE001

67
SCHOOL OF AUDIO ENGINEERING A03– Analog Tape Machines

Student Notes

68
AE04 – The Decibel

1. Logarithms

2. What is a Decibel?

3. Sound intensity, power and pressure level

4. Sound intensity level

5. Sound power level

6. Sound pressure level

7. Adding sounds together

7.1 Correlated sound sources

7.2 Uncorrelated sound sources

8. Adding decibels together

9. The inverse square law

9.1 The effect of boundaries

DB in Electronics

1. Power. And Voltage

2. Relative Versus Absolute Levels

2.1 dBm

2.2 dBu

2.3 dBV and dBv

2.6 dBW

3. Equal Loudness contours and Weighting networks

4. Other concepts

4.1 Dynamic range of common recording formats

69
SCHOOL OF AUDIO ENGINEERING A04– The Decibel

Student Notes

AE04 – THE DECIBEL

1. Logarithms

Logarithms allow smaller numbers to represent much larger values.

Going to basic algebra

Xy = Z

X is the base, y is the power to which X is raised and Z is the computed number.

If the base is 10 then y can be called the logarithm (log for short) of Z. This simply means, y is
the number to which 10 must be raised to get Z

Let’s use numbers to make it clearer.

103 = 1000

This means the log of 1000 is 3.

This means 3 can be used to represent 1000 as long as it is known that 3 is a logarithm.

This computation can be made with the use of a scientific calculator. At this point, learning how
would make the rest of this section easier.

Antilog basically refers to the reverse process. If 3 is the log of 1000, then 1000 is the antilog of
3. Antilog is the number you get when you raise 10 to the log value.

2. What is a Decibel?

Numerous attempts have been made to explain one of the most common, yet confusing terms
in audio, the “dB.”

“dB” is an abbreviation for “decibel,” and it need not be all that difficult to grasp, if properly
presented. If you’re one of the many people who is “a little fuzzy” about decibels, the following
explanations should clear things up for you.

Mathematical Definition of the dB

The dB always describes aratio of two quantities…quantities that are most often related to
power. The reason that the dB is used is that it is logarithmic, and therefore smaller numbers
can be used to express values that otherwise would require more digits. Also, since our ears’
sensitivity is “logarithmic,” dB values relate to how we hear better than do absolute numbers or
simple ratios. Thus, the dB was intended to simplify things, not to complicate them.

70
SCHOOL OF AUDIO ENGINEERING A04– The Decibel

Student Notes

The decibel is actual 1/10 of a Bel (a unit names after Alexander Graham Bell, which is why
the “B” of dB is upper case).

The decibel is more convenient to use in sound systems, primarily because the number
scaling is more natural. Since a decibel (dB) is 1/10 of a Bel, it can be mathematically
expressed by the equation:

dB = 10 log (P1 ÷ P2)

it’s important that you realise that a logarithm describes the ratio of two powers, not the power
value themselves. Therefore, log has no unit. To demonstrate this, let’s plug in some real
values in the dB equation.

PROBLEM: What is the ratio, in dB, of 2 watts to 1 watt?

dB = 10 • log (P1 ÷ P2)

= 10 • log (2 ÷ 1)

= 10 • log 2

= 10 • .301

= 3.01

=3

so the ratio of 2 watts to 1 watt is 3 dB.

NOTE: If you don’t have a calculator that gives log values, or a book with log tables, then you
need to know that the logarithm of 2 is .301 in order to solve the above equation. A calculator
that can perform log calculations helps a lot.

PROBLEM: What is the ratio, in dB, of 100 watts to 10 watts?

dB = 10 • log (P1 ÷ P0)

= 10 • log (100 ÷ 10)

= 10 • log 10

= 10 • 1

= 10

so the ratio of 100 watts to 10 watts is 10 dB.

71
SCHOOL OF AUDIO ENGINEERING A04– The Decibel

Student Notes

The two previous problems point out interesting aspects of using the dB to express power
ratios:

whenever one power is twice another, it is 3 dB greater (or if it is half the power, it is 3 dB less),

whenever one power is ten times another, it is 10 dB greater (or if it is 1/10 the power, it is 10
dB less).

One can begin to see the reason for using dB by expressing a few other power values. For
instance, how much greater than 100 watts is 1,000 watts? That’s a 10:1 ratio, so, again it is
10 dB. What is the relationship of one milliwatt to 1/10 watt? One milliwatt is 1/1000 watt, and
that’s 1/100 of 1/10 watt, which means it is 10 dB below 1/10 watt.

Decibels are applied throughout the audio field to represent levels. In audio, decibels for
specific ratios have a fixed reference. This fixed reference allows the decibel value to have an
absolute value. This will become clearer later.

3. Sound intensity, power and pressure level

The energy of a sound wave is a measure of the amount of sound present. However, in
general we are more interested in the rate of energy transfer, instead of the total energy
transferred. Therefore we are interested in the amount of energy transferred per unit of time,
that is the number of joules per second (watts) that propagate. Sound is also a three-
dimensional quantity and so a sound wave will occupy space. Because of this it is helpful to
characterize the rate of energy transfer with respect to area, that is, in terms watts per unit
area. This gives a quantity known as the sound intensity, which is a measure of the power
density of a sound wave propagating in a particular direction.

4. Sound intensity level

The sound intensity represents the flow of energy through a unit area. In other words it
represents the watts per unit area from a sound source and this means that it can be related to
the sound power level by dividing it by the radiating area of the sound source. Sound intensity
of real sound sources can vary over a range, which is greater than one million million (1012).
Because of this, and because of the way we perceive the loudness of a sound, the sound
intensity level is usually expressed on the logarithmic scale, using the decibel. This scale is
based on the ratio of the actual power density to a reference intensity of 1 picowatt per square
metre (10-12 Wm-2). Thus the sound intensity level (SIL) is defined as :

dBSIL = 10 log10 Iactual )

Iref

72
SCHOOL OF AUDIO ENGINEERING A04– The Decibel

Student Notes

Where Iactual = the actual sound power flux level (in Wm-2)

and Iref = the reference sound power flux level (10-12 W m-2)

5. Sound power level

The sound power level is a measure of the total power radiated in all directions by a source of
sound and it is often given the abbreviation SWL, or sometimes PWL. The sound power level
is also expressed as the logarithm of a ratio in decibels and can be calculated from the ratio of
the actual power level to a reference level of 1 picowatt (10-12 W) as follows:

dBSWL = 10 log10 wactual

w ref

where wactual = the actual sound power level (in watts)

and wref = the reference sound level (10-12W)

The sound power level is useful for comparing the total acoustic power radiated by objects, for
example ones which generate unwanted noises. It has the advantage of not depending on the
acoustic context. Note that, unlike the sound intensity, the sound power has no particular
direction.

6. Sound pressure level

The sound intensity is one way of measuring and describing the amplitude of a sound wave at
a particular point. However, although it is useful theoretically, and can be measured, it is not
the usual quantity used when describing the amplitude of a sound. Other measures could be
either the amplitude of the pressure or the associated velocity component of the sound wave.
Because human ears are sensitive to pressure, and because it is easier to measure, pressure
is used as a measure of the amplitude of the sound wave. This gives a quantity, which is
known as the sound pressure, which is the root mean square (rms.) pressure of a sound wave
at a particular point. The sound pressure for real sound sources can vary from less than 20
microPascals (20 µPa or 20 × 10-6 Pa) to greater than 20 Pascal (20 Pa). Note that 1 Pa
equals a pressure of 1 Nm-2. These two pressures broadly correspond to the threshold of
hearing (20 µPa) and the threshold of pain (20 Pa) for a human being, at a frequency of 1 kHz,
respectively. Thus real sounds can vary over a range of amplitudes which is greater than a
million. Because of this, and because of the way we perceive sound, the sound pressure level
is also usually expressed on the logarithmic scale. This scale is based on the ratio of the actual
sound pressure to the notional threshold of hearing at 1 kHz of 20 µPa. Thus the sound
pressure level (SPL) is defined as:

73
SCHOOL OF AUDIO ENGINEERING A04– The Decibel

Student Notes

dBSPL = 20 log10 pactual (Eq.# 1.12)


pref

where pactual = the actual pressure level (in Pa)

and pref = the reference pressure level (20 µPa)

The multiplier of 20 has a two-fold purpose. The first is to make the result a number in which
an integer change is approximately equal to the smallest change that can be perceived by the
human ear. The second is to provide some equivalence to intensity measures of sound level
as follows.

The intensity of an acoustic wave is given by the product of the volume velocity and pressure
amplitude as:

Iacoustic = VP

Where P = the pressure component amplitude

And V = the volume velocity component amplitude

Acoustic impedance is analogous to the electrical impedance but for acoustic. It can be
calculated from

V = P
Zacoustic

The pressure and velocity component amplitudes of the intensity are linked via the acoustic
impedance so the intensity can be calculated in terms of just the sound pressure and acoustic
impedance by:

Iacoustic = V * p = __P_* P= ___P2_


Zacoustic Zacoustic

Therefore the sound intensity level could be calculated using the pressure component
amplitude and the acoustic impedance using:

___P2___
SIL = 10 log10 Iacoustic = 10 log10 Zacoustic = 10log10 ___P2__
Iref Iref Zacoustic Iref

This shows that the sound intensity is proportional to the square of pressure, in the same way
that electrical power is proportional to the square of voltage. The operation of squaring the
pressure can be converted into multiplication of the logarithm by a factor of two, which gives:

SIL = 20 log10 _____P___


√Z acoustic Iref

74
SCHOOL OF AUDIO ENGINEERING A04– The Decibel

Student Notes

This equation is similar to the previous dB equations except that the reference level is
expressed differently. In fact, this equation shows that if the pressure reference level was
calculated as:

_________ ________
Pref = √Z acoustic Iref = √416 × 10-12 = 20.4 × 10-6

then the two ratios would be equivalent. The actual pressure reference level of 20 µPa is close
enough to say that the two measures of sound level are broadly equivalent. That is, SIL ≈ SPL
for a single sound wave a reasonable distance from the source and any boundaries. They can
be equivalent because the sound pressure level is calculated at a single point and sound
intensity is the power density from a sound source at the measurement point. However,
whereas the sound intensity level is the power density from a sound source at the
measurement point, the sound pressure level is the sum of the sound pressure waves at the
measurement point. If there is only a single pressure wave from the sound source at the
measurement point, that is there are no extra pressure waves due to reflections, the sound
pressure level and the sound intensity level are approximately equivalent, SIL ≈ SPL.

This will be the case for sound waves in the atmosphere well away from any reflecting
surfaces. It will not be true when there are additional pressure waves due to reflections, as
might arise in any room or if the acoustic impedance changes. However, changes in level for
both SIL and SPL will be equivalent because if the sound intensity increases then the sound
pressure at a point will also increase by the same proportion, provided that nothing alters the
number and proportions of the sound pressure waves arriving at the point at which the sound
pressure is measured. Thus a 10 dB change in SIL will result in a 10 dB change in SPL.

These different means of describing and measuring sound amplitudes can be confusing and
one must be careful to ascertain which one is being used in a given context.

In general, a reference to sound level implies that the SPL is being used because the
pressure component can be measured easily and corresponds most closely to what we hear.

7. Adding sounds together

So far we have only considered the amplitude of single sources of sound. However, in most
practical situations where more than one source of sound is present, these may result from
other musical instruments or reflections from surfaces in a room. There are two different
situations, which must be considered when adding, sound levels together.

7.1 Correlated sound sources

In this situation the sound comes from several sources which are related. In order for
this to happen the extra sources must be derived from a single source. This can
happen in two ways. Firstly, the different sources may be related by a simple
reflection, such as might arise from a simple reflection from a nearby surface. If the
delay is short then the delayed sound will be similar to the original and so it will be
correlated with the primary sound source. Secondly, the sound may be derived from
a common electrical source, such as a recording or a microphone, and then may be
reproduced using several loudspeakers. Because the speakers are being fed the

75
SCHOOL OF AUDIO ENGINEERING A04– The Decibel

Student Notes

same signal, but are spatially disparate, they act like several related sources and so
are correlated.

7.2 Uncorrelated sound sources

In this situation the sound comes from several sources which are unrelated. For
example, it may come from two different instruments, or from the same source but
with a considerable delay due to reflections. In the first case the different instruments
will be generating different waveforms and a t different frequencies. Even when the
same instruments play in unison, these differences will occur. In the second case,
although the additional sound source comes from the primary one and so could be
expected to be related to it, the delay will mean that the waveform from the additional
source will no longer be the same. This is because in the intervening time, due to the
delay, the primary source of the sound will have changed in pitch, amplitude and
waveshape. Because the delayed wave is different it appears to be unrelated to the
original source and so is uncorrelated with it.

8. Adding decibels together

Decibels are a logarithmic scale and this means that adding decibels together is not the same
as adding the sources’ amplitudes together. This is because adding logarithms together is
equivalent to the logarithm of the multiplication of the quantities that the logarithms represent.
Clearly this is not the same as a simple summation!

When decibel quantities are added together it is important to convert them back to their original
ratios before adding them together. If a decibel result of the summation is required then it must
be converted back to decibels after the summation has taken place.

There are some areas of sound level calculation where the fact that the addition of decibels
represents multiplication is an advantage. In these situations the result can be expressed as a
multiplication, and so can be expressed as a summation of decibel values. In other words
decibels can be added when the underlying sound level calculation is a multiplication. In this
context the decibel representation of sound level is very useful, as there are many acoustic
situations in which the effect on the sound wave is multiplicative, for example the attenuation of
sound through walls or their absorption by a surface.

9. The inverse square law

So far we have only considered sound as a disturbance that propagates in one direction.
However, in reality sound propagates in three dimensions. This means that the sound from a
source does not travel on a constant beam, instead it spreads out as it travels away from the
radiating source.

As the sound spreads out from a source it gets weaker. This is not due to it being absorbed
but due to its energy being spread more thinly.

Consider a half blown up spherical balloon, which is coated with honey to a certain thickness.
If the balloon is blown up to double its radius, the surface area of the balloon would have
increased four fold. As the amount of honey has not changed it must therefore have a quarter

76
SCHOOL OF AUDIO ENGINEERING A04– The Decibel

Student Notes

of the thickness that it had before. The sound intensity from a source behaves in an
analogous fashion in that every time the distance from a sound source is doubled the intensity
reduces by a factor of four, that is there is an inverse square relationship between sound
intensity and the distance from the sound source.

The sound intensity for a sound wave that spreads out in all directions from a source reduces
as the square of the distance. Furthermore this reduction in intensity is purely a function of
geometry and is not due to any physical absorption process. In practice there are additional
sources of absorption in air, for example impurities and water molecules, or smog and
humidity. These extra sources of absorption have more effect at high frequencies and, as a
result sound not only gets quieter, but also gets duller, as one moves away from a source, due
to the extra attenuation these cause at high frequencies. The amount of excess attenuation is
dependent on the level of impurities and humidity and is therefore variable.

Note that the sound intensity level at the source is, in theory, infinite because the area for a
point source is zero. In practice, all real sources have a finite area so the intensity at the
source is always finite.

The sound intensity level reduces by 6 dB every time we double the distance; this is a direct
consequence of the inverse square law and is a convenient rule of thumb.

However, this is only possible when the sound source is well away from any surfaces that
might reflect the propagating wave. Sound radiation in this type of propagating environment is
often called the free field radiation, because there are no boundaries to restrict wave
propagation.

9.1 The effect of boundaries

But how does a boundary affect this value? Clearly many acoustic contexts involve
the presence of boundaries near acoustic sources, or even all the way round them in
the case of rooms. However, in many cases a sound source is placed on a boundary,
such as a floor. In these situations the sound is radiating into a restricted space.
However, despite the restriction of the radiating space, the surface area of the sound
wave still increases in proportion to the square of the distance. The effect of the
boundaries is to merely concentrate the sound power of the source into a smaller
range of angles. This concentration can be expressed as an extra multiplication
factor.

The level of sound thus increases as boundaries concentrate the sound power of the
source into smaller angles.

Obviously the presence of boundaries is one means of restriction, but other


techniques can also achieve the same effect. For example the horn structure of brass
instruments results in the same effect. However, it is important to remember that the
sound intensity of a source reduces in proportion to the square of the distance,
irrespective of the directivity.

DB in Electronics

1. Power. And Voltage

The dB can be used to express power ratios. It is simply

77
SCHOOL OF AUDIO ENGINEERING A04– The Decibel

Student Notes

dBwatts = 10 • log (P1 ÷ P2)

For example, What is the ratio of 100 watts to 10 watts, in dB?

dBwatts = 10 • log (P1 ÷ P2)

= 10 • log (100 ÷ 10)

= 10 • log 10

= 10 • 1

= 10 dB

The dB can be used for voltage as well. According to the ohm’s law

Power (P) = V2 * R.

This means V2 = P/R

For a constant resistance, power is proportional to the square of the voltage.

To represent voltage in logarithms, therefore, the relationship is square the power relationship.

The square of a number can be represented logarithmically by multiplying the log of the
number by 2. For example

102 = 100

The log of 10 = 1

The log of 100 = 2

Therefore the log of 102 can be seen to be = 2 * the log of 10 = 2

voltage ratios, as explained below. The decibel relationship of power ratios is not the same as
that for voltage ratios.

dBvolts = 20 log (E1 ÷ E0)

where E0 and E1 are the two voltage values. Consider what this means. While twice the
power is a 3 dB increase, twice the voltage is a 6 dB increase. Similarly while 10 times the
power is a 10 dB increase, 10 times the voltage is a 20 dB increase. The following equations
should clarify this relationship:

What is the ratio of 100 volts to 10 volts, in dB?

dBvolts = 20 • log (E1 ÷ E0)

78
SCHOOL OF AUDIO ENGINEERING A04– The Decibel

Student Notes

= 20 • log (100 ÷ 10)

= 20 • log 10

= 20 dB

2. Relative Versus Absolute Levels

If we use a reference value of 1 watt for P0, then the dB = 10 log (P1 ÷ P0) equation yields the
following relationships:

Power Value Level in dB

of P1 (Watts) (Relative to 1 Watt P0)


1 0
10 10
100 20
200 23
400 26
800 29
1,000 30
2,000 33
4,000 36
8,000 39
10,000 40
20,000 43
40,000 46
80,000 49
100,000 50

The value of using dB to express relative levels should be apparent here, since a mere 50 dB
denotes a 100,000:1 ratio (one hundred thousand watts in this case). For finding smaller dB
values (i.e., for power ratios between 1:1 and 10:1), the following chart may be helpful:

Power Value Level in dB

of P1 (Watts) (relative to 1 watt P0)


1.0 0
1.25 1
1.6 2
2.0 3
2.5 4
3.15 5
4.0 6
5.0 7
6.3 8
8.0 9
10.0 10

79
SCHOOL OF AUDIO ENGINEERING A04– The Decibel

Student Notes

The key concept is that “dB,” in itself, has no absolute value. However, when a standard
reference value is used for “0 dB,” then any number of dB above or below that implied or
stated zero reference may be used to describe a specific quantity. We’ll give several
examples of “specifications” to illustrate this concept.

Example A : “The console’s maximum output level is +20 dB.”

That statement is meaningless because the zero reference for “dB” is not specified. It’s like
telling a stranger “I can only do 20,” without providing a clue as to what the 20 describes.

Example B : “The console’s maximum output level is 20 dB above 1 milliwatt.”

Example B makes a specific claim. It actually tells us that the console is capable of delivering
100 milliwatts (0.1 watt) into some load. How do we know it can deliver 100 milliwatts ? Of the
20 dB expressed, the first 10 dB represents a tenfold power increase (from 1 mW to 10 mW),
and the next 10 dB another tenfold increase (from 10 mW to 100 mW). Of course, the above
statement is awkward, so more “compact” ways of expressing the same idea have been
developed, as explained in the next subsection.

The absolute Decibel in Electrical Signal Levels

2.1 dBm

The term dBm expresses an electrical power level, and is always referenced to 1
milliwatt. That is, 0 dBm = 1 milliwatt. dBm has no direct relationship to voltage or
impedance.

The dBm was actually set forth as an industry standard in the Proceedings of the
Institute of Radio Engineers, Volume 28, in January 1940, in an article by H.A. Chinn,
D.K. Gannett and R.M. Moris titled “A New Standard Volume Indicator and Reference
Level.”

The typical circuit in which dBm was measured when the term was first devised was a
600 ohm telephone line. In the IRE article, the reference level was 0.001 watts, which
is one milliwatt. It so happens that this amount of power is dissipated when a voltage
of 0.775 Vrms is applied to a 600 ohm line. For this reason, many people mistakenly
believe that 0 dBm means “0.775 volts,” but that is only the case in a 600 ohm circuit.
0 dBm does, however, always means one milliwatt.

Example C: “The console’s maximum output level is +20 dBm.”

Example C tells us exactly the same thing as Example B, in Section 3.1.2, but in fewer
words. Instead of stating “the maximum output level is 100 milliwatts,” we say it is
“+20 dBm.”

Example D: “The mixer’s maximum output level is +20 dBm into 600 ohms.”

80
SCHOOL OF AUDIO ENGINEERING A04– The Decibel

Student Notes

Example D tells us that the output is virtually the same as that expressed in Examples
B and C, but it gives us the additional information that the load is 600 ohms. This
allows us to calculate that the maximum output voltage into that load is 7.75 volts rms,
even though the output voltage is not given in the specification.

2.2 dBu

Most modern audio equipment (consoles, tape decks, signal processors, etc.) is
sensitive to voltage levels. Power output isn’t really a consideration, except in the
case of power amplifiers driving loudspeakers, in which case “watts,” rather than any
“dB” quantity, is the most common term.

The term “dBm” expresses a power ratio so how does it relate to voltage? Not directly,
although the voltage can be calculated if the impedance is known. * That complicates
things, and, as we said earlier, the whole concept of the dB is to simplify the numbers
involved. For that reason, another dB term was devised…dBu.

dBu is a more appropriate term for expressing output or input voltage. This brings up
a major source of confusion with the dB… the dB is often used with different zero
references; dBm implies one zero reference, and dBu implies another. We’ll go on to
explain these and show the relationship between several commonly used “dB” terms.

The voltage represented by dBu is equivalent to that represented by dBm if, and only
if, the dBm figure is derived with a 600-ohm load. However, the dBu value is not
dependent on the load: 0 dBu is always 0.775 volts.

The dBu was specified as a standard unit in order to avoid confusion with another
voltage-related dB unit, the dBV, as explained in Section 3.2.5.

Example E : “The console’s maximum output level is +20 dBu into a 10k ohm or
higher impedance load.”

Example E tells us that the console’s maximum output voltage is 7.75 volts, just as we
calculated for Example D, but there is a significant difference. The output in Example
D would drive 600 ohms, whereas Example E specifies a minimum load impedance of
10,000 ohms; if this console were connected to a 600 ohm termination, its output
would probably drop in voltage, increase in distortion, and might burn out.

How can we make these assumptions? One learns to read between the lines.
Example D refers the output level to power (dBm), so if a given power level is to be
delivered, and the load impedance is higher, then a higher voltage would have to be
delivered to equal that same power output. Conversely, Example E states minimum
specified load impedance, and connection to a lower impedance load would tend to
draw more power from the output. Draining more power from an output circuit that is
not capable of delivering the power (which we imply from the dBu/voltage specification
and the minimum impedance) will result in reduced output voltage and possible
distortion or component failure.

2.3 dBV and dBv

81
SCHOOL OF AUDIO ENGINEERING A04– The Decibel

Student Notes

The dBu is a relatively recent voltage-reference term. For many years, dBV denoted
a voltage-reference, with 0dBV = 1 volt rms. During that period, it became common
practice to use a lower case “v,” as adopted by the National Association of
Broadcasters (NAB) and others, to denote the voltage value corresponding to the
power indicated in dBm (that is, dBv was a voltage-related term with 0 dBv = 0.775
volts). “dBv” with the lower case “v” was convenient because the dB values would
tend to be the same as though “dBm” were used provided the “dBm” output was
specified to drive 600 ohm loads, making it easier to compare dBu specs with
products specified in dBm. The convenience factor here only makes sense where a
voltage sensitive (read “high impedance” input is involved, and can lead to serious
errors elsewhere.

Example F:

“The nominal output level is +4 dBv.”

“The nominal output level is +4 dBV.”

The above two statements, (1) and (2), appear to be identical, but upon closer
scrutiny, you will notice the former uses a lower case “v” and the latter an upper case
“V” after the “dB”. This means that the first output specified will deliver a nominal
output of 1.23 volts rms, whereas the second mixer specified will deliver a nominal
output level of 1.6 volts rms.

Unfortunately, people often did not distinguish clearly between “dBv” (a 0.775 volt
zero reference –if one assumes a 600 ohm circuit) and “dBV” (a 1 volt zero reference
without regard to circuit impedance). To avoid confusion, the capital “V” was then
made the 1-volt zero reference standard by the International Electrotechnical
Commission (IEC), while the NAB agreed to use a small “u” to denote the voltage
value that might be obtained when the customary 600 ohm load is used to measure
the dBm (although the load itself must be minimal). The “u” in “dBu” thus stands for
“unloaded,” term engineers use to describe an output which works into no load (an
open circuit) or an insignificant load (such as the typical high impedance inputs of
modern audio equipment).

Example G:

“The nominal output level is + 4dBv.”

“The nominal output level is +4 dBu.”

The two statements, (1) and (2), are identical, although the latter is the preferable
usage today. Both indicate the nominal output level is 1.23 V rms.

To recap, the only difference between dBu (or dBv) and dBV is the actual voltage
chosen as the reference for “0 dB.” 0 dBV is 1 volt, whereas 0 dBu and 0 dBv are
0,775 volts.

2.4 Converting dBV to dBu (or to dBm across 600 ohms)

So long as you’re dealing with voltage (not power), you can convert dBV to dBu (or
dBm across 600 ohms) by adding 2.2 dB to whatever dBV value you have. To

82
SCHOOL OF AUDIO ENGINEERING A04– The Decibel

Student Notes

convert dBu (dBm) to dbV, it’s just the other way around –you subtract 2.2 dB from
the dBu value.

dBv =dBV + 2.21

dBV = dBv - 2.21

The following Table 3.3 shows the relationship between common values of dBV and
dbu, and the voltages they represent.

Level in dBu or dBm

Level in dBV (0 dBV = 1 V Voltage (0 dBu = 0.775V unterminated;

With Reference to Impedance, (RMS) 0 dBm = 0.775 V across

Which is Usually High) a 600 ohm load impedance


+ 6.0 2.0 + 8.2
+ 4.0 1.6 + 6.2
+ 1.78 1.23 + 4.0
0.0 1.00 2.2
- 2.2 0.775 0.0
- 6.0 0.5 - 3.8
- 8.2 0.388 - 6.0
- 10.0 0.316 - 7.8
- 12.0 0.250 - 9.8
- 12.2 0.245 - 10.0
- 20.0 0.100 - 17.8

2.5 Relating dBV, dBu and dBm to Specifications

In many products, you may see phono jack inputs and outputs rated in dBV (1 volt
reference) because that is the standard generally applied to such equipment, while
the XLR connector output levels and some phone jack output levels are rated in dBm
(1 milliwatt reference) or dBu (0.775 volt reference).

Typically, line level phono jack inputs and outputs are intended for use with high
impedance equipment, which is basically sensitive to voltage rather than power, so
their nominal levels may be specified as “-10 dBV.”

This standard is the one, which has been used for many years in the consumer audio
equipment business. Typical line level XLR connector inputs and outputs are
intended for use with low or high impedance equipment. Since older low impedance
equipment was sensitive to power, XLR connector nominal levels were often specified
as “+ 4 dBm” or “+8 dBm,” levels characteristic of sound reinforcement and recording,
or of broadcast, respectively. (while dBu values would probably suffice today, old

83
SCHOOL OF AUDIO ENGINEERING A04– The Decibel

Student Notes

practices linger and the dBm is still used.) Phone jack inputs and outputs are usually
specified at the higher levels and lower impedance characteristic of XLRs, though
exceptions exist.

A low impedance line output generally may be connected to higher impedance inputs,
without much change in level. Be aware that if a high impedance output is connected
to low impedance input, that output may be somewhat overloaded (which can
increase the distortion and lower the signal level), and the frequency response may be
adversely affected. In some cases, the equipment could be damaged, so check the
specifications carefully.

2.6 dBW

We have explained that the dBm is a measure of electrical power, a ratio referenced
to 1 milliwatt. dBm is handy when dealing with the miniscule power (in the millionths
of a watt) output of microphones, and the modest levels in signal processors (in the
milliwatts). One magazine wished to express larger power numbers without larger dB
values… for example, the multi-hundred watt output of large power amplifiers. For
this reason, that magazine established another dB power reference: dBW.

0 dBW is 1 watt. Therefore, a 100 watt power amplifier is a 20 dBW amplifier (10 log
(100÷1) = 10 log (100) = 10• 2 = 20 dB). A 1000 watt amplifier is a 30 dBW amplifier,
and so forth. In fact, if we are referring to amplifier power, the dB values in Tables 3-1
and 3-2 can be considered “dBW” (decibels, referenced to 1 watt of electrical power).

3. Equal Loudness contours and Weighting networks

The concepts of sound pressure level, the dB, and frequency response have been treated in
previous sections. Loudness is related to these items.

Some people use the term “loudness” interchangeably with “SPL” or “volume.” This is incorrect
since “loudness” has a very distinct, and not so simple, meaning.

If you examine the whole set of equal loudness contours, you’ll see that peak hearing
sensitivity comes between 3 and 4 kHz. It turns out that this is the frequency range where the
outer ear’s canal is resonant. If you realize how small the eardrum is, you can also see why it
has difficulty responding to low frequency (long wavelength) sound, which is why the equal
loudness contours sweep upward at lower frequencies. The mass of the eardrum and other
constituents of the ear limit high frequency response, which can be seen in the upward trend of
the contours at higher frequencies, but here we see some anomalies –perhaps due to
physiological limitations in the cochlea (the inner ear) as well as localized resonances. The
fact that all the contours have slightly different curvatures simply tells us our hearing is not
linear… that is, the sensitivity changes with absolute level.

The fact that the ear is not linear guided the makers of sound level meters to use a corrective
filter – the inverse of the typical equal loudness contour –when measuring SPL. The filter has
a so-called “A weighting” characteristic. The “A curve” is down 30 dB at 50 Hz, and over 45 dB

84
SCHOOL OF AUDIO ENGINEERING A04– The Decibel

Student Notes

at 20 Hz relative to the 1 kHz reference, then rises a few dB at between 1.5 and 3 kHz, and
falls below the 1 kHz sensitivity beyond 6 kHz. This is roughly the inverse of the 40 phon equal
loudness curve.

Given the sensitivity characteristic of the ear, the “A weighted” curve is most suitable for low
level sound measurement. Remember that 40 dB SPL (at 1 kHz) is equivalent to the sound of
a very quiet residence. In the presence of loud sounds, such as rock concerts, the ear has a
“flatter” sensitivity characteristic. This can be seen by comparing the 100 or 110 phon equal
loudness curves (which are the typical loudness at such concerts) to the 40 phon curve. In
order for the measured sound level to more closely correspond to the perceived sound level,
one would want a flatter response from the SPL meter. This is the function of the B and C
weighting scales. In apparent conflict with this common-sense approach, O.S.H.A.
(Occupational Safety & Health Administration) and most government agencies that get
involved with sound level monitoring continue to use the A scale for measuring loud noises.
Since this causes them to obtain lower readings than they otherwise would, the inappropriate
use of the A scale works in favor of those who don’t want to be restricted.

You may see weighting expressed in many ways:

“dB (A)” means the same as

“dB SPL, A weighted.”

“dB (A wtd.)” means the same as above.

“dB SPL (A weighted)” ditto…

4. Other concepts

Dynamic Range- Dynamic range is the difference in decibels between the loudest and quietest
portion of a program. Sometimes the quietest portion is obscured by ambient noise, in that
case the dynamic range is the the difference in dB between the loudest part of the program
and the noise floor.

Every sound system has an inherent noise floor which is the residual electronic noise in the
system. Every system also has a peak output level. The dynamic range of a system is
therefore the difference in dB of the peak and the noise floor

Dynamic range = (peak Level) – (Noise Floor).

Headroom- every program has an average level. This average level is called a nominal level.
The differene in dB of the nominal level of a signal and the peak level of a piece of equipment
in which the signal exists is called the headroom.

Headroom = (Peak level) – (nominal Level)

85
SCHOOL OF AUDIO ENGINEERING A04– The Decibel

Student Notes

Most equipment are specify nominal level which gives optimum performance with adequate
headroom.

Signal to noise ratio- The difference in dB between the nominal level and the noise floor is the
signal to noise ratio.

S/N = (Nominal level) – (Noise floor)

4.1 Dynamic range of common recording formats

Gramophone- 78rpm DR=18dB

LP- DR=65dB

Analog Tape- DR= Less than 90dB

Compact disc- DR = More than 90dB sampling rate 44.4Khz

FM Broadcast- DR= about 45dB

86
AE05 – Basic Electronics

1. Initial Concepts

1.1 What is an Atom

1.2 Sources and Kind of Electricity (a.c./d.c.)

2. Electricity

3. Circuits

3.1 Practical Applications

4. Electromotive force (EMF)

5. Magnetism and Electricity

6. Alternating Current

6.1 Phase

6.3 Mains Plug

7. Safety devices

7.1 Fuses

8. The AC Circuit

8.1 Voltage in an ac circuit

8.2 Current in an ac Circuit

8.3 Resistance in an AC Circuit

8.4 Resonance (electronics)

9. Introduction to transformers

10. Introduction to transistors

87
SCHOOL OF AUDIO ENGINEERING A05– Basic Electronics

Student Notes

AE05 – BASIC ELECTRONICS

1. Initial Concepts

1.1 What is an Atom

An Atom is a smallest particle of which all matter is made. It has a positvely charged
nucleus and a field of negatively charged partcles called electrons orbiting around the
nucleus

a. Nucleus- The nucleus consists of two particles, a neutral Neutron and a positively
charged Proton
b. Electrons – The electons are negatively charged and they rotate around the
nucleus of the atom.

1.2 Sources and Kind of Electricity (a.c./d.c.)

It is assumed that electricity is simply - electrons in motion. The movement of


electrons as a means of transferring energy from one molecule of a substance to
another is determined by how much potential difference exists to initiate this
movement. Potential difference is analogous to potential energy.

There are various ways in which electrical potential difference can be created, one of
such is the use of chemicals.

1.2.1 Chemical sources- the most common source of electricity is the cell which
uses conducting plates in chemical solutions to generate electricity.

Cells- There are different kinds of Cells. Some are higher voltage than
others and some are re-chargeable but all produce what is known as
Direct Current DC.

1.2.2 Mechanical sources- Another source of electricity is the conversion of


mechanical energy to electrical energy. One of such uses
electromagnetism.

Electromagnetism- A magnet is a piece of metal that has a field of force


around out in a particular direction. This field of force is like lines of force.
When a conductor is made to move through this lines of force, it creates a
potential difference within the conductor which exerts a force on the
electrons. The electrons will then move causing electricty to flow. The
direction of flow of electrons will reverse once the direction of movement of
the conductor is reversed. This movement and oscillation of electrons will
continue until the conductor stops moving.

This kind of electricity is called Alternating Current AC, more on AC later.

88
SCHOOL OF AUDIO ENGINEERING A05– Basic Electronics

Student Notes

2. Electricity

a. Potential Difference- The term potential difference refers to a kind of force that
causes electron movement. The unit of PD is Volts. (V)
b. Current- Current is therefore the electrons in movement. The unit of current is
Amps. (A)
c. Resistance- and resistance describes how difficult it is to move. The Unit of
Resistance is the Ohms. Electricity can be likened to water in a pipe. The size of
the pipe is the resistance, the water the electricity and the tap the Voltage source.
If you want more water to pass through (more current) you either increase the
water pressure (Voltage) or you increase the size of the pipe (Reduce resistance).
d. Power- Electric power is the amount of work electricity can do. It is a measure of
the rate of transfer of energy. Power is the ultimate aim of electrical energy. The
unit of power is Watts (W). Electric power in our model above will be how much
water is transferred in a specific length of time. This means, to get the more water
to pass in less time (more power) you either increase the pipe size (reduce the
resistance), or increase the water pressure (Potential Difference).
e. Ohm’s Law- These four quantities, Voltage, resistance, Current and power, are
related by the ohm’s law. The law states that

V = I *R Eq. 1

Where V is voltage in Volts

I is Current in Amps

And R is resistance in Ohms.

Power = V*I Eq. 2

Substituting using eq. 1 in eq. 2 gives P = V2 / R or

P = I2 R

3. Circuits

a. Series Circuits- A circuit is a closed path through which electrical energy flows. IN
a circuit, you have a voltage source, a load, wires/cables to connect the elements
of the circuit and a switch or some means of controlling the flow of electricity. A
series Circuit is a circuit where the current path is not broken. This means you can
traces the path of the current on the circuit diagram with a pencil without lifting the
pencil. In other words, the same current flows through all the circuit components.

i. Voltage – The voltage in a series circuit is different for each component


except when the components have the same resistance. The voltage is a
measure of the current flowing and the resistance of the component.

ii. Current- the current is equal throughout the circuit for a series circuit. All
components receive the same amount of current. Current is expressed
as flow rate of electrons per second and is commonly measured in
Amperes or Amps.

1/1000 amps = 1 milliamp (mA).

89
SCHOOL OF AUDIO ENGINEERING A05– Basic Electronics

Student Notes

1 Amp is equivalent to 1 Coulomb of electricity where a coulomb = 6.25 x


1018

iii. Total resistance- the total resistance in a series circuit is simply an


addition of the individual resistance.

R Total = R1 +R2 + R3 ……

b. Parallel Circuits- in a parallel circuit, the current breaks up to travel several


paths through the circuit and components do not all receive the same amount
of current. The current breaks up in relation to how much resistance is on
each path. The more the resistance, the less the current on that path.

i. Voltage- the voltage in a parallel circuit is constant and equal across all
components
ii. Current- the current drawn by each component in a parallel circuit
depends on its resistance and the circuit voltage.
iii. Total Resistance- total resistance in a parallel circuit is a little more tricky

1 / R Total = 1/ R1 + 1 / R2 + 1 / R3

3.1 Practical Applications

Resistance is encountered in Audio in all instances. Some important instances are:

Cable Resistance and Heat loss – When current flows through a conductor, the
resistance of the conductor creates a load on the current that it has to overcome to
get to the other side. This resistance causes the current to lose some of its ability to
do work. This power loss is in the form of heat and can be calculated as

Power Loss due to heat generated = I2 * R

This is important with speakers.

Speaker Loading –The resistance of the cable used to hook up a speaker is in series
with the resistance of the speaker as well, this increases the load seen by the
amplifier’s output stage and this deteriorates the quality of the output signal.

Power Supply considerations – A power source is often limited by the maximum


Voltage it can deliver and the maximum amount of current it can sustain. When
hooking up equipment like amps to a source, it is important to note how much
maximum current will be drawn from that source and if the source can handle it. This
can be easily calculated using the power rating on all the equipment and the Voltage
specification for the source. For example, a 1000 Watt amp will draw 4A from a 250 V
Source when powered fully.

Speaker to amp matching – the total load (resistance) seen by an amp can be
calculated based on the resistance values of the speakers and whether they are
connected in series or parallel.

4. Electromotive force (EMF)

90
SCHOOL OF AUDIO ENGINEERING A05– Basic Electronics

Student Notes

To produce a flow of current in any electrical circuit, a source of electromotive force or potential
difference is necessary. The available sources are as follows:

i. Electrostatic machines, which operate on the principle of inducing electric charges


by mechanical means
ii. Electromagnetic machines, in which current is generated by mechanically moving
conductors through a magnetic field or a number of fields ;
iii. Voltaic cells, which produce an electromotive force through electrochemical action
iv. Devices that produce electromotive force through the action of heat ;
v. Devices that produce electromotive force by the action of light ; and
vi. Devices that produce electromotive force by means of physical pressure, for
example, the piezoelectric crystal

5. Magnetism and Electricity

in the late 18th and early 19th centuries, the theories of electricity and magnetism were
investigated simultaneously. In 1819 an important discovery was made by the Danish physicist
Hans Christian Oersted, who found that a magnetic needle could be deflected by an electric
current flowing through a wire. This discovery, which showed a connection between electricity
and magnetism, was followed up by the French scientist André Marie Ampère, who studied
the forces between wires carrying electric currents, and by the French physicist Dominique
François Jean Arago, who magnetized a piece of iron by placing it near a current-carrying wire.

In 1831, the English scientist Michael Faraday discovered that moving a magnet near a wire
induces an electric current in that wire, the inverse effect to that found by Oersted: Oersted
showed that an electric current creates a magnetic field, while Faraday showed that a
magnetic field can be used to create an electric current.

What is a magnet- the magnetic properties of materials are classified in a number of different
ways. One classification of magnetic materials—into diamagnetic, paramagnetic, and
ferromagnetic—is based on how the material reacts to a magnetic field.

• Diamagnetic materials, when placed in a magnetic field, have a magnetic moment


induced in them that opposes the direction of the magnetic field. This property is
now understood to be a result of electric currents that are induced in individual
atoms and molecules. These currents, according to Ampere's law, produce
magnetic moments in opposition to the applied field. Many materials are
diamagnetic; the strongest ones are metallic bismuth and organic molecules, such
as benzene, that have a structure that enables the easy establishment of electric
currents.
• Paramagnetic behavior results when the applied magnetic field lines up all the
existing magnetic moments of the individual atoms or molecules that make up the
material. This results in an overall magnetic moment that adds to the magnetic
field.
• A ferromagnetic substance is one that, like iron, retains a magnetic moment even
when the external magnetic field is reduced to zero. This effect is a result of a
strong interaction between the magnetic moments of the individual atoms or
electrons in the magnetic substance that causes them to line up parallel to one
another. In ordinary circumstances these ferromagnetic materials are divided into
regions called domains; in each domain, the atomic moments are aligned parallel
to one another. Separate domains have total moments that do not necessarily
point in the same direction. Thus, although an ordinary piece of iron might not
have an overall magnetic moment, magnetization can be induced in it by placing
the iron in a magnetic field, thereby aligning the moments of all the individual

91
SCHOOL OF AUDIO ENGINEERING A05– Basic Electronics

Student Notes

domains. The energy expended in reorienting the domains from the magnetized
back to the demagnetized state manifests itself in a lag in response, known as
hysteresis.

Magnetic materials can also be categorized on the basis of whether they retain the magnetism
after the field is removed.

If they retain the magnetism, the substance is called a magnetically hard material and these
are ferromagnetic substances. These are used in the manufacture of speakers.

If they lose the magnetism after the field is removed, they are called magnetically soft
materials. These are diamagnetic and paramagnetic materials. Used in tape heads of tape
decks.

The symbol for magnetic flux is φ and as mentioned earlier, the unit is the Weber (Wb). Weber
is defined as the flux that, if reduced to zero, when linked to a coil of one turn will induce an
e.m.f. of 1 Volt.

Flux density is the concentration of flux and the unit is weber/meter2 . The Symbol is B. The
weber is too large a unit in audio for example to specify the flux/meter on tape weber/meter2 is
too large, instead, nanoweber/meter2 is used. Nano is 10-9 . A typical recorded signal on
magentic tape will have a flux per meter in the region of 300 nWb/meter.

6. Alternating Current

Electric Motors and Generators, group of devices used to convert mechanical energy into
electrical energy, or electrical energy into mechanical energy, by electromagnetic means. A
machine that converts mechanical energy into electrical energy is called a generator,
alternator, or dynamo, and a machine that converts electrical energy into mechanical energy is
called a motor. In audio, the device that converts mechanical energy or any other type of
energy into electrical energy is called an input transducer and one that converts electrical
energy into mechanical energy (speaker cone movement) is called an output transducer.
Therefore theories and principles that apply to generators and motors applies to input and
output transducers.

Two related physical principles underlie the operation of generators and motors. The first is the
principle of electromagnetic induction discovered by the British scientist Michael Faraday in
1831. If a conductor is moved through a magnetic field, or if the strength of a stationary,
conducting loop is made to vary, a current is set up or induced in the conductor. The converse
of this principle is that of electromagnetic reaction, first observed by the French physicist André
Marie Ampère in 1820. If a current is passed through a conductor located in a magnetic field,
the field exerts a mechanical force on it.

When a conductor is moved back and forth in a magnetic field, the flow of current in the
conductor will change direction as often as the physical motion of the conductor changes
direction. Several devices generating electricity operate on this principle, producing an
oscillating form of current called alternating current. Alternating current has several valuable
characteristics, as compared to direct current, and is generally used as a source of electric
power, both for industrial installations and in the home.

An alternating current is a sinewave. It is exactly like the audio we studied already, only this
time it represents electricity. When audio is converted to its electrical analogue it exists as AC.

92
SCHOOL OF AUDIO ENGINEERING A05– Basic Electronics

Student Notes

AC current has voltage amplitudes and peak values. It also has rms voltage, frequency and
phase.

For an AC supply, the amount of work the supply can do is related not to its peak amplitude
but to its rms value. Rms is the same here as it was for soundwaves.

6.1 Phase

AC can be in or out of phase. Because AC has a frequency, it has a period and


therefore a phase. For standard power supplies (UK) you have 50Hz AC line.

In a standard 3 phase set up you would have three phases 120 degrees out of phase
with each other. The sum at any one point of these three would be zero.

A common way of bringing the three phases back together is a star connection which
provides 230V between each arm and 400V 3-phase.

6.3 Mains Plug

The mains wiring have color codes. The brown wire is live (L), the blue wire goes to
the Neutral (N) and the green/yellow wire goes to the Earth.

7. Safety devices

The high levels of current and voltage that exists in AC distribution systems demands some
that some safety measures be put in place. Some of these measured are discussed below.

7.1 Fuses

A fuse is piece of conductor which has a maximum current carrying capacity


exceeding which it will melt down. When a fuse melts, you say it ‘blows’. For domestic
rating the most common fuses are 1, 3, 5 and 13A. Fuses are rated by the amount of
current they can handle.

For mains supply, there are two types of safety units

i. Residual Current devices (RCDs)- these devices compare the current


going in from the live side with that returning to the neutral, if they are not
equal a trip operates and the supply is cut.
ii. Earth leakage current devices (ECD)0 these devices operate by detecting
a flow of current to the earth.

8. The AC Circuit

The AC circuit is similar to the DC circuit, but differs in that AC is oscillating therefore,
frequency dependent characteristics come into play. The fact that AC oscillates is one reason
why sound can be converted into electricity.

8.1 Voltage in an ac circuit

93
SCHOOL OF AUDIO ENGINEERING A05– Basic Electronics

Student Notes

In an AC circuit, the voltage can be either positive or negative. The voltage considered
is the rms voltage. Rms is computed in the same way as for soundwaves. The unit is
still Volts.

8.2 Current in an ac Circuit

Current in an AC circuit can either be positive or negative. the unit is still amps.

8.3 Resistance in an AC Circuit

Due to the oscillating nature of Alternating Current, some other factors related to the
frequency of oscillation arise. Two of these factors resist the flow of current in relation
to the frequency of oscillation. This is reactance. There are two types of reactive
elements which differ according to how changes in frequency affect them.

It becomes immediately obvious that reactance plays a big part audio. For one it is a
major design factor in filter circuits like cross over networks for passive multi-driver
loudspeaker designs (more on this later in the course).

8.3.1 Capacitance

Capacitor, or electrical condenser, is a device that is capable of storing an electrical


charge. In its simplest form a capacitor consists of two metal plates separated by a non-
conducting layer called the dielectric. When one plate is charged with electricity from a
direct-current or electrostatic source, the other plate will have induced in it a charge of
the opposite sign; that is, positive if the original charge is negative and negative if the
charge is positive. The electrical size of a capacitor is its capacitance, the amount of
electric charge it can hold.

Capacitors are limited in the amount of electric charge they can absorb; they can
conduct direct current for only an instant but function well as conductors in alternating-
current circuits. This property makes them useful when direct current must be
prevented from entering some part of an electric circuit.

Capacitance is the ability of a circuit system to store electricity. The capacitance of a


capacitor is measured in farads and is determined by the formula

C = q/V,

where q is the charge (in coulombs) on one of the conductors and

V is the potential difference (in volts) between the conductors.

The capacitance depends only on the thickness, area, and composition of the
capacitor's dielectric.

Dielectric, or insulator, is a substance that is a poor conductor of electricity and that will
sustain the force of an electric field passing through it. This property is not exhibited by
conducting substances. Two oppositely charged bodies placed on either side of a piece
of glass (a dielectric) will attract each other, but if a sheet of copper is instead interposed
between the two bodies, the charge will be conducted by the copper.

94
SCHOOL OF AUDIO ENGINEERING A05– Basic Electronics

Student Notes

In most instances the properties of a dielectric are caused by the polarization of the
substance. When the dielectric is placed in an electric field, the electrons and protons of
its constituent atoms reorient themselves, and in some cases molecules become
similarly polarized. As a result of this polarization, the dielectric is under stress, and it
stores energy that becomes available when the electric field is removed. The
polarization of a dielectric resembles the polarization that takes place when a piece of
iron is magnetized. As in the case of a magnet, a certain amount of polarization remains
when the polarizing force is removed. A dielectric composed of a wax disk that has
hardened while under electric stress will retain its polarization for years. Such dielectrics
are known as electrets.

The effectiveness of dielectrics is measured by their relative ability, compared to a


vacuum, to store energy, and is expressed in terms of a dielectric constant, with the
value for a vacuum taken as unity. The values of this constant for usable dielectrics vary
from slightly more than 1 for air up to 100 or more for certain ceramics containing
titanium oxide. Glass, mica, porcelain, and mineral oils, often used as dielectrics, have
constants ranging from about 2 to 9. The ability of a dielectric to withstand electric fields
without losing insulating properties is known as its dielectric strength.

Dielectrics, particularly those with high dielectric constants, are used extensively in all
branches of electrical engineering, where they are employed to increase the efficiency
of capacitors.

Capacitors are produced in a wide variety of forms. Air, mica, ceramics, paper, oil, and
vacuums are used as dielectrics, depending on the purpose for which the device is
intended.

The Resistance in an AC circuit is dependent on the ‘resistance’ introduced by the


capacitance in the circuit. This resistance is called reactance and it is given by the
equation

Xc = 1 / 2πfc The unit is the ohms

This equation shows that capacitive reactance is dependent on Frequency.

8.3.2 Inductance

When the current in a conductor varies, the resulting changing magnetic field cuts
across the conductor itself and induces a voltage in it. This self-induced voltage is
opposite to the applied voltage and tends to limit or reverse the original current. This
effect is more pronounced if the different sections of an AC conductor are close to
together as in the case of a long cable coiled up. The field around each length of the
wire adds up within the coil to create a high inductive reactance (resistance due to
inductance) in the wires. This will affect the signal going through.

The amount of self-induction of a coil, its inductance, is measured by the electrical unit
called the henry, named after the American physicist Joseph Henry, who discovered
the effect. The inductance is independent of current or voltage; it is determined only by
the geometry of the coil and the magnetic properties of its core.

Due to this characteristic of inductance, there is an associated ‘resistance’ that an


inductance imposes on an AC circuit and this ‘resistance’ is called inductive reactance.

95
SCHOOL OF AUDIO ENGINEERING A05– Basic Electronics

Student Notes

The reactance of this inductance in a circuit is dependent on the Frequency of the


Alternating current.

Inductive reactance can be found from

XL= 2πfL The unit is the ohms

The combination of resistance with capacitive reactance and inductive reactance in an


AC circuit is called impedance.

Therefore resistance in an AC circuit is called impedance.

The effect of having an inductor and capacitor in a circuit affects the total impedance of
the circuit. Because these two are also opposite in nature, if frequency increases, one
increase while the other reduces. This creates an effect called resonance.

8.4 Resonance (electronics)

Resonance is a condition in a circuit in which the combined impedances of the


capacity and induction to alternating currents cancel each other out or reinforce each
other to produce a minimum or maximum impedance at a specific frequency.

Resonance occurs at a given frequency, called the resonant frequency, for each
circuit, depending upon the amounts of inductance and capacitance in the circuit. And
this is the frequency where the combined effect of the Capacitor and inductor either
reach a maximum or minimum.

If an alternating voltage of the resonant frequency is applied to a circuit in which


capacity and inductance are connected in series, the impedance of the circuit drops to
a minimum and the circuit will conduct a maximum amount of current. When the
capacitance and inductance are connected in parallel, the opposite is true: The
impedance is extremely high and little current will pass.

Resonant circuits are used in electric equipment, such as filters, to select or reject
currents of specific frequencies. Filters of this type, in which either the capacity or the
inductance of the circuit can be varied, are used to tune radio and television receivers
to the frequency of the transmitting station so that the receiver will accept that
frequency and reject others.

9. Introduction to transformers

When an alternating current passes through a coil of wire, the magnetic field about the coil
expands and collapses and then expands in a field of opposite polarity and again collapses. If
another conductor or coil of wire is placed in the magnetic field of the first coil, but not in direct
electric connection with it, the movement of the magnetic field induces an alternating current in
the second coil. If the second coil has a larger number of turns than the first, the voltage
induced in the second coil will be larger than the voltage in the first, because the field is acting

96
SCHOOL OF AUDIO ENGINEERING A05– Basic Electronics

Student Notes

on a greater number of individual conductors. Conversely, if the number of turns in the second
coil is smaller, the secondary, or induced, voltage will be smaller than the primary voltage.

The action of a transformer makes possible the economical transmission of electric power over
long distances. If 200,000 watts of power is supplied to a power line, it may be equally well
supplied by a potential of 200,000 V and a current of 1 amp or by a potential of 2000 V and a
current of 100 amp, because power is equal to the product of voltage and current. The power
lost in the line through heating is equal to the square of the current times the resistance. Thus,
if the resistance of the line is 10 ohms, the loss on the 200,000 V line will be 10 watts, whereas
the loss on the 2000 V line will be 100,000 watts, or half the available power

In a transformer the coil into which the power is fed is called the primary, the one in which the
power is taken from is called a secondary. The two coils have different number of turns in
them. The ratio

Number of turns in primary/number of turns in secondary

Is called the turns ratio and is usually denoted by n.

A transformer in which the secondary voltage is higher than the primary is called a step-up
transformer;

If the secondary voltage is less than the primary, the device is known as a step-down
transformer.

The product of current times voltage is constant in each set of coils, so that in a step-up
transformer, the voltage increase in the secondary is accompanied by a corresponding
decrease in the current.

There are four types of transformers

• Voltage step-up
• Voltage step-down
• Current step-up
• Current step-down

Because transformers allow the flow of electricity from one point in the circuit to another
without any physical contact, they can be used as isolation devices for common mode signals.
Common mode signals are signals that are in phase. (will be further explained under the audio
lines and patchbays). Because they are in pahse, the net induced current across the
transformer is zero. They cancel out.

10. Introduction to transistors

Transistor, in electronics, common name for a group of electronic devices used as amplifiers or
oscillators in communications, control, and computer systems. Until the advent of the transistor
in 1948, developments in the field of electronics were dependent on the use of thermionic
vacuum tubes, magnetic amplifiers, specialized rotating machinery, and special capacitors as
amplifiers. The transistor is a solid-state device consisting of a tiny piece of semiconducting
material, usually germanium or silicon, to which three or more electrical connections are made.

97
SCHOOL OF AUDIO ENGINEERING A05– Basic Electronics

Student Notes

The Figure 1 below is an example of a Vacuum tube. Before transistors, these devices were
used for amplification purposes. Some audio engineers still swear by them for their supposed
warmth and acceptable coloration of the sound especially at lower frequencies.

The figure 2 below is of a circuit board with resistors and capacitors but also with transistors.
The sealed metal containers house the transistors.

As the performance of all electronics is affected by heat, transistors and vacuum tubes have
an effect on audio signals that varies with the design and how hot they are. This effect will be
discussed later.

Figure 1

98
SCHOOL OF AUDIO ENGINEERING A05– Basic Electronics

Student Notes

Figure 2

99
100
SCHOOL OF AUDIO ENGINEERING A06– Audio Lines and Patchbays

Student Notes

AE06 – AUDIO LINES AND PATCHBAYS

Introduction

One of the most important principles relating to electrical devices is that of interconnection
between the devices or between components in the same circuit. This interconnection is made
using cables, jacks and plugs. The quality and appropriateness of these cables, jacks and
plugs can affect the electrical signal going through them.

This is why a study of these interconnects is important.

In audio, the integrity (accuracy) of sound in the electrical domain is maintained by conducting,
without change, the electrical signal from one point of the circuit/signal path to the other.
Anything that affects the electrical signal changes the audio characteristics, which distorts the
sound.

With audio lines, the aim is to maintain the integrity of the signal, throughout the signal path
and to exclude noise from the signal. Noise is refers to the unwanted sounds in the output.
This unwanted sounds, can be induced from as acoustic source and get encoded with the
original signal or it can be induced electrically.

1. Cable and Cable Parameters

A cable is a conductor used to connect the different parts of electrical systems together so that
electricity can flow. Metals are the most used materials for conductors (wires). Some metals
are better conductors than others as the table below shows, the lower the number the better
the conductor. But conductivity of a metal is not all that is important in choosing a metal for a
piece of wire. (see table 1)

Other important factors considered are

i. Annealing- Metals come in different sizes. They have to be reduced in


thickness for application as various wire sizes. This process makes the wire
brittle (break easily not very flexible) and the process to restore some
flexibility to the wire is called annealing. Not all metals respond to this well.
ii. Wire gage – This refers to the thickness of the wire. The standardized sizes
are specified by the American wire gage (AWG). The table 2 below shows a
few different wire gages. The gage of wire relates to its resistance and how
much current it can handle before meltdown (Fuses).
iii. Solid and stranded wire – a large gage wire can either be a solid wire or
stranded wire. Small strands of wire can be used to make larger gage by
simply wrapping them together. (see table 3)
iv. Corrosion and Oxidation – Metals have two problems, Oxidation and
corrosion. Oxidation occurs when the metal combines with oxygen and other
chemicals to create Rust. Corrosion is a problem that arises when two
dissimilar metals are in direct contact with each other. Due to differences in
potential, one metal will ‘eat away’ at the other one. These two processes
alter the gage of the wire and can also affect the integrity of a contact
(connection) point.
v. Capacitance in Cables – Capacitance effects in wires is a result of a situation
called ‘skin effect’. When Alternating current travels down a wire, as the
frequency of the current increases, the signal tends to travel down the

101
SCHOOL OF AUDIO ENGINEERING A06– Audio Lines and Patchbays

Student Notes

surface of the wire rather than through it. But for low frequencies, the current
travels through the whole wire. Capacitance effects in wire occur most for
high frequency signals. The plastic coating on the wire that is used for
isolation will begin to act like a dielectric. This effect is not very pronounced
for frequencies within the audio range but can begin to get very severe for
higher frequencies e.g. in the Megahertz range.
vi. Inductance in Cables – Inductance is the electrical property of storing energy
in the magnetic field that surrounds a current carrying wire. The effect of
inductance is very small unless the wire is carrying high current (as in the
case with loudspeaker cables) and much of its length is coiled up

Table 1

Metals Mil-ohms/ft @20oC Flexibility Annealing Cost


Silver 9.9 Poor Poor High
Copper 10.4 Good Good Medium
Gold 14.7 Excellent Not Needed Very high
Aluminum 17 Good Good Fair
Nickel 47 Poor Poor Medium
Steel 74 Excellent Not Needed Very Low

Table 2

AWG Diameter Compares to


40 0.0031 Smaller than human hair
30 0.01 Sewing Thread
20 0.032 Diameter of a Pin
10 0.102 Knitting Needle
1 0.39 Pencil
1/0 0.41 (1-naught) Finger
3/0 0.464 (3-naught) Marking pen
4/0 0.68 (4-naught) Towel Rack

Table 3

Stranded AWG No. of Strands Of Gage


20 7 28
20 19 32
14 7 22
14 19 27
14 42 30

102
SCHOOL OF AUDIO ENGINEERING A06– Audio Lines and Patchbays

Student Notes

2. Cable Construction

There are several configurations for constructing cables for various Applications. The
conductors in a cable assembly can take any of these configurations

i. A pair of parallel conductors – this provides a send and return conductor


for a circuit and is often used in AC power wiring or Loudspeaker wiring.
This is an unbalanced configuration.
ii. A twisted pair of conductors – Twisted pair have the same use features as
parallel with several advantages. Twisting the wires keeps them very
close together so that they will always have similar electromagnetic
properties relative to ground (will be explained later). This can used with a
balanced circuit.
iii. A shielded single conductor (coaxial) – The shielded single conductor
offers same features as the parallel config but with the advantage of a
shield. It is still an unbalanced configuration though.
iv. A shielded twisted pair of conductors – This provides all the advantages
of the twisted pair with the added advantage of the extra shield. This
works best with balanced circuits.

3. Electrical Noise in Audio Systems

In the electronic audio signal chain, keeping electrical noise out of the system is one of the
greatest and most difficult tasks. The sum total of interconnections in the audio system will
affect the final SNR.

3.1 Electromagnetic Interference (EMI)

The type of noise which can get into audio systems in many different ways is called
EMI. EMI arises as a consequence of conductors such as audio cabling being
exposed to electromagnetic radiation from other cables, electric motors and the
atmosphere. Electromagnetic radiation is the oscillation of electric and magnetic
fields operating together by transferring energy back and forth.

Every conductor in a system will act simultaneously as a capacitor, generating a


charge from an electric field, and an inductor, generating current from a magnetic
field. Electromagnetic radiation generates both current and voltages which as noise,
will compete and conflict with an audio signal that is being transferred down the same
conductors. When the electromagnetic radiation reaches audible levels where the
audio signal is degraded, we call this Electromagnetic Interference or Electronic
Noise.

3.2 Sources of EMI

In audio the most common sources of EMI include:

• Line frequency AC power (60Hz tone)


• broadband electrical noise on AC power lines caused by power surges from
electronic motors and dimmers.
• radiated electromagnetic waves (Radio Frequencies- RF)
• intercable crosstalk

103
SCHOOL OF AUDIO ENGINEERING A06– Audio Lines and Patchbays

Student Notes

EMI can manifest itself as hum, buzzes, girgles, chirps, whistles, or intelligible voice
signal interference.

3.3 Means of Transmission of EMI

Identifying how noise is being transmitted to the receiver is a key factor in determining
how to control it. There are 4 means of transmission of electrical noise:

3.3.1 Conducted Coupling

Occurs whenever there is a shared conductor with an impedance shared by both the
source and the receiver. ie The audio system may be coupled via the AC power wiring
to other systems which have electric motors, light dimmers etc. Proper grounding
practices and the isolation, usually by transformer, of the powering of the audio system
can help reduce this source of noise. However, interconnection practices in the audio
system are also an important deterrent.

3.3.2 Electric Field Coupling

For example close wires which run parallel to each other. A Noise voltage is produced
by the capacitance between source and receiver, and is proportional to the area source
and receiver share with each other

3.3.3 Magnetic field coupling

A current is produced by mutual inductance between source and receiver and is directly
proportional to the loop area of the receiver circuit (which behaves like the windings in
an electromagnet) and the permeability of the medium between source and receiver.
e.g Long wire runs such as long microphone cables, where send and receiver are
separated will cause a long loop which is likely to induce current noise.Where source
and receiver are closer together, the less inductance can occur.

3.3.4 Electromagnetic Radiation in the far field

Airborne EMI whose potential for interference depends on the field strength of the
source and receiver’s susceptibility or immunity to the noise. eg Instrument cables from
electric guitars commonly act like aerials and pick up RF.

The effects of each of these four sources of noise is greatly reduced by the adherence
to standard methods and practices of interconnection when designing, installing and
using an audio system.

4. Interconnection Principles

104
SCHOOL OF AUDIO ENGINEERING A06– Audio Lines and Patchbays

Student Notes

4.1 Balanced lines and Circuits

All interconnecting audio is either balanced or unbalanced. In this way an electrical


relationship is physically represented by the interconnecting wiring.

A In a balanced interconnect there is both an in-phase and an out-of-phase signal ie


two signals or wires in a push-pull arrangement known as Differential Mode.(DM)

The signals are of equal level but opposite polarity:

In Phase Signal: Positive (+), hot, signal, line.

Out-of-phase Signal: Negative, (-), cold, common, return.

The balanced system is universally used in professional audio because of its ability to
control noise inputs by only passing differential mode signals and rejecting in-phase
Common Mode (CM) Signals (Common Mode Rejection).

DM signals have a different polarity on each conductor such as an audio signal from a
balanced output. CM signals are of the same polarity on each conductor, such as
signals picked up by a radiating electromagnetic noise source or ground reference
differences between two devices (CM ground noise).

4.2 Balanced Interconnections

Four basic types, all with good noise rejection capabilities:

• Active-balanced to active balanced


• Balanced Transformers
• Active Balanced to transformer
• Balanced transformer to active balanced interconnect

The method of wiring a balanced interconnection is always the same. Shielded,


twisted pair wire is used. The earth connection via the shield is always terminated on
the input side. This procedure avoids the possibility of groundloop potentials
occurring between the two interconnected devices.

4.3 Unbalanced Interconnection

Consists of one signal and a ground reference. It is transmitted over one conductor
and a ground, with the ground sometimes used as a shield. Unbalanced circuits do
not exhibit CMR. Also, the ground connection will create a ground loop when both
pieces of interconnected equipment have a ground reference, which they usually do.

Unbalanced circuits are found in domestic and semi-pro audio systems. Some signal
processors used in the proaudio have unbalanced inputs and/or outputs.

4.4 Unbalanced to Unbalanced Interconnection

Generally interconnected with coaxial cable in which the centre conductor is the
signal wire and the outer conductor is a braided shield which is also a ground return. If

105
SCHOOL OF AUDIO ENGINEERING A06– Audio Lines and Patchbays

Student Notes

the two connected devices are each grounded, the shield connection introduces a
ground loop. If the shield is grounded at the output end only, no groundloop exists.

5. Impedances for Mic and Line level Systems

All interconnects are partially characterised by their impedance (Z). In any system there are 3
impedances to consider:

• Output Impedance (Source, Drive)


• Cable Impedance (Characteristic impedance)
• Input Impedance (drain, load)

Impedances were first designed for professional audio based around the 600 ohm power
matched system where all inputs and outputs were 600 ohms.

Current day professional audio, using modern op amp (operational amplifier) equipment, has
developed the protocol of low Z outputs (60 ohms ) and Hi Z inputs (10 - 200 K ohms) As long
as this impedance ratio is mantained, any device can be connected to any other with little
likelihood of distortion or noise. Where output Z approaches input Z noise and distortion can
occur eg plugging a domestic hi-Z output into a pro Hi-Z input.

6. Circuit level - High and Low

Audio signals exist at a wide range of nominal or average levels. See table 7-2 Signal levels
given in voltages. Mic levels are comparatively low which makes them highly susceptible to
noise inputs. Line levels are more robust. Care must be taken to check the Standard
Operating Level (SOL) of a device before connecting to it. Interconnecting domestic gear
(SOL -20dBv) with pro gear (SOL +4dBv) will probably degrade the SNR.

7. Polarity and Phase

A basic consideration of any electrical interconnection is whether the signal being captured by
the mic and transmitted thru electronic equipment will be of the same polarity or phase
throughout the chain.

Two signals are “in-polarity” when their voltages move together in time. They are “out-of -
polarity” when the voltages move in opposite directions in time. Phase relationships are more
complex and involve degrees of phase shift and a frequency specification ie 90 degrees out of
phase at 500Hz.

When a signal is out of polarity it is 180 degrees out of phase at all frequencies. The term
polarity is used when talking about balanced lines because frequency and degree of phase
shift usually are not relevent here.

The issue of whether or not signals are in polarity or not with each other becomes critical when
signals from a mulitrack programme are mixed to mono and interference is caused due to ou-
of-polarity signals.

106
SCHOOL OF AUDIO ENGINEERING A06– Audio Lines and Patchbays

Student Notes

7.1 Relative Polarity

Describes a comparison between 2 signals such as a stereo pair, where their polarity
relation ship to each other is of primary concern. Relative polarity of a signal may be
compared at any 2 points in the signal chain.

7.2 Absolute Polarity

The signal at any point in the chain is compared with the originally captured acoustic
signal (A). Absolute polarity testing of a system is done using an assymmetrical
waveform such as a sawtooth wave whose inversion will be immediately noticed on
an oscilliscope inserted at various points in the chain.

7.3 Polarity of Connectors and Equipment

Confusion can arrive when connecting equipment with balanced wiring which has an
in polarity and an out of polarity signal. A balanced connection has three pins +, - and
shield/ground. In the signal chain, hot must always be connected to hot, cold to cold
etc. Any reversal of this procedure will upset the absolute polarity relationship in the
system . The convention for balanced wiring is:

Pin 1 = shield; pin 2 = in-polarity (hot); pin 3 = out-of-polarity (cold)

8. Mixed Format Interconnections

Systems containing both balanced and unbalanced equipment pose a special problem. In an
average studio, a large variety of music oriented processing gear , particularly for guitar and
keyboards, is unbalanced. Many signal processing devices are balanced in and unbalanced
out, as are many console inserts. A system of interconnection must be devised which will
overcome the four main problems of mixed format systems:

• Increase in noise and distortion due to level mismatching.


• Increase in noise due to impedance mismatching.
• Noise generated by an output being shorted to ground.
• External noise pickup due to lack of proper CMR.

Interface convertors between -10dBv and +4dBv can be used to overcome level and
impedance mismatches. A method of interconnection called forward referencing can be
used to produce CMR between an unbalanced output and a balanced input:

8.1 Forward referencing

Will work where mixed interfaces have a balanced input or where an unbalanced
output drives a balanced input. In the wiring schema, a shielded, twisted pair is used
with the shield and negative wire connected to the unbalanced output’s earth
connection. At the balanced input, hot and cold are connected to their respective
terminals and the earth/shield is lifted. Foreward Referencing will produce common
mode rejection or EMI and ground loop noise at the input end.

9. Shielding

107
SCHOOL OF AUDIO ENGINEERING A06– Audio Lines and Patchbays

Student Notes

Shielding is used to control noise by preventing transmission of EMI from the source to the
receiver. Shielding in the form of braid or foil is commonly used to wrap the single or multiple
conductors in a transmission wire. In the case of a balanced pair the shield is used as the
connection to ground.

10. Grounding

Grounding is a fundamental technique used in the control of EMI. It is done to minimize


conducted coupling and to ground shields.

The ideal ground is a zero-potential body with zero impedance capable of sinking any and all
stray potentials and current. In a grounding system all devices and all metal surfaces are
linked in a star configuration to a common ground sunk in the earth which is the ultimate
ground. Every device should be grounded just once to avoid ground loops.

The Ground is a neutral reference around which the AC signal or voltage oscillates which adds
stability, safety and a degree of resistance to EMI to the audio system.

In an audio system there are two types of ground:

1. Power Ground

2. Technical Ground.

Power Ground: In a balanced ie grounded power system of 240V, the voltage swings from
+120V to -120V around the 0V ground. An unbalanced power system may have 240V at one
pole and 0V at the other- more unstable and unsafe.

Technical Ground: The audio signal voltage oscillates between positive/negative or hot/cold
potentials with fround also being 0V neutral. In a balanced line system the potentials should
balance each other out to zero - more stable and less noisey due to CMR.

11. Twisting

Twisting of wires causes electrical fields to induce common mode signals on the wire. Twisting
reduces magnetic EMI pickup by effectively reducing the loop area of the cable to zero and is
vastly more effective than magnetic shielding. The greater the number of turns perunitlength,
the higher the frequency at which EMI will be reduced.

12. Separation and Routing

Physical separation of cables has a significant effect upon their interaction with each other. eg
crosstalk. Coupling between parallel wires diminishes by 3dBv per doubling of the distance
between them. It is good practice to group wires of the same level eg mic level or line level
and to keep these groups separated.

shows a list of different signal classifications. Wires of different levels should cross at right
angles.

108
SCHOOL OF AUDIO ENGINEERING A06– Audio Lines and Patchbays

Student Notes

Wire And Cable For Audio

A wire contains only one conductor while a cable can contain any number of insulated wires in
a protected bundle.

1. Characteristics of Conductors

A conductor is used to conduct electricity from one location to another. Most wire conductors
are made from copper. Wire is usually sized to the American Wire Gauge (AWG) where a
lower number represents a thicker guage eg AWG 5 is thicker gauge than AWG 15.

When choosing the right conductor for a particular interconnection purpose, there are a
number of considerations, including:

• Current capacity - will the conductor size carry the required amount of current without
overheating. Is it safe ?

• Resistance/Impedance - Does the conductor size provide sufficiently low resistance


so that signal loss in the wire is acceptable.

• Physical Strength - is the conductor size durable enough for installation.

• Flexibility - Must not be too brittle or will break when bent.

• Termination - Is the conductor the right size to be terminated in the required


connector.

• Right Cable configuration - Conductors in a cable assembly can take many


configurations.

• Insulation Characteristics - The type of insulation used on the individual conductors of


a cable affects voltage rating, flexibility, cost and ease of termination. A range of
plastics and rubbers are used.

• Capacitance and Inductance - a conductor operates as a component in an electric


circuit, alternately producing an electrical charge and an induced current. These 2
characteristics must be kept in balance for best, most transparent conduction.

• Shield characteristics - 3 popular types: braid, spiral or foil. Braid shields which are
made of a fine wire weave, work best at low frequencies. Foil shields are small, light,
cheap and easy to terminate. Used in multipair/multicore cables. Conductive plastic is
a rugged alternative to metal braids.

1.1 Coaxial Cable

Unbalanced, single strand, shielded cable. commonly used in domestic audio with
unbalanced RCA connectors. Recent use in semi-pro audio as digital connector for
S/P DIF (Sony/Philips Digital Interface). Hi-quality audio video (AV) coaxial is best for
digital data over short wire runs

1.2 Microphone cable

Shielded, twisted pairs providing an overall balanced line. Low level mic signals
require good quality shielding for max protection from airborne EMI. Low capacitance

109
SCHOOL OF AUDIO ENGINEERING A06– Audio Lines and Patchbays

Student Notes

is desirable. Usually a finely stranded 22AWG. Portable mic cables must be rugged,
flexible and strong.

1.3 Line level Cable

Physically the same as mic cable but distinguished by the fact that it carries line level
signals. Line level cables carry signals greater distances which means impedance
must be kept low (60ohms) so that the frequency response of low output signals is not
degraded.

1.4 Multipair/Multicore Cable

Individual twisted, shielded pairs bundled together in an outer jacket. Useful where
many mic or line level cables are required, but not both in the one bundle. Each pair
of wires must have an isolated shield.

1.5 Low Impedance loud speaker cables

Must be as transparent as possible to the passage of the audio signal from the amp to
the speakers. Much thicker than mic cables - usually 9 to 14 AWG. Always
unbalanced and usually unshielded. Different cable shapes and strand weaves are
marketed at expensive prices. A short wire length will keep cable
induction/capacitance and impedance effects to a minimum.

1.6 Power cables

Must be chosen to meet safety fire and electrical code standards. 14 AWG is strong
enough for AC power runs in fixed installations. Thicker gauges (8 to 10 AWG) are
used as grounding wires.

2. Cable Connectors -Single Line Mic/Line Level

There are a number of connectors used for single twisted pair balanced connections. The
balanced connectors can also be used for single conductor plus shield/ground (unbalanced)
connectios as these generally require only two contacts. XLR and Ring/tip/sleeve (RTS) are
examples of balanced connectors as they each have three terminals. TS connectors are
unbalanced and have only 2 terminals.

110
AE07 – Analog Mixing Consoles

1. Introduction

2. Mixer connectors

3. The channel

3.1 The channel path

3.2 Output from channel

3.3 Insert

3.4 Direct out

3.5 Auxiliary

4. Routing

5. The Master Section

6. Mixer Design and applications

6.1 Split consoles

6.2 In-line consoles

6.3 Monitor Consoles

A Recording Console

1. The Recording Process

1.1 Recording

1.2 Monitoring

1.3 Overdubbing

1.4 Mixdown

2. Console Design

2.1 Signal Flow

2.2 Pots, Faders, Buttons.

3.1 Channel Input

111
3.2 Equalisation (EQ)

3.3 Dynamics Section

3.4 Auxilliary Section(Aux)

3.5 Channel Fader

3.6 Channel Path Feeds

4. Monitor Section

4.1 Bus/Tape Switch

4.2 EQ in Mon

4.3 Dynamics in Monitor

4.4 Aux in Monitor

4.5 Monitor Pot/Small Fader

4.6 Monitor Mute Button

4.7 Monitor Solo Button

4.8 Fader in Monitor/fader Rev Button

4.9 Monitor Panpot

4.10 Monitor Bus- stereo

5. Subgroups/Group Faders

6. Master Section

6.1 Aux Send/Return Masters

6.2 Communications Module

6.3 Master Monitor Module

GAIN STRUCTURING

1. Gain stages

2. Preamps

3. EQ

112
4. Faders

5. Stereo masters and Monitors

6. Level to tape

7. Balancing A Monitor Mix

7.1 Setting Levels

7.2 Setting Pan Positions

113
SCHOOL OF AUDIO ENGINEERING A07– Analog Mixing Consoles

Student Notes

AE07 – ANALOG MIXING CONSOLES

1. Introduction

The Mixer is a device that takes electrical audio signals from different input transducers
(from 2 to as many as 96 and more) and can

i. Send them to different out-board devices or to tapes or to amplifiers


ii. Blend them together to create a mix

Every mixer has a lot of identical sections to which the input transducers are connected.
Each of these sections have identical controls for sending the input transducer signal out
of the desk and for bringing it back. These sections are called Channels.

The other section of the mixer is the master section. The master section is different from
the channels in that every control on the master section can deal with information (Signal)
from more than one channel. That is, the master section is designed to receive and send
information from more than one channel out to out-board gear and receive a return signal
if necessary. The master section is also designed to mix and blend the signals from more
than one channel.

The out-board devices refer to signal processing equipment which can be equalizers,
dynamic processors, or effect processors like reverb machines, delay machines, chorus,
flange, harmonizer, etc.

2. Mixer connectors

Signals from the transducer get into the mixer through standard connectors. The input circuitry
of the mixer might be balanced or unbalanced. Most professional mixers have balanced
(differential) input circuitry.

There are two levels of signals expected by most mixers,

i. Microphone level- Small voltages that need to be pre-amplified


ii. Line level-larger voltages that do not need to be pre-amplified.

The typical jacks found at the input section of most mixers are

i. Canon (XLR)- These typically connect to the microphone level input of the desk
which as mentioned above connect to the pre-amplifier circuit.
ii. TRS Phono- These jacks typically connect to the line level inputs and are TS
compatible though TRS is used because the inputs are balanced.

All the output sections on most mixers send out line level signals (with exceptions). Most are
also connected using the TRS balanced Phono plug. But on some mixers the main left and
right out may use XLR connectors.

3. The channel

114
SCHOOL OF AUDIO ENGINEERING A07– Analog Mixing Consoles

Student Notes

The channel is also called the in/out (I/O) module. Each channel has two sets of inputs

i. Microphone level (XLR)


ii. Line Level (TRS)

Once the signal comes in to the mixer it can be mixed with others or sent to out board devices
for processing.

3.1 The channel path

The channel path describes the specific controls on that channel through which a
signal on that channel passes. In order to process a signal (send it to a processor)
there has to be controls for sending the signal out either as a single signal from that
channel only or in combination with other signals from other channels.

The channel path consists of three main controls for determining how much level of
the signal on a channel gets sent out to either the master section or to a processor.

Gain- This is normally a rotary control at the top of the channel It has two main
functions.

i. It controls the pre-amplifier


ii. It controls the maximum signal that enters the desk from the input.

Fader- This controls the level of signal that goes out of the channel.

Auxiliary- This also control the level of signal that goes out of the channel to the
auxiliary master control in the master section. Each channel normally has more than
on auxiliary. This is to increase the number of different places the signal on that
channel can be sent.

3.2 Output from channel

Each channel on the desk is internally wired to the master section, either to the main
L/R out or to some other output on the master section. However, other alternatives are
provided to give the engineer access to the signal on each channel in isolation..

If the signal is being sent along with signals from other channels to the same out
board processor, then each signal must be sent to the same single output with a
single control on the master section first. This single control then acts like a ‘bus’ for
all the signals routed to it.. One of such is the auxiliary mentioned above, or the Group
buses or the main L/R Bus.

If the signal is to be sent in isolation (on its own only) then it need not go to the master
section, it might be better to send it out from the channel itself. There are two ways to
send a signal out from the channel itself.

i. Insert point
ii. Direct out.

Processing- There are different types of outboard devices falling into two main categories
as far as mixer routing is concerned. These categories are

115
SCHOOL OF AUDIO ENGINEERING A07– Analog Mixing Consoles

Student Notes

i. Processors that should be Inserted in the channel path and


ii. Processors that should be used on an auxiliary (supplementary) copy of the
main signal.

3.3 Insert

The insert as mentioned earlier is a way of sending the signal from just one channel
out for processing. It is unique in that the signal that goes out through the insert has to
come back again through the insert or the channel will have no signal anymore. That
is, the signal coming in from the XLR or TRS input the signal will be sent out through
another connector called the ‘insert out’ or ‘insert send’ or just ‘send’. When the signal
goes out like this, it no longer exists on the channel; it is now inside the processors
connected to the insert send. To be able to get the signal back on the channel path,
you have to connect the output of the processor to the ‘insert in’ or ‘insert return’ or
just ‘return’.

Therefore to use the insert connections attached to every channel on the mixer you
need to make two connections

i. Insert send- takes the signal out of your channel and sends it to the processor
ii. Insert return- brings the signal back in to the same channel for further routing
or mixing with other signals.

3.4 Direct out

The direct out is another connection that allows you take signal out of one channel
only. It is similar to the insert in that each channel has its own direct out but it is
different in that the direct outs do not affect the signal on the channel. They do not
have direct in! It is just an optional output for each channel. It acts like an auxiliary for
single channels only.

3.5 Auxiliary

Another way of getting a signal out of from a channel but not in isolation is the
auxiliary path. The auxiliary path allows you to send varying levels of signals from
different channels out from a single point in the master section. Auxiliary controls exist
on each channel to determine how much of each channel’s signal is sent to the
master section. It also has a master control on the master section to which all the
individual aux channel controls send their signals. This controls the overall level of all
the individual channel controls. Therefore the single output connector (TRS) of the
auxiliary is not associated with each channel, but there is one main output at the
master section for all.

Channels on a desk can have as many as 8 or more auxiliaries. Each of these 8


auxiliary controls will be connected to its own master auxiliary control. This means
there will be 8 auxiliary controls in the master section .

It should be noted however that if the auxiliaries are used to send group of signals to a
processor, then in order to be able to mix/blend the processed sound signal with
signal from other channels or just to route it again somewhere else, it has to be
brought back in to the desk. This can be done by connecting the output of the

116
SCHOOL OF AUDIO ENGINEERING A07– Analog Mixing Consoles

Student Notes

processor to another channel on the desk or using some dedicated inputs in the
master section called auxiliary returns.

4. Routing

Routing simply means sending the signal around. Routing on a desk can either be internal or
external. If it is external there will be cable connections required. If it is internal the same
connections are made but with switches instead of physically handling the cable.

The signal path/signal flow is very important to properly wire up a mixer and to troubleshoot.
The signal flow simply describes every where the signal passed through from it source up to
where it terminates (tape or amp).

External routing- This involves sending signal out of the desk and brought back again, This is
done by using the insert, auxiliaries, buses, direct out or any other outputs available on the
desk.

Internal routing- Internal routing involves sending channels to specific buses and sending
auxiliaries to the main auxiliary master. To route to the buses, the mixers have a set of
switches with numbers that represent the buses to which they are connected. So a switch with
number 1 will connect that channel to bus 1 and a switch with L/R will connect that channel to
the main master L/R mix bus.

But internal routing can get a little more complicated than that.

Some mixers have built in Equalizers for each channel. These equalizers are already inserted
on the channel path. With a switch you can remove (uninsert/disconnect) them from the
channel path or put them back in (Eq on/off or Eq in/out switches).

Some mixers also have other processors built in like dynamic processors etc.

Some mixers have effect processors like reverb machines built in and they have dedicated
specific auxiliaries to these effect machines. Therefore to send signal from a channel to these
inbuilt machines all you have to do is turn up the specific auxiliary level on that channel, and its
master control on the master section and the effect machine will receive the signal and
process without you connecting any cables.

The output of this effect machine might also be returned to the desk to two separate channels
that can not be used otherwise or the output connection might be left to the user or they might
bring it back into a dedicated auxiliary return just for these in the master section.

Pre or Post

Finally, you have something called pre- and post- these determine whether one processor or
control (faders and such) have access to the channel signal before another one.

Pre- means the control has access to the signal before the specified other control. For
example, an auxiliary marked pre-fader means the auxiliary has access to the signal before it
gets to the fader. Or an auxiliary marked pre-Eq means the auxiliary gets its signal before the
Eq.

Post- this means the control has access to the signal after the specified control. For example,
an auxiliary marked post-fader means the auxiliary gets its signal after the fader level. Or a

117
SCHOOL OF AUDIO ENGINEERING A07– Analog Mixing Consoles

Student Notes

direct out marked post fader means the level of the signal going out of the direct out will be
affect by fader movements.

5. The Master Section

The master section of a mixer, as stated earlier, controls the master outputs of the desk.

The master section can be designed with a lot of combinations of output options but in
general the output options will be

i. Bus out or Group out - a mix bus is a simply a control that sums up the signal
from different channels and sends them out of the desk. A mix bus is also
called a group. This is because multiple channels can be sent to the mix bus
and the mix bus will then serve as the main master control for them. All mix
buses have dedicated output connectors (TRS). At least one output to one
mix bus.
ii. Aux out
iii. Main out - The main master mix bus sends out two signals, left and right,
sometimes with a third option for centre.
iv. Tape out
v. Headphone out
vi. Matrix out
vii. Studio out
viii. Control room out
ix. Solo

Some inputs on the master section are

i. Auxiliary returns
ii. Tape in or 2-Track in
iii. Talk back Microphone input

6. Mixer Design and applications

Mixers are designed in a variety of ways to suit specific applications. A mixer designed
for use at a concert event will be different from that designed for use in a studio or a
radio broadcast facility.

The general considerations for a mixer are Number of inputs, how these inputs are laid
out on the desk, the design of the master section.

Some common mixer designs are

6.1 Split consoles

A split console is a console that has three sections. A set of in/out on either side of the
master section. The common use for this design is to send signals to a multi-track
recorder from one side and monitor the off tape signal on the other. Some designs
have most of the stereo channels on one side and the mono channels on the other.

6.2 In-line consoles

118
SCHOOL OF AUDIO ENGINEERING A07– Analog Mixing Consoles

Student Notes

This design incorporates the main in/out module and the monitor module on the same
channel strip. This means, each channel strip acts as two separate channels, each
channel with its own input jacks. This design saves space. Though, access to the
monitor path is restricted and in most designs, the two inputs share some controls on
the path like Eq and Aux Sends.

6.3 Monitor Consoles

This is a dedicated console for processing monitor signals. Monitor mixers must have
the ability to output multiple mixes of the same channels. These consoles generally
have a lot of auxiliaries and the master section incorporates a lot of alternative paths
like matrices.

A Recording Console

Recording consoles are designed specifically to serve the needs of a recording engineer.

The recording Console or the Audio production Console - provides control of volume, tone,
blending and spatial positioning of signals that are applied to its inputs by mics, electronic
instruments and tape recorders. The console provides a means of directing or Routing these
signals to appropriate devices like tape machines , monitor speakers or signal processors.
The console allows for the subtle blending and combining of a multiplicity of signal sources.

1. The Recording Process

The console is normally used in every phase of audio production and its design reflects the
many different tasks it is called upon to perform.

1.1 Recording

Audio signals from mics, electronic instruments and other sources are recorded to
magnetic tape or a digital format. The inputs can be recorded separately or all at
once, or some combination of both.

Each instrument or sound is generally recorded on a separate track of the master tape
- Tracking. Signal sources connected to the inputs of the console are assigned to
particular console outputs which are wired to to the various input channels of the
recorder. The console is used to set and balance the level of each signal as it goes to
tape.

1.2 Monitoring

The engineer must be able to hear the different sounds as they are being recorded,
and later as they are played back. The console therefore has a monitor section which
allows us to hear sounds separately or in combination. The outputs from the monitor
section feed the power amplifier and monitor speakers in the control room.

119
SCHOOL OF AUDIO ENGINEERING A07– Analog Mixing Consoles

Student Notes

Different monitoring setups apply to different stages of production. During recording


one must herar the direct input from the mic. Later monitoring of sounds off tape is
necessary, somtimes in combination with inputs being newly recorded. Finally all the
recorded tracks must be monitored together for their balancing and combining in the
mixdown stage.

1.3 Overdubbing

The additional recording or tracking of other instruments once the intitial recording or
bed tracks have been laid. This involves monitoring of the already recorded material
as the musicians play along. The musicians must be provided with a cue or
headphone mix from off tape via the auxilliary sends of the console.

1.4 Mixdown

When all tracking is finished the recorded material on the various tracks of the
multitrack recorder are combined, balanced and enhanced via the consul in the
mixdown. Console inputs are fed by the playback outputs of the multitrack
taperecorder. Equalisation and signal processing may be added via the console to
each recorded track as it is mixed.

The tape outputs are finally routed to a two-track recorder (eg 2-trk ATR or DAT) via
the console's stereo output called a Stereo Bus . This allows for the final mix to be
recorded.

2. Console Design

The design of a console is based on the applications for which the console is made. The
standard controls on a console are of the sort listed below and the requirements are for an
efficient signal flow.

2.1 Signal Flow

Console design reflects different concepts of Signal Flow ie the particular way in
which audio signals are routed from a console's inputs to its outputs. Upon input to
the console a signal said to follow a particular Signal path within the console which
is determined by particular settings made on the console surface. Two simultaneous
signal paths are Channel and Monitor Paths.

The design of the console surface always entails a series of parallel modules or
channel strips. The same functions are duplicated on each module , however,
depending on design the functions of the modules may differ slightly.

Signal flow through each module is also basically the same, however the physical
top/down layout of a channel strip does not necessarily exactly reflect the signal path
through the console.

Signal flow diagrams are used to describe the lay out of the sections of a module in
the order that the signal passes through them. Different consoles have different
module setups and different signal flow diagrams. There are two basic console types:

120
SCHOOL OF AUDIO ENGINEERING A07– Analog Mixing Consoles

Student Notes

2.1.1 Split Line

Divides the controls on the console surface into 3 separate sections: Input, Ouput,
Monitor. The signal travels from the input section, to the output section and then via the
master bus to the tape in. A split from the output section signal is taken for pre-tape
monitoring. The signal can return off-tape thru the input modules if so desired by
selecting line instead of mic. Offtape signals may be selected at the monitor section.

Split line consoles are the cheapest and simplest type of console design. They are
usually limited by their amount of outputs and routing inflexibility. Also the signal must
travel through quite a few gain stages which can add noise. Popular in Sound
Reinforcement and small small studios.

2.1.2 In-line

The in-line console places all the input, output and monitor controls for a single audio
channel within a single channel strip called an I/O Module. The inline console must
have at least as many I/O modules as there are channels an the multitrack. A 32
channel console is commonly used for 24 track recording, leaving 8 modules spare for
signal processor returns.

2.2 Pots, Faders, Buttons.

The surface of any console is laid with different controllers:

2.2.1 Pots

Short for Potentiometer. Work on a simple rotary twist motion. May be used for
boosting/cutting levels (aux Monitor Pots) or making selections (EQ freq selection.
Panpots) Gain pots can be side or center detented ie their neutral or zero point is
located extreme left or in the centre of its swing.

2.2.2 Faders

Slider controls used for gain settings.

2.2.3 Buttons

Switches that select and deselect console functions, signal paths. Channel on/off
buttons (Mutes) often light up when engaged.

2.3 Buses

An electric conductor to which one or many signals may be collected and combined.
eg the monitor bus combines all monitor signals. an output bus combines output
signals etc. After signals are combined on the bus, its output is assigned to a single
destination eg auxillary bus goes to the input of a signal processor, output bus goes to
the channel input of a ATR.

3. In-Line Console Sections

121
SCHOOL OF AUDIO ENGINEERING A07– Analog Mixing Consoles

Student Notes

An I/O module of a in-line console will typically contain the following sections:

3.1 Channel Input

A mic/line switch allows for the selection of one of two inputs:

3.1.1 Mic in

The typical mic operating level (-45 to -55 dBV) is far below that of a tape recorder. A
Mic preamplifier in the console provides the gain to bring the mic signal up to a typical
Line Level (0dB). The Preamp Gain Pot provides around 70dB of boost to a mic signal.

The mic-in section will also include:

• +48V phantom power switch for remote powering of condenser mics.


• A 10dB Pad for attenuating extreme mic signals.
• Phase Reverse Switch (f) for 180 degree reversal of signal phase.
• High and Lowpass Filters for cutting very low rumble such as aircon noise (HPF) or
very high nioise (Hiss from amp) from the input signal.

3.1.2 Line in

Selects a line-level signal to the input section such as an electronic instrument or a tape
return signal. Sometimes there is a separate line level gain pot which will boost or cut
an input signal by around 30dB.

3.2 Equalisation (EQ)

Fed directly from the input section, although is usually bypassed until selected by an
EQ to Channel Select Switch. Equalisation provides boost and cut of the signal at
particular, selected frequencies. An EQ section will contain from 2 to 4 equalisers
which cover the frequency spectrum. Types of equalisers include:

3.2.1 Filters (HPF and LPF)

Sometimes included in the input section, but these are still equalisers. Provide cut only.

3.2.2 Shelf EQ

Bass and treble boost or cut Pots.

3.2.3 Sweep EQ

Boost or cut Pot plus centre frequency selector pot.

3.2.4 Parametric EQ

Same as Sweep EQ but with the addition of a pot for selecting bandwidth or Q (0.5 to
9). Hi Q is narrow bandwidth.

122
SCHOOL OF AUDIO ENGINEERING A07– Analog Mixing Consoles

Student Notes

3.3 Dynamics Section

Some large, expensive in-line consoles have an on-board dynamics section in each
I/O module to perform gating, compression and limiting. Buttons allow the dynamics
section to be selected at the channel input ie pre-EQ or channel output ie post fader.

3.4 Auxilliary Section(Aux)

A module may contain anywhere from 2 to 8 aux sends and they are numbered
accordingly ie Aux 1, Aux 2etc. The Channel signal is tapped or split. The aux pot
determines the level of split signal sent to its corresponding aux master bus, (ie aux
pot 1 sends to aux master bus 1 etc) The aux master bus 1 collects all aux 1 signals
and sends them to an outboard signal processor such as a reverb unit.

Auxs may be mono, or configured in stereo pairs with a corresponding panpot. A


stereo select switch allows for either option. An aux may be placed in the signal path
before or after the the channel fader with the use of the PRE/POST Switch. A Stereo
Auxilliary pair is commonly used for the Cue/headphne mix sent to musicians when
overdubbing.

The cue mix is Pre fader so that fader movements do not effect the headphone mix.
Mono auxs are commonly used for rev/echo sends in mixdown and are usually post
fader so that the effect may be faded with the dry signal.

3.5 Channel Fader

The channel fader controls the overall level of the signal before it is sent to either an
output bus for tracking or to the stereo bus for mixdown. The channel fader is typically
long throw and provides 30 to 40 dB of boost or cut to the signal although the fader
should usuaually remain around its unity gain position of 0dB.

The fader section also contains channel cut or MUTE switch which simply turns the
channel on and off, effectively stopping the input signal from going any further.

3.6 Channel Path Feeds

At the output of the of the channel fader, the channel signal may be routed to one or
more of the following destinations and their associated buses:

• Multitrack Taperecorder via Output bus or Direct bus


• Aux master buses
• Monitor System and two track taperecorder via Master stereo bus
• Solo Bus

These buses are accessed via the following buttons found on each I/O module:

3.6.1 Channel Assignment Matrix

The Channel Assignment Matrix can distribute the input signal to any or all tracks on
the multitrack recorder. A signal input into module 14 may be assigned to track 14 on

123
SCHOOL OF AUDIO ENGINEERING A07– Analog Mixing Consoles

Student Notes

the multitrack by pressing button 14. Alternatively the same signal may be sent to track
15 by pressing button 15. Two adjacent tracks (14 and 15) may be chosen to operate
as a stereo pair. A Pan Switch and Channel Panpot allow the panning of the signal L eft
or Right in the stereo pair.

The output bus combines all the signals assigned to that module's output section. The
output buses are usually wired to the track inputs of the multitrack.

DIRECT button assigns the input signal directly to the output bus associated with that
module ie 14 in direct to 14 out.

3.6.2 Aux Pre/Post Switch

Selects the channel aux send to post channel fader which means signals in the aux bus
will be effected by fader movements. Post is often selected for effects.

3.6.3 MIX Button

Assigns the input signal from that module directly to the stereo master bus. Use this
button to assign channels for mixdown.

3.6.4 SOLO Button

Is also located by the channel fader. When pressed, the solo switch simultaneously
mutes all other unsoloed channels and routes the soloed signals to a master solo bus
which in turn cuts into the monitor system. This allows us to check out a particular track
in the mix without having to mute every track manually. A signal may be soloed Pre or
After fader, (PFL or AFL) or In place where it will retain its panned position in the mix
and any effects assigned to it via the auxes.

3.6.5 Panpot

The Panoramic Potentiometer allows the incoming signal to be routed in varying


proportions to a pair of output buses or to the master stereo bus. With one side routed
to the left speaker and the other to the right, the panpot allows the signal to be moved or
panned from Left to center to right.

4. Monitor Section

Within an I/O module, the incoming signal actually travels along 2 paths: the channel path and
the Monitor Path. The channel path just discussed uses gain or level adjustment to set the best
possible levels to tape when tracking. The monitor path allows the engineer to create a monitor
mix of the signals being recorded or already recorded. This monitor mix should resemble the
musicality of the program material so that everyone in the control room can judge how the
track is shaping up. The monitor section is composed of the following controls:

4.1 Bus/Tape Switch

Selects which signal feeds the monitor path either a signal from the I/O module's
output bus or a signal from the multitrack recorder's output (tape return).

4.2 EQ in Mon

124
SCHOOL OF AUDIO ENGINEERING A07– Analog Mixing Consoles

Student Notes

Selects the EQ section into the monitor path so that the signal may be equalised
without going to tape.

4.3 Dynamics in Monitor

Selects the dynamics section into the monitor path.

4.4 Aux in Monitor

Selects an aux send into the monitor path so that the signal may have rev/echo added
without being recorded.

4.5 Monitor Pot/Small Fader

Controls the monitor signal level from that module to the monitor bus.

4.6 Monitor Mute Button

Turns the monitor path on or off for each module.

4.7 Monitor Solo Button

Routes the monitor signal to the solo bus .

4.8 Fader in Monitor/fader Rev Button

Places the channel fader in the monitor path and the monitor pot in the channel path
ie reverses their functions. Useful when a fader has broken.

4.9 Monitor Panpot

Pans the monitor signal left, right, centre in the stereo monitor mix.

4.10 Monitor Bus- stereo

Combines the monitor signals from all the I/O moules and feeds a stereo signal to the
master stereo bus for output to 2-track and studio monitor speakers. In some consoles
the stereo master bus and the monitor bus are the same.

5. Subgroups/Group Faders

The inline console provides a group function mode where the level of any group of incoming
signals may be adjusted by a single fader which is designated as Group fader. Any channel
fader may be selected as a group fader and and any other fader output may be assigned to its
control. Group faders or subgroups are useful during mixdown to control the level of a group of
instruments in the mix such as drums or strings with a single fader.

6. Master Section

125
SCHOOL OF AUDIO ENGINEERING A07– Analog Mixing Consoles

Student Notes

The master section of the inline console contains master controllers for the various buses,
switches which can globally reset the console and monitor and communications functions.
The master section is responsibler for the overall coordination of the I/O modules.

Most consoles are split into 2 or 3 master modules: Aux Send/Return masters, Communication
module, and Master Monitor section.

6.1 Aux Send/Return Masters

Each Aux Send bus is controlled by a master level pot which controls the signal level
sent out of the console to signal processors of headphone amplifier (Cue Mix).

A signal sent out to a precessor will return to the console processed or ‘Wet”. A
console may provide a number of stereo Aux Returns each with a master control pot
and a panpot. The signal from these returns may be assigned to an output buss for
recording to tape or sent directly to the master stereo bus for mixdown or monitoring.

6.2 Communications Module

Usually contains:

• talkback mic for communicating to artists in the studio


• Signal generator - sine waves and pink noise for calibration and testing purposes.
• Slating system - allows a voice or tone to be recorded to tape to identify starts of
songs etc.

6.3 Master Monitor Module

Contains the master controls associated with the control room and studio monitor systems.:

• Monitor Selector Matrix- selects taperecorder or other stereo output to be


routed to the monitor section.
• Mix Selector routes all off tape signals and any “live”output bus signals to the
monitor section
• Studio and Control room Monitor Master level pots and on/off switches.
• Speaker select A/B chooses nearfield or reference monitor speakers.
• Mono switch
• Stereo master Fader a single fader used to adjust the output levels from the
mixdown/monitor buses before these levels are sent to spekers or 2-trk
machine.
• Solo Master Level pot - controls level of the output from the Solo bus.
• Select PFL or In place Solo button

GAIN STRUCTURING

From mic to console to recorder, there are a multitude of level controllers spread over the
signal path. The active circuit element (usually an op amp) governed by a level control is
called a Gain Stage. Gain structuring refers how you choose to set these various levels so
that the SNR of the signal remains acceptable and the overall level stays balanced around
0VU.

126
SCHOOL OF AUDIO ENGINEERING A07– Analog Mixing Consoles

Student Notes

1. Gain stages

A signal entering the console will travel thru the input gain stage, the EQ section, the fader
section, possibly a group fader before going to tape. A split of the signal may go through the
various aux gain stages as well as the internal gain controls on a signal processor before
getting to the stereo bus. Gain structuring is the method of adjusting these levels for an
optimum clean signal.

Each level control exists within the context of an entire signal path . This path is affected by
both upstream and downstream level controls. Also the SNR can only get worse after the
signal leaves the mic or instrument.

A practical guideline: avoid setting any gain stage at its highest or lowest extreme, with the
possible exception of the original source’s output level. An ideal level for each gain stage is
75% of its maximum output.

The best approach is to start at the beginning of the signal chain and work through each gain
stage. By watching the meters and listening the levels can be adjusted until an optimum signal
balance is achieved.

2. Preamps

The preamp level pot is the most important gain stage in the signal chain. Mic preamps apply
the most gain to the signal. A very low signal may have to be turned up, but this will inevitabley
add noise. Better to find a better mic position where the signal is stronger. A heavy signal
which requires attenuation leaves no headroom and is likely to distort at peak points. Again,
the solution is in trying to moderate the sound source.In most cases its best to set the preamp
trim in a position that allows some flexibility and avoid changing it.

3. EQ

An equaliser is a gain stage which will boost or cut a signal at a particular frequency. If the
signal from the input preamp is too hot there will be little headroom left for equalisation.
Boosting a signal at certain frequencies before it goes to tape may increase noise or distort
the timbral character of the instrument. Once a frequency is cut and recorded it cannot easily
be replaced.EQ can often be deslected as a gain stage and this is useful for keeping noise
levels at a minimum.

In general, equalisation should only be used when necessary. Many engineers prefer not to
EQ to tape but to leave their options open for the mixdown

4. Faders

The channel fader is used to make fine adjustments to the signal as it naturally dips or jumps
as the sound source level changes. The setting of the fader especially effects headroom. If it is
set too high its hard to to boost a signal in a quiet passage. Conversely, setting the fader too
low will not allow a proper fadeout before noise sets in.

In practice the fader should be kept around the 0dBm mark on its slider path with the signal
registering 0VU on the meters. The same considerations apply to group faders. This way the
channel fader is used to change the level of a single instrument in the group, and the group
fader will determine the level of the group of instruments either to tape in in a mixdown.

127
SCHOOL OF AUDIO ENGINEERING A07– Analog Mixing Consoles

Student Notes

5. Stereo masters and Monitors

Don’t forget to set the stereo bus mater fader at its unity position. This again preserves
headroom and allows for a smooth fadeout of the whole track at mixdown.

The level at which one listens while making gainsettings can effect the overall balance of the
sound. This is because of the non linearity of our hearing (Equal loudness contours). If we
listen at too low a level we can miss details in the program like hiss and minute distortion. Also
we will tend to overcompensate for the frequencies our ears are not picking up which could
lead to overequalising a sound. Conversely, monitoring at high levels will lead to ear fatigue
and cutting too much bass or treble.

It is therefore good to work out a relationship between monitor gain and the optimum signal
level. One way would be to play a well balanced program at 0VUand measure the SPL at the
mix position with an SPL meter. Increase monitor gain until the meter reads 85dB which is the
optimum listening level. Mark this point on the monitor gain pot as a reference level for future
monitoring.

6. Level to tape

Its good practice to strive for an average level of 0VU when recording to tape. However there
are some exceptions to this rule.

• Sync tones should be recorded no higher than -5VU so their signal won’t bleed
onto other tracks.
• Drums have very fast attack transients which can distort at 0VU. Levels of around
-5 to -3 VU will give more headroom and allow for some equalisation later.
• Some high pitched percussion instruments like cowbell or xylaphone will also
bleed easily and should be tracked at around -5VU.
• Electric Guitar, especially rhythm guitar, can be recorded at hotter than 0VU
levels, but not so that the meter pins.

7. Balancing A Monitor Mix

Levels are often recorded to tape irrespective of the overall musical balance of the program
being recorded. This is because other considerations such as SNR are primary in recording.
However one must still be able to judge the quality of a performance and an approximation of
the final product. Hence instruments are blended in a musical way (ie mixed) and sent to the
monitor speakers.

The monitor mix is performed using the level pots and panpots in the monitor section to
roughly balance and position the instruments. Often reverb and equalisation will also be
necessary to please a client. These effects can be selected into the monitor path and have no
effect on the recorded sound.

7.1 Setting Levels

A rough balance of the instruments is provided using the monitor level pots. The aim
is to highlight the important elements of the program and also any signal which the
client may wish to study in detail. The feeling or mood or dramatic presence of the

128
SCHOOL OF AUDIO ENGINEERING A07– Analog Mixing Consoles

Student Notes

program should be drawn out in the monitor mix, however everything should be
equally balanced so that all options are still left open.

7.2 Setting Pan Positions

Start with the most important elements such as vocal, then position the supporting
instruments around them. Panning involves a left- right spread of instruments across
the soundstage. There are no definite rules , but their are some conventions which
most clients will expect to hear.

7.2.1 Center Position

The mix must create a focal point for the listener. This is accomplished by taking certain
key elements such as lead vocal, lead instrument, kick and snare drum and giving them
prominence in the mix. One way to this is to pan them to the centre, as this is where the
listeners attention is naturally drawn.

Centre pan position is also used for low frequency sounds which lack clarity and
directionality. the bass guitar is most often at the centre of the mix.

7.2.2 Far Left/Right Position

The most crucial use of extreme L/R panning is in the positioning and opening up of
stereo recorded or stereo emulated tracks. This is so that the localisation cues inherent
in these tracks can be fully present in the mix.

Drum miking usually includes a stereo overhead pair. These should be panned extreme
l/R so that the various elements of the kit will appear in position across the sound stage.
Stereo miked piano should also be opened up so the keyboard will sound from Hi to low
across the stage.FX returns are often in stereo and these must be panned extreme L/R.

Single instruments can be on the far sides of the mix, especially if the effect required is
one of distance from the centre or commentary on the centre eg backing vocals or lead
guitar playing with lead vocal.

2
Assignment 2 – AE002

129
SCHOOL OF AUDIO ENGINEERING A07– Analog Mixing Consoles

Student Notes

130
AE08 – Signal Processors

DYNAMIC RANGE PROCESSORS

1. Dynamic Range Control

2. Gain Riding

3. Amplifier Gain transfer characteristics

4. Electronic gain riding

5. The Compressor/Limiter

5.1 Operation

5.2 Compression ratio/Slope of the Compression Curve

5.3 Threshold/Rotation point

5.4 Attack Time (dB/ms)

5.5 Release Time (dB/sec)

5.6 Metering

5.7 The Limiter

5.8 Using Compressors

6. EXPANDERS/GATES

6.1 Operation

6.2 Gate

6.3 Expansion Ratio

6.4 Expansion Threshold

6.5 Range

6.6 Attack

6.7 Hold

6.8 Release

6.9 Key Input

131
Equalisation & Filters

1. Equalisers: What They Do

2. Equalisation Characteristics

2.1 Equalisation Curve

2.2 Bandwidth and Quality(Q)

2.3 Active/Passive EQ

3. Equaliser Electronics

4. Controls

5. Equaliser Types

5.1 Shelf EQ

5.2 Parametric and Sweep Equalisers

5.3 Notch Filter

5.4 Presence Filter

5.5 Comb Filter

5.6 Graphic Equalisers

6. Practical Equalisation

6.1 Extreme low bass: 20 to 60Hz.

6.2 Bass: 60 to 300Hz.

6.4 Lower Midrange: 300 to 2500Hz.

6.5 Upper Midrange: 2500Hz to 5000Hz.

6.6 Presence" range: 5000Hz to 7000Hz.

6.7 High frequency range: 7000Hz to 20000Hz.

6.8 Vocal Equalisation

7. Some Uses of Equalisation

8. Eq Chart

132
DIGITAL SIGNAL PROCESSORS

1. Digital Delay Line (DDL)

2. Pulsing

3. Modulation Parameters

3.1 Sweep range, Modulation amount, depth

3.2 Modulation Type

3.3 Modulation Rate

4. Setting up Reverb devices

133
134
SCHOOL OF AUDIO ENGINEERING A08– Signal Processors

Student Notes

AE08 – SIGNAL PROCESSORS

DYNAMIC RANGE PROCESSORS

This unit Introduces the student to the basics of controlling the dynamic range of program
material.

1. Dynamic Range Control

In broadcasting and analog recording, the original signal’s dynamic range often exceeds the
capabilities of the medium, and must be reduced or compressed to fit the capabilities of that
medium.

The dynamic range of music is around 120dB while the dynamic range of analog tape and FM
broadcasting are around 60 to 70 dB. Often when we try and reproduce the wide dynamic
range programe through the narrow dynamic range medium information can be lost in
background noise and distortion. Therefore, the dynamic range of the program material must
be compressed until it fits within the S/N limitations imposed by the medium.

2. Gain Riding

This can easily be demonstrated when miking a singer who is expressively changing their
dynamics to suit the song eg Mariah Carey singing “Can’t Live if Living is without you” where
she goes from a breathy whisper to an impassioned scream. If we watched the VU meter of
the vocal we would see it dip below an acceptable level then suddenly peak into the red.

The engineer’s reaction is to try and balance the level and keep it at unity gain by moving the
fader up and down in response to the dynamic changes. This method, called Gain Riding, is
commonly used to keep a vocal track balanced. However it’s not very practical when the
sound can shift dramatically without warning and where there are many channels, each with
independent gain variables.

Under these conditions you either need an octopus with perfect reflexes or some method of
automatic gain adjustment.

3. Amplifier Gain transfer characteristics

The same situation can be looked at from the point of view of an amplifier whose gain has
been adjusted in response to a signal whose dynamic variations exceed the SNR of the
system.

The input level on the horizontal axis varies from -15 to +15 dB. The SNR of the system is
around 25dB. Three different level adjustments a,b, and c are made to the input signal in an
attempt to contain it within the dynamic constraints of the system:

a. 0dB adjustment ie flat response

b. A boost of the fader by 10dB

c. A cut of the fader by 10dB

135
SCHOOL OF AUDIO ENGINEERING A08– Signal Processors

Student Notes

None of these static gain adjustments is adequate to track the dyanics of the input signal to
yield an output where the signal neither distorts nor is lost in the noise floor. In fact a
combination of a,b and c gain adjustments would be required ie a continuous or variable gain
change called COMPRESSION.

4. Electronic gain riding

Compression can solve this problem by continuously varying the amp gain as the signal
varies. This produces an output signal that fits within the limits imposed by the system. This
effect may be produced by means of a variable gain amplifier in which the gain of an
electronic circuit varies automatically as a function of the input signal. Two devices which use
a variable gain amplifier to achieve electronic gain control are the Compressor/Limiter and the
Expander/Gate

5. The Compressor/Limiter

5.1 Operation

A compressor/limiter is a variable gain amplifier in which the dynamic range of the


output signal is less than that of the applied input signal. A compressor is in effect, an
automatic fader. Through compressor operation, high level signals are dropped below
system overload.When an input signal exceeds a predetermined level called a
threshold compressor gain is reduced and the signal is attenuated.

5.2 Compression ratio/Slope of the Compression Curve

The increase of the input signal needed to cause a 1dB increase in the output signal.
This sets the ratio of input to output dynamic range where unity gain equals 1:1 For
example a 2:1 compression ratio setting means that the output dynamic range is half
that of the input. In this instance an 8dB increase in input would produce a 4dB
increase in output for a Gain Reduction of 4dB. Higher compression ratios will require
more input signal for a stronger output response. This means that at high levels of
compression eg 20:1 or infinity to one there is very little gain increase .

5.3 Threshold/Rotation point

The threshold setting defines the point at which gain reduction begins. The output of
the compressor is linear below the threshold point.

1:1 No gain reduction. the output is equal to input. Linear gain transfer characteristic.

2:1 Every 1dB of gain beyond the rotation point is reduced by half eg +10dB gain
above unity will be compressed to +5dB with a 5dB gain reduction.

4:1 +12dB gain beyond threshold will be compressed to +3dB with a gain reduction
of 9dB

20:1 +10 dB gain beyond the threshold will be compressed to 0.5dB with a gain
reduction of 9.5dB

136
SCHOOL OF AUDIO ENGINEERING A08– Signal Processors

Student Notes

A ratio of infinity will give 100% gain reduction above threshold no matter what the
input level is. This is called Limiting.

Threshold settings can be varied with the threshold pot The lower the threshold
setting, the quicker the compression slope is implemented. Thus the amount of gain
reduction is always related to the threshold setting. A signal which is 10dB over unity
and compressed by 2:1 will have gain reduction of 5dB. If the threshold is lowered to
-10dB the gain reduction of the same signal will increase to 10dB.

Compression can be instantaneously applied at the threshold producing a standard “Hard


Knee” curve. In overeasy compression which is used in all DBX compressors, the increasing
input signal gradually activates the gain reduction amplifier as the threshold is approached and
full compression is not acheived until a short distance after the threshold point has been
passed. This leads to a gradual compression around threshold yielding a curve referred to as
“Soft-Knee”. This smoother transition is considered to be less audible than an abrupt hard
knee compression, especially at higher ratios.

5.4 Attack Time (dB/ms)

Since musical signals vary in loudness they will be above the threshold at one instant
and below it in the next. The speed with which the gain is reduced after it exceeds the
threshold is referred to as the Attack Time. Attack time is defined as the time it takes
for the gain to decrease by a certain amount, which is isually 63% of the final value
of the gain reduction.

Why does attack time need to be varied ? The perception of the ear to the loudness
of a signal is proportional to that signal’s average or RMS level. Large, short duration
peaks do not noticeably increase the perceived loudness of a signal. However these
fast peaks add to the feeling of liveness and dynamism and because they are so fast
they produce little noticeable distortion to analog tape (They may however distort in
the digital medium).

Thus attack time needs to be varied so that a balance is preserved between keeping
the edge or peak attack of a signal, whilst compressing it fast enough to control the
average level of the program. In this way our perception of the dynamic range of a
piece is kept intact ie we don’t actually hear the gain reduction occurring.

Instruments with fast attack times like percussion should not necessarily have their
compression attack times set at 0ms. Altho this offers maximum protection from high
level transisnts, it can lead to a lifeless sound. A little bit of attack time will preserve if
not increase the sense of percussive attack.

In general sounds with a fast attack eg percussion will require faster attack time
settings than sounds which slowly swell eg flute, clarinet.

Vocal usually requires a moderate attack time.

5.5 Release Time (dB/sec)

Setting the release time controls the closing of the compression envelope when the
input signal falls below thr eshold. The release time is defined as the time required for
the processed signal to return to its original level.

137
SCHOOL OF AUDIO ENGINEERING A08– Signal Processors

Student Notes

If release time is set too short and full gain is restored instaneously, Thumps,
Pumping and Breathing can be heard due to the rapid rise of background noise as the
gain is suddenly increased. Also, if rapid peaks were fed into the compressor, the gain
would be restored after each one and the overall gain of the program material would
not be effectively reduced.

Gentler release times control the rate at which gain is restored to create a smoother
transition between compression and non-compression. Typically, sounds with a long
decay envelope will require longer release time settings eg strings.

As the calibrations on the pots suggest, Release times are on the whole much longer
than attack times.

5.6 Metering

Compressors usually have built-in metering to allow monitoring of the amount of gain
reduction taking place. The meter reads 0VU when the input signal is below the
threshold and falls to the left or right to indicate the number of decibels of gain
reduction. Bar graph LED displays are used on some units to indicate amount of gain
reduction.

5.7 The Limiter

When the compression ratio is made large enough (>10:1) the compressor becomes a
limiter. A limiter is is used to prevent signal peaks from exceeding a certain level in
order to prevent overloading amplifiers, tapes or discs. Since such a large increase in
the input is required to produce a significant increase in the output, the liklihood of
overloading the equipment is greatly reduced.

Peakstop/clipper:

Limiting can be used to stop short term peaks from creating distortion or actual
damage. Extremely short attack and release times are used so that the ear cannot
hear the gain being reduced and restored.

5.8 Using Compressors

Some common compressor applications are:

* Fattening a kick or snare

* Adding sustain to a guitar of synth string sound.

* Smoothing out a vocal performance

* Raising a signal out of a mix

* Preventing sound system overload.

* Balance out the diffeent volume ranges of an instrument. eg bass guitar strings.

138
SCHOOL OF AUDIO ENGINEERING A08– Signal Processors

Student Notes

* Reduce sibilance (de-esser) by inserting in the compression circuit a HPF that


causes the circuit to trigger compression when a excess if High frequency signal
is present.

Compressors are also used to reduce the gain of a stereo mix so that the average
level will be booosted and the mix will sound louder. A 2 channel compressor in
stereo mode or 2 identical compressors in link mode must be used. This
interconnection mixes the outputs of the 2 units so that a gain reduction in one
channel will produce an equal gain reduction in the other channel preventing the
Centre Shifting of information in the stereo image.

6. EXPANDERS/GATES

Many audio signals are restricted in their dynamic range by their very nature eg recordings
with a high level of ambient background noise, amplified guitar sounds with hums and buzzes.
These noises are masked by the sound itself, but become noticeable in its absence. An
expander or gate can be used to stretch the dynamic range of these noisy signals by adjusting
the signal’s internal gain to attenuate the noise as the signal level drops.

6.1 Operation

An expander is a variable amplifier in which the dynamic range of the output signal is
greater than the dynamic range of the applied input signal. Low levels are attenuated.
High levels are boosted.

Expansion is the process of decreasing the gain of a signal as its level falls and/or
increasing the gain as the level rises. When a signal level is low ie below the
expansion threshold gain is lowered and program loudness is reduced. Certain
expanders also perform the opposite function of raising the gain as the signal rises
above the threshold.

In this way, expanders increase the dynamic range of a program by making loud
signals louder and soft signals softer. They can be used as noise reduction devices
by adjusting them such that noise is removed below the threshold level, while the
desired signal will pass above the threshold.

6.2 Gate

A simple expander whose parameters are set to sharply attenuate an input signal
whenever its level falls below a threshold

6.3 Expansion Ratio

The ratio of input to output dynamic range. The ratio controls downward expansion
from 1:1 or unity to 20:1. The output is adjusted continuously by the input signal over
its entire dynamic range.

eg A 2:1 ratio produces a 2dB change in output level for every 1dB of input. The
higher the expansion ratio, the higher the dynamic range of the output signal.

The ratio control changes the effect from an expander to a gate. Low ratios below
4:1 produce controlled downward expansion with a smooth transition between signal

139
SCHOOL OF AUDIO ENGINEERING A08– Signal Processors

Student Notes

and noise reduction. High ratios around 30:1 produce a gating effect where the signal
is abruptly cut off.

6.4 Expansion Threshold

The point below which expansion begins. In the presence of a very low level signal
the expander gain will be at minimum. As the signal rises the expansion decreases.
When the signal passes the threshold, the expander has reduced its expansion to
unity gain and the transfer characteristic of the signal is linear.

Eg when the threshold is 0VU an expansion ratio of 2:1 will cause a -10dB input to
produce a -20dB output..

For gating purposes where high ratios are used, the threshold is often set at the
minimum level required to open the gate. When the gate is closed, the high ratio will
drop the signal below the level of audibility. All levels above the threshold will be
passed without gain change.

For expansion over a wide dynamic range, the threshold is set at a high level -
perhaps at the maximum peak level of the input signal. In this way the program
material is expanded downward. In these circumstances a lower expansion ratio is
used so that the lower levels of the program material will not be expanded downward
to inaudibility.

6.5 Range

Determines the maximum amount of attenuation. (0dB or no attenuation to 100dB or


inaudible). In many applications it is not desirable for the signal to be gated off
completely when the signal drops below threshold. This can produce a choppy quality
to the sound. Using the gain control it is possible to attenuate only slight amounts,
thus improving naturalness.

6.6 Attack

Adjusts the time to achieve unity gain after the input signal exceeds the threshold. ie
the time it takes for the system to stop working.

A fast attack signal eg snare should have a fast attack setting so that its initial
dynamism is not stifled. A slow attack sinal should have slow attack setting so that
noise is kept down as the signal develops.

6.7 Hold

Adjusts the period of delay before the onset of the release cycle, after the key signal
falls below the threshold. Use where quick rapid peaks succeed each other. eg voice
material with lots of short pauses. Instruments with strong sustain like guitar and
piano.

6.8 Release

Adjusts the time for the gain to fall to the value set by the range control. Transients
with fast releases should have fast release settings so that background noise will be
cut immediately. slow decay sounds like cymbals require loner release times.

140
SCHOOL OF AUDIO ENGINEERING A08– Signal Processors

Student Notes

6.9 Key Input

Most expanders have a key input control which allows a signal to act as a trigger to
set off downward expansion. The trigger may be:

• Another external signal. eg a snare can be used as a trigger to open the gate
allowing reverb to sound with the snare. Key input is set on external (EXT)
• The same signal with equalisation or filtering. Used for cleaning up a close miked
drum signal so the other drums don’t interefere with the track.
• If the gate is opened by a high hat or cymbal the key input can be tuned to a
bandpass frequency equivalent to that of the close miked drum. The gate will
only open when it “hears “ this frequency. This is called frequency conscious
gating .

4. Expander Applications

• Controlling Leakage in the Studio


• Reducing Feedback on stage Mics
• Keying External sounds to a percussion track
• Ducking programs for Voice overs and paging
• Adding new dynamics to existing program material

Equalisation & Filters

Introduction

To equalise or not to equalise? Many audio engineers subscribe to the idea that the least
amount of equalisation is already too much. Another group uses equalisation indiscriminately.
The simplest and best advice is that equalisation should only be used when necessary. In this
chapter we will examine different types of equaliser, (including both outboard and console-
mounted models), as well as different types of filter. We will explore what makes them work,
and the effect they have on sound. (SAE students: please use "Practical EQ" Demonstration
Tape No. 1). Finally, we will explain the controls to be found on equalisers and filters, and
show how they are used to arrive at desire effects.

1. Equalisers: What They Do

At its simplest, the equaliser allows the engineer to cut or boost any frequency or group of
frequencies within the audio spectrum. By the same process, unwanted frequencies can be
filtered out. This control over the spectrum of the sound gives the engineer a good deal of
creative freedom in changing timbre and harmonic balance, So equalisers are used not only
to correct a particular sound, but also as a creative tool. For example, you can use them
during overdubs, to match the sound of another instrument; or you can use them to control the
balance within a mix without resorting to great level changes. An equaliser can help "position"
each instrument in a three-dimensional stereo image. It can increase separation between
instruments, or to produce a better blend of the sounds of different microphones. Its tasks
range all the way from analysing and improving a control room's acoustics, to getting that
elusive sound which the producer can hear in his head, but never quite describe.

2. Equalisation Characteristics
2.1 Equalisation Curve

141
SCHOOL OF AUDIO ENGINEERING A08– Signal Processors

Student Notes

An equaliser always selects a frequency range or bandwidth to control. Although one


frequency is often specified, this will be the centre frequency of an equalisation curve
with a slope on one or both sides. For example, an equaliser selects a +4dB boost at
5KHz. Frequencies on either side of this centre will also be boosted in proportion the
the angle of the slope of the EQ curve.

2.2 Bandwidth and Quality(Q)

A typical bell-shaped equalisation curve has its centre frequency (fc) at the peak cut
or boost point of the bell. The distance between the frequencies at 3dB below peak on
both sides of the curve is called the Bandwidth of the curve. The overall shape or
Slope rate of the equalisation curve ie whether it is narrower or wider, is called the
Quailty or Q of that curve. A wider bandwidth will produce a wider Q and vice versa.
Q is determined as the ratio of bandwidth to the Center Frequency:

Quality/Slope Rate = Center Frequency


Bandwidth

2.3 Active/Passive EQ

Equalisers can be passive or active. Passive equalisers base their performance on


electronically passive components like capacitors, inductors and resistors. Passive Eq
can only cut or attenuate a selected frequency bandwidth.

Most Equalisers are active types whose circuits involve altering the feedback loop of
an op amp (operational amplifier). Active equalisers can cut or boost at selected
frequencies therefore operate as defacto gain controllers in the signal flow.

3. Equaliser Electronics

Before we look at the equaliser's controls and talk about how to use them, we must
understand a little about the electronics of the thing.The principals can be briefly stated thus:
the various control parameters of equalisation can be manipulated using three basic electronic
components: resistors, capacitors and inductors.

Resistors present an equal opposition to the signal at all frequencies. So-called resistance
capacitors present a frequency-discriminating resistance call capacitive reactance (Xc).
Capacitive reactance increases as the frequency decreases and can be expressed in the
following formula, where:

f is the frequency

C is the capacitance

1
Xc = ---------------
2pfC

From this it can be seen that a capacitor passes high frequencies better then low. Inductors
(such as voice coils), present a frequency-related resistance called inductive reactance
increases when the frequency increases. It can be calculated using the following formula,
where L = inductance:

142
SCHOOL OF AUDIO ENGINEERING A08– Signal Processors

Student Notes

XL = 2 p f L

Therefore an inductor has the opposite effect on the frequency response to that of a capacitor.

Resistors dissipate power through their reactance, but capacitors and inductors do not.

Of course other components, such as balancing networks, amplifiers compensating for


insertion losses and the like are also found in today's studio equaliser.

4. Controls

Depending upon the type of equaliser or filter the following controls are available:

Bandwidth: This control affects the number of frequencies which are being increased or
decreased around the centre frequency. Also known as "Q" or "Bell".

Frequency Selection: With this control we can choose the frequency that we wish to change.
The main frequency affected by this change will be called the "centre" or "resonant" frequency.

Roll Off Frequency: On a filter, the frequency at which decrease in signal level starts to take
place.

Roll Off Slope: On a filter, the rate at which the frequencies are decreased after the roll off
point. (Measured in dB per octave). (See Diagram 2).

Additional controls found on most graphic equalisers are; high pass and low pass filters;
input/output gain adjustments; and a bypass switch.

5. Equaliser Types

5.1 Shelf EQ

The typical bass and treble boost and cut equalisers. Shelving refers to the rise or
drop in frequency response from a selected frequency which tapers off to a preset
level and continues at this level to the end of the audio spectrum. A shelf EQ will
boost or cut all the frequencies above or below the selected frequency equally by the
same amount.

With this type of equalisation the eventual equalisation curve flattens out, after the
boost or the cut. The equaliser can be identified by its turnover frequency; that is, the
frequency at which the slope begins to flatten out to a linear response. This frequency
is generally 3dB below the maximum boost or cut.

143
SCHOOL OF AUDIO ENGINEERING A08– Signal Processors

Student Notes

5.2 Parametric and Sweep Equalisers

The difference between a parametric equaliser and a sweep equaliser is that the
sweep equaliser can only control the frequency and the amount of cut/boost, while the
parametric is able to control the bandwidth or "Q" as well.

The sweep EQ provides a bell curve boost or cut for centre frequencies not covered
by the shelf EQ. There is usually a control for selecting a particular centre frequency.
At the selected frequency the sweep EQ will boost or cut with a predetermined Q and
bandwidth.

A parameric equaliser must have these three control parameters per band (frequency,
cut/boost, bandwidth) or it cannot be termed a "parametric" equaliser. Some
parametric equalisers do not provide a continuously variable bandwidth or frequency
control. One such is the Neumann W491. Equalisers of this type are limited in the
number of frequencies and bandwidths that can be selected; however they are still full
parametric equalisers. The diagram shows the Orban 672A parametric equaliser.

Equalisers of this type can be referred to as peaking filters, which work on the LC
circuit principle. The attenuation decreases as frequency rises to the boost frequency;
it reaches a maximum at that frequency, and decreases as signal frequency continues
to increase.

The frequencies covered by these equalisers are divided into frequency bands.
simple units may have only a single band; sophisticated ones have up to four. There
are also manufacturers offering equalisers of more than four bands, such as the
Orban 672A or the 674A (stereo) which is an eight band equaliser. These are
outboard devices. In a mixing console we usually only find equalisers with a
maximum of four bands. Most consoles will provide a parametric midrange equaliser
and shelved low and high frequency EQ.

5.3 Notch Filter

A notch filter is a filter used to cut a single frequency, or several single frequencies.
They are generally used to eliminate simple hums or other pure-frequency spurious
signals which may have crept in among a recorded programme.

5.4 Presence Filter

With the presence filter the engineer can modify frequencies within the ear's region of
greatest sensitivity (2kHz to 4kHz). The presence filter has a preset "Q" and is used
to increase intelligibility of speech, or to "lift" an instrument out from the background.
One could also describe in as a "tuned peaking filter".

The filter usually has 5 preset frequencies within the above range, with a maximum
boost (Note: boost only) of 6dB. The unit is an outboard device which is switched
across the applicable audio channel. The "Q" is adjusted in such a way as to
enhance the resonance of the affected audio signal. The presence filter is usually
found in television audio and broadcast studios; in recording studios there are usually
enough other equalisers able to do the same task.

144
SCHOOL OF AUDIO ENGINEERING A08– Signal Processors

Student Notes

5.5 Comb Filter

Imagine a series of filters (or alternating bands of a 1/3 or 1/6 octave graphic EQ),
covering the entire useful audio spectrum. Filtered bands alternate with unfiltered
bands. This is a comb filter, (so-called because of the shape of the resulting
frequency graph) and they are commonly used in pairs to produce a pseudo-stereo
signal from a monaural source. The two equalisers are set such that the "spikes" of
one match the "notches" of the other; the two resulting signals are sent left and right in
the stereo mix. All those old recordings "electronically reprocessed for stereo" have
been subjected to this process.

5.6 Graphic Equalisers

The graphic equaliser derives its name from its front panel layout, which reflect its
functions graphically by dividing the audio spectrum up into a number of separate
"bands" which can be modified independently.

It is our most powerful tool for correcting acoustic problems in a room, and of course
also has applications in smoothing out audio signals.Used to correct an audio signal,
the graphic equaliser smooths out the peaks and dips of the waveform rather than
radically changing the character of the signal, which is a job best left to the parametric
equaliser.

Because it is so well suited to smoothing out an audio waveform, the graphic


equaliser is often used to "tune" a control room, a studio or a concert hall. That is, to
allow us to modify the room's acoustic characteristics to reach an optimally flat
response. There are many techniques fro this "tuning". For details, see the chapter
on Acoustics.

The graphic equaliser can also be used for certain audio effects. For example, a
"flanger" or "comb filter" effect can be approximated by cutting and boosting alternate
bands of a 1/3-octave equaliser to the maximum.

There are two main types of graphic equaliser, active and passive. Manufacturers
such as White, Court Acoustics, Klark Technik and Yamaha are leading
manufacturers of both types. In the passive design, a certain amount of signal level is
lost during processing. This loss of level - typically about 35dB - must be corrected
with an additional power stage.

The "active" graphic equaliser uses active components and induces no signal level
losses. It is actually an amplifier, designed such that only certain frequencies are
amplified (a unity gain amplifier). The passive equaliser, however, when instructed to
boost, say, 3000Hz, will actually cut all other frequencies - thus giving the impression
that 3000Hz has been boosted.

Graphic equalisers have a preset frequency and "Q" factor, whilst the cut/boost is
adjustable.

The principal of operation is that two capacitors are connected in parallel, they provide
great opposition to the signal applied to them at a particular frequency or group of
frequencies, and little resistance at all other frequencies. This results in a "cut" in the
level of the chosen frequencies. When they are connected in series, however, they

145
SCHOOL OF AUDIO ENGINEERING A08– Signal Processors

Student Notes

provide little resistance to the signal at the selected frequencies, and great resistance
to all others. This provides a "boost" to the selected frequencies.

The resonant frequency is that frequency at which, for a given setting, the parallel
circuit provides the greatest opposition and the series circuits the least opposition.

Since the equaliser achieves its boost or cut of frequencies through reactive
components, a certain amount of phase shift will be introduced into the signal path.
This is especially noticeable when simultaneously boosting and cutting adjacent
bands on the equaliser. The greater the boost and cut, the greater the phase lag. To
minimize the phase lag (reaction time) one should never boost and cut adjacent
frequencies or groups of frequencies.

Graphic equalisers fall into the following groups, according to the number of bands
into which the audio spectrum is divided:

5.6.1 Octave

The centre frequencies selected (by the manufacturer) are one musical octave apart,
eg. 50Hz, 100Hz, 200Hz, 400Hz, 800Hz, 1.6kHz, 3.2kHz, 6.4kHz etc. This unit would
be known as an 8-band graphic equaliser. The octave graphic equaliser, since it
usually only possesses 8 selectable frequencies, offers very little precise control for
studio or acoustic applications and is mostly used by musicians as a special effect to
equalise a musical instrument.

5.6.2 Half Octave

Where the frequencies are 1/2 octave apart, eg. 50, 75, 100, 150, 200, 300, 400Hz etc.
The half octave equaliser is used in recording studios to smooth out the musical signal,
and some older recording consoles had them installed in each channel.

5.6.3 One-Third Octave

Where the centre frequencies are 1/3-octave apart, eg. 20, 25, 31.5, 40, 50, 63, 80,
100, 125, 160, 200, 315, 400, 500, 630, and on up to 20kHz. This equaliser provides
31 bands between 20Hz and 20kHz and is generally used for acoustic correction.
However, it is also founds as an outboard device in most studios.

5.6.4 One-Sixth Octave

Where the frequencies are 1/6th octave apart. This 1/6-octave range is only used in the
low frequency range, between 20Hz and 150Hz. This type of unit is exclusively used
fro acoustic room correction, as it offers excellent control in the low frequency area
where studios have most of their problems. There is just one commonly-used 1/6th-
octave unit, manufactured by White.

All the above units come in stereo or mono versions.

Mixing console equalisers are usually of the "combination" variety, which gives them the
greatest number of equalisation possibilities. There is no such thing as the "best"
mixing console equaliser:

146
SCHOOL OF AUDIO ENGINEERING A08– Signal Processors

Student Notes

6. Practical Equalisation

The equaliser is the engineer's tuning instrument. Used correctly, the equaliser can blend
sounds together. The parametric equaliser, although the most versatile, can also cause the
most damage. It has a tendency to smooth out the uneven response of most control room
monitor systems, making your mix sound good in one control room by totally different in a
second control room with a different monitor system. Control rooms often have a problem
known as an acoustic well which "swallows up" a certain frequency range. If the engineer, with
the help of the parametric equaliser, boosts frequencies which are in this acoustic well, there
will be no apparent increase of these frequencies in the monitor system. However this boost
will be recognized by the tape and recorded, and can only be heard in other listening
environments.

Too much equalisation can also destroy "natural" blending of instruments' waveforms. Let's
suppose that you have a drum kit which sounds good to the ear in the studio. As an engineer,
you place microphones next to all the various drums, and you equalise these with parametric
equaliser so that each sounds good on its own.

When they are mixed, it is likely that the natural waveforms will have been changed and the kit
no longer sounds good as a whole. Much of today's recording involves different studios and
engineers for original recording and for mixdown. If one engineer, during recording, destroys
these natural waveforms with bad equalisation, the second engineer, during mixdown, cannot
repair the damage. Equalisation is always done by ear, and not by memorising some
frequency setting and applying them in all situations. Below is a guide to frequency ranges
and their characteristics from the point of view of equalisation.

6.1 Extreme low bass: 20 to 60Hz.

These frequencies are more felt than heard. They have very limited application in
modern music recording, other than for effects. If the producer wants these low
frequencies, they should only be applied for short periods of time, which increases
their effectiveness. Too much extreme low frequency content will "muddy up" the
sound and send excessive level to tape. A further problem is that, if the tape is played
on radio, the radio station's limiters will react more severely to a tape with excessive
bass; they will limit this tape more, thereby making it sound quieter.

6.2 Bass: 60 to 300Hz.

This is the frequency range which is most important to the "feel" of music; it also
contains most of the fundamental tones. Excessive boosting of this range will result in
a very "boomy" sound, while excessive cut will result in a very thin sound. It is
important to be aware of phasing problems which can be introduced by the equaliser if
the bandwidth adjustment is too narrow.

Phasing problems can be evaluated by switching the monitor system into mono, or by
the use of a phase meter. Note that the change of equalisation will tend to change the
balance of the rhythm section, and thus the feel of the basic body of the song.

6.4 Lower Midrange: 300 to 2500Hz.

This bandwidth deserves the greatest time and attention in setting EQ, because all
listeners' sound systems will be able to reproduce it. (Tip: If you divide your
equalisation time for each instrument into 100 units you should spend about 50 units

147
SCHOOL OF AUDIO ENGINEERING A08– Signal Processors

Student Notes

in the midrange area, 25 units for the highs and 25 units for the lows). Adding too
much lower midrange will give a "telephone-like" quality, because it tends to mask all
other frequencies. Adding too much around 800Hz will give a horn-like sound.

Be aware that your ears will become fatigued quickly if there is an excess of 1 to 2kHz
in the mix.

6.5 Upper Midrange: 2500Hz to 5000Hz.

The upper midrange is important for acoustic instruments and vocal sounds. Boosting
around 3kHz will add clarity to both without increasing the overall level. Excessive
boost will create a thin and distant sound, and can also cause excessive tape/noise
distortion. Some vocal sounds (eg. e, b, o. m. v) may be masked, while others tend to
become more dominant.

6.6 Presence" range: 5000Hz to 7000Hz.

Boosting this range will make the sound thin and annoying, However, applying a
boost of 3 to 6dB around 5000Hz will add additional clarity to the record. This is a
technique often used by disc cutting engineers. In addition, it creates the impression
of extra loudness on the record. On the other hand, sometimes curring a few dB in
this range will give an instrument a warmer sound.

6.7 High frequency range: 7000Hz to 20000Hz.

Boosting this range when recording acoustic instruments will give "clarity" and
"transparency", but boosting these frequencies after processing through effects units
will tend to add noise and a "hard" sound. boosting cymbals and vocals at around
12kHz will "clean up" their sound. Cutting in this range will make the sound
compressed and unclear.

6.8 Vocal Equalisation

Where the material to be recorded includes vocals, they are naturally the most
important element. So we note here a few special considerations for vocal EQ. The
vocal range of frequencies (including harmonics) is from 40Hz to 10kHz. However,
only a small part of that range is responsible for clarity and intelligibility. Vocal sounds
can be divided into three main areas: fundamentals, vowels and consonants.

Speech fundamentals occur between 125 and 250Hz. Male fundamentals occur
lower, at around 125Óz. The fundamental region is important as it allows us to tell
who is speaking, an its clear transmission is therefore essential as far as the voice
quality is concerned.

Vowels essentially contain the maximum energy and power of the voice, occuring
over the range 350Hz to 2kHz.

Consonants, occuring over the range of 1.5kHz to 4kHz, contain little energy yet they
are essential to intelligibility.

The frequency range from 63Hz to 500Hz carries 60% of the power of the voice yet it
contributes on 5% to the intelligibility. The 500Hz to 1kHz region produces 35%

148
SCHOOL OF AUDIO ENGINEERING A08– Signal Processors

Student Notes

intelligibility, while the range from 1kHz to 8kHz produces just %5 of the power by
60% of the intelligibility.

By rolling off the frequencies and accentuating the range from 1kHz to 5kHz - known
as the "presence band" - the intelligibility and clarity of speech can be improved.

Boosting the low frequencies from 100Hz to 250Hz makes speech boomy or chesty.
a cut in the 150Hz to 500Hz region will make it boxy, hollow or tubelike.

Dips around 500Hz to 1kHz produce hardness, whilst peaks around 1kHz and 3kHz
produce a metallic nasal quality. Dips around 2kHz to 5kHz reduce intelligibility and
make speech "wooly" and lifeless. Peaks in the 5-10kHz band produce sibilance and
a gritty quality.

The spectrum of speech changes noticeably with voice level, and equalisation can
again be used to correct this. The diagram below shows the changes that occur in
speech as the SPL (dBa) changes:

7. Some Uses of Equalisation

Equalisers can serve a multitude of purposes. They are used to reduce noise and hiss, to
compensate for bad microphone positioning, to compensate for monitor characteristics, to
smooth out peals and dips, and create new sounds. They are useful in mixdown, where they
can increase separation of instruments; they can help emphasis certain psych-acoustic
properties; they can equalise telephone lines and through-lines to compensate for losses in
transmission; and they can simulate various acoustic effects. Finally, they can highlight feature
instruments in the final mix, and improve vocal clarity.

The most effective way to use equalisation is to choose the frequency and apply maximum
boost first, then cut back until the desired amount of equalisation has been reached. Our ears
tend to react more easily to boosted sound than to cut sound. It may be necessary sometime
to cut a particular group of frequencies in order to give the impression that others have been
boosted.

Using an equaliser to boost all frequencies will only produce an overall increase of level, not a
change of EQ. It is far simpler to change the level of an instrument with the fader than the
equaliser.

When equalising, the engineer should bear in mind the final use of the material he or she is
mixing. For instance, if the produce is to be used on television, there is no need to concern
yourself with frequencies above 12kHz. Something similar applies to a dance mix where it is
important to concentrate more on frequencies around 100Hz. Of course, we don't always
know the ultimate use of the product, so monitoring on a selection of speakers in the control
room, and at various levels, is always advisable. In the words of the great producer George
Martin: "all you need is ears".

8. Eq Chart

A table showing the frequency ranges of various instruments and their "general equalisation
points. Remember that these stated equalisation frequencies are only guidelines and will have
to be changed according to instrument and musical key (for acoustic instruments) in each
case.

149
SCHOOL OF AUDIO ENGINEERING A08– Signal Processors

Student Notes

Violin Frequency Range (G-C4) 196 to 2100Hz

Overtones Up to 10kHz

Equalisation Warmth around 240Hz

String 2.5kHz

Attack 7-10kHz

Double Bass Frequency Range (E-C1) 41 to 260Hz

Overtones Up to 8kHz

Equalisation Fullness 80 - 100Hz

Body 200Hz

String Noise 2.5kHz

Guitar (A) Frequency Range (E-D3) 41 to 1175Hz

Overtones Up to 12kHz

Equalisation Warmth 240Hz

Clarity 2kHz to 5kHz

Attack 3.5kHz

Trumpet Frequency Range (E-D3) 160 to 1175Hz

Overtones Up to 15kHz

Equalisation Fullness 120Hz to 240Hz

Bell 5kHz

Attack 8kHz

150
SCHOOL OF AUDIO ENGINEERING A08– Signal Processors

Student Notes

Tuba Frequency Range (B2-A1) 29 to 440Hz

Overtones Up to 1.8kHz

Equalisation Fullness 80Hz

Resonance 500Hz

Cut 1.2kHz

Grand Piano Frequency Range (A2-C5) 27 to 4200Hz

Overtones Over 13kHz

Equalisation Warmth 120Hz

Clarity 2.5 to 4kHz

Attack 8kHz

Flute (small) Frequency Range (D2-C5) 587 to 4200Hz

Overtones Around 10kHz

Equalisation Warmth 500-700Hz

Breath 3.2kHz

Air 6kHz

Oboe Frequency Range (B-F2) 247 to 1400Hz

Overtones Up to 12kHz

Equalisation Body 300Hz

Resonance 1.2kHz

Attack 4.5kHz

Clarinet Frequency Range (D-G3) 147 to 1570Hz

151
SCHOOL OF AUDIO ENGINEERING A08– Signal Processors

Student Notes

Overtones Up to 4kHz

Equalisation Bell 300Hz

Harmonics 2.5kHz

Air 5.2kHz

Tympani Frequency Range (D-C) 73 to 130Hz

Overtones Up to 4kHz

Equalisation Warmth 90Hz

Attack 2.0kHz

Air 4.5kHz

Bass Guitar (E) Frequency Range (E-C2) 82Hz to 520Hz

Overtones Up to 8kHz

Equalisation body 80Hz

Warmth 300Hz

String 2.5kHz

Viola Frequency Range (C-C3) 130 to 1050Hz

Overtones Around 8 to 10kHz

Equalisation Fullness 200Hz

String 2.4kHz

Scratch 4.2kHz

Bass Drum Frequency Range Not defined (low)

Overtones Around 4kHz

152
SCHOOL OF AUDIO ENGINEERING A08– Signal Processors

Student Notes

Equalisation Body 120Hz

Box Sound 400Hz

Cut 3.0kHz

Snare Drum Frequency Range Not defined

Overtones Up to 8kHz

Equalisation Body 120 & 240Hz

Hollow 400Hz

Snare 2.5Hz

Cymbals Frequency Range Not defined

Overtones Up to 10kHz

Equalisation Bell 220Hz

Clarity 7.5kHz

Air 10kHz

Toms Frequency Range Not defined

Overtones Up to 3.5kHz

Equalisation Fullness 120Hz (Floor Tom)

240Hz (Hanging Toms)

Cut 5kHz

Guitar (E) Frequency Range (E-G3) 82 to 1570Hz

Overtones 5kHz

Equalisation Fullness 240Hz

153
SCHOOL OF AUDIO ENGINEERING A08– Signal Processors

Student Notes

Warmth 400Hz

String 2.5kHz

DIGITAL SIGNAL PROCESSORS

Microprocessors (DSP chips) programmed with algorithms which perform fast, intensive
calculations on the incoming signal. The input signal is regenerated into a series of closely
spaced digital delays. Through the performance of the digital algorithms, these delays are
organised into defined sets of random delay patterns resulting in different reverb
characteristics ie simulations of different environments.

A ROM or factory memory of a typical digital reverb unit is programmed with a number of
common types of reverbs such as hall, room, plate, stadium etc. Within these presets is more
or less flexibility to adjust various reverb and delay parameters to create user defined
variations.

1. Digital Delay Line (DDL)

The DDL produces a series of discrete repeats of the input signal at regularly spaced and
user-defined time intervals. The DDL converts the signal into a digital PCM (Pulse Code
Modulation) form. This binary signal is fed into a shift register or Buffer where it is temporarily
stored or delayed for a user defined amount of time before being read out of the buffer by a
clocking oscillator.

DDLs may have flexible parameters for configuring different types of delays such as delay
time, number of repeats and amount of feedback or regeneration. Some units have up to six
parallel delays which can be used to produce echo clusters and panning effects. Some
common delay effects are doubling, slap echoes, long spacey echoes, infinite repeat which is
a simple form of sample looping, and flanging.

2. Pulsing

When setting delay and reverb times one must be careful to ensure that these timings don’t
conflict with the tempo of the piece of music. Pulsing is the method of determining which
timings can be used with a particular tempo. Song tempos are expressed in beats per minute
(bpm).

Pulse in milliseconds = 60 x 1000


bpm

A song with a tempo of 120bpm will have a 500 ms pulse for each of its beats. Any time value
based on divisions or multiples of this value will work with the song ie 250ms, 125ms, 63ms,
31ms etc or 1 sec, 1.5 secs, 2 secs. etc.

If tempo is not known the pulse may be determined with a single delay placed on a the time
keeping sound such as snare. Vary the delay time until it falls exactly on the next beat and this
value can be used as the pulse

3. Modulation Parameters

154
SCHOOL OF AUDIO ENGINEERING A08– Signal Processors

Student Notes

3.1 Sweep range, Modulation amount, depth

Determines how much the modulation section or LFO varies the the delay time. A
delay with a 2:1 sweep range will sweep over a 2:1 time interval (5ms to 10ms, 100ms
to 200ms). A wide sweep range or full depth will give dramatic flanging effects. With
Chorus as depth increases, so does the detuning of the sound.

3.2 Modulation Type

Different waveform shapes which are used by the LFO.


Triangle - varies delay time from max to min in a cyclical fashion - often used with
chorus and flange.
Sine - smoother than triangle
Square - switches between two delay times
Random - changes delay times at random

3.3 Modulation Rate

Sets the modulation frequency of the LFO. Typical rates are 0.1Hz (1cycle every 10
secs) to 20Hz. With flanging and chorusing, modulation causes the original pitch to go
flat to a point of maximum flatness, then return to the original pitch and start the cycle
all over. A slower rate produces a slow, gradual detuning. Rate and depth interact with
each other to produce the total amount of pitch change and how often it oscillates. A
faster rate with full depth will sound more out of tune than a slower rate with full depth.

4. Setting up Reverb devices

Setting up functions differ form device to device. The first step to setting a device is to have a
look at the device manual or documentation.

Adjustable Parameters.

Input level: [dBs] Determines the level of the signal at the input stage of the unit.

Most reverb units are mono in stereo out but generate different reflections patterns
for the L + R audio channels. These two signals independent signals is assimilated by
our stereo hearing mechanism enabling us to perceive spaciousness.

The actual input level is program dependent but good starting point is to set the i/p
level as high as possible before distortion. This will obtain best signal to noise ratio.

Pre-Delay:[or Ini. delay -1 ms to 400 ms] Delays a signal before it can get to the
reverb cluster. It indicates the length of time between sound source and onset of the
reverb. This gives the effect of separating the dry from the reverb.

Longer pre-delays time, where there is a slight gap before onset of reverb, will add a
greater sense of clarity to the track.

155
SCHOOL OF AUDIO ENGINEERING A08– Signal Processors

Student Notes

General Pre-delay guidelines: Up tempo Drums/Perc 25 to 50 ms

Ballad drums 40 to 80 ms

Acoustic instr. 45 to 80 ms

Vox 75 to 125ms

Strings 100 to 200 ms

Brass 50 to 100 ms

ER level:[0 to 100%] In simulating ambient spaces there is a distinct echo prevalent at


the beginning of the sound. These are the first or early reflections-or the nearest
reflective surfaces that the direct sound comes into contact with.

Less ER level would make the mix less cluttered.

More ER level on a particular instrument can reinforce the sound of the instrument
which will help project the perceived ÔsizeÕ of the instrument.

Decay time: [0 to 12s] Long decay time simulate larger ambient spaces while shorter
decay times put the instrument in smaller environments.

The general rule regarding decay time is as long as needed but no longer than
necessary.

Longer decay times on each track will reduce instrument definition and lead to a
somewhat cluttered,sluggish mix.

Try to use shorter decay times which will allow each layer[track] in the mix some place
in which to be heard.

Density: [0 to 10] This parameter determines the density o f the reverb reflections -that
is the average amount of time between reflections. Lower settings produces minimum
reverb density and leads to a more spacious sound -one can perceive the individual
echoes between reflections.

Higher settings produces more dense reverb, can be described as closely packed
tight reverb[thicker] as one cannot perceive the gap between the reflections.

Diffusion: [0 to 10] is the complexity of the reflections and how spread out the cluster
is.

If set to minimum complexity, a cleaner reverb effect in produced - you get less of a
diffused reverb field is building up. As the diffusion value is increased, the complexity
of the reflections increases producing a larger more diffused reverb cluster field.

156
SCHOOL OF AUDIO ENGINEERING A08– Signal Processors

Student Notes

Feedback/Regeneration: [%] This is a variable control that sends the output of the
delay unit back into the input . It will create multiple repeats this parameter should be
used moderately as an excessive amount will result in an uncontrollable feedback .[or
a squelling loop]

157
SCHOOL OF AUDIO ENGINEERING A08– Signal Processors

Student Notes

158
AE09 – Microphones

1. Introduction to common Input transducers

2. Transducer types

2.1 Carbon Granules

2.2 Piezo Electric

2.3 Capacitor/electrostatic

2.4 Moving coil (Dynamic)

3. Pick-up or directivity patterns

12.1 Pressure operation

12.2 Pressure Gradient operation (figure-of-eight)

12.3 Cardioids

12.4 Hypercardioid

12.5 Highly Directional Microphones

Microphone Techniques

1. Mono Microphone Techniques

1.1 Distant Microphone Placement

1.2 Close Microphone Placement

1.3 Accent Miking

1.4 Ambient Miking

2. Stereo Miking Techniques

2.1 AB or Spaced Pair

2.2 XY, and Mid-side (MS) or Coincident Pair

2.3 Near Coincident or OSS (Optimal Stereo Sound)

159
SCHOOL OF AUDIO ENGINEERING A09– Microphones

Student Notes

AE09 – MICROPHONES

1. Introduction to common Input transducers

Input transducers are devices for producing an electrical voltage, which is as far as possible, a
replica of the sound waves striking the sensitive part of the Transducer. Examples of input
transducer s are Guitar pickups, contact pick ups, the stylus of a record player and of course a
microphone. The microphone is the main focus of this section but it shoul dbe noted that
transducer basics apply to all input transducers.

Some important characteristics of microphones for professional use are

i. The frequency response should be as flat as possible over the frequency range 20Hz
to 20kHz. In some microphones, there may be some advantages of having a
frequency response that is not flat; an example of this would be a reduced bass
response for public address microphones.
ii. The microphones response to starting transients is important. Essentially, it is a matter
of the diaphragm’s ability to respond rapidly to the almost violent changes in sound
wave pressure, which can sometimes occur at the start of a sound.
iii. Sensitivity – the electrical output should be as high as possible so that the audio signal
produced by the microphone is high relative to system noise produced in other
equipment down the signal path like Mixing consoles.
iv. Self-Generated electrical noise should be low. This is often quoted as the equivalent
acoustic noise (in dBA), which would be present even if a perfect, noiseless
microphone were used.
v. The polar pattern should vary as little as possible with frequency
vi. Environmental influences like humidity and temperature should have little of no effect
on the microphone’s output.
vii. Repair and maintenance should be quick and accessible

2. Transducer types

The basic microphone is made of

A diaphragm – a thin light membrane which moves in the presence of soundwaves so that its
movement follows accurately the sound wave pressure vibrations.

The transducer – The diaphragm is hooked up to the transducer. The transducer, as


mentioned earlier, produces a voltage that is precise analogue of the diaphragm movements.

The casing- the significance of the casing is that it determines the directional response of the
microphone.

The diaphragm and the transducer will have to be looked at together as transducer systems.
Some transducer systems are

160
SCHOOL OF AUDIO ENGINEERING A09– Microphones

Student Notes

2.1 Carbon Granules

Carbon granules are loosely packed in contact with the diaphragm in such a way that
when the diaphragm moves it moves the granules.. This movement of the granules
causes the to be more closely packed and less closely packed which affects the
resistance of the granules and therefore a DC current flowing throught he granules will
be modulated. Used in the past in telephones.

2.2 Piezo Electric

Some crystals such as barium titanate and quartz, create e.m.f. when they are
deformed. By attaching a diaphragm to such a crystal, they can work as a transducer.
This has the disadvantage of high impedance.

2.3 Capacitor/electrostatic

These uses a conducting diaphragm (for example plastic coated with a metal deposit)
positioned near to a rigid metal plate so the the two of them forma capacitor. The
combination of diaphragm and back plate is known as the capsule. Capacitance
changes occur as a result of the diaphragm moving in the presence of sound waves.
The plates have to be charged and there are two ways to charge them. One is to use
a DC source of about 50V the other is to substances called electrets for the
diaphragm. An electret is a substance that almost permanently retains electrical
charge. When use in a microphone, there is no need for the polarizing DC Voltage.

Using a DC source, the most common way is the 48V standard also called T-
Powering. In this system, the 48V DC supply is connected to the microphone
terminals on the console. The DC voltage is carried on the two signal wires of the
microphone cable. This means there is not difference between the two wires. The 0 V
side of the DC supply (Return) is connected to the earth. This connection ensures that
microphones connected the desk that do not use the phantom power will not be
adversely affected by the DC voltage as the net result of the 48V at the Transducer is
zero.

2.4 Moving coil (Dynamic)

This is the most widely used. They depend on the production of e.m.f. in a small
lightweight coil attached to a diaphragm. Movements of the latter as a result of sound
waves pressure, cause alternating voltages to be produced. Such voltages are small.
In the order of 0.5-1mV for speech. The coil is often made aluminum, for lightness,
and the impedance is commonly about 30 ohms. The coil/diaphragm assembly is
heavier than the diaphragm of an electrostatic microphone and this affects the
transient response.

3. Pick-up or directivity patterns

An important characteristic of a microphone is its response to sound arriving from different


directions. This is also called polar pattern. The directivity pattern of polar diagram of a
microphone design depends on how the design allows soundwaves to strike the diaphragm.

12.1 Pressure operation

161
SCHOOL OF AUDIO ENGINEERING A09– Microphones

Student Notes

With this design, sound waves can only strike the diaphragm from the front. This
means the movement of the diaphragm is completely determined by the direct
soundwave pressure. The polar pattern of a pressure-operated microphone is
determined by the frequency and the shape and diameter of microphone. It can be
assumed that sound coming from any direction will be able to strike the diaphragm.
Those coming from 180o will strike the diaphragm by diffraction assuming their
wavelength was large enough. This means obviously that the microphone will show
some directional characteristics as frequency increases.

12.2 Pressure Gradient operation (figure-of-eight)

The basic design of this type of microphone is so that soundwaves have equal access
to both sides of the diaphragm. The microphones theoretically would give equal
outputs for sounds 0o and 180o. But sounds from 90o would result in 0 output
because they will cancel out. A characteristic of pressure gradient microphones is an
exaggerated output at low frequencies when the source is close to the microphone.
This is called proximity effect. Microphones designed with the pressure gradient
principle exhibit this proximity effect.

12.3 Cardioids

This polar pattern does not respond to sounds coming from behind (180o) the
microphone. This is polar pattern is very useful I practice. The carioid pattern is
achieved by using a phase shift principle. An alternative path for soundwaves to the
diaphragm but behind it (180o) is created by creating an aperture in the casing. The
sound waves which come in from this aperture are then delayed by inserting an
acoustic labyrinth in the their path. This delay is such that the sound which originates
from behind the diaphragm and goes into the aperture is delayed long enough for the
same sound which diffracts round the microphone to get to the diaphragm. The
resultant pressure at the diaphragm for sound originating from behind the microphone
will therefore be zero. Cardioid designs are an extension of the pressure gradient
operation and also exhibit proximity effects.

12.4 Hypercardioid

A hypecardioid microphone can be thought of as an intermediate between a pressure


gradient and a cardioid. The main characteristic of this design is that its frontal lobe is
narrower which makes it more directional but at the same time, a rear lobe develops
which means it picks up sound from behind as well, albeit not as much as the
pressure gradient.

12.5 Highly Directional Microphones

Highly directional microphone pick up patterns can be achieved in two ways

i. The use of a parabolic reflector – The use of a parabolic reflector with the
microphone at the focus can create very narrow lobes. But of course this will
only be for frequencies above that with the wavelength equal to the diameter
of the parabolic dish. Below this frequency the system will be more or less
omni-directional. In practice, the highly directional pickup pattern will only start
after about 2kHz.
ii. The use of interference tubes (Gun or Rifle microphones) – A long slotted
tube is placed after the diaphragm. Sounds that arrive off axis of the tube
enter the slots and will arrive at the diaphragm at different times depending on

162
SCHOOL OF AUDIO ENGINEERING A09– Microphones

Student Notes

which part o the tube they enter. There would therefore be a phase
cancellations. The most cancellation occurs for frequencies with wavelengths
less than the acoustic length of the tube. Most tubes are about 50cm long.
This microphone design is used mainly for news gathering for radio and TV
work.

Microphone Techniques

Proper miking technique comes down to choosing the right mic and positioning it properly. All
recording situations will have inherent limitations such as availability of mics, size of the
recording space and unwanted leakage. These considerations coupled with the desired effect
ie what sort of recorded sound you're after, will determine the basic application of miking
techniques.

Microphone Placement is broken down into 2 broad categories: Mono miking or the use of
single or multiple single mics; and Stereo miking which uses pairs of mics to capture the
soundfield in a way which emulates certain binaural features of the ear.

1. Mono Microphone Techniques

The basic idea of mono miking is the collection of various mono sources of sound for
combination in a mix for a simulated stereo effect. Often mono and stereo sources are mixed
together. There 4 Monomiking styles of mic placement directly related to the distance of a
mic from its sound source.

1.1 Distant Microphone Placement

The positioning of one or more mics at 3 feet or more from the sound source. Such a
distance picks up a tonally balanced sound from the instrument or ensemble and also
picks up the acoustic environment ie reflected sound. Using this style provides an
open, live feeling to the sound.

Distant miking is often used on large ensembles such as choirs or orchestras. Mic
placement depends on size of sound source and the reverberent characteristics of the
room. One must try to strike an overall balance between the ensemble and the overall
acoustics.

A problem with distant miking is that reflections from the floor which reach the mic out
of phase with the direct sound will cause frequency cancellations. Moving the mic
closer to the floor reduces the pathlength of reflected sound and raises the frequency
of cancellation. A height of 1/8 to 1/16 inces will keep the lowest cancellation above
10KHz.

The Pressure Zone Mic (PZM) is an electret-condenser mic designed to work on a


boundary such as a wall or a floor. Phase cancellation is eliminated because the mic,
located at the point of reflection, will add direct and reflected sound together resulting
in a smoother overall frequency response. The PZM is therefore well suited for distant
miking applications.

163
SCHOOL OF AUDIO ENGINEERING A09– Microphones

Student Notes

1.2 Close Microphone Placement

The mic is placed 1" to 3' from the source. Only direct, on-axis sound is captured.
Creates a tight present sound quality which effectively excludes the acoustic
environment. Very common technique in studio and live sound reinforcement
applications where lots of unwanted sound (leakage) needs to be excluded. Multitrack
recording often requires that individual instruments be as "clean" as possible when
tracked to tape.

Miking too close may colour the recorded tone quality of a source. Small variations in
distance can drastically alter the way an instrument sounds through the mic. A
common technique when close miking is to search for the instrument's "sweet spot"
by making small adjustments to mic placement near the surface of the instrument.
The sweet spot is where the instrument sounds fullestand richest.

1.3 Accent Miking

A not too close miking technique used to highlight an instrument in an ensemble


which is being picked by distant mikes. The accent mike will add more volume and
presence to the highlighted instrument when mixed together with the main mic.

1.4 Ambient Miking

An ambient mike is placed at such a distance that the reverberent or room sound is
more prominent than the direct signals. The ambeint mic is used to enhance the total
recorded sound in a number of ways:

a. restore natural reverb to a live recording


b. used to pick up audience reaction in a live concert.
c. in a studio, used to add the studio rooms acoustic back in to a close miked
recording

2. Stereo Miking Techniques

The use of two identical microphones to obtain a stereo image in which sound sources can be
identified by location, direction and distance. Stereo miking methods rely on principles similar
to those utilized by the ear/brain to localise a sound source. These methods may be used in
close or distant miking setups to record ensembles, orchestras or individual sound sources live
or in the studio.

2.1 AB or Spaced Pair

The two mics (Omni or cardioid) are placed quite far from each other to preserve a
L/R spread or soundstage. The AB method works on the arrival time differences
between the two mics to obtain the stereo image. This is similar to the ear utilizing
Interaural Arrival Time differences to perceive direction.

164
SCHOOL OF AUDIO ENGINEERING A09– Microphones

Student Notes

In placing AB mics use the 3:1 Rule : the distance between the to mics should be at
least 3 times the distance between the mics and the source. This help mantain phase
integrity between the mics ie less chance of ou-of-phase cancellations occurring.

The AB stereo method can give an exaggerated stereo spread and can suffer from a
perceived hole in the center effect. The sound can be warm and ambient but off
centre sources can seem diffuse ie not properly located.

2.2 XY, and Mid-side (MS) or Coincident Pair

In both these techniques the mic capsules sit on top of each other. The stereo image
is obtained by intensity differences produced by the soundsource on each mic. This
is similar to the Interaural Intensity Difference utilized by the ear. The images are
usually sharp and accurate but the stereo spread can seem narrow.

XY Pair are two cardioid mics (Top angled L ,bottom R) set at an angle of between 90
and 135 degrees. The angle increases the intensity differences and widens the stereo
image. 2 omni mics can be used for more ambiance. The Blumlein Pair uses 2
bidirectional mics set at right angles to each other.

MS Technique utilizes a cardioid and a bidirectional mic. Usually performed with a


decoder box which produces various types of polar patterns which result from this
combination. The side picks up ambient sound while the mid picks up the direct
sound. May also by done on three channels of a console with the 3rd channel
containing the reverse phase of the bidirectional mic.

2.3 Near Coincident or OSS (Optimal Stereo Sound)

Several versions of this method including the ORTF (Office for Radio and TV of
France), NOS and Faulkner. Microphone pair is separated by a distance similar to
that between the 2 ears. Therefore, both level and time differences are used to obtain
the stero image. Uses the best features of AB and XY to produce a soundstage with
sharply focused images and an accurate stero spread.

The Binaural Mic or Dummy Head is a development of the near coincident approach
which mounts 2 mics in ear cavities of a model head. This technique produces
realistic 3D stereo effects which can be heard through headphones.

3
Assignment 3 – AE003

165
AE11 – Digital Technology

1. Advantages of Digital Audio over Analog Audio

2. Binary numbers

3. Conversions of Binary numbers to Decimal and vice versa

4. Sampling

5. Aliasing

6. Quantization

6.1 Quantization Error

6.2 Dither

6.3 S/E Ratio

7. Pulse Code Modulation

8. Linear PCM recording section

8.1 Dither Generator

8.2 Input Low Pass Filter

8.3 Sample and Hold

8.4 Analog to Digital conversion

8.5 Record processing

9. Error protection, correction and concealment

9.1 Modulation process.

10. Digital Audio Reproduction

10.1 Demodulation Circuits

10.2 Reproduction Processing

11. Digital to Analog Conversion

11.1 Output sample and hold

11.2 Output Low-Pass Filter and Oversampling

166
167
SCHOOL OF AUDIO ENGINEERING A11– Digital Technology

Student Notes

AE11 – DIGITAL TECHNOLOGY

Introduction

Audio was previously recorded, stored and reproduced by analog means and mediums.
Now, however, due to the current advance in digital technology, audio can be stored and
reproduced in a digital form. Such examples of this technology will be CD players, DAT
players, digital consoles and digital samplers. For an audio engineer of the current times,
it is vital that he/she must understand the underlying digital theories and concepts of
these technologies that are available.

1. Advantages of Digital Audio over Analog Audio

Digital audio can exists in a non-linear form in comparison with analog audio. This non-
linearity provides more flexibility in audio editing processes. (Analogue editing compared
with a DAW such as Protools)

There is virtually no degradation of original signal when copying. A perfect copy of the
original source can be made in the digital domain whereas in analog domain tape noise
and hiss are added to the duplicates. (Cassette dubbing compared to CD burning)

Digital audio can be stored on more reliable media such as CDs, MDs and hard disks.
These storage mediums have longer life expectancy than analogue storage mediums like
tape do. (CDs last longer than cassette tapes)

Analog Audio Digital Audio


♦ Obvious generation lost ♦ No generation lost
♦ Noise added during copying ♦ No noise added during copying
♦ No perfect copy can be made ♦ Perfect copy can be made
♦ Can only be stored on limited ♦ Can be stored on large number
analog medium of digital medium
♦ Cannot be manipulated by ♦ Can be manipulated by
computer computer

Analogue technologies have more circuitry that adds noise and distortion to the signal
than digital technologies. Therefore digital technology has more dynamic range than
analogue technology. (16bit Digital Audio recorder has a dynamic range of 97.8dB - a
value 30db above the noise figure for most conventional analog tape recorders.)

2. Binary numbers

Binary numbers are used to represent data in the digital realm. In this system, there are
only two states (a "0" state and a "1" state). Binary digits used in digital devices are called
bits (short for Binary digITS). All numbers in the decimal system can be represented by
the binary system. The decimal system is known as base 10 because there are 10

168
SCHOOL OF AUDIO ENGINEERING A11– Digital Technology

Student Notes

numbers (0-9) to represent all the figures in this system. The binary system is known as
base 2 because it has only 2 numbers (0 and 1) to represent all the figures in its system.

Digital devices need a fixed length of binary numbers called a "word". Word length refers
to the number of digits and is fixed in the design of a digital device (00000001- 8-bit word
length). For example the Alesis XT20 uses 20-bit word length.

The following is a further illustration of the above-mentioned principle:

For systems with 2-bit word length there can be only four representations

00

01

10

11

For systems with 3-bit word length there will be eight representations

000

001

010

011

110

111

100

101

This can be calculated by the following formulae

Number of representations = 2 to the power of n (when n= word length)

Therefore a 16-bit system would have, 2 to the power of 16, which is equal to 65536
representations.

169
SCHOOL OF AUDIO ENGINEERING A11– Digital Technology

Student Notes

3. Conversions of Binary numbers to Decimal and vice versa

For example for a 5 digit word can represent all combination of numbers from 0 to 31 (2 to the
power of 5) where:

00000 (Bin) = 0 (Dec)

00001 (Bin) = 1 (Dec)

11111 (Bin) = 31 (Dec)

The first bit in a word is known as the Most Significant Bit (MSB) and the last bit is known as
the Least Significant bit (LSB). One BYTE = eight bits and One NIBBLE = 4 bits.

The word length of a digital device becomes the measure of the "Resolution" of the system
and with digital audio better resolution means better fidelity of reproduction. (24 bit systems are
superior to 16 bit systems)

4. Sampling

In the digital realm, recordings of analog waves are done through periodic sampling. That
means when a sound wave is recorded, snap shots of the wave are taken at different
instances. These snap shots are later examined and given a specific value (a binary
number). This process is called discrete time sampling. The sampling rate of a digital
system is defined as the number of snap shots or samples that are taken in one second.
Therefore a device with a sample rate of 44.1kHz takes 44100 samples per second.

B
i
t

170
SCHOOL OF AUDIO ENGINEERING A11– Digital Technology

Student Notes

According to the Nyquist theorem, S samples per second are needed to completely
represent a waveform with a bandwidth of S/2 Hz, i.e. we must sample at a rate which
twice the highest through put frequency to achieve loss less sampling. Therefore for a
bandwidth of 20Hz-20kHz, one must use a sampling frequency of at least 40 kHz.
Therefore, it is a necessity to send the recording signal through a low pass filter before
the sampling circuit to act in accordance with the Nyquist Theorem.

5. Aliasing

Alias frequencies are produced when the input signal is not filtered effectively to remove all the
frequencies above half of the sampling frequency. This is because there are no longer
adequate samples to represent the deviant high frequencies. The sample continues to
produce samples at a fixed rate, outputting a stream of false information caused by deviant
high frequencies. This false information takes the form of new descending frequencies, which
were not present in the original audio signal. These are called alias or fold over frequencies.
For example if S ids the sampling rate, and F is a frequency higher than half the sampling
frequency, then a new frequency Fa is created where Fa = S-F. The solution to aliasing is to
band limit the input signal at half the sampling frequency using a steep slope HPF called an
anti-aliasing filter.

6. Quantization

Quantization is the measured amplitude of an analog signal at a discrete sample time. The
accuracy of quantization is limited by the system's resolution, which is represented by the word
length used to encode the signal, i.e. the number of bits in a signal such as 8,12, 16, 20, 24
bits. No matter how fine we make the resolution we cannot fully capture the full complexity of
an analog waveform. For the current standards a 16-bit (65563 steps) representation is
acceptable.

171
SCHOOL OF AUDIO ENGINEERING A11– Digital Technology

Student Notes

6.1 Quantization Error

This is the difference between actual analog value at the sample time and the chosen
quantization intervals value i.e. the difference between the actual and measured
values. Quantization error is limited to +- 1/2 interval at the sample time. At the system
output, this error will be contained in the output signal. This error will sound like white
noise that is heard together with the program material. However, there will be no noise
in silent passages of the program. In addition, quantization noises changes with
amplitude, becoming distortion at low levels.

6.2 Dither

Although quantization error occurs at a very low level, its presence must be
considered in hi fidelity music. Particularly at low levels, the error becomes
measurable distortion in reactive with the signal. To fix quantization errors, a low-level
noise called dither is added to the audio signal before the sampling process. Dither
randomises the effect of quantization error. Dither removes the distortion of
quantization error and replaces it with low-level white noise.

6.3 S/E Ratio

Signal to error ratio is closely akin, although not identical to signal to noise ratio.
Whereas signal to noise ratio is used to indicate the overall dynamic range of an
analogue system, the signal to error ratio of a digital device indicates the degree of
accuracy used when encoding a signal's dynamic range with regard to the step
related effects of quantization.

The signal to error ratio of a certain digital device can be formulated as follows:

S/E = 6N + 1.8 (dB)

7. Pulse Code Modulation

A method of encoding digital audio information which uses a carrier wave in the form of a
stream of pulses which represents the digital data. The original analog waveform is sampled
and its amplitude quantized by the analog to digital (A/D) converter. Binary numbers are sent

172
SCHOOL OF AUDIO ENGINEERING A11– Digital Technology

Student Notes

to the storage medium as a series of pulses representing amplitude. If two channels are to be
sampled the PCM data may be multiplexed to form one data stream. The data is processed for
error correction and stored. On playback the bit stream is decoded to recover back the original
amplitude information at proper sample times and the analog waveform is reconstructed by the
digital to analog converter (DAC).

8. Linear PCM recording section

8.1 Dither Generator

An analog noise signal is added to the analog signal coming from the line amplifier.
The dither causes the audio signal to constantly move between quantization levels.
The noise should resemble noise from analog systems, which is very easy on the ear.
Gaussian white noise is often used.

8.2 Input Low Pass Filter

The analog signal is low-pass filtered by a very sharp cut-off filter to band limit the
signal and its entire harmonic content to frequencies below half of the sampling
frequency. The ideal LPF would have a "Brick wall" cut off, but this is very hard to
achieve. In professional recorders with a sampling frequency of 48kHz, the input filters
are usually designed for 20Hz-20kHz. Thus proving a guard band of 2kHz to ensure
the attenuation is sufficient.

8.3 Sample and Hold

The S/H circuit time samples the analog waveform at a fixed periodic rate and holds
the analog value until the A/DC outputs the corresponding digital word. Samples must
be taken precisely at the correct time.

In audio digitization, time information is stored implicitly as samples taken at a fixed


periodic rate, which is accomplished by the S/H circuit. An S/H circuit is essentially
capacitor and a switch. Maintaining absolute time throughout a digital system is
essential. Variations in absolute timing called jitter can create modulation noise.

8.4 Analog to Digital conversion

This is the most critical component of the entire system. The circuit must determine
which quantization increment is closest to the analog waveform's current value, and
output a binary number specifying that level. This is done in less than 20
microseconds. In a 16-bit linear PCM system each of the 65 536 increments must be
evenly spaced throughout the amplitude range so that even the LSBs in the resulting
word are meaningful. Thus the speed and accuracy are the key requirements for an
A/D converter.

8.5 Record processing

After conversion several operations must take place prior to storage:

173
SCHOOL OF AUDIO ENGINEERING A11– Digital Technology

Student Notes

Multiplexing - Digital audio channel data is processed in a single stream. However the
A/DC outputs parallel data, i.e. entire words. The multiplexer converts this parallel
data to serial data.

Data coding - Raw channel is properly encoded to facilitate storage and later
recovery. Several types of coding are applied to modify or supplement the original
data. A synchronisation code is a fixed pattern of bits provided to identify the
beginning of each word as it occurs in the bit stream. Address codes are added to
identify location of data in the recording. Other specifications such as sampling
frequency, table of contents, copyright information, even Time Code can be added.

9. Error protection, correction and concealment

As anyone familiar with analog recording will know, magnetic tape is an imperfect medium. It
suffers from noise and dropouts, which in analog recording are audible. In a digital recording of
binary data, a bit is either correct or wrong, with no intermediate stage. Small amounts of noise
are rejected, but inevitably, infrequent noise impulses cause some individual bits to be in error
(bit errors). Dropouts can cause a larger number of bits in one place to be in error. An error of
this kind is called a burst error. Whatever the medium and whatever the nature of the
mechanism responsible, data are either recovered correctly, or suffer some combination of bit
errors and burst errors. In Compact Disc, random errors can be caused by imperfections in the
moulding process, whereas burst errors are due to contamination or scratching of the CD
surface.

The audibility of a bit error depends upon which bit of the sample is involved. If the LSB of one
sample was in error in a loud passage of music, the effect would be totally masked and no one
could detect it. Conversely, if the MSB of one sample was in error in a quiet passage, no one
could fail to notice the resulting loud transient. Clearly a means is needed to render errors from
the medium inaudible. This is the purpose of error correction.

In binary, a bit has only two states. If it is wrong, it is only necessary to reverse the state and it
must be right. Thus the correction process is trivial and perfect. The main difficulty is in
identifying the bits, which are in error. This is done by coding the data by adding redundant
bits. Adding redundancy is not confined to digital technology: airliners have several engines
and cars have twin braking systems. Clearly the more failures which have to be handled, the
more redundancy is needed. If a four-engined airliner is designed to fly normally with one
engine failed, three of the engines have enough power to reach cruise speed, and the fourth
one is redundant. The amount of redundancy is equal to the amount of failure, which can be
handled. In the case of the failure of two engines, the plane can still fly, but it must slow down;
this is graceful degradation. Clearly the chances of a two-engine failure on the same flight are
remote.

In digital audio, the amount of error, which can be corrected, is proportional to the amount of
redundancy; the samples are returned to exactly their original value. Consequently corrected
samples are inaudible. If the amount of error exceeds the amount of redundancy, correction is
not possible, and, in order to allow graceful degradation, concealment will be used.
Concealment is a process where the value of a missing sample is estimated from those
nearby. The estimated sample value is not necessarily exactly the same as the original, and so
under some circumstances concealment can be audible, especially if it is frequent. However,
in a well-designed system, concealments occur with negligible frequency unless there is an
actual fault or problem.

Concealment is made possible by rearranging or shuffling the sample sequence prior to


recording. This is shown in the following diagram.

174
SCHOOL OF AUDIO ENGINEERING A11– Digital Technology

Student Notes

In cases where the error correction is inadequate, concealment can be used provided that the
samples have been ordered appropriately in the recording. Odd and even samples are
recorded in different places as shown here. As a result an uncorrectable error causes incorrect
samples to occur singly, between correct samples. In the example shown, sample 8 is
incorrect, but samples 7 and 9 are unaffected and an approximation to the value of sample 8
can be had by taking the average value of the two. This interpolated value is substituted for the
incorrect value. Odd-numbered samples are separated from even-numbered samples prior to
recording. The odd and even sets of samples may be recorded in different places, so that an
uncorrectable burst error only affects one set. On replay, the samples are recombined into
their natural sequence, and the error is now split up so that it results in every other sample
being lost. The waveform is now described half as often, but can still be reproduced with some
loss of accuracy. This is better than not being reproduced at all even if it is not perfect. Almost
all digital recorders use such an odd/even shuffle for concealment. Clearly if any errors are
fully correctable, the shuffle is a waste of time; it is only needed if correction is not possible.

In high-density recorders, more data are lost in a given-sized dropout. adding redundancy
equal to the size of a dropout to every code is inefficient. The following figure shows that the
efficiency of the system can be raised using interleaving.

175
SCHOOL OF AUDIO ENGINEERING A11– Digital Technology

Student Notes

Sequential samples from the ADC are assembled into codes, but these are not recorded in
their natural sequence. A number of sequential codes are assembled along rows in a memory.
When the memory is full, it is copied to the medium by reading down columns. On replay, the
samples need to be de-interleaved to return them to their natural sequence. This is done by
writing samples from tape into a memory in columns, and when it is full, the memory is read in
rows. Samples read from the memory are now in their original sequence so there is no effect
on the recording. However, if a burst error occurs on the medium, it will damage sequential
samples in a vertical direction in the de-interleave memory. When the memory is read, a single
large error is broken down into a number of small errors whose size is exactly equal to the
correcting power of the codes and the correction is performed with maximum efficiency.

Interleave, de-interleave, time compression and timebase correction processes cause delay
and this is evident in the time taken before audio emerges after starting a digital machine.
Confidence replay takes place later than the distance between record and replay heads would
indicate. In DASH-format recorders, confidence replay is about one-tenth of a second behind
the input. Synchronous recording requires new techniques to overcome the effect of the
delays.

The presence of an error-correction system means that the audio quality is independent of the
tape/head quality within limits. There is no point in trying to assess the health of a machine by
listening to it, as this will not reveal whether the error rate is normal or within a whisker of
failure. The only useful procedure is to monitor the frequency with which errors are being
corrected, and to compare it with normal figures. Professional digita

Some people claim to be able to hear error correction and misguidedly conclude that the
above theory is flawed. Not all digital audio machines l audio equipment should have an error
rate display. are properly engineered, however, and if the DAC shares a common power
supply with the error-correction logic, a burst of errors will raise the current taken by the logic,
which in turn loads the power supply and interferes with the operation of the DAC. The effect is
harder to eliminate in small battery-powered machines where space for screening and
decoupling components is hard to find, but it is only a matter of design; there is no flaw in the
theory.

Error protection and correction are provided so that the effect of storage defects is minimised.
The data is processed before storage by adding parity bits and check codes, both of which are
redundant data created from the original data to help detect and correct errors. Finally
interleaving is employed in which data is scattered to various locations on the recording
medium.

9.1 Modulation process.

This is the final electronic manipulation of the audio data before storage. Binary code
is not recorded directly, rather a modulated code in the form of a modulation
waveform which is stored and which represents the bit stream.

In the Binary bit stream there is really no way to directly distinguish between the
individual bits. A series of ones and zeros would forma static signal upon playback
and timing information would be lost. Additionally it is inefficient to store binary code
directly onto the medium. It would take too much storage space.

176
SCHOOL OF AUDIO ENGINEERING A11– Digital Technology

Student Notes

Typically, in the modulation process, it is the transition from one level to another rather
than the amplitude levels, which represents the information on the medium. Various
modulation codes have been designed:

Non-return to zero code (NRZ) - 1s and 0s are represented directly as high and low
levels. Used only where synchronisation is externally generated such as video tape
recordings.

Non-return to Zero Inverted code (NRZI) - only 1s are denoted with amplitude
transitions. Thus any flux change in the magnetic medium indicates a 1.

Modified Frequency modulation (MFM) - Sometimes called miller code.

Eight to Fourteen modulation (MFM) - Used for CD storage. Blocks of 8 bits are
translated into blocks of 14-bits using a look-up table. Each one represents a
transition in the medium, which in a CD would mean physical presence of a pit-edge.

10. Digital Audio Reproduction

Digital audio reproduction processes can be compared to the reverse of the digital audio
recording processes.

10.1 Demodulation Circuits

A preamp is required to boost the low-level signal coming off the tape heads. The
waveform is very distorted and only transitions between levels have survived
corresponding to the original recorded signal. A waveform shaper circuit is used to
identify the transitions and reconstruct the modulation code. The modulated data,
whatever its code (EFM, MFM, etc) is then demodulated to NRZ code, that is, a
simple code which amplitude information represents the binary information.

10.2 Reproduction Processing

The production processing circuits are concerned with minimising the effects of data
storage. They accomplish the buffering of data to minimise the effects of mechanical
variations and transport problems (e.g. timing variations, tape stretch, bad head
alignment) in the medium. These timing variations will cause jitter. The reproduction
processing circuits also perform error correction and demulitplexing.

1. The circuits firstly de-interleave the data and assemble it in the correct order.
2. The data is read into a buffer whose output occurs at an accurately controlled
rate thus ensuring precise timing data and nullifying any jitter caused by
mechanical variations in the medium.
3. Using redundancy techniques such as parity and checksums, the data is
checked for errors. When error is too extensive for recovery, error
compensation techniques are used to conceal the error. In extreme cases the
signal will be momentarily switched off.
4. The demulitplexer reconverts the serial data to its parallel form. This circuit
takes single bits and outputs whole simultaneous words.

11. Digital to Analog Conversion

177
SCHOOL OF AUDIO ENGINEERING A11– Digital Technology

Student Notes

The D/AC is the most critical element in the reproduction system - determining how accurately
the digitized signal will be restored to the analog domain. A DAC accepts input digital word and
converts it into an output analog voltage or current.

11.1 Output sample and hold

When the DAC switches from one output voltage to another, false voltage variations
such as switching glitches can occur which will produce audible distortion. The output
circuit acquires a voltage from the DAC only when the circuit has reached a stable
output condition. The S/H circuit holds correct voltage during the intervals when the
DAC switches from samples

Hence false glitches are avoided by the S/H circuitry. It operates like a gate removing
false voltages from the analog stream and like a timing buffer, re-clocking the precise
flow of voltages. Its output is a precise "staircase" analog signal, which resembles the
output of its counterpart in the recording conversion.

11.2 Output Low-Pass Filter and Oversampling

This "anti-imaging" LPF has a design very similar to the input "anti-aliasing" filter.
Oversampling techniques are used in conjunction with this filter.

Oversampling has the effect of further reducing inter modulation and other forms of
distortion. Whenever Oversampling is employed, the effective sampling rate of a
signal-processing block is multiplied by a certain factor - commonly ranging between
12 and 128 times the original rate. This significant increase in the sample rate is
accomplished by interpolating the original sample times. This technique, in effect,
makes educated guesses as to where sample levels would fall at the new sample time
and generate an equivalent digital word for that level.

178
SCHOOL OF AUDIO ENGINEERING A05– Basic Electronics

Student Notes

179
AE12 – Computer Fundamentals

Hardware

1. An Introduction

1.1 Personal Computer

1.2 Workstation

1.3 Minicomputer

1.4 Mainframe

1.5 Supercomputer

2. Chips

3. CPU

4. Microprocessor

5. RISC

6. Coprocessors

7. Bus

7.1 local or system bus

7.2 PCI bus

7.3 NuBus

8. RAM (main memory)

9. Storage - the disk drive

10. Port

10.1 Serial Port

10.2 Parallel Port

10.3 SCSI

APPLE COMPUTER

1. Macintosh Computer

1.1 PowerPC

180
Software

1. Data

2. Instruction

3. Program

4. Software

5. Operating System (OS)

5.1 Memory Management

6. System (7.5 and up)

7. Desktop

8. Top

9. File Management System

10. Clipboard

11. Application

181
SCHOOL OF AUDIO ENGINEERING A10– Computer Fundamentals

Student Notes

AE12 – COMPUTER FUNDAMENTALS

A computer is a programmable machine. The two principal characteristics of a computer are:

•It responds to a specific set of instructions in a well-defined manner.

•It can execute a prerecorded list of instructions (a program).

Modern computers are electronic and digital. The actual machinery -- wires, transistors, and
circuits -- is called hardware; the instructions and data are called software.

Hardware

1. An Introduction

All general-purpose computers require the following hardware components:

• memory: Enables a computer to store, at least temporarily, data and programs.


• mass storage device: Allows a computer to permanently retain large amounts of
data. Common mass storage devices include disk drives and tape drives.
• input device: Usually a keyboard or mouse, the input device is the conduit
through which data and instructions enter a computer.
• output device: A display screen, printer, or other device that lets you see what
the computer has accomplished.
• central processing unit (CPU): The heart of the computer, this is the component
that actually executes instructions.

In addition to these components, many others make it possible for the basic components to
work together efficiently. For example, every computer requires a bus that transmits data from
one part of the computer to another.

Computers can be classified by size and power as follows:

1.1 Personal Computer

A small, single-user computer based on a microprocessor. In addition to the


microprocessor, a personal computer has a keyboard for entering data, a monitor for
displaying information, and a storage device for saving data.

1.2 Workstation

A powerful, single-user computer. A workstation is like a personal computer, but it has


a more powerful microprocessor and a higher-quality monitor.

1.3 Minicomputer

A multi-user computer capable of supporting from 10 to hundreds of users


simultaneously.

182
SCHOOL OF AUDIO ENGINEERING A10– Computer Fundamentals

Student Notes

1.4 Mainframe

A powerful multi-user computer capable of supporting many hundreds of users


simultaneously.

1.5 Supercomputer

An extremely fast computer that can perform hundreds of millions of instructions per
second.

Hardware Refers to objects that you can actually touch, like disks, disk drives, display
screens, keyboards, printers, boards, and chips. In contrast, software is untouchable. Software
exists as ideas, concepts, and symbols, but it has no substance.

Books provide a useful analogy for describing the difference between software and hardware.
The pages and the ink are the hardware, while the words, sentences, paragraphs, and the
overall meaning are the software.

A computer without software is like a book full of blank pages you need software to make the
computer useful just as you need words to make a book meaningful.

2. Chips

A small piece of semiconducting material (usually silicon) on which an integrated circuit is


embedded. A typical chip can contain millions of electronic components (transistors).
Computers consist of many chips placed on electronic boards called printed circuit boards.

There are different types of chips. For example, CPU chips (also called microprocessors)
contain an entire processing unit, whereas memory chips contain blank memory.

• Chips come in a variety of packages. The three most common are:


• DIPs: Dual in-line packages are the traditional buglike chips that have anywhere from
8 to 40 legs, evenly divided in two rows.
• PGAs: Pin-grid arrays are square chips in which the pins are arranged in concentric
squares.
• SIPs: Single in-line packages are chips that have just one row of legs in a straight line
like a comb.

In addition to these types of chips, there are also single in-line memory modules (SIMMs),
which consist of up to nine chips packaged as a single unit.

3. CPU

Abbreviation of central processing unit, and pronounced as separate letters. The CPU is the
brains of the computer. Sometimes referred to simply as the processor or central processor,
the CPU is where most calculations take place. In terms of computing power, the CPU is the
most important element of a computer system.

n large machines, CPUs require one or more printed circuit boards. On personal computers
and small workstations, the CPU is housed in a single chip called a microprocessor.

Two typical components of a CPU are:

183
SCHOOL OF AUDIO ENGINEERING A10– Computer Fundamentals

Student Notes

• The arithmetic logic unit (ALU), which performs arithmetic and logical operations such
as addition and multiplication, and all comparison operations.
• The control unit, which extracts instructions from memory and decodes and executes
them, calling on the ALU when necessary.

4. Microprocessor

A silicon chip that contains a CPU. In the world of personal computers, the terms
microprocessor and CPU are used interchangeably. At the heart of all personal computers and
most workstations sits a microprocessor.

Microprocessors also control the logic of almost all digital devices, from clock radios to fuel-
injection systems for automobiles.

Two basic characteristics differentiate microprocessors:

• bandwidth: or Internal Architecture. The number of bits processed in a single


instruction or the size of the data word that the microprocessor can hold in one of its
registers. eg 8-bit, 16-bit, 32-bit, 64-bit. A 16-bit processoe can process 2 pices of 8-bit
data in parallel.
• clock speed: Microprocessors are driven by a crystal clock. Given in megahertz
(MHz), the clock speed determines how many instructions per second the processor
can execute. CPU speed is also measured in MIPS (millions of instructions per
second) and Mflops (millions of floating point operations per second)

In both cases, the higher the value, the more powerful the CPU. For example, a 32-bit
microprocessor that runs at 50MHz is more powerful than a 16-bit microprocessor that runs at
25MHz.

In addition to bandwidth and clock speed, microprocessors are classified as being either RISC
(reduced instruction set computer) or CISC (complex instruction set computer).

5. RISC

Pronounced risk, acronym for reduced instruction set computer, a type of microprocessor that
recognizes a relatively limited number of instructions. Until the mid-1980s, the tendency among
computer manufacturers was to build increasingly complex CPUs that had ever-larger sets of
instructions. At that time, however, a number of computer manufacturers decided to reverse
this trend by building CPUs capable of executing only a very limited set of instructions. One
advantage of reduced instruction set computers is that they can execute their instructions very
fast because the instructions are so simple.

Another, perhaps more important advantage, is that RISC chips require fewer transistors,
which makes them cheaper to design and produce. Since the emergence of RISC computers,
conventional computers have been referred to as CISCs (complex instruction set computers).

There is still considerable controversy among experts about the ultimate value of RISC
architectures. Its proponents argue that RISC machines are both cheaper and faster, and are
therefore the machines of the future.

Skeptics note that by making the hardware simpler, RISC architectures put a greater burden
on the software. They argue that this is not worth the trouble because conventional
microprocessors are becoming increasingly fast and cheap anyway.

184
SCHOOL OF AUDIO ENGINEERING A10– Computer Fundamentals

Student Notes

To some extent, the argument is becoming moot because CISC and RISC implementations
are becoming more and more alike. Many of today's RISC chips support as many instructions
as yesterday's CISC chips. And today's CISC chips use many techniques formerly associated
with RISC chips.

6. Coprocessors

Processors that handle specialised tasks freeing up the main processor to overseeing the
entire operation. The Floating Point Unit (FPU) is a math coprocessor specifically designed
to crunch non-integer and exponential values (eg used for rendering 3D graphics and
animation).

7. Bus

A collection of wires through which data is transmitted from one part of a computer to another.
You can think of a bus as a highway on which data travels within a computer. When used in
reference to personal computers, the term bus usually refers to internal bus. This is a bus that
connects all the internal computer components to the CPU and main memory. There's also an
expansion bus that enables expansion boards to access the CPU and memory.

All buses consist of two parts -- an address bus and a data bus.

The data bus transfers actual data.

the address bus transfers information about where the data should go ie which memory
location is being addressed.

The size of a bus, known as its width, is important because it determines how much data can
be transmitted at one time. For example, a 16-bit bus can transmit 16 bits of data, whereas a
32-bit bus can transmit 32 bits of data.

Every bus has a clock speed measured in MHz. A fast bus allows data to be transferred faster,
which makes applications run faster. On PCs, the old ISA bus is being replaced by faster
buses such as PCI.

Control Lines - A sort of traffic cop for data, specifying the functions associated with data and
address lines.

7.1 local or system bus

Many PCs made today include a local bus for data that requires especially fast
transfer speeds, such as video data. The local bus is a high-speed pathway that
connects directly to the processor.

The local bus is a data bus that connects directly, or almost directly, to the
microprocessor. Although local buses can support only a few devices, they provide
very fast throughput. Most modern PCs include both a local bus, for video data, as
well as a more general expansion bus for other devices that do not require such fast
data throughput.

Several different types of buses are used on Apple Macintosh computers. Older Macs
use a bus called NuBus, but newer ones use PCI.

185
SCHOOL OF AUDIO ENGINEERING A10– Computer Fundamentals

Student Notes

7.2 PCI bus

Acronym for Peripheral Component Interconnect, a local bus standard developed by


Intel Corporation. Most modern PCs include a PCI bus in addition to a more general
ISA expansion bus. Many analysts, however, believe that PCI will eventually supplant
ISA entirely. PCI is also used on newer versions ofthe Macintosh computer.

PCI is a 32-bit bus, but supports a 64-bit extension for new processors, such as the
Pentium. It can run at clock speeds of 33 or 66 MHz. At 32 bits and 33 MHz, it yields a
throughput rate of 133 MBps. 64-bit implementations running at 66 MHz provide 524
MBps.

Although it was developed by Intel, PCI is not tied to any particular family of
microprocessors.

7.3 NuBus

The expansion bus for versions of the Macintosh computers starting with the
Macintosh II and ending with the Performa. Current Macs use the PCI bus.

8. RAM (main memory)

Pronounced ramm, acronym for random access memory, a type of computer memory that can
be accessed randomly; that is, any byte of memory can be accessed without touching the
preceding bytes. RAM is the most common type of memory found in computers and other
devices, such as printers.

There are two basic types of RAM:

• dynamic RAM (DRAM)


• static RAM (SRAM)

The two types differ in the technology they use to hold data, dynamic RAM being the more
common type. Dynamic RAM needs to be refreshed thousands of times per second. Static
RAM needs to be refreshed less often, which makes it faster; but it is also more expensive
than dynamic RAM. Both types of RAM are volatile, meaning that they lose their contents
when the power is turned off.

In common usage, the term RAM is synonymous with main memory, the memory available to
programs. For example, a computer with 8M RAM has approximately 8 million bytes of
memory that programs can use. In contrast, ROM (read-only memory ) refers to special
memory used to store programs that boot the computer and perform diagnostics. Most
personal computers have a small amount of ROM (a few thousand bytes). In fact, both types
of memory (ROM and RAM) allow random access. To be precise, therefore, RAM should be
referred to as read/write RAM and ROM as read-only RAM.

9. Storage - the disk drive

A disk drive is a machine that reads data from and writes data onto a disk. A disk drive
resembles a stereo turntable in that it rotates the disk very fast. It has one or more heads that
read and write data.

186
SCHOOL OF AUDIO ENGINEERING A10– Computer Fundamentals

Student Notes

The disk consists of concentric rings called tracks. Also, the surface of the disk is cross-
sectioned into wedge-shaped sectors. The areas cross-referenced by the intersection of
tracks and sectors are called blocks.

There are different types of disk drives for different types of disks.

For example, a hard disk drive (HDD) reads and writes hard disks, and a floppy drive (FDD)
accesses floppy disks. hard disks are faster and can hold a lot more data than floppies. A
magnetic disk drive reads magnetic disks, and an optical drive reads optical disks.

Disk drives can be either internal (housed within the computer) or external (housed in a
separate box that connects to the computer). A PC will have at least one internal HDD with a
least 1 Gigabyte of storage space.

10. Port

An interface on a computer to which you can connect a device. Personal computers have
various types of ports. Internally, there are several ports for connecting disk drives, display
screens, and keyboards. Externally, personal computers have ports for connecting modems,
printers, mice, and other peripheral devices.

Almost all personal computers come with a serial RS-232C port or RS-422 port for connecting
a modem or mouse and a parallel port for connecting a printer. On PCs, the parallel port is a
Centronics interface that uses a 25-pin connector. SCSI (Small Computer System Interface)
ports support higher transmission speeds than do conventional ports and enable you to attach
up to seven devices to the same port. All Apple Macintosh computers since the Macintosh
Plus have a SCSI port.

10.1 Serial Port

A port, or interface, that can be used for serial communication, in which only 1 bit is
transmitted at a time. Serial data transfer refers to transmitting data one bit at a time.
The opposite of serial is parallel, in which several bits are transmitted concurrently.

Most serial ports on personal computers conform to the RS-232C or RS-422


standards. A serial port is a general-purpose interface that can be used for almost any
type of device, including modems, mice, and printers (although most printers are
connected to a parallel port).

10.2 Parallel Port

A parallel interface for connecting an external device such as a printer. Most personal
computers have both a parallel port and at least one serial port.

On PCs, the parallel port uses a 25-pin connector (type DB-25) and is used almost
exclusively to connect printers. It is often called a Centronics interface after the
company that designed the original standard for parallel communication between a
computer and printer. (The modern parallel interface is based on a design by Epson.)

A newer type of parallel port, which supports the same connectors as the Centronics
interface, is the EPP (Enhanced Parallel Port) or ECP (Extended Capabilities Port).
Both of these parallel ports support bi-directional communication and transfer rates
ten times as fast as the Centronics port.

187
SCHOOL OF AUDIO ENGINEERING A10– Computer Fundamentals

Student Notes

Macintoshes have a SCSI port, which is parallel, but more flexible. They are used for
many types of communication in addition to connecting printers

10.3 SCSI

Abbreviation of small computer system interface. Pronounced scuzzy, SCSI is a


parallel interface standard used by Apple Macintosh computers, some PCs, and many
UNIX systems for attaching peripheral devices to computers. All Apple Macintosh
computers starting with the Macintosh Plus come with a SCSI port for attaching
devices such as disk drives and printers.

SCSI interfaces provide for faster data transmission rates (up to 40 megabytes per
second) than standard serial and parallel ports. In addition, you can attach many
devices to a single SCSI port, so that SCSI is really an I/O bus rather than simply an
interface.

Although SCSI is an ANSI standard, there are many variations of it, so two SCSI
interfaces may be incompatible. For example, SCSI supports several types of
connectors.

While SCSI is the only standard interface for Macintoshes, PCs support a variety of
interfaces in addition to SCSI. These include IDE, enhanced IDE and ESDI for mass
storage devices, and Centronics for printers. You can, however, attach SCSI devices
to a PC by inserting a SCSI board in one of the expansion slots. Many high-end new
PCs come with SCSI built in. Note, however, that the lack of a single SCSI standard
means that some devices may not work with some SCSI boards.

The following varieties of SCSI are currently implemented:

• SCSI: Uses an 8-bit bus, and supports data rates of 4 MBps


• Fast SCSI: Uses an 8-bit bus, and supports data rates of 10 MBps.
• Ultra SCSI: Uses an 8-bit bus, and supports data rates of 20 MBps.
• Fast Wide SCSI: Uses a 16-bit bus and supports data rates of 20 MBps.
• Ultra Wide SCSI: Uses a 16-bit bus and supports data rates of 40 MBps. Also called
SCSI-3.

APPLE COMPUTER

A personal computer company founded in 1976 by Steven Jobs and Steve Wozniak.
Throughout the history of personal computing, Apple has been one of the most innovative
influences. In fact, some analysts say that the entire evolution of the PC can be viewed as an
effort to catch up with the Apple Macintosh.

In addition to inventing new technologies, Apple also has often been the first to bring
sophisticated technologies to the personal computer.

Apple's innovations include:

188
SCHOOL OF AUDIO ENGINEERING A10– Computer Fundamentals

Student Notes

• Graphical user interface (GUI). First introduced in 1983 on its Lisa computer. Many
components of the Macintosh GUI have become de facto standards and can be found
in other operating systems, such as Microsoft Windows.
• Color. The Apple II, introduced in 1977, was the first personal computer to offer color
monitors.
• Built-in networking. In 1985, Apple released a new version of the Macintosh with built-
in support for networking (LocalTalk).
• Plug & play expansion. In 1987, the Mac II introduced a new expansion bus called
NuBus that made it possible to add devices and configure them entirely with software.
• QuickTime. In 1991, Apple introduced QuickTime, a multi-platform standard for video,
sound, and other multimedia applications.
• Integrated television. In 1993, Apple released the Macintosh TV, the first personal
computer with built-in television and stereo CD.
• RISC. In 1994, Apple introduced the Power Mac, based on the PowerPC RISC
microprocessor.

1. Macintosh Computer

A popular model of computer made by Apple Computer. Introduced in 1984, the Macintosh
features a graphical user interface (GUI) that utilizes windows, icons, and a mouse to make it
relatively easy for novices to use the computer productively. Rather than learning a complex
set of commands, you need only point to a selection on a menu and click a mouse button.

Moreover, the GUI is embedded into the operating system. This means that all applications
that run on a Macintosh computer have a similar user interface. Once a user has become
familiar with one application, he or she can learn new applications relatively easily.

The Macintosh family of computers is not compatible with the IBM family of personal
computers. They have different microprocessors and different file formats. This can make it
difficult to share data between the two types of computers. Increasingly, however, many
software companies are producing Mac versions of their products that can read files produced
by a Windows version of the software, and vice versa.

Since the Macintosh interface's arrival on the marketplace and its enthusiastic acceptance by
customers, numerous software producers have produced similar interfaces. For example,
Microsoft offers a Mac -like GUI for PCs called Windows.

There are many different Macintosh models, with varying degrees of speed and power. All
models are available in many different configurations -- different monitors, disk drives, and
memory. All older Macintosh computers use a microprocessor from the Motorola 68000 family,
but in 1994 Apple switched to the PowerPC microprocessor. PowerMacs can also run
programs written for the Motorola processors.

1.1 PowerPC

A RISC-based computer architecture developed jointly by IBM, Apple Computer, and


Motorola Corporation. The name is derived from IBM's name for the architecture,
Performance Optimization With Enhanced RISC.

The first computers based on the PowerPC architecture were the Power Macs, which
appeared in 1994. Since then, other manufacturers, including IBM, have built PCs
based on the PowerPC. Although the initial reviews have been good, it remains to be

189
SCHOOL OF AUDIO ENGINEERING A10– Computer Fundamentals

Student Notes

seen whether this new architecture can eventually supplant, or even coexist, with the
huge number of Intel -based computers in use and on the market.

There are already a number of different operating systems that run on PowerPC-
based computers, including the Macintosh operating system (System 7.5 and higher),
Windows NT, and OS/2.

Software

1. Data

Distinct pieces of information, usually formatted in a special way. All software is divided into
two general categories: data and programs. Programs are collections of instructions for
manipulating data.

Data can exist in a variety of forms -- as numbers or text on pieces of paper, as bits and bytes
stored in electronic memory, or as facts stored in a person's mind.

Strictly speaking, data is the plural of datum, a single piece of information. In practice,
however, people use data as both the singular and plural form of the word.

The term data is often used to distinguish binary machine-readable information from textual
human-readable information. For example, some applications make a distinction between data
files (files that contain binary data) and text files (files that contain ASCII data).

2. Instruction

A basic command. The term instruction is often used to describe the most rudimentary
programming commands. For example, a computer's instruction set is the list of all the basic
commands in the computer's machine language.

3. Program

An organized list of instructions that, when executed, causes the computer to behave in a
predetermined manner. Without programs, computers are useless.

A program is like a recipe. It contains a list of ingredients (called variables) and a list of
directions (called statements) that tell the computer what to do with the variables. The
variables can represent numeric data, text, or graphical images.

There are many programming languages -- C, C++, Pascal, BASIC, FORTRAN, COBOL, and
LISP are just a few. These are all high-level languages. One can also write programs in low-
level languages called assembly languages, although this is more difficult. Low-level
languages are closer to the language used by a computer, while high-level languages are
closer to human languages.

Eventually, every program must be translated into a machine language that the computer can
understand. This translation is performed by compilers, interpreters, and assemblers.

190
SCHOOL OF AUDIO ENGINEERING A10– Computer Fundamentals

Student Notes

When you buy software, you normally buy an executable version of a program. This means
that the program is already in machine language -- it has already been compiled and
assembled and is ready to execute.

4. Software

Computer instructions or data. Anything that can be stored electronically is software. The
storage devices and display devices are hardware.

The terms software and hardware are used as both nouns and adjectives. For example, you
can say: "The problem lies in the software," meaning that there is a problem with the program
or data, not with the computer itself. You can also say: "It's a software problem."

The distinction between software and hardware is sometimes confusing because they are so
integrally linked. Clearly, when you purchase a program, you are buying software. But to buy
the software, you need to buy the disk (hardware) on which the software is recorded.

Software is often divided into two categories:

• systems software: Includes the operating system and all the utilities that enable the
computer to function.
• applications software: Includes programs that do real work for users. For example,
word processors, spreadsheets, and database management systems fall under the
category of applications software.

5. Operating System (OS)

The most important program that runs on a computer. Every general-purpose computer must
have an operating system to run other programs. Operating systems perform basic tasks, such
as recognizing input from the keyboard, sending output to the display screen, keeping track of
files and directories on the disk, and controlling peripheral devices such as disk drives and
printers.

For large systems, the operating system has even greater responsibilities and powers. It is like
a traffic cop -- it makes sure that different programs and users running at the same time do not
interfere with each other. The operating system is also responsible for security, ensuring that
unauthorized users do not access the system.

Operating systems can be classified as follows:

• Multi-user: Allows two or more users to run programs at the same time. Some
operating systems permit hundreds or even thousands of concurrent users.
• Multiprocessing: Supports running a program on more than one CPU.
• Multitasking: Allows more than one program to run concurrently.
• Multithreading: Allows different parts of a single program to run concurrently.
• Real-time: Responds to input instantly. General-purpose operating systems, such as
DOS and UNIX, are not real-time.

Operating systems provide a software platform on top of which other programs, called
application programs, can run. The application programs must be written to run on top of a
particular operating system. Your choice of operating system, therefore, determines to a great
extent the applications you can run. For PCs, the most popular operating systems are DOS,
OS/2, and Windows, but others are available, such as Xenix.

191
SCHOOL OF AUDIO ENGINEERING A10– Computer Fundamentals

Student Notes

As a user, you normally interact with the operating system through a set of commands. For
example, the DOS operating system contains commands such as COPY and RENAME for
copying files and changing the names of files, respectively. The commands are accepted and
executed by a part of the operating system called the command processor or command line
interpreter. Graphical user interfaces allow you to enter commands by pointing and clicking at
objects that appear on the screen

5.1 Memory Management

The OS uses a memory map to keep applications and files from conflicting. When it
runs an application the OS looks at the memory map to determine where to place it,
allocates enough RAM and copies the program into memory. The OS the handles the
request to open, close and save files with an application.

6. System (7.5 and up)

On Macintoshes, System is short for System file, an essential program that runs whenever you
start up a Macintosh. The System provides information to all other applications that run on a
Macintosh. The System and Finder programs together make up the Macintosh operating
system.

7. Desktop

In graphical user interfaces, a desktop is the metaphor used to portray file systems. Such a
desktop consists of pictures, called icons, that show cabinets, files, folders, and various types
of documents (that is, letters, reports, pictures). You can arrange the icons on the electronic
desktop just as you can arrange real objects on a real desktop -- moving them around, putting
one on top of another, reshuffling them, and throwing them away.

8. Top

The desktop management and file management system for Apple Macintosh computers. In
addition to managing files and disks, the Finder is responsible for managing the Clipboard and
Scrapbook and all desktop icons and windows.

9. File Management System

The system that an operating system or program uses to organize and keep track of files. For
example, a hierarchical file system is one that uses directories to organize files into a tree
structure. The OS creates and maintains a directory or file allocation table.

Although the operating system provides its own file management system, you can buy
separate file management systems. These systems interact smoothly with the operating
system but provide more features, such as improved backup procedures and stricter file
protection.

10. Clipboard

A special file or memory area (buffer) where data is stored temporarily before being copied to
another location. Many word processors, for example, use a clipboard for cutting and pasting.
When you cut a block of text, the word processor copies the block to the clipboard; when you
paste the block, the word processor copies it from the clipboard to its final destination. In

192
SCHOOL OF AUDIO ENGINEERING A10– Computer Fundamentals

Student Notes

Microsoft Windows and the Apple Macintosh operating system, the Clipboard (with a capital C)
can be used to copy data from one application to another.

The Macintosh uses two types of clipboards. The one it calls the Clipboard can hold only one
item at a time and is flushed when you turn the computer off. The other, called the Scrapbook,
can hold several items at once and retains its contents from one working session to another.

11. Application

A program or group of programs designed for end users. Software can be divided into two
general classes: systems software and applications software . Systems software consists of
low-level programs that interact with the computer at a very basic level. This includes operating
systems, compilers, and utilities for managing computer resources.

In contrast, applications software (also called end-user programs ) includes database


programs, word processors, and spreadsheets. Figuratively speaking, applications software
sits on top of systems software because it is unable to run without the operating system and
system utilities.

193
SCHOOL OF AUDIO ENGINEERING A10– Computer Fundamentals

Student Notes

194
AE13 – Music Theory II

Music Theory – Intermediate Level

1. Major scales

1.1 Major scales with sharps; the concept of key

1.2 Major scales with flats

2. Natural minor scales and the concept of relative keys

2.1 Harmonic minor and melodic minor scales

3. Memorising key signatures

4. Intervals

4.1 Major intervals and perfect intervals

4.2 Minor intervals

4.2 Augmented and diminished intervals

5. Chords built from major scales

5.1 Primary chords in major keys

Musical Instrument and the Orchestra

1. Stings Section

2. Woodwind Section

3. The Brass Section

4. Transposing Instrument

4.1 Transposing Instruments in the Woodwind Section:

4.2 Transposing instruments in the brass section:

5. The Percussion Section

5.1 Pitch Percussion Instruments

5.2 Unpitched Percussion Instruments

History of Western Art Music

195
1. Middle Ages (450-1450)

1.1 Social background

1.2 Sacred Music of the Middle ages

1.3 Secular Music In The Middle Ages

2. Renaissance (1450-1600)

2.1 Characteristics of Renaissance Music

2.1 Sacred Music

2.3 Secular Music

3. Baroque (1600-1750)

3.1 Characteristics of Baroque music

4. Classical (1750-1820)

4.1 Characteristics of the classical style

4.1 Important composers of the classical period

5. Romantic (1820-1900)

5.1 Characteristics of romantic music

5.2 Important composers of the romantic period

6. Twentieth century

6.1 Characteristic of twentieth century music

6.2 Important composers of the twentieth century

196
SCHOOL OF AUDIO ENGINEERING AE13 – Music Theory II

Student Notes

AE13 – MUSIC THEORY II

Music Theory – Intermediate Level

1. Major scales

A major scale consists of eight notes covering one octave, and follows the pattern of tones and
semitones illustrated below. A major scale can begin on any pitch, but you can hear the pattern
by playing the white notes of the piano keyboard from C to C an octave higher.

If you examine the major scale closely, you can see that the pattern of tones and semitones
between the first four notes is exactly the same as the pattern of tones and semitones between
the last four notes-that is tone-tone-semitone*.

* A four-note sequence following this pattern is sometimes described as a 'major tetrachord'.

We can therefore build a series of major scales by adding successive four-note sequences.

197
SCHOOL OF AUDIO ENGINEERING AE13 – Music Theory II

Student Notes

1.1 Major scales with sharps; the concept of key

If you look closely at the previous example you will notice some patterns.

If we number the notes of a scale 1,2,3,4,5,6,7,8 then the new scale always starts on
note 5 of the old scale (e.g. C major is followed by G major-and G is the fifth letter in a
musical sequence which begins on C-C D E F G).

Each new scale contains one more sharp than the previous scale.

Each new sharp is also five notes away from the previous sharp. For example, G
major scale has one sharp (F#) while the next major scale (D major) has an
additional sharp (C#) which is five away from F#.

If a piece of music is based around the notes of a particular major scale we say that
the music is in that key. For instance, if a melody is based around the notes from the
D major scale, we say that the melody is in the key of D major, and that D note is the
tonic. Since in the key of D major the F note and the C note will always be sharpened,
we can simplify our notation by writing a key signature.

A key signature

The key signature is a grouping of the sharps or flats. The key signature is displayed
at the beginning of each staff, and indicates which notes are to be sharpened each
time they are played. This saves us the trouble of having to write a sharp against the
note whenever it occurs within the music.

198
SCHOOL OF AUDIO ENGINEERING AE13 – Music Theory II

Student Notes

It is possible that the music might also be in a related minor key-this is discussed
later.

Below is a list of major scales with sharps, written out with appropriate key signatures.

1.2 Major scales with flats

If you recall our discussion of enharmonic notes you will remember that the note F# is
exactly the same as the note Gb. It is therefore possible to think of an F# major scale
as being a Gb major scale instead. If we make this enharmonic change, then when we
continue on with our tetrachord-adding process we arrive at a series of keys which
require a progressively decreasing number of flats-rather than an ever increasing
number of sharps and even double-sharps!

199
SCHOOL OF AUDIO ENGINEERING AE13 – Music Theory II

Student Notes

As is the case with 'sharp' keys, it is possible to use a key signature with 'flat' keys.
Below is a list of major scales with flats, written out with appropriate key signatures.

2. Natural minor scales and the concept of relative keys

A natural minor scale consists of eight notes covering one octave, and follows the pattern of
tones and semitones illustrated below. A natural minor scale can start from any pitch, but you
can hear the pattern by playing the white notes of the piano keyboard from A to A an octave

higher.

200
SCHOOL OF AUDIO ENGINEERING AE13 – Music Theory II

Student Notes

If we examine any given major scale, we will discover that there is a natural minor scale which
shares exactly the same pattern of sharps or flats. For instance G major scale (with one
sharp-F#) contains the same notes as E natural minor scale. The only difference between the
scales is that G major scale runs from G to G and E minor scale runs from E to E.

Major and minor keys which share the same key signature are said to be relative to each
other. Therefore, in the case of the G major and E minor keys mentioned above, E minor may
be described as the relative minor of G major-and in turn G major may be described as the
relative major of E minor.

Below is list of major scales alongside the corresponding relative minor scale. Note that the
relative minor scale starts on the sixth note of the major scale, and the major scale starts on
the third note of the relative minor scale.

2.1 Harmonic minor and melodic minor scales

There are some other minor scales which are you should be aware of.

The harmonic minor scale is a natural minor scale with a raised or sharpened 7th
note. Example 5.9 illustrates the sound of the natural minor scale followed by the
harmonic minor.

In European "art music" history this scale resulted from the desire to provide a
stronger sounding cadence (the chord progression at the end of a section of music). A
very brief history is as follows.

In early music, a melody in a minor mode would often end with the notes 8 7 8 from
the minor scale, and this melody would be harmonised as in the example below.

Over time, composers began to prefer the stronger harmonic flavour associated with
the use of a sharpened 7th note, as illustrated in the next example.

201
SCHOOL OF AUDIO ENGINEERING AE13 – Music Theory II

Student Notes

The result is the so-called harmonic minor scale, which uses the raised 7th. You
should note, however, that in popular music (as in early classical music) it is very
common to find melodies which use both the ordinary 7th and the sharpened
7th-thereby creating melodic interest and variety. Below is a typical illustration.

The harmonic minor scale is also a common element in folk music from various
regions of the world; and a melody based entirely on the harmonic minor can often
seem to have a "world music" quality.

There are two forms of melodic minor scale. The first is the "classical" melodic minor,
which has a different form when ascending as compared to descending. Even though
this scale is hardly ever used in popular music, you will often encounter references to
it and should be aware of its origins. This scale emerged as a consequence of raising
the 7th note at cadences. To avoid the somewhat awkward leap (for singers), which is
created, when a melody moves from the minor 6 note to the raised (major) 7 note;
composers began to raise the 6th note as well.

When the melody moved down rather than up, the leaping problem was avoided by
using the 7th and 6th from the natural minor scale. As a result the complete "classical'
melodic minor is as below (i.e. With a raised 6th and 7th ascending, and the "normal"
6th and 7th descending).

202
SCHOOL OF AUDIO ENGINEERING AE13 – Music Theory II

Student Notes

3. Memorising key signatures

It should be realised that a knowledge of key signatures is an indispensable aid in working with
a range of musical elements such as intervals, scales and chords-from interval recognition to
scale and chord construction. You should therefore memorise this information as quickly as
possible.

Please don't fool yourself about how well you know key signatures, even if (particularly if!) You
already have some knowledge in this area. You must have instant recall of the pattern of
sharps/flats in any major or minor key (at least to five sharps and flats at this stage), and be
able be able to write these sharps and flats in the correct position on the staff. 1 know from
long teaching experience that the lack of this knowledge provides a serious and frustrating
impediment to satisfying progress.

A list of key signatures together with the relevant major and minor key is provided below. Here
are a few summarising tips which may help to make the process of memorisation a little easier,
but remember you should ultimately develop instant recall, rather than resorting to your notes
or a calculator to work things out!

Tip 1:

When dealing with keys with sharps think in fives.

If we begin with C major (or A minor)-which have no sharps, subsequent 'sharp' keys are five
letter names away from the preceding key. For example C (no sharps) is followed by G major
(one sharp)-and G is five letter names away from C in the musical alphabet--C D E F G.
Similarly A minor (no sharps) is followed by E minor (one sharp) and E is five letter names
away from A in the musical alphabetic B C D E.

In addition, after the first sharp to be used in a key signature (F#), subsequent sharps are five
letters away (i.e. F# is joined by C# then G#, C# # etc.).

You might also find some saying helpful in remembering the order of sharps in the key
signature (e.g. Father Christmas Gets Drunk At Every Ball).

Tip 2:

When dealing with keys with flats think in fours.

If we begin with C major (or A minor)-which have no flats, subsequent 'flat' keys are four letter
names away from the preceding key (e.g. C major has no flats; F major has one flat and F is
four letter names away from C in the musical alphabet--C D E F).

Similarly, after the first flat to be used in a key signature (Bb), subsequent flats are four letters
away (i.e. Bb is joined by Eb then Ab etc.).

Tip 3:

The relative minor is three letter names below the major (e.g. C major [B] A minor).

203
SCHOOL OF AUDIO ENGINEERING AE13 – Music Theory II

Student Notes

Be aware that (as with the tips 1 and 2) there will be times when the letter name is not enough
(e.g. A major is relative of F# minor, not F minor) and so there is no substitute for
understanding the concepts and then carefully memorising the specific information.

4. Intervals

This chapter examines the way musical intervals are labelled, notated and identified. It
also explores the relationship between intervals and the overtone series, and considers
the distinctive qualities associated with particular intervals, and the sound of various
intervals in two-part harmony. There are numerous written exercises provided at the end
of the chapter, and you should make sure that you are totally comfortable with relevant
terminology and principles of notation which are covered in this chapter. This material
may seem a little 'dry' and technical, but if you build a solid foundation of knowledge in
relation to 'language' of music theory at this stage, you can avoid the frustration of having
to continually refer back to this material when you study aspects of melody, chord
construction, chord-scale relationships etc.

A musical interval is the pitch distance between two notes. A melodic interval is the
distance between two adjacent melody notes, while a harmonic interval is the distance
between two notes which are played simultaneously (i.e. As 'harmony).

204
SCHOOL OF AUDIO ENGINEERING AE13 – Music Theory II

Student Notes

Intervals are described with reference to two aspects---distance and quality. Interval
distance is expressed as a number by counting the total number of letter names
encompassed by the two notes (including the notes themselves).

Therefore the interval between the notes C and F would be numbered as four- C(1), D(2),
E(3), F(4). In interval terminology this is called a fourth.

Similarly the interval between G and B is described as a third, while the interval between
F and D is described as a sixth.

The same numbering process also applies when describing downward intervals. For
example, since G and C below encompass five letter names, the interval from G to C
below is described as a fifth.

The idea of interval quality relates to the particular sound of the interval, which in turn
relates to the actual distance between the notes. For example the interval between C and
E is numbered as a third, as is the interval between C and R However, the distance
between C and E (which we call a 'major' interval) is four semitones, whereas the
distance between C and R (which we call a 'minor' interval) is only three semitones, and
most listeners will describe C-E as having a brighter sound than C-Eb. The various
interval qualities are described below.

4.1 Major intervals and perfect intervals

We will return to the major scale as an interval reference point. If we consider the
tonic note as a fixed bottom note, then a different interval is created as we add each
of the notes of the major scale to the tonic. As you might expect, many of these
intervals (the second, third, sixth and seventh) are described as major intervals, since
they come from the major scale. However, there are several intervals (the unison,
fourth, fifth and octave), which have become known as perfect intervals.

205
SCHOOL OF AUDIO ENGINEERING AE13 – Music Theory II

Student Notes

The origin of the term perfect interval goes back to the ancient Greeks, who are
thought to have discovered that these intervals embody the most 'perfect'
mathematical relationships. For example, there is a 1:1 relationship between the tonic
and its unison (i.e. The same note); a 1:2 relationship between octave and tonic; a 2:3
relationship between fifth and tonic, and a 3:4 relationship between fourth and tonic.
Also, if you re-examine the harmonic series, you will notice that the octave, fifth and
fourth intervals are the first naturally-occurring intervals within the series.

Harmonic series based on C.

The ancient Greeks because of their mathematical perfection favoured perfect


intervals, and when early Christian church music (which drew extensively from Greek
theory) began to use some simple harmony, perfect intervals were preferred. Even
now, the sound of a series of perfect intervals can seem suggestive of an earlier
period.

4.2 Minor intervals

When a major interval is made smaller by one semitone it becomes a minor interval.
Therefore it is possible to create minor second, minor third, minor sixth and minor
seventh intervals by lowering the appropriate note from the major scale.

As the name suggests, the minor third, minor sixth and minor seventh interval all
occur within the natural minor scale (together with the perfect fourth, fifth and octave
which are part of both the major and natural minor scale), and so you will be able to
use your knowledge of major and minor scales to assist in labelling intervals.

There is, however, one inconsistency you should note at this stage. In both major and
natural minor scales the second note is a major second interval from the tonic.

206
SCHOOL OF AUDIO ENGINEERING AE13 – Music Theory II

Student Notes

As well as being used to identify musical intervals, terms such as perfect fourth, major
third, minor sixth etc. Are also often used to label notes in reference to a given tonic.
For example, in relation to a D tonic (or key centre) you will often see a B note
described as a major sixth; or an F note labelled as a minor third.

4.2 Augmented and diminished intervals

An augmented interval is one semitone larger than a perfect interval or a major


interval.

*This may seem a bit technical, but bear with me. Although the note E# is the same
pitch as an F, if it was written as an F the interval would be then numbered as a
(minor) third (I), E, F) rather than an augmented second. The rule is 'Number first-then
quality'.

A diminished interval is one semitone smaller than a perfect interval or a minor


interval.

When memorising the principles which govern how we label musical intervals, you
should note the meaning of the labels themselves. We have already examined the
logic of the term 'perfect' intervals, and it is no surprise that 'major' intervals should be
larger than 'minor' intervals.

Since to 'augment' something is to make it even larger in some way, it seems logical
that augmented intervals should be expanded major or perfect intervals. Similarly,
since to 'diminish' something is to make it smaller, it seems logical that diminished
intervals should be compressed minor or perfect intervals.

To AUGMENT OR To DIMINISH? OR “WHEN IS A MINOR SIXTH AN AUGMENTED


FIFTH?” ETC.

207
SCHOOL OF AUDIO ENGINEERING AE13 – Music Theory II

Student Notes

When you are dealing with the notation of intervals things can get a little confusing
sometimes. For instance we noted above how an augmented second will sound
exactly the same as a minor third. If you examine the example below you can also see
that that an augmented fourth is exactly the same as a diminished fifth, and that an
augmented fifth is the same as a minor sixth etc.

There are a number of factors (too many to discuss at this point) which may influence
the way an interval is notated, but a general 'rule of thumb' is that notation should
follow musical logic where possible.

Let me give you a simple example by way of illustration. In (a) below, the top note is
moving upwards-creating changing harmonic intervals as it does so. Since the sound
is that of expanding intervals, the middle interval is best written as an augmented fifth.
On the other hand, example (b) sees the major sixth being decreased by a semitone,
so it is best to write the second interval as a minor sixth.

5. Chords built from major scales

In the same way that it is possible to create a series of major and minor third intervals
from the notes of the major scale, it is also possible to create a series of major and minor
triads. This pattern of triads-major; minor; minor; major; major; minor; diminished will be
the same no mater what major key is used.

208
SCHOOL OF AUDIO ENGINEERING AE13 – Music Theory II

Student Notes

You will have noticed the reference to a diminished triad. A diminished triad is like a
minor triad with a lowered (diminished) fifth-or you can think of it as two superimposed
minor thirds.

The diminished triad is rarely used in popular music, however-so for the moment we will
ignore it. We will revisit the diminished sound at a later point when discussing seventh
chords.

When dealing with harmony it is common practice to label the chords built from a major
scale with Roman numerals. This enables us to talk about types of chords and chord
progressions etc. Without reference to a particular key. A large Roman numeral is used
to indicate a major chord, while a small Roman numeral is used to indicate a minor chord.
A small Roman numeral together with a small circle indicates a diminished chord. The
triads built from C major scale are illustrated below with both Arabic and Roman names.

5.1 Primary chords in major keys

The three most important chords in a major key are those built on the first, fourth and
fifth scale degrees (i.e. The chords we label I, IV, and V), and these are described as
primary chords.

I will give you a brief description of each of the primary chords and discuss some
common chord progressions (a chord progression is the movement from one chord to
another; and may consist of a single movement or a series of movements). Of course,
it is difficult to convey a sense of a chord in words, but these descriptions (and the
recorded examples) should give you some idea of the typical sound and function of
these important chords.

The I chord is called the tonic chord. It is considered to be the most important chord in
the key, and normally has a sense of repose or finality. The tonic chord may move to
any other chord but has no strong tendency to move in a particular direction. After a
piece of music moves away from the tonic, there is a type of "harmonic gravity" which
results in a return to this chord at some point. Most pieces of music begin and end on
the tonic chord.

209
SCHOOL OF AUDIO ENGINEERING AE13 – Music Theory II

Student Notes

The V chord is the next most important chord in a given key, and is known as the
dominant chord. The dominant chord usually has a strong tendency to move back to
the tonic-so when V moves to I it is said to be performing a dominant function. This
function is further enhanced by the addition of a seventh note-creating a dominant
seventh chord (this chord is discussed in more detail later in this chapter). The V-I
chord movement is known in classical music theory as perfect cadence or final
cadence. A cadence is a "final" chord progression-typically found at the end of a
melodic phrase etc. The V-I progression is so predictable that when a V chord moves
to a chord other than I (usually vi) the cadence is described as a deceptive cadence.
Example 6.3 illustrates two I-V-I progressions in different keys (and with different
melodies and feels), followed by one deceptive cadence (I-V-vi).

The IV chord is known as the subdominant chord (i.e. "below the dominant") and is
used in a variety of chord progressions. One common progression involves an upward
step movement to V (i.e. IV-V)-this will often occur as part of a IV-V-I movement.
Example 6.4 illustrates two I-IV-V-I progressions.

The reverse IV V progression (i.e. V-IV) is used quite often in popular music. The
effect of V-IV movement is to release the tension associated with the V chord (which
wants to move to I). If the IV-V or V-IV is repeated the effect is to prolong the tension
by delaying the ultimate resolution to the I chord.

The IV chord can also move down to the I chord, creating a plagal cadence (IV I)-a
sound commonly associated with the "amen" of a Christian church hymn.

In longer chord progressions the IV chord is often used in association with a melody
which seems to move away from "home bass". The example below illustrates a typical
longer progression involving primary triads (i've notated a simple version of the
melody below). Note how the I V I sequence at the beginning establishes the sense of
the key, but nothing very interesting happens with the melody. When we reach the IV
chord, you will hear a typical leaping melodic hook which has a sense of taking the
melody away from home bass, before the I V I at the end re-establishes the tonic and
the melody falls again.

210
SCHOOL OF AUDIO ENGINEERING AE13 – Music Theory II

Student Notes

Musical Instrument and the Orchestra

A typical symphony orchestra is made up of 4 main sections namely

1. Strings;
2. Woodwind;
3. Brass;
4. Percussions

1. Stings Section

Strings section consists mainly of Violin, Viola, Cello and Double Bass. They are the most
important instrument in the orchestra and are playing most of the time. The violin is further
divided into the First and Second violin sections each could be playing different notes in
harmony or in unison.

211
SCHOOL OF AUDIO ENGINEERING AE13 – Music Theory II

Student Notes

Stings instruments produce sound by the vibration of strings that are either bowed or plucked
(as in pizzicato). The designs of the 4 string-instruments are quite similar except their sizes.
They all have 4 strings stretched across a fretless fingerboard. The vibrations of the strings are
transmitted to the sound box (the body) via the bridge. The bridge is a small thin piece of wood
supporting all the 4 strings on top of the body. A musician plays a different pitch by changing
the position of his fingers on the fingerboard while bowing. A vibrato effect can be achieved by
vibrating the finger touching a string.

It is to be noted here that when a piece of string vibrates, it fluctuates in a complex manner that
produces its fundamental frequencies as well as the harmonics that give the instrument sound
its characteristic.

212
SCHOOL OF AUDIO ENGINEERING AE13 – Music Theory II

Student Notes

Fundamental

2nd Harmonic

3rd Harmonic

4th Harmonic

Result

If we were to note down the frequencies of the fundamental and each of the harmonics, we will
realize that they form the Harmonic Series.

2. Woodwind Section

The Woodwind section consists mainly of flute, oboe (Col Anglais), clarinet and bassoon. In
more modern orchestra setting, saxophone is added and it belongs to the woodwind section.
Saxophone has similar mouthpiece and fingering pattern as the clarinet. The two instruments
are thus interchangeable.

213
SCHOOL OF AUDIO ENGINEERING AE13 – Music Theory II

Student Notes

The vibration of a column of air within a tube is the principle of sound production of woodwind
instrument. The length of the tube determines the lowest note the instrument could play. To be
able to play notes of different pitches, holes are punched along the length of the tube by
careful calculation. Different pitches could then be achieved by covering or uncovering the
holes. Key action is part of the instrument to aid the playing of the instrument. It allows a long
tube to be played without having to move or stretch fingers across the length of the instrument.
It also allows fast movements of closing and opening of the holes which makes playing of fast
passage possible.

Tube (bore of a woodwind instrument)

One cycle of the fundamental pitch

Hole
Tube (bore of a woodwind instrument)

The fundamental wave form is shortened and thus produced higher pitch.

214
SCHOOL OF AUDIO ENGINEERING AE13 – Music Theory II

Student Notes

Flute is the only instrument in the woodwind section which does not use reed. When a flute
player blows into the mouthpiece, air rushes through the hole of the mouthpiece and initiate
vibrations of air within the body of the flute. There is a shorter version of flute called the piccolo
which is half the size and plays an octave higher.

Apart from flute and piccolo, the rest of the woodwind family is divided into the Single reed and
Double reed categories. A reed is a thin piece of bamboo attached to the mouthpiece of the
instrument. The sound of a reed instrument comes from vibration of the reed caused by
blowing.

Clarinet is a Single reed woodwind instrument (so is the saxophone). It has a mouthpiece with
a reed attached. A clarinet play can control the tone of the instrument by forcing his lower lip
against the reed.

A single reed mouthpiece

Screws
Mouthpiece

Reed
Air flow

Oboe and Bassoon are double reed instrument. Their mouthpieces are made of two pieces of
bamboo reed tied together. The effect of the double reed when vibrating produces a more
pipe-liked sound as compared to that of a single reed instrument. Oboe is a treble instrument
of the double reed type. It has a lower pitch counterpart called the Col Anglais (or English
Horn) which is longer and deeper in tone.

A double reed mouthpiece

Reed

Reed

Bassoon is the bass instrument in the woodwind family. The total length of its air column is
about 2 meters long. It has a rather comical sound characteristic and thus is used for
humorous section of the music.

215
SCHOOL OF AUDIO ENGINEERING AE13 – Music Theory II

Student Notes

3. The Brass Section

The brass section consists of mainly trumpet, trombone, French horn and tuba.

The sound of all brass instruments is produced by vibration of the player’s lips pressing against
the mouthpiece. The vibration is transmitted into the column of air within the instrument which
is then amplified at the bell-shape end.

Different pitches can be achieved by two methods. Firstly, a brass player can play different
pitches by changing the strength of his breath and lip shape. With a fixed length of brass pipe,
a skillful player can produce notes of the Harmonic Series. If we examine the Harmonic Series,
we will find that notes get closer when approaching the higher end of the series. That explains
why Baroque trumpets always play at high pitch as this was the only way melodies could be
played with the technology available at the time.

Secondly, different pitches can also be achieved by changing the length of the tube. Instead of
punching holes along the length instrument (such as the woodwind family), brass instruments
produce different notes by changing the physical length of the tube via various means.
Trumpet, French Horn and Tuba employ the valve and piston system to redirect air into
different length of tubes. Combining with the player’s alteration of the breath and lips, a relative
wide range of notes can be played.

The trombone employs a slide mechanism to change the length of its tube. This mechanism
allows the player to produce ‘glissando’ effects.

Among the brass section, trumpet and French Horn are transposing instrument.

216
SCHOOL OF AUDIO ENGINEERING AE13 – Music Theory II

Student Notes

4. Transposing Instrument

Woodwind instrument makers have to find the range of notes which the instrument sound best.
Therefore, there is a need to strike a balance between the length of the instrument and its tone
colour. The result is that some instruments play at a different pitch from others.

One of such instrument in the woodwind section is the clarinet. Clarinet comes in different
sizes – shorter ones for higher pitch and longer ones for lower pitch. Different sizes of the
clarinet are used for different music and sometimes different section of the same piece.

Although, musicians can play clarinet of the different length all in the same pitch by changing
their fingering pattern, but it will be difficult for them to adopt a different playing style whenever
they change instrument. A better solution is to write the music score at a different key while
keeping the musician’s fingering pattern consistent.

This result is the so-called transposing instrument. A transposing instrument plays a different
pitch with a standardized fingering pattern. Within an orchestra, the player would read a score
of a different key from his other partners.

The amount of transposition of an instrument is indicated by a note name. The name of the
note is the note produced when the player plays a C on the score. For example, a ‘clarinet in B
flat’ will produce a B flat when the player plays a C (with the standard fingering) on the score.

Since B flat is a major second (two semitones) lower than C, the particular clarinet will always
be a major second lower than the key written on the score. Therefore, in order for the clarinet
in B flat to play along with the rest of the orchestra, its score has to be written a major second
HIGHER to compensate for its lower pitch. Thus, a little bit of calculation is needed when
writing score for transposing instrument. For example, an orchestra is playing a piece in G
Major. The music scores for instruments such as violin, piano, trombone and other non-
transposing instruments are written in G Major. However, for a clarinet in B flat to participate in
this performance, the clarinet player would have to read a score that is major second higher
i.e. A Major. As the player reads the A Major score, his clarinet will sound a major second
lower and thus produce a tune in G Major which is the desired key of the music being played
by the whole orchestra.

4.1 Transposing Instruments in the Woodwind Section:

A. Clarinet in Bb - notes written a major 2nd above

B. Clarinet in A - notes written a minor 3rd above

C. Clarinet in Eb - notes written a minor 3rd below

D. Bass clarinet in Bb - notes written a major 9th above

C. Cor anglais (in F) - notes written a perfect 5th above

F. Alto (or bass) flute in G - notes written a perfect 4th above the sounding notes.

Occasionally used:

217
SCHOOL OF AUDIO ENGINEERING AE13 – Music Theory II

Student Notes

G. Saxophone in Bb (soprano) - notes written a major 2nd above h. Saxophone in Eb


(alto) - notes written a major 6th above i. Saxophone in Bb (tenor) - notes written a
major 9th above j. Saxophone in Eb (baritone) - notes written an 8ve plus major 6th
above

4.2 Transposing instruments in the brass section:

A. French horn in F - notes written a perfect 5th above

B. Horn in E - notes written a minor 6th above

C. Horn in D - notes written a minor 7th above

D. Trumpet in Bb - notes written a major 2nd above the sounding notes.

E. Trumpet in D - notes written a major 2nd below

F Cornet in Bb - notes written a major 2nd above

G. Cornet in Eb - notes written a minor 3rd below

5. The Percussion Section

The Percussion Section consists of a wide range of instrument that could be divided into –
Pitched and Unpitched Percussion.

Pitched percussion instruments are those that have a distinctive pitch. Such as the Celesta,
Glockenspiel, Vibraphone, Tubular Bells and etc.

Unpitched percussion instruments such as cymbals, snare drums (side drum), bass drums,
gong and etc., generate more Overtones than harmonics that overshadow the sense of pitch.
They are perceived to have no distinctive pitch.

5.1 Pitch Percussion Instruments

218
SCHOOL OF AUDIO ENGINEERING AE13 – Music Theory II

Student Notes

5.2 Unpitched Percussion Instruments

A Timpani

219
SCHOOL OF AUDIO ENGINEERING AE13 – Music Theory II

Student Notes

History of Western Art Music

Musical styles change from one era in history to the next. These changes are continuous,
and so any boundary line between one style period and the next can be only an
approximation.

The history of western art music can be divided into the following stylistic periods:

Middle Ages (450-1450)

Renaissance (1450-1600)

Baroque (1600-1750)

Classical (1750-1820)

Romantic (1820-1900)

Twentieth century

We know that music played an important role in the cultures of ancient Israel, Greece,
and Rome. But hardly any notated music has survived from these ancient civilizations.
The first style period to be considered is the European Middle Ages, from which notated
music has come down to us.

1. Middle Ages (450-1450)

1.1 Social background

A thousand years of European history are spanned by the phrase “Middle Age”
beginning around 450 AD with the disintegration of the Roman Empire. This era
witnessed the “dark ages”, a time of migrations, upheavals, and wars.

Roman Catholic church had great influence on all segments of the society. Most
important musicians were priests and worked for the church. Writing music and
singing became important occupations in the church. With the preeminence of the
church, for centuries, it’s no surprise that only sacred music was notated.

Most medieval music was vocal, though a wide variety of instruments served as
accompaniment. After about 1100, organ be came the most prominent instrument in
the church.

1.2 Sacred Music of the Middle ages

Gregorian Chant

For over 1000 years, the official music of the Roman Catholic church has been
Gregorian Chant, which consists of melody set to sacred Latin texts and sung without
accompaniment. It is monophonic in texture. The melodies of Gregorian Chant were
meant to enhance parts of religious services.

220
SCHOOL OF AUDIO ENGINEERING AE13 – Music Theory II

Student Notes

Gregorian Chant is named after Pope Gregory I (the Great), who reorganized the
Catholic liturgy during his reign from 590 to 604. Although medieval legend credits
Pope Gregory with the creation of Gregorian Chant, we know it evolved over many
centuries. Some of its practices came from the Jewish synagogue of the first century
after Christ. Most of the several thousand melodies known today were created
between 600 and 1300 AD.

At first Gregorian melodies were passed along by oral tradition, but as the number of
chants grew to the thousands, they were notated to ensure musical uniformly
throughout the western church.

1.3 Secular Music In The Middle Ages

The first large body of secular songs that survives in decipherable notation was
composed during the 12th and 13th centuries by French nobles called troubadours and
trouvères. Most of the songs deal with love and were preserved because nobles had
clerics write them down.

The Development Of Polyphony: Organum

For centuries, western music was basically monophonic, having a single melodic line.
But sometime between 700 and 900, the first step were taken in a revolution that
eventually transformed western music. Monks in monastery choirs began to add a
second melodic line to Gregorian Chant.

Medieval music that consists of Gregorian Chant and one or more additional melodic
lines is called organum.

School Of Notre Dame: Measured Rhythm

After 1150, Paris became the center of polyphonic music. The University of Paris
attracted leading scholars, and the Cathedral of Notre Dame was the supreme
monument of gothic architecture.

Two successive choirmasters of Notre Dame, Leonin and Perotin, are the first notable
composers known by name. They and their followers are referred to as the school of
Notre Dame.

From about 1170 to 1200, the Notre Dame composers developed rhythmic
innovations. The music of Leonin and Perotin used measure rhythm, with definite time
values and a clearly defined meter. For the first time in music history, notation
indicated precise rhythms as well as pitches.

2. Renaissance (1450-1600)

The 15th and 16th centuries in Europe have come to be known as the Renaissance. People
then spoke of “rebirth” or “renaissance” of human creativity. It was a period of exploration and
adventure.

During the Renaissance, the dominant intellectual movement, which was called humanism,
focused on human life and its accomplishments.

221
SCHOOL OF AUDIO ENGINEERING AE13 – Music Theory II

Student Notes

The Catholic church was far less powerful in the Renaissance than during the Middle Ages, for
the unity of Christendom was exploded by the Protestant Reformation led by Martin Luther
(1483-1546).

The invention of printing with movable type (around 1450) accelerated the spread of learning.

2.1 Characteristics of Renaissance Music

In the Renaissance, as in the Middle Ages, vocal music was more important than
instrumental music.

The humanistic interest in language influenced vocal music, creating a close


relationship between words and music.

The word painting technique was used in music.

While there is a wide range of emotion in Renaissance music, it usually is expressed


in moderate balanced way, with no extreme contrasts of dynamics, tone color and
rhythm.

The texture of Renaissance music is chiefly polyphonic. A typical choral piece has
four, five or six voice parts of nearly equal melodic interest.

Rhythm is more gentle and each melody line has great rhythmic independence.

2.1 Sacred Music

The 2 main form of sacred music in the Renaissance music are the motet and the
mass. They are alike in style, but a mass is a longer composition.

2.3 Secular Music

Secular vocal music became increasingly popular. Music was set to poems in various
languages throughout Europe.

Renaissance secular music was written for groups of solo voices and for solo voice
with the accompaniment of one or more instruments.

An important kind of secular vocal music during the Renaissance was the madrigal, a
piece of several solo voices set to a short poem, usually about love. A madrigal, like a
motet, combines homophonic and polyphonic textures. But it more often uses word
painting and unusual harmonies.

3. Baroque (1600-1750)

During the Baroque period, the ruling class was enormously rich and powerful. While most of
the population barely managed to survive, European rulers surrounded themselves with
luxury.

3.1 Characteristics of Baroque music

222
SCHOOL OF AUDIO ENGINEERING AE13 – Music Theory II

Student Notes

Unity of mood – a baroque piece usually expresses one basic mood throughout which
is conveyed by continuity of rhythm.

Baroque melody creates a feeling of continuity. An opening melody will be heard


again and again in the course of a baroque piece. Paralleling continuity of dynamic
level; the volume seems to stay constant for a stretch of time.

We’ve noted that late baroque music is predominantly polyphonic in texture: two or
more melodic lines compete for the listener’s attention. Chords became increasingly
important and became more significant in themselves.

Important composers of the baroque period

Antonio Vivaldi (1678-1741)

Johann Sebastian Bach (1685-1750)

George Frideric Handel (1685-1759)

4. Classical (1750-1820)

During the baroque era, scientific methods and discoveries of geniuses like Galileo and
Newton vastly changed people’s view of the world. By the middle of 18th century, faith in the
power of reason was so great that it began to undermine the authority of the social and
religious establishment.

Revolutions in thought and action were paralleled by shifts of styles in the visual arts and
music.

4.1 Characteristics of the classical style

Contrast of mood – Great variety and contrast of mood received new emphasis in
classical music. Mood in classical music may change gradually or suddenly,
expressing conflicting surges of elation and depression.

Flexibility of rhythm adds variety to classical music. In contrast to the polyphonic


texture of late baroque music, classical music is basically homophonic. However,
pieces shift smoothly or suddenly from texture to another.

Classical melodies are among the most tuneful and easy to remember. They often
sound balanced and symmetrical because they are frequently made up of two phrase
of the same length.

The classical composers’ interest in expressing shades of emotion led to the


widespread use of gradual dynamic change – cresendo and decresendo.

During the classical period, the desire for gradual dynamic change led to the
replacement of the harpsichord by the piano.

4.1 Important composers of the classical period

Joseph Haydn (1732-1809)

223
SCHOOL OF AUDIO ENGINEERING AE13 – Music Theory II

Student Notes

Wolfgang Amadeus Mozart (1756-1791)

Ludwig Van Beethovan (1770-1827)

5. Romantic (1820-1900)

The early 19th century brought the flowering of romanticism, a cultural movement that stressed
emotion, imagination and individualism. Romantic writers emphasized freedom of expression.

5.1 Characteristics of romantic music

Romantic music puts unprecedented emphasis on self-expression and individuality of


style. Many romantics created music that sound unique and reflects their
personalities.

Music nationalism was expressed when romantic composers deliberately created


music with a specific national identity.

Program Music – instrumental music associated with a story, poem, idea or scene,
became popular. A programmatic instrumental piece can represent the emotions,
characters, and events of a particular story, or it can evoke the sounds and motions of
nature. One example is Tchaikovsky’s Romeo and Juliet, an orchestral work inspired
by Shakespeare’s play.

Romantic composers reveled in rich and sensuous sound, using tone color to obtain
variety of mood and atmosphere. There were more varieties in tone color.

New chords and novel ways of using familiar chords were explored. There was more
prominent use of chromatic harmony, which employs chords containing tones not
found in the prevailing major and minor scale.

Romantic music also calls for wide range of dynamics. Extreme dynamic expression
like φφφφ and ππππ were used.

5.2 Important composers of the romantic period

Franz Schubert (1797-1828)

Robert Schumann (1810-1856)

Frederic Chopin (1810-1849)

Felix Mendelssohn (1809-1847)

Hector Berlioz (1803-1869)

Fraz Liszt (1811-1886)

Peter Ilyich Tchaikovsky (1840-1893)

Antonin Dvorak (1841-1904)

224
SCHOOL OF AUDIO ENGINEERING AE13 – Music Theory II

Student Notes

Johannes Brahms (1833-1897)

Giacomo Puccini (1858-1924)

Richard Wagner (1813-1883)

6. Twentieth century

The year 1900 to 1913 brought radical new development in science and art. During the period
preceding World War I, discoveries were made that overturned long-held beliefs. Sigmund
Freud explored the unconscious and developed psychoanalysis, and Albert Einstein
revolutionized the view of the universe with his special theory of relativity. In visual arts,
abstract paintings no longer tried to represent the visual world.

In music, there were entirely new approaches to the organization of pitch and rhythm and a
vast expansion in the vocabulary of sound used, especially percussive sounds.

In the past, composers depended on the listener’s awareness of the general principles
underlying the interrelationship of tone and chords. Twentieth century music relies less on pre-
established relationship and expectations. Listeners are guided by musical cues contained
only within an individual composition.

6.1 Characteristic of twentieth century music

Tone color has become a more important element of music than it ever was before. It
often has a major role, creating variety, continuity, and mood.

Noiselike and percussive sounds are often used and instruments are played at the
very top or bottom of their range. Sometimes, even typewriters, sirens and automobile
brake drums are brought into the orchestra as noisemaker.

The twentieth century brought fundamental changes in the way chords are treated.
Chords are divided into 2 opposing types: consonant and dissonant. A consonant
chord was stable; it functioned as a point of rest or arrival. A dissonant chord was
unstable; its tension demanded onward motion or resolution to a stable, consonant
chord.

There was new way of organizing the rhythm. Rapidly changing of meters are
characteristic of twentieth century. Beats are grouped irregularly, and the accented
beat comes at unequal time intervals.

6.2 Important composers of the twentieth century

Claude Debussy (1862-1918)

Igor Stravinsky (1882-1971)

Bela Bartok (1881-1945)

George Gershwin (1898-1937)

225
SCHOOL OF AUDIO ENGINEERING AE13 – Music Theory II

Student Notes

226
AE14 – MIDI

1. The Needs for MIDI

2. MIDI Hardware Connections

2.1 Standard Midi Connections

2.2 Setting up MIDI connections with Macintosh computer

2.3 MIDI interface for PC

3. MIDI Messages

3.1 MIDI Note Numbers

3.2 Channel Message

3.3 Channel Voice Message

3.4 Note On/Off

3.5 Velocity information

3.6 Running status

3.7 Polyphonic key pressure (aftertouch)

3.8 Control change

3.9 Channel modes (under the control change category)

4. MIDI Channels and the Multi-Timbre Sound Module

4.1 Assigning a patch to a "robomusician"

4.2 Individual control via each MIDI Channel

4.3 Changing instrumentation – Program Change

4.4 Active Sense

5. Daisy Chain/MIDI Channels

6. The Advantages of MIDI

7. General MIDI

8. GM Patches

227
8.1 GM Drum Sounds

9. Saving Midi files

Midi Synchronization Methods

1. MIDI Clock

2. Song Position Pointer (SPP)

3. MIDI Timecode (MTC)

3.1 Quarter frame messages

3.2 Full-Frame Message

MIDI Sequencer

1. What is a sequencer?

2. The Hardware Sequencers

2.1 Keyboard Workstation

3. Software Sequencers

Digital Sampler

1. What is a sampler?

1.1 The sampling process

1.2 The Editing Process

1.3 The Programming Process

1.4 Channel Mapping Process

228
SCHOOL OF AUDIO ENGINEERING AE14 – Midi

Student Notes

AE14 – MIDI

Introduction

MIDI stands for Musical Instrument Digital Interface. It is a standard interfacing language used
on all modern digital musical instrument and sequencer. This module focuses on basic theory
of MIDI and sequencing and sampling technique.

1. The Needs for MIDI

Musical instruments only existed as acoustic devices until the discovery of electricity. This led
to the creation of electronic musical instruments. Electronic musical instruments were devices
that electrically simulated the sound of real instruments. An example is a keyboard. Modern
keyboards can play the sound of almost any musical instrument and even non-musical
sounds. The keyboard is made up two separate sections

a. The section with the keys – called the keyboard


b. The section with the tones – called the tone generator or the tone module. The
tones in the tone module are the different sounds of the different instruments that
the electronic keyboard can play. Each sound, for example, the trumpet sound, is
called a patch.

These two sections can exist together in one device or you can have them exist separately.
When they exist separately you have

a. Only keys – Called a keyboard controller. It does not have any sounds on it so
cannot be used for anything but remote controlling.
b. Only sounds, No keys – Called a tone module or a tone generator. Can not be
used without a controller.

Musicians after playing with electronic musical instruments then desired to be able to control
multiple instruments remotely or automatically from a single device.

Remote control is when a musician plays one musical instrument, and that instrument controls
(one or more) other musical instruments.

Examples of this include combining the sounds of several instruments playing in perfect unison
to "thicken" or layer a musical part. Blend certain patches upon those instruments. Perhaps a
musician wishes to blend the sax patches upon 5 different instruments to create a more
authentic-sounding sax section in a big band. But, since a musician has only two hands and
feet, it's not possible to play 5 instruments at once unless he has some method of remote
control. Midi was developed in response to this need.

Midi is a digital language in binary form. Since there were existing ways to record and even
edit binary data through CPU technology, the next development was dedicated processing
hardware and software to handle midi messages.

This hardware and software allowed midi messages to be recorded, edited like one would edit
text on a word processor, and played back at a later time. These were called sequencers.

An application for sequencers is a situation where some musicians want to be able to have
"backing tracks" in live performance, without using prerecorded tapes. Midi sequencers offered
a more flexible alternative that allowed real time changes in the arrangement.

229
SCHOOL OF AUDIO ENGINEERING AE14 – Midi

Student Notes

2. MIDI Hardware Connections

The visible MIDI connectors on an instrument are female 5-pin DIN jacks. There are separate
jacks for incoming MIDI signals (received from another instrument that is sending MIDI
signals), and outgoing MIDI signals (ie, MIDI signals that the instrument creates and sends to
another device).

You use MIDI cables (with male DIN connectors) to connect the MIDI jacks of various
instruments together, so that those instruments can pass MIDI signals to each other. You
connect the MIDI OUT of one instrument to the MIDI IN of another instrument, and vice versa.

Some instruments have a third MIDI jack labeled "Thru". This is used as if it were an OUT jack,
and therefore you attach a THRU jack only to another instrument's IN jack. In fact, the THRU
jack is exactly like the OUT jack with one important difference. Any messages that the
instrument itself creates (or modifies) are sent out its MIDI OUT jack but not the MIDI THRU
jack. Think of the THRU jack as a stream-lined, unprocessed MIDI OUT jack.

2.1 Standard Midi Connections

The standard MIDI hardware connection is a 5-pin DIN Plug. Out of the 5 pins, only 3
pins are used for MIDI transmissions. Out of the 3 used pins, Pin 2 is used for
shielding and Pins 4 (+) and 5 (-) are for actual transmissions. The extra 2 pins are
reserved for future development of the MIDI standard.

In the diagram below, take note of the position and numbering of the 5 pins. Also note
that Pin 1 and 3 are not used at present.

It is interesting to note that before the establishing of the 5-pin DIN plug as standard,
microphone cables and XLR jacks (with 3 pins) were used in MIDI connection.

An alternative connector type found mainly on Macintosh computers is a host


connector . Here the Midi IN/OUT/Thru all use this single host cable. This option exist
for connection to PC as well so often a sound module will have a selctor switch at the
host port for Mac or PC.

A Hardware requirement for connecting a midi device to a computer is a Midi


interface. This is simply a device that acts as a translator of between the midi
equipment and the PC. Often coming with Midi IN, OUT and THRU connections on
one end and computer connections like Parallel or host ports on the other.

2.2 Setting up MIDI connections with Macintosh computer

i. Connect the MIDI OUT of the Controller Keyboard to MIDI IN of the Sound
Module.

230
SCHOOL OF AUDIO ENGINEERING AE14 – Midi

Student Notes

ii. Connect the Host port of the Sound Module to either Printer or Modem port of
the Macintosh.
iii. Make sure that the selector switch of the Sound Module Host port is switched
to “Mac” for communication with the Macintosh.

231
SCHOOL OF AUDIO ENGINEERING AE14 – Midi

Student Notes

2.3 MIDI interface for PC

There are few ways of interfacing PC with MIDI. Each makes use of a different
communication port of the PC computer.

Soundcard joystick port – this is a simplest way of connecting MIDI to PC with a


standard Creative compatible soundcard. A special Joystick Port to MIDI cable and a
software driver is needed to convert the Joystick port into MIDI In/Out interface;

The serial com-port – with a custom-made com-port-to-Host cable and the appropriate
software driver, the com-port can function as both the MIDI In and Out through a
standard Macintosh Host cable. This set up however, requires that the MIDI device
has a built-in Host port;

The parallel port (printer port) – The parallel port usually works with external MIDI
interface box which provides multiple MIDI Ins and Outs. More advanced models are
also equipped with synchronization functions to allow MIDI sync with tape recorders,
video and other sequencers.

Dedicated MIDI interface card – There are also internal interface cards that perform
similar functions to the external interface boxes.

Some of the major software sequencers are integrated with digital audio and video
features which turned the computer into an integrated audio/video production
workstation. Some famous of such are Cubase VST by Steinberg and Digital
Performer by MOTU.

3. MIDI Messages

Many electronic instruments not only respond to MIDI messages that they receive (at their
MIDI IN jack), they also automatically generate MIDI messages while the musician plays the
instrument (and send those messages out their MIDI OUT jacks).

When a musician pushes down (and holds down) the middle C key on a keyboard. Not only
does this sound a musical note, it also causes a MIDI Note-On message to be sent out of the
keyboard's MIDI OUT jack. That message consists of 3 numeric values:

144 60 64

The musician now releases that middle C key. Not only does this stop sounding the musical
note, it also causes another message -- a MIDI Note-Off message -- to be sent out of the
keyboard's MIDI OUT jack. That message consists of 3 numeric values: 128 60 64. Note that
one of the values is different than the Note-On message.

You saw above that when the musician pushed down that middle C note, the instrument sent
a MIDI Note On message for middle C out of its MIDI OUT jack. If you were to connect a
second instrument's MIDI IN jack to the first instrument's MIDI OUT, then the second
instrument would "hear" this MIDI message and sound its middle C too. When the musician
released that middle C note, the first instrument would send out a MIDI Note Off message for
that middle C to the second instrument. And then the second instrument would stop sounding
its middle C note.

232
SCHOOL OF AUDIO ENGINEERING AE14 – Midi

Student Notes

But MIDI is more than just "Note On" and "Note Off" messages. There are lots more
messages. There's a message that tells an instrument to move its pitch wheel and by how
much. There's a message that tells the instrument to press or release its sustain pedal. There's
a message that tells the instrument to change its volume and by how much. There's a
message that tells the instrument to change its patch (ie, maybe from an organ sound to a
guitar sound). And of course, these are only a few of the many available messages in the MIDI
command set.

And just like with Note On and Note Off messages, these other messages are automatically
generated when a musician plays the instrument. For example, if the musician moves the pitch
wheel, a pitch wheel MIDI message is sent out of the instrument's MIDI OUT jack. (Of course,
the pitch wheel message is a different group of numbers than either the Note On or Note Off
messages). What with all of the possible MIDI messages, everything that the musician did
upon the first instrument would be echoed upon the second instrument. It would be like he had
two left and two right hands that worked in perfect sync.

3.1 MIDI Note Numbers

A MIDI controller (be it a piano-like keyboard, a MIDI guitar, a MIDI drum kit, etc) can
have upto 128 distinct pitches/notes. (Consequently, A MIDI controller has a wider
range of notes -- ie, more octaves of notes -- than even an acoustic piano). The
lowest note is a C upon a MIDI controller (as opposed to an A upon an acoustic
piano).

But whereas musicians name the keys using the alphabetical names, with sharps and
flats, and also octave numbers (as shown in the diagram above), this is more difficult
for MIDI devices to process, so they instead assign a unique number to each key. The
numbers used are 0 to 127. As mentioned, the lowest note upon a MIDI controller is a
C and this is assigned note number 0. The C# above it would have a note number of
1. The D note above that would have a note number of 2. Etc. "Middle C" is note
number 60. (ie, There are 59 other keys below middle C upon a MIDI controller). So, a
MIDI note number of 69 is used for A440 tuning. (ie, That is the A note above middle
C).

As mentioned, a MIDI controller can theoretically have 128 keys (or frets, or drum
pads, etc). Of course, in practice, it's too expensive to manufacture a MIDI controller
with 128 keys on it. Besides, most musicians aren't accustomed to playing a keyboard
with more than 88 keys (or a drum kit with 128 pads, etc). It would seem very odd to
the musician -- perhaps even too unwieldly an instrument to play. So typically, most
controllers have less than 128 keys upon them. For example, a "full-size" keyboard
controller will usually have only the 88 keys that a piano has; the lowest key being the
low A like upon a piano. (ie, Its lowest key is actually MIDI Note Number 21). Never
mind that a true "full-size" keyboard controller would have 128 keys and its lowest key
would go down to a C two octaves below the piano's A key. Of course, most keyboard
controllers have a "MIDI transpose" function so that, even if you don't have the full 128
keys, you can alter the note range that your (more limited) keyboard covers. For
example, instead of that lowest A key being assigned to note number 21, you could
transpose it down an octave so that it is assigned a note number of 9.

Some MIDI software or devices don't use MIDI note numbers to identify notes to a
musician (even though that's what the MIDI devices themselves expect, and what they
pass to each other in MIDI messages). MIDI note numbers don't mean that much to a
musician. Instead, the software/device may display note names, such as F#3 (ie, the
F# in the third octave of a piano keyboard).

233
SCHOOL OF AUDIO ENGINEERING AE14 – Midi

Student Notes

There is one, nagging discrepancy that has crept up between various models of MIDI
devices and software programs, and that concerns the octave numbers for note
names. If your MIDI software/device considers octave 0 as being the lowest octave of
the MIDI note range (which it ideally should), then middle C's note name is C5. The
lowest note name is then C0 (note number 0), and the highest possible note name is
G10 (note number 127).

Some software/devices instead consider the third octave of the MIDI note range (ie, 2
octaves below middle C) as octave 0. (They do this because they may be designed to
better conform to a keyboard controller that has a more limited range; one which
perhaps doesn't have the two lowest octaves of keys which a 128 key controller would
theoretically have. So they pretend that the third octave is octave 0, because the first
two octaves are physically "missing" on the keyboard). In that case, the first 2 octaves
(that are physically missing) are referred to as -2 and -1. So, middle C's note name is
C3, the lowest note name is C-2, and the highest note name is G8. This discrepancy
is purely in the way that the software/device displays the note name to you.

Beside Note number, Note On and Note Off messages, MIDI consists of a hierarchy
of different categories of messages catering for the different needs of most MIDI
devices.

Below is a diagram showing different types of MIDI Messages:

MIDI Messages

Channel Message System Message

Voice Mode Common Realtime Exclusiv

3.2 Channel Message

Channel Messages are those that carry specific MIDI channels. These MIDI channel
numbers range from binary 0 to 15 but are commonly referred to as MIDI Channel 1 to
16. In binary form, the channel numbers are in the form of 4 binary digits as part of the
status byte. The other 4 bits of the status byte specify different types of Channel
Messages. The status byte is then followed by one or more data bytes.

A typical Channel Message

[Type of message || Channel # ] [ Data byte 1 ] [Data byte 2]…

234
SCHOOL OF AUDIO ENGINEERING AE14 – Midi

Student Notes

3.3 Channel Voice Message

Channel Voice Messages are ones that most directly concerned with the music being
recorded and played.

The most common Channel Voice Messages are:

Note On
Note Off
Program Change
Control Change
Pitch bend
Polyphonic Aftertouch
Channel Aftertouch

3.4 Note On/Off

A Note On/Off message consists of 1 status bytes and 2 data bytes as shown below:

Status Byte Data Bytes


8 bits 8 bits 8 bits
[Note On/Off || Chn #] Note Number Velocity

MIDI note numbers relate directly to the western musical chromatic scale, and the
format of the message allows for 128 note numbers which cover a range of a little
over ten octaves adequate for the full range of most musical material. This
quantisation of the pitch scale is geared very much towards keyboard instruments,
and is perhaps less suitable for other instruments and cultures where the definition of
pitches is not so black and white. Nonetheless, means have been developed of
adapting control to situations where unconventional tunings are required. Note
numbers normally relate to the musical scale as shown in Table 2.2, although there is
a certain degree of confusion here. Yamaha established the use of C3 for middle C,
whereas others have used C4 Some software allows the user to decide which
convention will be used for display purposes.

The table below shows MIDI note numbers related to the musical scale

Musical note MIDI note number

C –2 0
C –1 12
C0 24
C1 36
C2 48
C3 (middle C) 60 (Yamaha convention)
C4 72
C5 84
C6 96
C7 108
CS 120
G8 127

235
SCHOOL OF AUDIO ENGINEERING AE14 – Midi

Student Notes

3.5 Velocity information

Note messages are associated with a velocity byte, and this is used to represent the
speed at which a key was pressed or released. The former will correspond to the
force exerted on the key as it is depressed: in other words, 'how hard you hit it' (called
'note on velocity'). It is used to control parameters such as the volume or timbre of the
note at the audio output of an instrument, and can be applied internally to scale the
effect of one or more of the envelope generators in a synthesiser. This velocity value
has 128 possible states.

Note off velocity (or 'release velocity') is not widely used, as it relates to the speed at
which a note is released, which is not a parameter that affects the sound of many
normal keyboard instruments. Nonetheless it is available for special effects if a
manufacturer decides to implement it.

The Note On Velocity zero value is reserved for the special purpose of turning a note
off, for reasons which will become clear under 'Running status' below. If an instrument
sees a note number with a velocity of zero, its software should interpret this as a note
off message.

3.6 Running status

When a large amount of information is transmitted over a single MIDI bus, delays
naturally arise due to the serial nature of transmission wherein data such as the
concurrent notes of a chord must be sent one after the other. It will be advantageous,
therefore, to reduce the amount of data transmitted as much as possible, in order to
keep the delay as short as possible and to avoid overloading the devices on the bus
with unnecessary data.

Running status is an accepted method of reducing the amount of data transmitted,


and one which all MIDI software should understand. It involves the assumption that

once a status byte has been asserted by a controller there is no need to reiterate this
status for each subsequent message of that status, so long as the status has not
changed in between. Thus a string of note on messages could be sent with the note
on status only sent at the start of the series of note data, for example:

[Note On || Chn #] [Data] [Velocity] [Data] [Velocity] [Data] [Velocity]

It will be appreciated that for a long string of note data this could reduce the amount of
data sent by nearly one third. But as in most music each note on is almost always
followed quickly by a note off for the same note number, this method would clearly
break down, as the status would be changing from note on to note off very regularly,
thus eliminating most of the advantage gained by running status. This is the reason
for the adoption of note on, velocity zero as equivalent to a note off message,
because it avoids a change of status during running status, allowing a string of what
appears to be note on messages, but which is, in fact, both note on and note off.

Running status is not used at all times for a string of same status messages, and will
often only be called upon by an instrument's software when the rate of data exceeds a
certain point. Indeed, an examination of the data from atypical synthesiser indicates
that running status is not used during a large amount of ordinary playing. Yet it might
be useful for a computer sequencer which can record data from a large number of

236
SCHOOL OF AUDIO ENGINEERING AE14 – Midi

Student Notes

MIDI devices, and might transmit it all out of a single port on replay. Even so, the
benefit would not be particularly great even in this case, as the sequencer would be
alternating between the addressing of a number of different devices with different
statuses, in order not to allow any one device to lag behind in relation to the others.

3.7 Polyphonic key pressure (aftertouch)

The key pressure messages are sometimes called 'aftertouch' by keyboard


manufacturers. Aftertouch is perhaps a slightly misleading term as it does not make
clear what aspect of touch is referred to, and many people have confused it with note
off velocity. This message refers to the amount of pressure placed on a key at the
bottom of its travel, and it is used to instigate effects based on how much the player
leans onto the key after depressing it. It is often applied to performance parameters
such as vibrato.

The polyphonic key pressure message is not widely used, as it transmits a separate
value for every key on the keyboard, and thus requires a separate sensor for every
key. This can be expensive to implement, and is beyond the scope of many
keyboards, so most manufacturers have resorted to the use of the channel pressure
message (see below). It should be noted, though, that some manufacturers have
shown it to be possible to implement this feature at a reasonable cost. The message
takes the general format:

[Aftertouch || Chn #] [Note number] [Pressure]

Implementing polyphonic key pressure messages involves the transmission of a


considerable amount of data over MIDI which might well be unnecessary, as the
message will be sent for every note in a chord every time the pressure changes. As
most people do not maintain a constant pressur e on the bottom of a key whilst
playing, many messages might be sent per note. A technique known as 'controller
thinning' may be used by a device to limit the rate at which such messages are
transmitted, and this may be implemented either before transmission or at a later
stage using a computer. Alternatively this data may be filtered out altogether if it is not
required.

3.8 Control change

As well as note information, a MIDI device may be capable of transmitting control


information which corresponds to the various switches, control wheels and pedals
associated with it. These come under the control change message group, and should
be distinguished from program change messages (see section 2.5.9). The controller
messages have proliferated enormously since the early days of MIDI, and not all
devices will implement all of them. The control change message takes the general
form:

[Control Change || Chn #] [Controller number] [Data]

thus a number of controllers may be addressed using the same type of status byte by
changing the controller number.

Although the original MIDI standard did not lay down any hard and fast rules for the
assignment of physical control devices to logical controller numbers, there is now

237
SCHOOL OF AUDIO ENGINEERING AE14 – Midi

Student Notes

common agreement amongst manufacturers that certain controller numbers will be


used for certain purposes, and these are controlled by the MMA and JMS C.

It should be noted that there are two distinct kinds of controller: that is, the switch
type, and the analogue type. The analogue controller is any continuously variable
wheel, lever, slider or pedal that might have any one of a number of positions, and
these are often known as continuous controllers. There are 128 controller numbers
available, and these are grouped as shown in Table 2.3. Table 2.4 shows a more
detailed breakdown of the use of these, as found in the majority of MIDI-controlled
musical instruments, although the full list is regularly updated by the MMA. The control
change messages have become fairly complex so coverage of them is divided into a
number of sections. The topics of sound control, bank select and effects control have
been left for coverage by later chapters on MIDI implementation in synthesisers and
effects devices.

The first 64 controller numbers relate to only 32 physical controllers (the continuous
controllers). This is to allow for greater resolution in the quantisation of position than
would be feasible with the seven bits that are offered by a single data byte. Seven bits
would only allow 128 possible positions of an analogue controller to be represented,
and it was considered that this might not be adequate in some cases. For this reason,
the first 32 controllers handle the most significant byte (MSbyte) of the controller data,
while the second 32 handle the least significant byte (LSbyte). In this way, controller
numbers &06 and &38 both represent the data entry slider, for example. Together, the
data values can make up a 14 bit number (because the first bit of each data word has
to be a zero), which allows the quantisation of a control's position to be one part in
214(1638410). Clearly, not all controllers will require this resolution, but it is available if
needed. Only the LSbyte would be needed for small movements of a control. If a
system opts not to use the extra resolution offered by the second byte, it should send
only the MSbyte for coarse control, and in practice this is all that is transmitted on
many devices.

On/off switches can be represented easily in binary form (0 for OFF, 1 for ON), and it
would be possible to use just a single bit for this purpose, but, in order to conform to
the standard format of the message, switch states are normally represented by data
values between &00 and &3F for OFF, and &40-7F for ON. In other words switches
are now considered as 7 bit continuous controllers, and it may be possible on some
instruments to define positions in between off and on in order to provide further
degrees of control, such as used in some 'sustain' pedals (although this is not
common in the majority of equipment). In older systems it may be found that only &00
= OFF and &7F = ON.

3.9 Channel modes (under the control change category)

Although grouped with the controllers, under the same status, the channel mode
messages differ somewhat in that they set the mode of operation of the instrument
receiving on that particular channel.

Local on/off' is used to make or break the link between an instrument's keyboard and
its own sound generators. Effectively there is a switch between the output of the
keyboard and the control input to the sound generators which allows the instrument to
play its own sound generators in normal operation when the switch is closed. If the
switch is opened, the link is broken and the output from the keyboard feeds the MIDI
OUT while the sound generators are controlled from the MIDI IN. In this mode the
instrument acts as two separate devices: a keyboard without any sound, and a sound

238
SCHOOL OF AUDIO ENGINEERING AE14 – Midi

Student Notes

generator without a keyboard. This configuration can be useful when the instrument in
use is the master keyboard for a large sequencer system, where it may not always be
desired that everything played on the master keyboard results in sound from the
instrument itself.

'Omni off' ensures that the instrument will only act on data tagged with its own channel
number(s), as set by the instrument's controls. 'Omni on' sets the instrument to
receive on all of the MIDI channels. In other words, the instrument will ignore the
channel number in the status byte and will attempt to act on any data that may arrive,
whatever its channel. Devices should power-up in this mode according to the original
specification, but more recent devices will tend to power up in the mode that they
were left. Mono mode sets the instrument such that it will only reproduce one note at a
time, as opposed to 'Poly' (phonic) in which a number of notes may be sounded
together.

In older devices the mono mode came into its own as a means of operating an
instrument in a 'multitimbral' fashion, whereby MIDI information on each channel
controlled a separate monophonic musical voice. This used to be one of the only ways
of getting a device to generate more than one type of voice at a time. The data byte
that accompanies the mono mode message specifies how many voices are to be
assigned to adjacent MIDI channels, starting with the basic receive channel. For
example, if the data byte is set to 4, then four voices will be assigned to adjacent MIDI
channels, starting from the basic channel which is the one on which the instrument
has been set to receive in normal operation. Exceptionally, if the data byte is set to
zero, all sixteen voices (if they exist) are assigned each to one of the sixteen MIDI
channels. In this way, a single multitimbral instrument can act as sixteen monophonic
instruments, although on cheaper systems all of these voices may be combined to
one audio output.

Mono mode tends to be used mostly on MIDI guitar synthesizers since each string
can then have its own channel, and each can control its own set of pitch bend and
other parameters. The mode also has the advantage that it is possible to play in a
truly legato fashion - that is with a smooth take over between the notes of a melody -
because the arrival of a second note message acts simply to change the pitch if the
first one is still being held down, rather than re-triggering the start of a note envelope.
The legato switch controller allows a similar type of playing in polyphonic modes by
allowing new note messages only to change the pitch

In poly mode the instrument will sound as many notes as it is able at the same time.
Instruments differ as to the action taken when the number of simultaneous notes is
exceeded: some will release the first note played in favour of the new note, whereas
others will refuse to play the new note. Some may be able to route excess note
messages to their MIDI OUT ports so that they can be played by a chained device.
The more intelligent of them may look to see if the same note already exists in the
notes currently sounding, and only accept a new note if is not already sounding. Even
more intelligently, some devices may release the quietest note (that with the lowest
velocity value), or the note furthest through its velocity envelope, to make way for a
later arrival. It is also common to run a device in poly mode on more than one receive
channel, provided that the software can handle the reception.

239
SCHOOL OF AUDIO ENGINEERING AE14 – Midi

Student Notes

4. MIDI Channels and the Multi-Timbre Sound Module

Most MIDI sound modules today are "multi-timbral". This means that the module can listen to
all 16 MIDI channels at once, and play any 16 of its "patches" simultaneously, with each of the
16 patches set to a different MIDI channel.

It's as if the module had 16 smaller "sub-modules" inside of it. Each sub-module plays its own
patch (ie, instrument) upon its own MIDI channel.

Think of these sub-modules as robotic musicians. I'll call them "robomusicians". You have 16
of them inside one multi-timbral module.

Now think of MIDI channels as channels (ie, inputs) upon a mixing console. You have 16 of
them in any one MIDI setup. (I assume one discrete MIDI bus in this "MIDI setup". Some
setups have multiple MIDI Ins/Outs with more than 16 MIDI channels. But here, let's talk about
a typical MIDI setup which is limited to 16 channels).

Each robomusician (ie, sub-module) has his own microphone plugged into one channel of that
16 channel mixer, so you have individual control over his volume, panning, reverb and chorus
levels, and perhaps other settings.

4.1 Assigning a patch to a "robomusician"

Think of a patch as a "musical instrument". For example, you typically have Piano,
Flute, Saxophone, Bass Guitar, etc, patches in a sound module (even the ones built
into a computer sound card -- often referred to as a "wavetable synth"). Typically,
most modules have hundreds of patches (ie, musical instruments) to chose from. The
patches are numbered. For example, a trumpet patch may be the fifty-seventh patch
available among all of the choices.

Since you have 16 robomusicians, you can pick out any 16 instruments (ie, patches)
among those hundreds, to be played simultaneously by your 16 robomusicians. Each
robomusician can of course play only one instrument at a time. (On the other hand,
each robomusician can play chords upon any instrument he plays, even if it's
traditionally an instrument that can't play chords. For example, if the robomusician
plays a trumpet patch, he can play chords on it, even though a real trumpet is
incapable of sounding more than one pitch at a time).

As an example, maybe your arrangement needs a drum kit, a bass guitar, a piano,
and a saxophone. Let's say that the drums are played by robomusician #10. (He's on
MIDI channel 10 of the mixer). (In fact, with some MIDI modules, channel #10 is
reserved for only drums. In other words, robomusician #10 can play only drums, and
maybe he's the only robomusician who can play the drums). The other robomusicians
are super musicians. Each robomusician can play any of the hundreds of instruments
(ie, patches) in your module, but of course, he still is restricted to playing only one
instrument at a time. So let's say that you tell robomusician 1 to sit at a piano, and
robomusician 2 to pick up a bass guitar, and robomusician 3 to pick up a saxophone.
Let's say that you tell the remaining 12 robomusicians to pick up an accordian, violin,
acoustic guitar, flute, harp, cello, harmonica, trumpet, clarinet, etc, so that each
robomusician has a different instrument to play.

How do you tell the robomusician to pick up a certain instrument? Hit that button upon
his channel and give him a message telling him the number of the patch/instrument

240
SCHOOL OF AUDIO ENGINEERING AE14 – Midi

Student Notes

you want him to play. How do you do that over MIDI? Well, that's what MIDI
messages are for. The MIDI Program Change message is the one that instructs a
robomusician to pick up a certain instrument. Contained in the MIDI Program Change
message is the number of the desired patch/instrument. So, you send (to the multi-
timbral module's MIDI In) a MIDI Program Change message upon the MIDI channel
for that robomusician. For example, to tell robomusician 3 to pick up a sax, you send a
MIDI Program Change (with a value that selects the Saxophone patch) on MIDI
channel 3.

4.2 Individual control via each MIDI Channel

After you've told the 16 robomusicians what instruments to pick up, you can now have
them play a MIDI arrangement with these 16 instruments -- each robomusician
playing simultaneously with individual control over his volume, panning, etc.

How do you tell a robomusician what notes to play? You send him MIDI Note
messages on his channel. Remember that only that one robomusician "hears" these
messages. The other robomusicians see only those messages on their respective
channels. (ie, Each robomusician ignores messages that aren't on his channel, and
takes notice of only those messages that are on his channel). For example, the sax
player is robomusician 3, so you send him note messages on MIDI channel 3.

How do you tell a robomusician to change his volume? You send him Volume
Controller messages on his MIDI channel. How do you tell a robomusician to bend his
pitch? You send him Pitch Wheel messages on his MIDI channel. In fact, there are
many different things that a robomusician can do independently of the other 15
robomusicians, because there are many different MIDI controller messages that can
be sent on any given MIDI channel.

And that's why I say that it's as if there are are 16 "sub-modules" inside of one multi-
timbral module because these 16 robomusicians really do have independent control
over their musical performances, thanks to there being 16 MIDI channels in that one
MIDI cable that runs to the multi-timbral module's MIDI In.

4.3 Changing instrumentation – Program Change

OK, let's say that at one point in your arrangement, a 17th instrument needs to be
played -- maybe a Banjo. Well, at that point you've got to have one of your 16
robomusicians put down his current instrument and pick up a Banjo instead. Let's say
that the sax player isn't supposed to be playing anything at this point in the
arrangement. So, you send a MIDI Program Change to robomusician 3 (ie, on MIDI
channel 3 -- remember that he's the guy who was playing the sax), telling him to pick
up a Banjo. Now when you send him note messages, he'll be playing that banjo. Later
on, you can send him another MIDI Program Change to tell him to put down the Banjo
and pick up the saxophone again (or some other instrument). So, although you're
limited to 16 robomusicians playing 16 instruments simultaneously, any of your
robomusicians can change their instruments during the arrangement. (Well, maybe
robomusician 10 is limited to playing only drums. Even then, he may be able to
choose from among several different drum kits).

Parts - So is there a name for these 16 "robomusicians" or "sub-modules" inside of


your MIDI module? Well, different manufacturers refer to them in different ways, and
I'm going to use the Roland preference, a Part. A Roland multi-timbral module has 16
Parts inside of it, and each usually has its own settings for such things as Volume,

241
SCHOOL OF AUDIO ENGINEERING AE14 – Midi

Student Notes

Panning, Reverb and Chorus levels, etc, and its MIDI channel (ie, which MIDI data the
Part "plays"). Furthermore, each Part has its own way of reacting to MIDI data such as
Channel Pressure (often used to adjust volume or brightness), MOD Wheel controller
(often used for a vibrato effect), and Pitch Wheel (used to slide the pitch up and
down). For example, one Part can cause its patch to sound brighter when it receives
Channel Pressure messages that increase in value. On the other hand, another Part
could make its volume increase when it receives increasing Channel Pressure
messages. These Parts are completely independent of each other. Just because one
Part is receiving a Pitch Wheel message and bending its pitch doesn't mean that
another Part has to do the same.

You'll note that sound of all 16 robomusicians typically comes out of a stereo output of
your sound module (or computer card). That's because most multi-timbral modules
have an internal mixer (which can be adjusted by MIDI controller messages to set
volume, panning, brightness, reverb level, etc) that mixes the output of all 16 Parts to
a pair of stereo output jacks. (ie, The 16 microphones and 16 channel mixing console
I alluded to earlier are built into the MIDI module itself. The stereo outputs of the
module are like the stereo outputs of that mixing console).

4.4 Active Sense

Active Sense is a type of MIDI message (ie, that travels over your MIDI cables just like
note data or controller data). It's used to implement a "safety feature". It can provide
an automatic "MIDI panic" control that will cause stuck notes and other undesirable
effects to be resolved whenever the MIDI connection inbetween modules is broken or
impeded in some way.

Here's how Active Sense works.

Assume that you have two sound modules daisy-chained to the MIDI OUT of a
keyboard controller. Let's also say that all three units implement Active Sense. If you
aren't playing the controller at any given moment (and therefore the controller isn't
sending out MIDI data), then after a little while of sitting there idle, the controller will
automatically send out an Active Sense MIDI message that tells the attached sound
modules (connected to the controller's MIDI OUT) "Hey, I'm still connected to you.
Don't worry that you haven't seen any MIDI note data, or controller messages, or Pitch
Wheel messages, etc, from me in awhile. I'm just sitting here idle because my owner
isn't playing me at the moment". When the sound modules hear this "reassurance"
from the controller, then they expect the controller to repeat this same Active Sense
message every so often as long as it's sitting idle. That's how the sound modules
know that everything is still OK with their MIDI connections. If not for that steady
stream of Active Sense messages, then there would be no other MIDI data sent to the
sound modules while the controller was left idle, and the sound modules would have
no way of knowing whether they were still connected to the controller.

Now, if you walk up to the controller and start playing it, then it will stop sending these
"reassurance" messages because now it will be sending other MIDI data instead. And
then later, if you again leave the controller sitting idle, it will resume sending Active
Sense messages out of its MIDI OUT jack.

By constantly "talking" to the sound modules (sending Active Sense messages


whenever it has no other MIDI data to send), the controller is able to let the modules
know that they're still connected to the controller.

242
SCHOOL OF AUDIO ENGINEERING AE14 – Midi

Student Notes

So how is a "safety feature" derived from this?

Let's say that you press and hold down a key on the controller. This sends a MIDI
Note On message to the sound modules telling them to start sounding a note. OK,
while still holding that note down, you disconnect the MIDI cable at the controller's
MIDI OUT (ie, disconnect the daisy-chained modules). Now, the modules are left
sounding that "stuck note". Even if you now release the key on the controller, there's
no way for the controller to tell the modules to stop sounding the note (because you
disabled the MIDI connection). OK, so the modules are holding that stuck note, and
they're waiting for some more MIDI messages from the controller. After awhile, they
start to get worried, because they know that if the controller isn't sending them MIDI
note data, or controllers, or Pitch Wheel messages, etc, then it should at least be
sending them Active Sense messages. And yet, they're not getting even that
reassurance. So, what happens? Well, after each module waits for a suitably long
time (ie, 300 milliseconds, which is a long time for MIDI devices, but a fraction of a
second to humans) without receiving an Active Sense message, then the module
automatically concludes that the MIDI connection is broken, and it automatically turns
off any sounding notes, all by itself. Hence, the stuck note is turned off.

Active Sense allows any module to detect when the MIDI connection to the preceding
module is broken. For example, let's say that you disconnected the MIDI cable
between the two sound modules. (ie, The first module is still attached to the
controller's MIDI OUT, but now the second module is no longer daisy-chained to the
first module's MIDI OUT). In this case, the first module will still be getting the Active
Sense messages from the controller, but these can't be passed on to the second
module (because it's disconnected from the MIDI chain). Therefore, the second
module will quickly realize that its MIDI connection has been broken.

So, Active Sense is a "safety" feature. It allows a MIDI module to know when its
connection to some other unit's MIDI OUT jack has been broken. In such a case, the
module can take automatic safety precautions such as turning off any sounding notes,
or resetting itself to a default state, in order to avoid being left in a "stuck" or
undesirable state.

Note that many (in fact, most) MIDI units do NOT implement Active Sense, and
therefore aren't setup for this "safety" feature. (Typically, Roland gear DOES
implement Active Sense. Roland is one of the few companies that adhere to the MIDI
spec to this degree). If you connect a controller with Active Sense to a module that
doesn't implement Active Sense, then the controller's reassurance messages are
ignored, and the module won't be able to realize when a connection has been broken.

So how do you know if your unit implements Active Sense? Well, look at the MIDI
Implementation Chart in your users manual. For sound modules, you'll look down the
Recognized column until you get to Active Sense (it's listed in the AUX Messages
section almost at the bottom of the chart). Make sure that there is a circle instead of
an X shown beside Active Sense. For a controller, you'll look down the Transmitted
column.

5. Daisy Chain/MIDI Channels

You can attach a MIDI cable from the second instrument's MIDI THRU to a third instrument's
MIDI IN, and the second instrument will pass onto the third instrument those messages that
the first instrument sent. Now, all 3 instruments can play in unison. You could add a fourth,
fifth, sixth, etc, instrument. We call this "daisy-chaining" instruments.

243
SCHOOL OF AUDIO ENGINEERING AE14 – Midi

Student Notes

But, daisy-chained instruments don't always have to play in unison either. Each can play its
own, individual musical part even though all of the MIDI messages controlling those daisy-
chained instruments pass through each instrument. How is this possible? There are 16 MIDI
"channels". They all exist in that one run of MIDI cables that daisy-chain 2 or more instruments
(and perhaps a computer) together. For example, that MIDI message for the middle C note
can be sent on channel 1. Or, it can be sent on channel 2. Etc.

Most all MIDI instruments allow the musician to select which channel(s) to respond to and
which to ignore. For example, if you set an instrument to respond to MIDI messages only on
channel 1, and you send a MIDI Note On on channel 2, then the instrument will not play the
note. So, if a musician has several instruments daisy-chained, he can set them all to respond
to different channels and therefore have independent control over each one.

Also, when the musician plays each instrument, it generates MIDI messages only on its one
channel. So, it's very easy to keep the MIDI data separate for each instrument even though it
all goes over one long run of cables. After all, the MIDI Note On message for middle C on
channel 1 will be slightly different than the MIDI Note On message for middle C on channel 2.

6. The Advantages of MIDI

There are two main advantages of MIDI -- it's an easily edited/manipulated form of data, and
also it's a compact form of data (ie, produces relatively small data files).

Because MIDI is a digital signal, it's very easy to interface electronic instruments to computers,
and then do things with that MIDI data on the computer with software. For example, software
can store MIDI messages to the computer's disk drive. Also, the software can playback MIDI
messages upon all 16 channels with the same rhythms as the human who originally caused
the instrument(s) to generate those messages. So, a musician can digitally record his musical
performance and store it on the computer (to be played back by the computer). He does this
not by digitizing the actual audio coming out of all of his electronic instruments, but rather by
"recording" the MIDI OUT (ie, those MIDI messages) of all of his instruments. Remember that
the MIDI messages for all of those instruments go over one run of cables, so if you put the
computer at the end, it "hears" the messages from all instruments over just one incoming
cable.

The great advantage of MIDI is that the "notes" and other musical actions, such as moving the
pitch wheel, pressing the sustain pedal, etc, are all still separated by messages on different
channels. So the musician can store the messages generated by many instruments in one file,
and yet the messages can be easily pulled apart on a per instrument basis because each
instrument's MIDI messages are on a different MIDI channel. In other words, when using MIDI,
a musician never loses control over every single individual action that he made upon each
instrument, from playing a particular note at a particular point, to pushing the sustain pedal at a
certain time, etc. The data is all there, but it's put together in such a way that every single
musical action can be easily examined and edited.

Contrast this with digitizing the audio output of all of those electronic instruments. If you've got
a system that has 16 stereo digital audio tracks, then you can keep each instrument's output
separate. But, if you have only 2 digital audio tracks (typically), then you've got to mix the audio
signals together before you digitize them. Those instruments' audio outputs don't produce
digital signals. They're analog. Once you mix the analog signals together, it would take
massive amounts of computation to later filter out separate instruments, and the process
would undoubtably be far from perfect. So ultimately, you lose control over each instrument's
output, and if you want to edit a certain note of one instrument's part, that's even less feasible.

244
SCHOOL OF AUDIO ENGINEERING AE14 – Midi

Student Notes

Furthermore, it typically takes much more storage to digitize the audio output of an instrument
than it does to record an instrument's MIDI messages. Why? Let's take an example. Say that
you want to record a whole note. With MIDI, there are only 2 messages involved. There's a
Note On message when you sound the note, and then the next message doesn't happen until
you finally release the note (ie, a Note Off message). That's 6 bytes. In fact, you could hold
down that note for an hour, and you're still going to have only 6 bytes; a Note On and a Note
Off message. By contrast, if you want to digitize that whole note, you have to be recording all
of the time that the note is sounding. So, for the entire time that you hold down the note, the
computer is storing literally thousands of bytes of "waveform" data representing the sound
coming out of the instrument's AUDIO OUT. You see, with MIDI a musician records his actions
(ie, movements). He presses the note down. Then, he does nothing until he releases the note.
With digital audio, you record the instrument's sound. So while the instrument is making sound,
it must be recorded.

So why not always "record" and "play" MIDI data instead of WAVE data if the former offers so
many advantages? OK, for electronic instruments that's a great idea. But what if you want to
record someone singing? You can strip search the person, but you're not going to find a MIDI
OUT jack on his body. (Of course, I anxiously await the day when scientists will be able to offer
"human MIDI retrofits". I'd love to have a built-in MIDI OUT jack on my body, interpreting every
one of my motions and thoughts into MIDI messages. I'd have it installed at the back of my
neck, beneath my hairline. Nobody would ever see it, but when I needed to use it, I'd just push
back my hair and plug in the cable). So, to record that singing, you're going to have to record
the sound, and digitizing it into a WAVE file is the best digital option right now. That's why
sequencer programs exist that record and play both MIDI and WAVE data, in sync.

7. General MIDI

On MIDI sound modules (ie, whose Patches are instrumental sounds), it became
desirable to define a standard set of Patches in order to make sound modules more
compatible. For example, it was decided that Patch number 1 on all sound modules
should be the sound of an Acoustic Grand Piano. In this way, no matter what MIDI sound
module you use, when you change to Patch number 1, you always hear some sort of
Acoustic Grand Piano sound. A standard was set for 128 Patches which must appear in a
specific order, and this standard is called General MIDI (GM). For example, Patch
number 25 on a GM module must be a Nylon String Guitar. The chart, GM Patches,
shows you the names of all GM Patches, and their respective Program Change numbers.

The patches are arranged into 16 "families" of instruments, with each family containing 8
instruments. For example, there is a Reed family. Among the 8 instruments within the
Reed family, you will find Saxophone, Oboe, and Clarinet.

A GM sound module should be multi-timbral, meaning that it can play MIDI messages
upon all 16 channels simultaneously, with a different GM Patch sounding for each
channel.

Furthermore, all patches must sound an A440 pitch when receiving a MIDI note number
of 69.

If the GM module also has a built-in "drum module" (ie, usually one of 16 Parts), then
each of that Drum Part's MIDI notes triggers a different drum sound. The assignments of
drum sounds to MIDI notes is shown in the chart, GM Drum Sounds. The Drum Part is
usually set to receive MIDI messages upon channel 10.

245
SCHOOL OF AUDIO ENGINEERING AE14 – Midi

Student Notes

GM Standard makes it easy for musicians to put Program Change messages in their
MIDI (sequencer) song files, confident that those messages will select the correct
instruments on all GM sound modules, and the song file would therefore play all of the
correct instrumentation automatically. Furthermore, musicians need not worry about parts
being played back in the wrong octave. Finally, musicians didn't have to worry that a
snare drum part, for example, wouldn't be played back on a Cymbal. The GM spec also
spells out other minimum requirements that a GM module should meet, such as being
able to respond to Pitch and Modulation Wheels, and also be able to play 24 notes
simultaneously (with dynamic voice allocation between the 16 Parts). All of these
standards help to ensure that MIDI Files play back properly upon setups of various
equipment.

The GM standard is actually not encompassed in the MIDI specification, and there's no
reason why someone can't set up the Patches in his sound module to be entirely different
sounds than the GM set. After all, most MIDI sound modules offer such programmability.
But, most have a GM option so that musicians can easily play the many MIDI files that
expect a GM module.

Finally, the GM spec spells out a few global settings. For example, the module should
respond to velocity (ie, for note messages). This may be hard-wired to control the VCA
level (ie, volume) of each note. Some modules may allow velocity to affect other
parameters. The pitch wheel bend range should default to +/- 2 semitones. The module
also should respond to Channel Pressure (often used to control VCA level or VCO level
for vibrato depth) as well as the following MIDI controller messages: Modulation (1)
(usually hard-wired to control LFO amount, ie, vibrato), Channel Volume (7), Pan (10),
Expression (11), Sustain (64), Reset All Controllers (121), and All Notes Off (123).
Channel Volume should default to 90, with all other controllers and effects off (including
pitch wheel offset of 0). Additionally, the module should respond to these Registered
Parameter Numbers: Pitch Wheel Bend Range (0), Fine Tuning (1), and Coarse Tuning
(2). Initial tuning should be standard, A440 reference.

There is a MIDI System Exclusive message that can be used to turn a module's General
MIDI mode on or off. See the MIDI specification. This is useful for modules that also offer
more expansive, non-GM playback modes or extra, programmable banks of patches
beyond the GM set, but need to allow the musician to switch to GM mode when desired.

NOTE: The GM spec doesn't dictate how a module produces sound. For example, one
module could use cheap FM synthesis to simulate the Acoustic Grand Piano patch.
Another module could use 24 digital audio waveforms of various notes on a piano,
mapped out across the MIDI note range, to create that one Piano patch. Obviously, the 2
patches won't sound exactly alike, but at least they will both be piano patches on the 2
modules. So too, GM doesn't dictate VCA envelopes for the various patches, so for
example, the Sax patch upon one module may have a longer release time than the same
patch upon another module.

8. GM Patches

This chart shows the names of all 128 GM Instruments, and the MIDI Program Change
numbers which select those Instruments.

246
SCHOOL OF AUDIO ENGINEERING AE14 – Midi

Student Notes

Prog# Instrument Prog# Instrument

PIANO CHROMATIC PERCUSSION

1 Acoustic Grand 9 Celesta

2 Bright Acoustic 10 Glockenspiel

3 Electric Grand 11 Music Box

4 Honky-Tonk 12 Vibraphone

5 Electric Piano 1 13 Marimba

6 Electric Piano 2 14 Xylophone

7 Harpsichord 15 Tubular Bells

8 Clavinet 16 Dulcimer

ORGAN GUITAR

17 Drawbar Organ 25 Nylon String Guitar

18 Percussive Organ 26 Steel String Guitar

19 Rock Organ 27 Electric Jazz Guitar

20 Church Organ 28 Electric Clean Guitar

21 Reed Organ 29 Electric Muted Guitar

22 Accordion 30 Overdriven Guitar

23 Harmonica 31 Distortion Guitar

24 Tango Accordion 32 Guitar Harmonics

BASS SOLO STRINGS

33 Acoustic Bass 41 Violin

34 Electric Bass(finger) 42 Viola

35 Electric Bass(pick) 43 Cello

247
SCHOOL OF AUDIO ENGINEERING AE14 – Midi

Student Notes

36 Fretless Bass 44 Contrabass

37 Slap Bass 1 45 Tremolo Strings

38 Slap Bass 2 46 Pizzicato Strings

39 Synth Bass 1 47 Orchestral Strings

40 Synth Bass 2 48 Timpani

ENSEMBLE BRASS

49 String Ensemble 1 57 Trumpet

50 String Ensemble 2 58 Trombone

51 SynthStrings 1 59 Tuba

52 SynthStrings 2 60 Muted Trumpet

53 Choir Aahs 61 French Horn

54 Voice Oohs 62 Brass Section

55 Synth Voice 63 SynthBrass 1

56 Orchestra Hit 64 SynthBrass 2

REED PIPE

65 Soprano Sax 73 Piccolo

66 Alto Sax 74 Flute

67 Tenor Sax 75 Recorder

68 Baritone Sax 76 Pan Flute

69 Oboe 77 Blown Bottle

70 English Horn 78 Shakuhachi

71 Bassoon 79 Whistle

72 Clarinet 80 Ocarina

248
SCHOOL OF AUDIO ENGINEERING AE14 – Midi

Student Notes

SYNTH LEAD SYNTH PAD

81 Lead 1 (square) 89 Pad 1 (new age)

82 Lead 2 (sawtooth) 90 Pad 2 (warm)

83 Lead 3 (calliope) 91 Pad 3 (polysynth)

84 Lead 4 (chiff) 92 Pad 4 (choir)

85 Lead 5 (charang) 93 Pad 5 (bowed)

86 Lead 6 (voice) 94 Pad 6 (metallic)

87 Lead 7 (fifths) 95 Pad 7 (halo)

88 Lead 8 (bass+lead) 96 Pad 8 (sweep)

SYNTH EFFECTS ETHNIC

97 FX 1 (rain) 105 Sitar

98 FX 2 (soundtrack) 106 Banjo

99 FX 3 (crystal) 107 Shamisen

100 FX 4 (atmosphere) 108 Koto

101 FX 5 (brightness) 109 Kalimba

102 FX 6 (goblins) 110 Bagpipe

103 FX 7 (echoes) 111 Fiddle

104 FX 8 (sci-fi) 112 Shanai

PERCUSSIVE SOUND EFFECTS

113 Tinkle Bell 121 Guitar Fret Noise

114 Agogo 122 Breath Noise

115 Steel Drums 123 Seashore

116 Woodblock 124 Bird Tweet

117 Taiko Drum 125 Telephone Ring

249
SCHOOL OF AUDIO ENGINEERING AE14 – Midi

Student Notes

118 Melodic Tom 126 Helicopter

119 Synth Drum 127 Applause

120 Reverse Cymbal 128 Gunshot

Prog# refers to the MIDI Program Change number that causes this Patch to be selected.
These decimal numbers are what the user normally sees on his module's display (or in a
sequencer's "Event List"), but note that MIDI modules count the first Patch as 0, not 1. So, the
value that is sent in the Program Change message would actually be one less. For example,
the Patch number for Reverse Cymbal is actually sent as 119 rather than 120. But, when
entering that Patch number using sequencer software or your module's control panel, the
software or module understands that humans normally start counting from 1, and so would
expect that you'd count the Reverse Cymbal as Patch 120. Therefore, the software or module
automatically does this subtraction when it generates the MIDI Program Change message.

So, sending a MIDI Program Change with a value of 120 (ie, actually 119) to a Part causes the
Reverse Cymbal Patch to be selected for playing that Part's MIDI data.

8.1 GM Drum Sounds

This chart shows what drum sounds are assigned to each MIDI note for a GM module
(ie, that has a drum part).

MIDI Drum Sound MIDI Drum Sound

Note # Note #
35 Acoustic Bass Drum 59 Ride Cymbal 2
36 Bass Drum 1 60 Hi Bongo
37 Side Stick 61 Low Bongo
38 Acoustic Snare 62 Mute Hi Conga
39 Hand Clap 63 Open Hi Conga
40 Electric Snare 64 Low Conga
41 Low Floor Tom 65 High Timbale
42 Closed Hi-Hat 66 Low Timbale
43 High Floor Tom 67 High Agogo
44 Pedal Hi-Hat 68 Low Agogo
45 Low Tom 69 Cabasa
46 Open Hi-Hat 70 Maracas
47 Low-Mid Tom 71 Short Whistle
48 Hi-Mid Tom 72 Long Whistle
49 Crash Cymbal 1 73 Short Guiro
50 High Tom 74 Long Guiro
51 Ride Cymbal 1 75 Claves
52 Chinese Cymbal 76 Hi Wood Block
53 Ride Bell 77 Low Wood Block
54 Tambourine 78 Mute Cuica
55 Splash Cymbal 79 Open Cuica
56 Cowbell 80 Mute Triangle
57 Crash Cymbal 2 81 Open Triangle
58 Vibraslap

250
SCHOOL OF AUDIO ENGINEERING AE14 – Midi

Student Notes

9. Saving Midi files

A MIDI file is a data file. It stores information, just like a text (ie, ASCII) file may store the text of
a story or newspaper article, but a MIDI file contains musical information. Specifically, a MIDI
file stores MIDI data -- the data (ie, commands) that musical instruments transmit between
each other to control such things as playing notes and adjusting an instrument's sound in
various ways.

MIDI is binary data, and a MIDI file is therefore a binary file. You can't load a MIDI file into a
text editor and view it. (Well, you can, but it will look like gibberish, since the data is not ASCII,
ie, text. Of course, you can use my MIDI File Disassembler/Assembler utility, available on this
web site, to convert a MIDI file to readable text).

The MIDI file format was devised to be able to store any kind of MIDI message, including
System Exclusive messages, along with a timestamp for each MIDI message. A timestamp is
simply the time when the message was generated. Using the timestamps, a sequencer can
playback all of the MIDI messages within the MIDI file at the same relative times as when the
messages were originally generated. In other words, a sequencer can playback all of the note
messages, and other MIDI messages, with the original "musical rhythms". A MIDI file can also
store other information relating to a musical performance, such as tempo, key signature, and
time signature. A MIDI file is therefore a generic, standardized file format designed to store
"musical performances", and is used by many sequencers. A MIDI file even has provisions for
storing the names of tracks in a sequencer, and other sequencer settings.

MIDI files are not specific to any particular computer platform or product.

There are 3 different "Types" (sometimes called "Formats") of MIDI files.

1. Type 0 files contain only one track, and all of the MIDI messages (ie, the entire
performance) is placed in that one track, even if it represents many musical parts
upon different MIDI channels.
2. Type 1 files separate each musical part upon its own track. Both Type 0 and 1 store
one "song", "pattern", or musical performance.
3. Type 2 files, which are extremely rare, are akin to a collection of Type 0 files all
crammed into one MIDI file. Type 2 is used to store a collection of songs or patterns,
for example, numerous drum beats. (If you need to convert a MIDI file to the various
Types, use my Midi File Conversion utility, available on this web site).

Other software and hardware devices in addition to sequencers may use MIDI files. For
example, a patch editor may store an instrument's patch settings in a MIDI file, by storing
System Exclusive messages (received from the instrument) within the MIDI file. In this case,
the patch editor may not care about the timestamp associated with each SysEx message.

Midi Synchronization Methods

1. MIDI Clock

Early MIDI equipment achieved synchronization via a separate ‘sync’ connection carrying a
clock signal at one of a number of rates, usually described in pulse-per-quarter-note (ppqn). As
different manufacturers have their preferred rates, rate converters were needed in order to
synchronize MIDI equipment of different made.

251
SCHOOL OF AUDIO ENGINEERING AE14 – Midi

Student Notes

As standard MIDI messages were used for synchronization, the ‘pulse sync’ method became
obsolete. A group of system messages called the ‘system real time’ messages control the
execution of timed sequences in a MIDI system, and these are often used in conjunction with
the song pointer. To control autolocation within a stored song, the system realtime messages
concerned with synchronization, all of which are single bytes are:

• Timing clock
• Start
• Continue
• Stop

The timing clock (often referred to as ‘MIDI beat clock’) is a single status byte (&F8) to be
issued by the controlling device six times per MIDI beat. A MIDI beat is equivalent to a musical
semiquaver or 16th note. Therefore, 24 MIDI clocks are transmitted per quarter note (thus 24
bpqn).

As timing is vital in synchronization, system real time messages are given priority over note
messages when transmitted together. However, it is to note that slight errors do occur as
realtime messages still have to wait for the processing of the voice messages in front of it.
Therefore, it is advisable to avoid transmitting these two messages together in the same
direction.

On receipt of &F8, a device which handles timing information should increment its internal
clock by the relevant amount. This in turn will increment the internal song pointer after six MIDI
clocks (ie. One MIDI beat) have passed. Any device controlling the sequencing of other
instrument should generate clock bytes at the appropriate intervals, and any changes of tempo
within the system should be reflected in a change in the rate of MIDI clocks.

‘Start’, ‘Stop’ and ‘Continue’ are used to control the remote machine’s replay. ‘Start’ will cause
the remote machine to playback the song from the beginning. ‘Stop’ will stop the playback.
‘Continue’ will cause the remote machine to playback from where it last stopped. Using this
method, it is difficult or impossible to start a song from middle position.

2. Song Position Pointer (SPP)

SPP was then introduced to overcome the problem. A SPP message will be sent followed by
‘Continue’. This will inform the remote machine to relocate its song pointer and starts playback
from there. SPP enable slave machines to following the master even when the master is ‘fast
forwarded’ or ‘rewinded’.

3. MIDI Timecode (MTC)

MIDI timecode has two specific functions. Firstly, to provide a means for distributing
conventional SMPTE/EBU timecode data around a MIDI system in a format that is compatible
with the MIDI protocol, and secondly to provide a means for transmitting ‘setup’ messages
which may be downloaded from a controlling computer to receivers in order to program them
with cue points at which certain events are to take place.

Different from the MIDI Clock, MTC represents timing in the form of

Hours : minutes : seconds : frames

252
SCHOOL OF AUDIO ENGINEERING AE14 – Midi

Student Notes

In an LTC timecode frame, two binary data groups are allocated to each of hours, minutes,
seconds and frames, these groups representing the tens and units of each, so there are eight
binary groups in total representing the time value of a frame. In order to transmit this
information over MIDI, it has to be turned into a format which is compatible with other MIDI
date, i.e. a status byte followed by relevant data bytes.

There are two types of MTC synchronizing message:

1. Quarter Time Frame: One which updates a receiver regularly with running timecode,
denoted by the status byte &F1.
2. Full-frame message: One that transmits one-time updates of the timecode position
for situation such as exist during the high speed spooling of tape machines. It is
transmitted as a universal realtime System Exclusive message.

3.1 Quarter frame messages

One timecode frame is represented by too much information to be sent in one


standard MIDI message, so it is broken down into eight separate messages. Each
message of the group of eight represents a part of the timecode frame value and
takes the general form:

&[F1][data]

The data byte begins with zero and the next seven bits of the data word are made up
of a 3 bit code defining whether the message represents hours, minutes, seconds or
frames, Msnibble or Lsnibble, followed by the four bits representing the binary value of
that nibble.

Status Byte Data Byte

[ F1 ] [ Type | Time Data ]

0000 Frame LS

0001 Frame MS

0010 Seconds LS

0011 Seconds MS

0100 Minutes LS

0101 Minutes MS

0110 Hours LS

0111 Hours MS

253
SCHOOL OF AUDIO ENGINEERING AE14 – Midi

Student Notes

Nibble – half a byte (4 bits)

LS – least significant

MS – most significant

At a frame rate of 30 fps, quarter-frame message would be sent over MIDI at a rate of
120 messages per second. As eight messages are needed fully to represent a frame,
it can be appreciated that 30 x 8 = 240 messages really ought to be transmitted per
second if the receiving device were to be updated every frame, but this would involve
what has been considered to be too great an overhead in transmitted data, therefore
the receiving device is updated every two frames. If MTC is transmitted continuously
over MIDI it will take up approximately 7.5% of the available data bandwidth.

Quarter frame messages may be transmitted in forward or reverse order, to emulate


timecode running either forwards or backwards, with the ‘frames Lsnibble’ message
transmitted on the frame boundary of the timecode frame that it represents.

The receiver must in fact maintain a two-frame offset between displayed timecode and
received timecode since the frame value has taken two frames to transmit completely.

3.2 Full-Frame Message

Full-frame message is catagorised as universal realtime message. The format of this


message is as follows:

& [F0][7F][dev.ID][01][01][hh][mm][ss][ff][F7]

MIDI Sequencer

1. What is a sequencer?

A sequencer can be thought as a computer dedicated for storing, processing and editing MIDI
messages. Sequencers come in various forms but all of them resemble a standard computer
system equipped with operating system, microprocessor and memory integrated with controls
designed for performing sequence-specific functions.

The basic functions of a sequencer are to record MIDI messages; store them with references
to a certain timing information so that, during playback, the messages can be reproduced at
the exact timing. These capabilities allow a sequencer to perform the basic function of a
conventional tape recorder.

As MIDI messages are digital information stored on a sequencer, they can be altered. Thus
editing can be done to the MIDI messages such as correcting error notes, transposing notes,
changing tempo and time signature and etc.

2. The Hardware Sequencers

The most common type of sequencers used during the 70’s and 80’s are the so called
“hardware sequencers”. They are portable machines with LCD (LED on the earliest models)

254
SCHOOL OF AUDIO ENGINEERING AE14 – Midi

Student Notes

screen and control panel. On the side of the box, MIDI IN and THRU/OUT, Sync Ports are
found.

Compared to Software Sequencers, hardware sequencers are restricted by its small LCD
screen which make editing difficult especially for beginner. It has also limited RAM side which
restricts the number of MIDI Notes that can be recorded.

The advantage of hardware sequencer is that it is portable and is therefore popular even
nowadays for on-stage performance. A famous example of the Hardware Sequencer is the
MC-50 by Roland.

Roland MC-50 MKII MIDI Sequencer

2.1 Keyboard Workstation

Sequencers are also built into newer synthesizers and samplers and are often known
as “Keyboard Workstation”. The combination provides the advantage of ease-of-use
and portability to musicians who need to move around.

3. Software Sequencers

With the popularity of personal computer, sequencers are also available in the forms of
software packages running on PC, Apple Macintosh, Atari and Amiga Commodore computers.
Software sequencers tap on the processing power, storage and control system of the
computer and are thus more powerful and less restricted than its hardware based
counterparts. There are other added advantages;

• bigger monitor screen which makes editing easier;


• allows printing of music score;
• allows both software and hardware plug-in to expand the capabilities of the
sequencer;
• most editing functions are performed by standard computer commands (like the cut
and paste commands) which users are already familiar with.

Majority of the software sequencers require external MIDI interface to receive and distribute
MIDI data.

Digital Sampler

255
SCHOOL OF AUDIO ENGINEERING AE14 – Midi

Student Notes

1. What is a sampler?

If we can understand the concept of a sound module, it would be easier to grasp the concept
of a digital sampler. As we know, a sound module consists of one or more sound ROM chips
that store sound samples in digital form. During the playback stage, individual sound samples
are called (by MIDI Program Change Number) and loaded into the RAM to be furthered
modified or combined with the synthesized sounds of the device.

A sampler is similar to a sound module in its internal structure and operations except that there
is no built-in factory preset sound ROM. Therefore, it requires users to input digital sound
samples into its RAM before it can be used. Thus we can say that a sampler is in fact a
dummy sound module i.e. a sound module without preset sound chips.

RAM

ROM Chip
containing
digital Sound Samples
sound transferred from
samples Mixed
Control Mxer & Modulator Output
ROM Chip
Synth data
containing transferred into
preset
synthesizer
synth data

A typical sound module


structural layout.

There are basically three ways of putting sounds into a sampler’s RAM.

1. Record directly into the sampler’s line/mic input;


2. Loading pre-recorded samples from a sample disc/CD-ROM and;
3. Transferring of sound file from another device (such as a computer or another
sampler) via communication line e.g. SCSI or USB.

It has to be noted here that the sample we are concerned here is most of the time the sound of
just a single note of an instrument. For example, to create a piano sound program on a
sampler, the user has to record individual keys of a real piano and not the performance of
musical piece (although there are exceptional cases).

Getting sound into a sampler is not the end of the story. The user needs to go through a series
of processes to ensure that the sounds are playback correctly on keyboard. Before a sound
can fully function in a sampler, the following steps are involved:

Sampling – recording of the ‘raw’ sound into the sampler;

Editing – cutting away redundant portion of the sample and define loops,

Programming – define how the sample is to be playback and modified by the built-in algorithm,

Channel mapping – creating a multi-timbre sound module.

256
SCHOOL OF AUDIO ENGINEERING AE14 – Midi

Student Notes

1.1 The sampling process

During the sampling process, analog sound enters the sampler via the line in and on
some samplers, microphone inputs. It is then converted into digital samples by the
A/D converter. Alternatively, pre-recorded sound samples stored on digital storage
media could also be transferred directly via SCSI or USB interface.

At this stage, the user needs to consider the transposing characteristics of the
sampler during the playback process. It is possible to create a piano sound program
by just recording a single key (e.g. middle C) of a piano. This is because the sampler
can later transpose this key automatically to span across the whole keyboard by
means of varying the playback-sampling rate. For example, if a Middle C at 261Hz is
sampled at 44.1KHz, by playing back the sample at sampling rate higher than
44.1KHz, C#, D, D# , E, F and so on can be obtained.

However, it has to be noted that on a real life instrument such as a piano, tone color,
dynamic and ADSR of the instrument change when played on different keys.
Therefore, over-transposing a key will result in unnatural or unrealistic sound. This is
even more obvious on human voice. In such cases, “chick monk” effects would occur
when transposing too much upwards.

To avoid the problems of over-transposition, multiple notes should be recorded at


appropriate intervals. And this should be planned at before the sampling session
takes place.

1.2 The Editing Process

There are two main purposes for the editing of samples. Firstly, unwanted portion of
the sample such as short noise or silence before the actual sample and the
unnecessary long sustain phase of the sound , have to be trimmed. This will greatly
save the amount of memory space of the sampler. Secondly, looping has to be done
and loop type to be defined. (See diagrams on next page)

1.3 The Programming Process

Once all the samples required are edited, the next step would be to define how these
samples are to be played by the sampler. Here, there are two major tasks to be
performed. First is to assign samples to the appropriate keys as planned during the
sampling stage. This part of the process is called Key-Span. Second is to program the
built-in algorithm of the sampler in order to adjust the ADSR (Envelope Generator),
Filtering and adding other effects to the samples.

At the end of the programming process, basically, a Program (a collection of all the
settings mentioned above) is created. The Program can then be stored together with
the sample for future recall.

1.4 Channel Mapping Process

257
SCHOOL OF AUDIO ENGINEERING AE14 – Midi

Student Notes

This process is only necessary if the user intends to create a multi-timbre sound
module out of the sampler. During this process, the user defines MIDI channels,
adjusts overall stereo image, volume of various Program. 4

B
e
T

T
T o T

T
o

4
Assignment 4 and 5– AE004 and AE005

258
SCHOOL OF AUDIO ENGINEERING AE14 – Midi

Student Notes

259
AE15 – Digital Recording Formats

COMPACT DISC & PLAYER

1. The CD Message

1.1 The medium

1.2 The Pickup

1.3 FOCUSING MECHANISM

1.4 TRACKING

1.5 PICKUP CONTROL

2. SPARS CODE

DIGITAL AUDIO TAPE

1. DAT Mechanism

2. DAT RECORDING SYSTEM

3. WHAT IS RECORDED ON A DAT

3.1 MAIN AREA

3.2 SUB AREA

3.3 ATF AREA

3.4 START ID and SKIP ID

3.5 Absolute Time

3.6 Skip Ids

4. Error Correction System

7. Other Features

7.1 Random Programming

7.2 Various Repeat Functions

7.3 Music Scan

7.4 End Search

260
7.5 Informative Display

7.6 Construction

7.7 Longer Recording Time

7.8 Material of Tape

7.9 Identification Holes

MINI DISC

1. General Features

1.1 Quick Random Access

1.2 Recordable Disc

2. The Disc

2.1 PRE-REcorded MD

2.2 Recordable MD

3. The Laser Pickup

4. Playback On A Magneto Optical (Mo) Disc

5. Recording

6. Quick Random Access

7. Adaptive Transform Acoustic Coding (Atrac)

8. Shock Proof Memory

9. Mini Disc Specifications

10. Manufacture and Mastering

10.1 K-1216 MD Format Converter

10.2 K-1217 Address generator

DVD

1. DVD-Audio

1.1 Storage Capacity

261
2. Multi Channels

3. Meridian Lossless Packing

4. Various Audio Formats

5. Audio details of DVD-Video

5.1 Surround Sound Format

5.2 Linear PCM

5.3 Dolby Digital

5.4 MPEG Audio

5.5 Digital Theater Systems

5.6 Sony Dynamic Digital Sound

Super Audio Compact Disc (SACD)

1. Watermark – Anti Piracy Feature

2. Hybrid Disc

262
SCHOOL OF AUDIO ENGINEERING AE15 – Digital Recording Formats

Student Notes

AE15 – DIGITAL RECORDING FORMATS

Introduction

CD, DAT, MD, SACD and DVD formats and their operations are explained.

COMPACT DISC & PLAYER

The CD is the audio standard for optical playback systems. In addition, some optical
recorders are designed to conform to its standard and format so that the recorded
information maybe reproduced on a conventional Compact Disc Player (CDP).

1. The CD Message

A CD contains digitally encoded audio information in the form of pits impressed into its surface.
The information on the disc is read by the player's optical pickup, decoded, processed and
ultimately converted into acoustical energy. The disc is designed to allow easy access to the
information by the optical system as well as provide protection for the encoded information.

1.1 The medium

The CD's dimensions are: Diameter 12cm

Thickness approx. 1.2 mm

The middle hole for the motor spindle shaft is 15 mm in diameter.

The CDP's laser beam is guided across the disc from the inside to the outside,
starting from at the lead-in area moving outward through the programme area and
ending at the outer edge with the lead-out area.

The lead-in and lead-out areas are designed to provide information to control the
player. The lead-in area contains a Table of Content (TOC) which provides
information to the CDP, such as the number of musical selections including starting
points and duration of each selection.

The lead-out area informs the player that the end of the disc has been reached.

The CD spins at a fixed Constant Linear Velocity (CLV) of 1.2 m/s for CD with
programme exceeding 60 minutes. A CD programme under 60 minutes has a CLV of
1.4 m/s. The angular velocity (rpm) decreases as the optical pickup moves toward the
outer tracks of the disc. At linear velocity of 1.2 m/s the angular velocity varies
between 486-196 rpm.

At 1.4 m/s the angular velocity varies between 568-228 rpm.

Data is stored in pit formation.

263
SCHOOL OF AUDIO ENGINEERING AE15 – Digital Recording Formats

Student Notes

These pits vary in length from 0.833-3.054 µm depending on the encoded data and
the linear velocity of the disc.

The information contain in the pit structure on the disc surface is coded so that the
edge of each pit represents 0 and all space between the edges represent 0. The width
and depth of the pits are approximately 0.5 and 0.11 µm respectively. The track runs
circumferentially from the inside to the outside. The total number of spiral revolutions
on a CD is 20 625.

1.2 The Pickup

The function of the pick up is to transfer the encoded information from the optical disc
to the CDP's decoding circuit (fig. 8). The pickup is required to track the information on
the disc, focus a laser beam and read the information as the disc rotates. The entire
lens assembly is able to move across the disc as directed by the tracking information
taken from the disc and programming information provided by the user.

The pickup must respond accurately under adverse conditions such as playing
damaged and dirty disc or while experiencing vibrations or shocks.

CDP uses either a three-beam pickup or a single-beam pickup. A laser diode


functions as the optical source for the pickup. The AIGaAs laser is commonly used in
CDP. Its wavelength is 780 nm.

The laser beam is split by a diffraction grating into multiple beams. Diffraction gratings
are plates with slits placed only a wavelength apart (fig 9a). As the beam passes
through the grating it diffracts in different directions resulting in an intense main beam
(primary beam) with successively less intense beams on either side. Only the primary

264
SCHOOL OF AUDIO ENGINEERING AE15 – Digital Recording Formats

Student Notes

and secondary beams are used in the optical system of a CDP. The primary beam is
used for reading data and focusing the beam. The outer two beams (secondary
beams) are used for tracking. The light is subjected to a collimator lens that converges
the previously divergent light into a parallel path. The light is conditionally passed
through a Polarization Beam Splitter (PBS). The PBS acts as a one-way mirror
allowing only vertically polarized light to pass to the disc; it reflects all other light.

The light is than directed to the toward a quarter-wave plate (QWP). The QWP is an
anisotropic crystal material designed to rotate the plane of polarization of linearly
polarized light by 45o (fig 9b).

The laser light is then focus onto the disc by an objective lens. The objective lens
converges the impinging light to the focal point at a distance (d) from the lens called
the focal length (fig. 9c).

265
SCHOOL OF AUDIO ENGINEERING AE15 – Digital Recording Formats

Student Notes

The objective lens of a CDP is mounted on a two-axis actuator that is controlled by


focus and tracking servos. The spot size of the primary beam on the surface is 0.8
mm and is further reduced in size to 1.7 µm at the reflective surface of the disc. This is
due to the converging effect of the objective lens.

Accurate control of the focusing system causes dust, scratches or fingerprints on the
surface of the disc to appear out of focus to the reading laser.

The pits appear as bumps from underneath where the laser enters the medium. The
wavelength of the laser in air is 780 nm. Upon entering the polycarbonate subtrate
with a refractive index of 1.55, the wavelength is reduced to approximately 500 nm.
The depth of the pits are between 110 and 130 nm and are designed to be
approximately one quarter of the laser's wavelength. The pit depth of one quarter of
the laser's wavelength will create a diffraction structure such that reflected light
undergoes destructive interference. This interference thus decreases the intensity of
light returned to the pickup lens. The presence of pits and land areas are detected in
terms of changing light intensity by photodetectors.

The light signal is converted to a corresponding electrical signal by the


photodetectors. Thereafter the electrical signals are sent to the decoder for
processing and eventually are converted into audio signal.

As the light is reflected from the disc and it passes through the objective lens again.
The light converges as it passes through the objective lens and is again phase-shifted
45o by the QWP. The plane of polarization of the reflected light is now at right angle to
its original state; its now horizontally polarized. Since the light is horizontally polarized
it is reflected by the PBS toward a cylindrical lens. The cylindrical lens uses an
astigmatic property to reveal focusing errors in the optical system (fig. 9d). Light
passes through the cylindrical lens and is received by an array of photodetectors,
typically a four-quadrant photodetector.

266
SCHOOL OF AUDIO ENGINEERING AE15 – Digital Recording Formats

Student Notes

1.3 FOCUSING MECHANISM

A perfectly focused beam places the focal point of the light on the photodectors where
the shape of the image on the photodetectors is correspondingly circular. When the
focal point is in front of the photodetectors an elliptical image is projected on the
photodectors at an angle. If the focal point moves behind the photodetectors, the
elliptical image is rotated 90o. The photodetectors act as transducers converting the
impinging light signals into corresponding electrical signals. The electrical signals thus
contain information for both focusing and tracking the laser beam, as well as audio
information.

The four-quadrant photodetectors (labeled A, B, C, D) [Fig 10] used to control the


focus servo and to transfer the audio signal to the decoding circuit. The focus circuit
provides control for the vertical positioning of the two-axis objective lens. The focus
correction signal (A+C) - (B+D) is equal to zero when the laser is focused correctly on
the disc, and the shape of the photodetector is correspondingly circular. When the
optical correction system is out of focus the focus correction signal is a non-zero
value. The focus correction signal provides feedback to the focus servo circuit, which
moves the objective lens up or down according until the laser is focused and the
shape of the image on the photodetectors becomes circular

267
SCHOOL OF AUDIO ENGINEERING AE15 – Digital Recording Formats

Student Notes

The audio information is collected by summing the signals from the four photodiodes
(A+B+C+D). The audio signal must exceed a threshold level prior to activation of the
focus servo. When the disc is first placed in the player the distance between the
objective lens and the disc is large, and therefore the audio signal is below the
threshold and the focus servo is inactive. A focus search circuit initially moves the lens
closer to the disc, causing the audio signal to increase. When the audio signal
exceeds the threshold the focus servo is activated.

1.4 TRACKING

The tracking servo is controlled by the signals received at two outer photodetectors E
and F (Fig. 10). The secondary beams are directed to these photodetectors. The two
outer photodetectors generate a tracking error signal (E-F) [fig. 11]. The tracking
system detects mistracking to the left or right and returns a tracking error signal to the
tracking servo.

268
SCHOOL OF AUDIO ENGINEERING AE15 – Digital Recording Formats

Student Notes

269
SCHOOL OF AUDIO ENGINEERING AE15 – Digital Recording Formats

Student Notes

The servo moves the pickup accordingly to correct the tracking error. The tracking
error signal controls the horizontal movement of the two-axis objective lens actuator.
The tracking servo continually moves the objective lens in the appropriate direction to
reduce the tracking error.

1.5 PICKUP CONTROL

Three-beam pickups are mounted on a sled that moves radially across the disc,
providing coarse tracking capabilities. Tracking signals are derived from the signal
used to control the two-axis objective lens actuator. The sled servo operation is
contingent upon the level of the primary beam exceeding the threshold.

Precise tracking is always provided by the tracking servo and corresponding control
circuit. During fast forward or reverse a microprocessor takes control of the tracking
servo to increase locating speed.

2. SPARS CODE

Found on CDs stating which recording format was used in the three stages of album
production.

Recording Mixing Mastering


A A D
A D D
D A D
D D D

A – analogue

D - digital

Reference: Ken C. Polman,, Fundamentals of Digital Audio, Focal Press

270
SCHOOL OF AUDIO ENGINEERING AE15 – Digital Recording Formats

Student Notes

DIGITAL AUDIO TAPE

Introduction

DAT was developed by Sony in the early 1980’s as new consumer format to replace the
analogue Compact Cassette.

High Fidelity Recording

There is no signal degradation during playback or recording because the audio signal is
recorded digitally on a DAT.

The quality of the head, tape and transport in an analogue cassette recorder will affect the
playback of a cassette tape. There is no tape hiss, noise, distortion and wow and flutter which
are inherent limitations found in cassette tape.

For a DAT, both recording - Analogue-to-Digital Conversion (encoding) and playback - Digital-
to-Analogue Conversion (decoding) are performed in the digital domain. As a result there is
little, if not an absent, of the above problems associated with analogue recording and
playback.

THE DAT ADVANTAGE

Frequency Response 20 to 20 KHz

Sampling Frequency 44.1 or 48 Khz

Dynamic Range > 90 dB

Bit Depth 16, 24 Bit

Wow & Flutter Unmeasurable

Tape Speed 8.15 mm/sec (1/6th of compact cassette tape @


4.76 cm/sec)

Tape Width 3.81 mm

Tape Thickness 13 microns

Tape Size 73mm x 54mm x 10.5 (half the size of Compact


Cassette)

Longest Playback Time 120 min

High-speed Search Up to 200 times its normal layback speed 60-


min programme material can be searched in <
20 sec

271
SCHOOL OF AUDIO ENGINEERING AE15 – Digital Recording Formats

Student Notes

Subcode Like CD a subcode can be incorporated to


provide information such as track number and
playing time - elapse and total programme time

Synchronisation Yes

1. DAT Mechanism

Due to the extremely large amount of digital audio data involved. Highly precise, high-density
recording is required. For this reason a DAT employs rotation heads like those of a VTR. The
head rotates at a very high speed while the tape runs pass it, thereby increasing the relative
head-to-tape speed and improving recording performance. There are two heads, Head A and
B, mounted on a rotating drum which is set at a 6o 23” angle against the tape. Signals are
recorded diagonally on the tape, using a helical scan system, in the same way as in a VTR.

The tape wraps around only 90o of a 30mm drum’s circumference (see diagram below) which
is remarkably little when compared to 180o to a VTR. Because of this, the stress on the tape is
minimised allowing high-speed rewinding, fast-search and cueing without having to “unwrap”
the tape from the drum.

This is a big advantage of DAT.

The drum of the DAT rotates at 2000 rpm (1800 rpm for VTR), for a relative tape speed of 3.13
meter per sec or 66 times that of an analogue cassette at 4.76 cm/s.

2. DAT RECORDING SYSTEM

The DAT recording system is like that of a VTR. Diagonal tracks are created on the tape in the
area used for recording, which 2.163 mm wide. These tracks are only about 13.6 microns wide
or less then 1/5th the diameter of a human hair.

272
SCHOOL OF AUDIO ENGINEERING AE15 – Digital Recording Formats

Student Notes

While rotating, the two heads, A and B, record alternatively, overlapping slightly to leave no
space between tracks because the azimuth angles of the two heads are different, crosstalk
between tracks is minimised. The A head azimuth is angled at 20 o called the plus (+) azimuth
head. The B head, on the other hand, records at an angle of 20 o clockwise from the
perpendicular track, and is called the minus (-) azimuth head.

In this way the two heads play and record in an interleaved pattern. DAT heads are used for
both recording and playback, and any unwanted audio data is erased by writing over it.

Heads A and B alternately record on 13.591 micron wide tracks as they rotate, but there is
practically no space between the tracks. This is because they slightly overlap as shown in the
drawing. During playback the heads do not read information from adjacent tracks due to their
different azimuth angles.

On both edges of the tape there are auxiliary tracks for fixed head access in as-yet-undecided
future systems. At present, they can be looked at as being nothing more than blank areas that
contribute to tracking stability.

3. WHAT IS RECORDED ON A DAT

The different types of data that is written on a DAT tape can be divided into three following
areas:

273
SCHOOL OF AUDIO ENGINEERING AE15 – Digital Recording Formats

Student Notes

3.1 MAIN AREA

In the main area, PCM digital audio data (main data) is recorded here. The control
signal called the Main ID (Identification Code) is also embedded in this area. This
records, among other things, such information as sampling frequency, the number of
bits, anti-piracy status known as Serial Copy Management System (SCMS), the
number of channels in the recorded signal format and the presence or absence of
emphasis. Due to the important role Main ID plays as control signal in playback, the
main ID is repeated 16 times per track to ensure no mistakes occur when the DAT
recorder reads the data.

3.2 SUB AREA

Start IDs, address, absolute time and programme numbers are recorded in this area.
A similar sub area exists in CDs but DAT has four times the recording capacity. This
sub area is designed to give the user the convenience of editing programme numbers,
start and skip IDs.

3.3 ATF AREA

Automatic Track Finding (ATF) data is recorded here. These signals are recorded on
one track when the head is touching the tape at a 90 o angle.

The ATF is found both on home VCR and DAT alike. During the playback of a DAT,
the playback head must be correctly positioned on the recorded track. Even a slight

274
SCHOOL OF AUDIO ENGINEERING AE15 – Digital Recording Formats

Student Notes

drift from the track will impair the playback due to the head inability to read data
correctly.

The ATF signal is recorded to make sure heads stay correctly positioned on the right
track during recording and playback. ATF in DAT helps to do away with control head
or tracking control knob found in VCRs.

Thus the order of recording on a DAT is SUB, ATF, MAIN, ATF and SUB again.

3.4 START ID and SKIP ID

These signals recorded in the subcode are, the Start ID – which is of one of the most
important. It is a special mark recorded to indicate the beginning of a song. In the
normal mode it is recorded from the start of a song up through to the ninth second.
During playback it is used by the search function to locate the start of a song. The
Start ID is recorded automatically and it can set to record manually as well. Automatic
recording occurs at the first rise in signal level after a space of about two-second or
more of silence.

Programme Numbers are recorded aut omatically at the same time as IDs to separate
songs. With programme numbers it is also possible to go directly to the song of one’s
choice by using the direct song function, by selecting the appropriate number on the
10-key pad.

Editing of recorded music programme is possible by adding or deleting Start and


Programme Numbers.

3.5 Absolute Time

Absolute Time (ABS) is another example of data that is recorded automatically. It


represents the total time that has elapsed since the beginning of the tape, and it
serves a useful role in high-speed searching. It cannot normally be added after
recording although it is possible with certain high-grade models. This is essential if
synchronization is needed subsequently.

3.6 Skip Ids

Skip IDs is useful, enabling the user to mark songs or the recorded material that one
does not want to hear. During playback when the DAT deck comes to a Skip ID, it
skips over at high speed to the next Start ID. For example one is listening to a
recorded digital radio broadcast, Skip Ids can be used to exclude commercials and
talk leaving just the music for one’s enjoyment.

Skip IDs can be inserted or erased at will during recording and playback.

4. Error Correction System

On a DAT even if one of the heads is clogged almost perfect playback is possible from the
other head. This is because data is interleaved across the two tracks during recording. The
DAT error correction system can restore all of the original PCM data because odd number
data and even number date is split between the A and B head tracks during recording. This
data is then rearranged during playback. This means that even if playback from the B head is
impossible because of clogging provided, there is sufficient data on the A head track. The error

275
SCHOOL OF AUDIO ENGINEERING AE15 – Digital Recording Formats

Student Notes

correction system will be able to provide approximate correction by interpolating an average


value based on division by two of the neighbouring data values from both sides of the missing
value.

When the tape is dirty or damaged. The DAT system is designed to suppress noise from dirty
or damaged tape in all but the worst cases, thanks to powerful error correction and
compensation function.

7. Other Features
7.1 Random Programming

This function allows one to select the order in which songs are played back with fast
access time.

7.2 Various Repeat Functions

As with a CD, a number of repeat functions are possible. A single song, all the songs
on the tape, a programmed sequence, or songs between any two points can be
repeated at will.

7.3 Music Scan

This function enables rapid checking of everything recorded on the tape by


automatically playing the first few seconds of all the songs.

7.4 End Search

This fast-forwards the recorded DAT to the end of the last song where new
programme can be recorded on it.

7.5 Informative Display

Display of subcode information such as the number of songs, time and programming
information is possible.

7.6 Construction

A lid covers the tape and a slider covers the reel hub holes when not loaded. This
protects the tape from fingerprints, dirt, dust and damage that could affect the
recorded data. When loading the cassette, the slider pin is depressed to release the
hub brakes and retract the slider which uncovers the tape reel hub holes. At the same
time, the front lid opens so the tape can be drawn out and wrapped across the
periphery of the rotary head drum.

7.7 Longer Recording Time

At the slower secondary speed the total tape time can be increased up to 4 hours.
Some DAT will use a slightly thinner tape for up to 3 hours of recording time at the
normal speed and up to 6 hours at the secondary speed.

All recordings on a DAT is done on one side only.

276
SCHOOL OF AUDIO ENGINEERING AE15 – Digital Recording Formats

Student Notes

There is no “Side B” on a DAT unlike compact analogue cassette.

7.8 Material of Tape

DAT tape uses metal particles of unoxidised iron and cobalt alloy particles suspended
in a binder. This is similar to the metal particle tape used in analogue cassettes and
8mm (Hi-8) video cassette.

7.9 Identification Holes

There are five identification holes on the bottom of the tape, one of which is an
erasure prevention hole. Unlike on ordinary cassette tapes, this can be opened or
closed by a sliding tab.

277
SCHOOL OF AUDIO ENGINEERING AE15 – Digital Recording Formats

Student Notes

278
SCHOOL OF AUDIO ENGINEERING AE15 – Digital Recording Formats

Student Notes

MINI DISC

Introduction

The MD was developed in the early 1990’s as new consumer format to replace the analogue
Compact Cassette.

1. General Features
1.1 Quick Random Access

Total Durability - never gets stretched, broken or tangled. The optical pickup never
physically touches the surfaces of the MD. Hence no scratches, no wear and tear.
The disc is house in a casing for further protection.

Superb compactness - ø 64mm disc

Casing dimension - 68 x 72 x 5mm

1.2 Recordable Disc

Unsurpassed digital sound based on CD technology.

Shock-proof portability-uses an advanced semiconductor memory to provide almost


total shock resistant operation. No skips whilst jogging or driving.

2. The Disc
2.1 PRE-REcorded MD

Pre-recorded Mini Disc for music software. The audio signals are recorded in the form
of pits on the MINI DISC. The casing helps to protect the pits. The casing has a read
window shutter only on the bottom of the surface of the cartridge, leaving the top free
for the label. The disc is made of polycarbonate like the CD.

2.2 Recordable MD

Can be recorded and re-recorded up to a million times without signal loss or


degradation and has a life time to that of a CD. This recordable disc is based on the
Magneto Optical (MO) disc technology.

During the recording on MO disc a laser shines from the back of the MO disc while a
magnetic field is applied to the front. MO disc casing has a read and write window on
both sides of it.

3. The Laser Pickup

Ability to read both types of MD for MO discs, the pickup reads the polarity of the disc.

For pre-recorded discs, the laser pickup reads the amount of reflected light.

The pickup system is based on a standard CD pickup with the addition of a MO signal readout
analyzer and two photo diodes. (See diagramme)

279
SCHOOL OF AUDIO ENGINEERING AE15 – Digital Recording Formats

Student Notes

4. Playback On A Magneto Optical (Mo) Disc

A 0.5 mW laser is focused onto the magnetic layer.

The magnetic signal on the disc affects the polarization of the effect. The direction of the
polarization is converted into light intensity by the MO signal readout analyzer. Depending on
the direction of polarization. One of the two photo diodes will receive more light.

The electrical signal from the photo diodes are subtracted and depending on whether the
difference is positive or negative, a “1” or “0” is read.

The same laser is used for the playback of pre-recorded discs. The amount of light reflected
depends on whether or not a pit exists.

The pits are covered with a thin layer of aluminum which improves its reflectivity like that of a
CD.

If no pits exist, most of the light is reflected back through the beam splitter into the analyzer
and eventually into the photo diodes.

If a pit does exist, some of the light is diffracted and less light reaches the photo diodes. The
electrical signals are summed up depending on the sum, a “1” or “0” is read.

280
SCHOOL OF AUDIO ENGINEERING AE15 – Digital Recording Formats

Student Notes

5. Recording

Requires the use of a laser and a polarizing magnetic field.

When the magnetic layer in the disc is heated by the laser to a temperature of

400oF, it temporarily looses its coercive force (becomes neutral and loses its magnetism). As
the disc rotates and the irradiated domain (the exposed domain) returns to its normal
temperature, its magnetic orientation is determined by an external magnetic field produced by
the write head.

Polarities of N & S can be recorded, corresponding to “1” or “0”.

The magnetic head is positioned directly across from the laser source on the opposite across
from the laser source on the opposite side of the disc.

A magnetic field corresponding to the input signal is generated over the laser spot. The rotation
of the disc then displaces the area to be recorded, allowing the temperature at the spot to drop
below the Curie point. At that point the spot takes on a polarity of the applied magnetic field.

6. Quick Random Access

MD has a ‘pre-groove’ which is formed during manufacture.

This groove helps the tracking servo and spindle servo during recording and playback.

Addressed information is recorded at intervals of 13.3 ms using technology that puts small
zigzags on the pre-groove. Therefore the disc has all the addresses (timing) already notched
along the groove even on a blank MD.

281
SCHOOL OF AUDIO ENGINEERING AE15 – Digital Recording Formats

Student Notes

There is a User Table of Content Area (UTOC) located around the inner edge of the disc
before the programme which only contains the order of the music.

Start and end addresses for all music tracks recorded on the disc are stored in this area
enabling easy programming just by rewriting the addresses.

7. Adaptive Transform Acoustic Coding (Atrac)

Digital audio data compression compresses information down to a fifth (20%) enabling 74
minutes of recording time on MD. A MD stores approximately 130 MB data or 1.756 MB/min of
compressed stereo digital audio data –

16-bit, 44.1 KHz.

Unlike uncompressed digital audio data for 16-bit, 44.1 KHz stereo CD recording is
approximately 10 MB/min.

ATRAC starts with 16-bit information and it analyses segments of the data for its waveform
content.

ATRAC encodes only those frequency components which are audible to us.

It works on the principle of the threshold of hearing and masking effect. (see diagram)

282
SCHOOL OF AUDIO ENGINEERING AE15 – Digital Recording Formats

Student Notes

8. Shock Proof Memory

Aim is to prevent skipping or muting while the user is moving about.

The pickup can read information off the disc at a rate of 1.4M bit per second but the ATRAC
decoder requires a data rate of only 0.3M bit per second for playback.

Allows the use of a buffer memory that can store up 20 seconds of digital information. It is a
read ahead buffer.

Should the pickup be jarred out of position and stop supplying information. The correct
information continues to be supplied to the ATRAC decoder from the buffer memory. As long
as the pickup returns to the correct position within 3 – 20 seconds, depending on buffer
memory size, the listener will never experience mistracking.

283
SCHOOL OF AUDIO ENGINEERING AE15 – Digital Recording Formats

Student Notes

Since signals enter the buffer faster than they leave it, the buffer will eventually become full. At
that point the pickup stops reading information from the disc. It resumes as soon as there is
room in the memory.

This memory can also be applied to the conventional CD but will require a much larger
memory.

Using a concept called sector repositioning the pickup has the ability to resume reading from
the correct point after being displaced. It simply memorizes the address every 13.3 ms where it
was displaced and returns the pickup to the correct position.

9. Mini Disc Specifications

Frequency range: 5 Hz to 20 KHz

Dynamic range: 105 dB

Sampling Resolution: 16-bit

Sampling Frequency: 44.1 KHz

Disc speed: 1.2 - 1.4 m/sec (CLV) centrifugal linear velocity

User Table of Content (UTOC) programmable

A MD that has some pre-recorded and recordable areas in one disc.

Useful for language learning and music study, where the student could repeat and record what
is already on the disc.

10. Manufacture and Mastering


10.1 K-1216 MD Format Converter

The original 16 bit digital signal is compressed using ATRAC encoder.

A hard disc is built into the K-1216 for saving the compressed signal data.

Simultaneously with ATRAC compression, the signal is restored back to the 16-bit
audio for monitoring the MD audio sound.

A keyboard is supplied with the format converter so that character and text information
can be applied.

Subcode info is read from the original CD master and converted to MD format sub
code and saved on the hard disc.

10.2 K-1217 Address generator

Used in producing the final MO glass master and only required at actual disc cutting
facilities. It interfaces directly with the cutting machine, supplying the necessary audio,
error correction and address codes via Sony CDX-I code pro.

284
SCHOOL OF AUDIO ENGINEERING AE15 – Digital Recording Formats

Student Notes

DVD

Introduction

DVD format was design to replace the Laser Disc boosting a higher storage capacity then LD.

1. DVD-Audio

Even though DVD is of the same size and shape of a CD, DVD can hold up to 4.7GB of data
on a single sided, single layer disc in comparison to CD’s storage capacity of 650 MB only.
This is achieved by tightening tolerances to squeeze more data pits closer together.

Like a vinyl record it needs to be flipped over when side A has finished playing on current DVD
player. Where multilayer technology is used on the DVD, more data can be packed into the
disc. Here two layers of different data are to be found on one or both sides of the disc. Though
one layer lies in front of the other, the separate layer can be read by changing the focal point of
the laser. This is very similar to us looking through a rain-spattered windowpane and focusing
on the landscape, the rain-splattered windowpane would now become out of focus.

1.1 Storage Capacity

CD 650 - 700 MB

DVD

A single sided, single layer 4.7GB

A doubled-sided 9.40 GB

A dual-layer (DVD-9), single-sided 8.54 GB

A double-layer, double-sided 17.08 GB

DVD player requires new heads using shorter-wavelength lasers and more refined
focusing mechanisms, other then that, it is identical to a CD player’s transport.

RS-PC (Reed Solomon Product Code) error correction system is approximately 10


times more robust than the current CD system.

2. Multi Channels

LPCM is mandatory, with up to 6 channels at sample rates of 48/96/192 kHz (also


44.1/88.2/176.4 kHz) and sample sizes of 16/20/24 bits. Yielding frequency response of up to
96 kHz and dynamic range of up to 144 dB. Multichannel PCM will be downmixable by the
player, although at 192 and 176.4 kHz only two channels are available. Sampling rates and
sizes can vary for different channels by using a predefined set of groups. The maximum data
rate is 9.6 Mbps.

285
SCHOOL OF AUDIO ENGINEERING AE15 – Digital Recording Formats

Student Notes

3. Meridian Lossless Packing

Meridian's MLP (Meridian Lossless Packing) scheme licensed by Dolby. MLP removes
redundancy from the signal to achieve a compression ratio of about 2:1 while allowing the
PCM signal to be completely recreated by the MLP decoder (required in all DVD-Audio
players). MLP allows playing times of about 74 to 135 minutes of 6-channel 96kHz/24-bit audio
on a single layer (compared to 45 minutes without packing). Two-channel 192kHz/24-bit
playing times are about 120 to 140 minutes (compared to 67 minutes without packing).

4. Various Audio Formats

Other audio formats of DVD-Video (Dolby Digital, MPEG audio, and DTS, described below)
are optional on DVD-Audio discs, although Dolby Digital is required for audio content that has
associated video. A subset of DVD-Video features (no angles, no seamless branching, etc.) is
allowed. It's expected that shortly after DVD-Audio players appear, new universal DVD players
will also support all DVD-Audio features.

DVD-Audio includes specialized downmixing features. Unlike DVD-Video, where the decoder
controls mixing from 6 channels down to 2, DVD-Audio includes coefficient tables called
SMART (system-managed audio resource technique) to control mixdown and avoid volume
buildup from channel aggregation. Up to 16 tables can be defined by each Audio Title Set
(album), and each track can be identified with a table. Coefficients range from 0dB to 60dB.

DVD-Audio allows up to 16 still graphics per track, with a set of limited transitions. On-screen
displays can be used for synchronized lyrics and navigation menus. A special simplified
navigation mode can be used on players without a video display.

Matsushita announced that its new Panasonic and Technics universal DVD-Audio/DVD-Video
players will be available in fall 1999 and will cost $700 to $1,200. Yamaha may also release
DVD-Audio players at the same time. BMG, EMI, Universal, and Warner have all announced
that they will have about 10 to 15 DVD-Audio titles available at launch.

5. Audio details of DVD-Video

The following details are for audio tracks on DVD-Video. Some DVD manufacturers such as
Pioneer are developing audio-only players using the DVD-Video format. Some DVD-Video
discs contain mostly audio with only video still frames.

5.1 Surround Sound Format

A DVD-Video disc can have up to 8 audio tracks (streams). Each track can be in one
of three formats:

• Dolby Digital (formerly AC-3): 1 to 5.1 channels


• MPEG-2 audio: 1 to 5.1 or 7.1 channels
• PCM: 1 to 8 channels.

Two additional optional formats are provided: DTS and SDDS. Both require external
decoders and are not supported by all players.

The "0.1" refers to a low-frequency effects (LFE) channel that connects to a sub low
frequency driver (subwoofer).

286
SCHOOL OF AUDIO ENGINEERING AE15 – Digital Recording Formats

Student Notes

5.2 Linear PCM

Linear PCM is uncompressed (lossless) digital audio, the same format used on CDs
and most studio masters. It can be sampled at 48 or 96 kHz with 16, 20, or 24
bits/sample. (Audio CD is limited to 44.1 kHz at 16 bits.) It ranges from 1 to 8
channels depending on surround sound format. The maximum bitrate is 6.144 Mbps,
which limits sample rates and bit sizes with 5 or more channels. It's generally felt that
the 96 dB dynamic range of 16 bits or even the 120 dB range of 20 bits combined with
a 48 kHz sampling rate is adequate for high-fidelity sound reproduction. However,
additional bits and higher sampling rates are useful in studio work, noise shaping,
advanced digital processing, and three-dimensional sound field reproduction. DVD
players are required to support all the variations of LPCM, but some of them may
subsample 96 kHz down to 48 kHz, and some may not use all 20 or 24 bits. The
signal provided on the digital output for external digital-to-analog converters may be
limited to less than 96 kHz and less than 24 bits.

5.3 Dolby Digital

Dolby Digital is multi-channel digital audio, using lossy AC-3 coding technology from
original PCM with a sample rate of 48 kHz at up to 24 bits. The bitrate is 64 kbps to
448 kbps, with 384 being the normal rate for 5.1 channels and 192 being the normal
rate for stereo (with or without surround encoding). (Most Dolby Digital decoders
support up to 640 kbps.)

Dolby Digital is the format used for audio tracks on almost all DVDs.

5.4 MPEG Audio

MPEG audio is multi-channel digital audio, using lossy compression from original
PCM format with sample rate of 48 kHz at 16 bits. Both MPEG-1 and MPEG-2 formats
are supported. The variable bitrate is 32 kbps to 912 kbps, with 384 being the normal
average rate. MPEG-1 is limited to 384 kbps. The 7.1 channel format adds left-center
and right-center channels, but will probably be rare for home use. MPEG-2 surround
channels are in an extension stream matrixed onto the MPEG-1 stereo channels,
which makes MPEG-2 audio backwards compatible with MPEG-1 hardware (an
MPEG-1 system will only see the two stereo channels.) MPEG Layer III (MP3) and
MPEG-2 AAC (a.k.a. NBC, a.k.a. unmatrix) are not supported by the DVD-Video
standard.

5.5 Digital Theater Systems

DTS (Digital Theater Systems) Digital Surround is an optional multi-channel (5.1)


digital audio format, using lossy compression from PCM at 48 kHz at up to 20 bits.
The data rate is from 64 kbps to 1536 kbps (though the DTS Coherent Acoustics
format supports up to 4096 kbps as well as variable data rate for lossless
compression). The DVD standard includes an audio stream format reserved for DTS,
but many players ignore it. According to DTS, existing DTS decoders will work with
DTS DVDs. All DVD players can play DTS audio CDs.

5.6 Sony Dynamic Digital Sound

SDDS (Sony Dynamic Digital Sound) is an optional multi-channel (5.1 or 7.1) digital
audio format, compressed from PCM at 48 kHz. The data rate can go up to 1280

287
SCHOOL OF AUDIO ENGINEERING AE15 – Digital Recording Formats

Student Notes

kbps. SDDS is a theatrical film soundtrack format based on the ATRAC compression
format technology that is used in Minidisc.

Super Audio Compact Disc (SACD)

Sony and Philips are promoting SACD, a competing DVD-based format using Direct
Stream Digital (DSD) encoding with sampling rates of up to 100 kHz.

SACD and DVD-audio are not design to replace the CD format.

DSD is based on the pulse-density modulation (PDM) technique that uses (1-bit) single
bits to represent the incremental rise or fall of the audio waveform.

The 1 bit system encodes music at 2 822 400 samples a second. This supposedly
improves quality by removing the brick wall filters required for PCM encoding. It also
makes downsampling more accurate and efficient.

DSD provides frequency response from DC to over 100 kHz with a dynamic range of over
120 dB. DSD includes a lossless encoding technique that produces approximately 2:1
data reduction (50%) by predicting each sample and then run-length encoding the error
signal. Maximum data rate is 2.8 Mbps.

1. Watermark – Anti Piracy Feature

SACD includes a physical watermarking feature - Pit signal processing (PSP) modulates the
width of pits on the disc to store a digital watermark (data is stored in the pit length). The optical
pickup must contain additional circuitry to read the PSP watermark, which is then compared to
information on the disc to make sure it's legitimate – anti piracy. A downside because of the
requirement for new watermarking circuitry, SACD discs are not playable in existing DVD-
ROM drives.

SACD includes text and still graphics, but no video. Sony says the format is aimed at
audiophiles and is not intended to replace the audio CD format – 13 billion CDs worldwide. It
may be revived when yields are high enough that it no longer costs more to make a hybrid
SACD disc than to press both an SACD DVD and a CD.

Future SACDs will have enough room for both a two-channel mix and a multi-channel
version of the same music including text and graphics alike.

With the SACD format, music companies can offer three different types of discs. The single-
layer disc can store a full album of high-resolution music. A dual-layer disc provides nearly
twice the playing time.

2. Hybrid Disc

Two layers – the top layer is CD while the bottom layer is high density SACD. It is compatible
with 700 million CD players worldwide.

Sony released an SACD player in Japan in May 1999 and expects the player to be available in
the U.S. for $4,000 by the end of the year. Initial SACD releases will be mixed in stereo, not

288
SCHOOL OF AUDIO ENGINEERING AE15 – Digital Recording Formats

Student Notes

multichannel. A number of studios have announced that they will release SACD titles by the
end of the year: Audioquest (2), DMP (5), Mobile Fidelity Labs, Sony (40), Telarc (12), Water
Lily Acoustics (2).

289
AE16 Digital Audio Workstations

1. Storage Requirements

2. Usage of Storage capacity on disc versus tape

2.1 Disk Storage

3. Digital Audio Requirements

4. Audio Editing

5. Multichannel Recording

6. Concepts in Hard Disk Recording

6.1 The Sound File

6.2 Sound File Storage - The Directory

6.3 Buffering

6.4 The Allocation Unit (AU)

6.5 Access Time (ms)

6.6 Transfer Rate (MB/sec)

6.7 Disk Optimization

7. Multichannel Considerations

7.1 Winchester Magnetic Disc Drives

8. Editing in Tapeless Systems

8.1 Non-destructive Crossfade

8.2 Destructive Crossfade

8.3 Edit-point Searching

8.4 The Edit Decision List (EDL)

9. Pro-tools ( Use manual)

290
291
292
SCHOOL OF AUDIO ENGINEERING AE16 – Digital Audio Workstations

Student Notes

AE16 DIGITAL AUDIO WORKSTATIONS

1. Storage Requirements

In a conventional linear PCM system without data compression the data rate (bits/sec) from
one channel of digital audio will depend on the sampling rate and the resolution. E.g. a system
operating at 48kHz using 16 bits will have a data rate of:

48,000 x 16 = 768 Kbit/sec or 5.5Mbs per min

2. Usage of Storage capacity on disc versus tape

A 30-minute reel of 24-track tape using 48kHz/16 Bit uses 396MBs only when all 24 tracks are
recorded for the full 30 min. With a disk drive the total average storage capacity can be
distributed in any way between channels. In fact one cannot talk of 'tracks' in random access
systems since there is simply one central reservoir of storage serving a number of channel
outputs when they are required.

In disk-based systems, storage capacity tends to be purchased in units of so many megabytes


at a time, corresponding to the number of single channel minutes at a standard resolution. The
system may allow an hour of storage time divided between 4 channel outputs, but the total
amount of 'programme time' to which this would correspond depends on the amount of time
that each channel output is being fed with data.

2.1 Disk Storage

All computer mass storage devices are block structured, i.e. they involve the dividing
up of the storage space into blocks of fixed size, (512 – 1024 bytes) each of which
may be separately addressed. In this way information is accessed quickly by
reference to a directory of the disks contents containing the locations of blocks of data
relating to particular files.

3. Digital Audio Requirements

Samples, since they are time-discreet may be processed and stored either contiguously or
non-contiguously, provided they are reassembled into their original order (or some other
specified order) before analogue conversion. Thus digital audio is ideal for storage on block-
structured such as hard disks, provided that buffer is employed at the inputs and the outputs to
smooth the transfer of data to and from the disk.

4. Audio Editing

Audio Editing may be accomplished in the digital domain by the joining of one recording to
another in RAM, using the buffer to provide a continuous output. A fast access time disk drive
(under 20ms seek time with no thermal calibration) makes it possible to locate sound flies at
different times and play them back as one continuous stream of audio without any audible
glitches.

5. Multichannel Recording

Multichannel Recording may be accomplished by dividing the storage capacity between the
channels. During system operation, audio data is transferred to and from the disc via RAM

293
SCHOOL OF AUDIO ENGINEERING AE16 – Digital Audio Workstations

Student Notes

using the Direct Memory Access (DMA) controller and buss, bypassing the CPU. Data is also
transferred between RAM and the audio interfaces via the buffers under CPU and user
command control. During editing, fading and mixing data is written from the disc to the DSP
unit via RAM, which in turn passes it to the buffered audio outputs.

6. Concepts in Hard Disk Recording

6.1 The Sound File

This is an individual sound recording of any length. The disk is a storage in which no
one part has any specific time relationship to any other part - i.e. disk recording does
not start at one place and finish at another.

6.2 Sound File Storage - The Directory

A directory is used as an index to the store, containing entries specifying what has
been stored, the size of each file and its location. Within the directory, or its sub-
indexes, the locations of all the pieces of the sound file will he registered. When that
particular file is requested the system will reassemble the pieces by retrieving them in
a sequence.

6.3 Buffering

Buffering ensures that time-continuous data may be broken up and made continuous
again. A buffer is a short-term RAM store, which holds only a portion of audio data at
any one time. Buffering is used to accomplish the following tasks:

• Writing/Reading - Disk media require that audio is split into blocks


(typically 512- 1024 bytes) and this is achieved by filling a RAM buffer with
a continuous audio input and then reading out of RAM with bursts of disk
blocks which are written to disk. On replay the RAM buffer is filled with
bursts from the disc blocks and read out in a continuous form. In order to
preserve the original order of samples, the buffer must operate in the First-
In-First-Out (FIFO) mode.
• Timebase Correction - The timing of data entering the buffer may be
erratic or have gaps. The timing of data leaving the buffer is made steady
by using a reference clock to control the reading process.
• Synchronisation - buffers are used to synchronise audio data with an
external reference such as Time Code by controlling the rate at which data
is read out to ensure lock.
• Smooth Editing - Discontinuities of transitions between various sound files
are smoothed out and made continuous at their join or cross-fade.

6.4 The Allocation Unit (AU)

A minimum AU is defined representing a package of contiguous blocks.

E.g. An AU of 8 Kbs = 16 x 512 bytes

294
SCHOOL OF AUDIO ENGINEERING AE16 – Digital Audio Workstations

Student Notes

6.5 Access Time (ms)

The time taken between the system requesting a file from the disk and the first byte of
that file being accessed by the disc controller. Also access time can mean the time
taken for the head to jump from file to file.

In a disk drive system access time is governed by the speed at which the read/write
heads can move accurately from one place to another, and the physical size of the
disc. The head must move radially across the disk leading to a delay called seek
latency. Then the disc rotates until the desired block moves under the head (rotational
latency). Hence, average access time for a particular hard disk will be the sum of its
seek and rotational latencies.

6.6 Transfer Rate (MB/sec)

The rate at which data can be transferred to and from the disk once the relevant
location has been found. This is a measure of buss speed and CPU rate. Transfer
rate in conjunction with access time limits the number of channels that can be
successfully recorded or played back. These 2 factors also limit the freedom with
which long crossfades and other operational features may be implemented.

6.7 Disk Optimization

To get the best response out a hard disk recording system the efficiency of data
transfer to and from the store must be optimised by keeping the number of access to
a minimum for any given file.

295
SCHOOL OF AUDIO ENGINEERING AE16 – Digital Audio Workstations

Student Notes

7. Multichannel Considerations

In a tapeless system, the concept of track is very loosely defined and a “channel" refers to how
many physical monophonic audio inputs and outputs there are in the system.

Multichannel disk recording systems often use more than one disk drive since there is a limit to
the number of channels which can be serviced by a single drive. This leads to a number of
considerations:

296
SCHOOL OF AUDIO ENGINEERING AE16 – Digital Audio Workstations

Student Notes

Determine how many channels a given disk can handle.

Next work out how many disks are required for the total storage capacity.

Decide which flies or channels should he written to which drives. A storage strategy which
places files physically close to another based on their time contiguous relationship will favour a
faster playback if these files are to be read off the disk in this same order. Hence a system
which imitates a multitrack tape machine, perhaps assigning several channels per disk, is a
good strategy, particularly where the playback of instruments in a music piece is required. In
sound design where sound FX are randomly access according to picture considerations the is
strategy may not work so well.

7.1 Winchester Magnetic Disc Drives

Used in PCs. Contained within a sealed unit to stop disk contamination. The disks are
rigid platters that rotate on a common spindle. The heads 'float' across the surface,
lifted by the aerodynamic effect of the air produce between the positioner and the disk
rotation.

Data is stored in a series of concentric rings (tracks). Each track is divided up into
blocks. Each block is separated by a small gap and preceded by an address mark
which uniquely identifies the block location. The term Cylinder relates to all the tracks
which reside physically in line with each other in the vertical plane, through the
different disk surfaces. A Sector refers to a block projected onto the multiple layers of
the cylinder.

297
SCHOOL OF AUDIO ENGINEERING AE16 – Digital Audio Workstations

Student Notes

Diagram illustrates the difference betwe en sector, block, track and cylinder in a hard disk’s platters.

8. Editing in Tapeless Systems

Pre-recorded soundfiles are replayed in a predetermined sequence, in accordance with a


replay schedule called an Edit Decision List (EDL). Memory buffering is used to smooth the
transition from one file to another. Using non-destructive editing, any number of edited masters
can be compiled from one set of source files, simply by altering the replay schedule. In this
way edit points may be changed and new takes inserted without ever effecting the integrity of
the original material.

8.1 Non-destructive Crossfade

When performing a short fade from file X to file Y. Both files are read out via memory.
At the time of the cross fade the system ensures that data from both files exists
simultaneously in different address areas, by reading from the disk ahead of the
realtime playback. The exact overlap between old and new material will depend on a
user-specified crossfade. At the start of the crossfade the system reads out both X
and Y samples into a crossfade processor. Time coincident X and Y samples are
blended together and sent to the appropriate channel output.

Because the system must maintain audio data simultaneously from two audio
regions, the demands on memory are high. Thus, in general, the larger RAM capacity,
the longer and more gradual the crossfade.

8.2 Destructive Crossfade

Can be made as long as the user wants but involve non-realtime calculation and
separate storage of the crossfaded file. There are two variations of the destructive
crossfade:

A real edited master recording is created from the assembled takes which would exist
as a separate soundfile. You either require more disc space for this operation or you
wipe over your previous flies.

298
SCHOOL OF AUDIO ENGINEERING AE16 – Digital Audio Workstations

Student Notes

Crossfade segments are created and stored separately from the main soundfiles. This
saves on disc space but allows for long crossfades. This is not a realtime process and
the user has to wait for the results.

8.3 Edit-point Searching

Often there is a user interface utilising a moving tape metaphor, allowing the user to
cut, splice, copy and paste files into appropriate locations along a virtual tape.
Variable speed replay in both directions is used to simulate ‘scrubbing’ or reel rocking.

8.4 The Edit Decision List (EDL)

The heart of real-time editing process is the EDL which is a list of soundfiles sent to
particular audio outputs at particular times. The EDL controls the replay process and
is the result of the operator, having chosen the soundfiles and the places - often
specified in SMPTE addresses - at which they are to be joined. To achieve the final
EDL the operator will have auditioned each soundfile and determined the crossfade
points.

9. Pro-tools ( Use manual)

5
Assignment 6 – AE006-1

299
AE17 – Mastering for Audio and Multimedia

1. Mastering

2. Assembley editing

3. Sweetening

4. Output

5. Tube vs. solid state electronics

6. Quality Digital & Analog Processing

7. Quality issues with CD-R media

8. Should it be mixed to analog or digital?

9. The verdict

10. A note on monitors

11. Considerations for mastering

11.1 The CD has to be loud

11.2 What are the other considerations?

11.3 Solution

12. Technical tip if loudness is important to a mix

13. Improving the Final Product

13.1 EQ Technique

13.2 Creating Space

13.3 Phasing Problems

14. Alternate Mixes

15. TC Finalizer

16. A non-technical perspective

THE LANGUAGE OF EQUALIZATION

1. Baril language

Various CD Formats

300
1. Red Book

2. Yellow Book

3. Mixed Mode

4. Blue Book

5. Green Book

6. Orange Book

301
SCHOOL OF AUDIO ENGINEERING AE17 – Mastering for Audio and Multimedia

Student Notes

AE17 – MASTERING FOR AUDIO AND MULTIMEDIA

1. Mastering

Mastering includes adjusting the levels so there will be consistency in the levels of
various songs for the entire album. Spacing the songs. Adding PQ Subcode. Fine-tuning
the fade ins and fade outs. Removing noises; clicks, pops, tape hiss, etc. Digital editing
today is done on high quality 24 Bit ADC/DAC and DAW. Equalizing songs to make them
brighter or darker, the equalization used is usually subtle, and to bring out instruments
that (in retrospect) did not seem to come out properly in the mix. Individual song is
“sweetened” via quality analogue tube processors and/or digital equipment and software.
Mastering is designed to fit the entire albums’ dynamic range into the respective media,
e.g. CD - 96 dB, MD - 96 dB, Vinyl – 70 dB, Cassette - 50 dB, 50 to 15 Khz, etc. And
ensure that each mastering sounds good on its own medium.

Every album has a "voice" in which the message of the artist is delivered. A strong
performance and good recording technique will set the basic tone for this voice.
Mastering can then profoundly affect its impact and resonance. How?

A guideline is to do what serves the music. A wide range of techniques can bring out an
album's native voice. Depth, punch, sense of air and detail can all be enhanced. Vintage
tube processors may work, or the latest technology could be appropriate. The solution
sometimes goes against logic. Experience and a feel for the music determine the best
path.

Most people are familiar with the idea of recording music in a live concert or recording
studio. You make tapes that store the individual performances, or takes. Ultimately these
takes are assembled into a final master tape. This is sent to a replication plant, where
copies are made. The process of creating the final master is called mastering. It has
three steps:

2. Assembley editing

The tapes from the location recording or mixdown sessions are transferred to a digital editor.
The tunes are sequenced into the order you specify and correct spacing is made between
cuts. Beginnings and ends of cuts are faded to black (silence) or room tone (the natural
background noise of the performing space), or the cuts are crossfaded as you wish. Pops,
clicks and strange noises can often be fixed at the mastering stage, depending on their source
and severity.

3. Sweetening

When engineers first began cutting master discs used to produce vinyl records, they designed
signal processors such as compressors, limiters, and equalizers (EQ) to prevent overloading
the cutter head. They noticed that changing the settings could also have beneficial effects on
the music, especially in Pop styles. Equipment and techniques were developed to further
"sweeten" the sound. Since then, this has been considered the heart of the process, where
clarity, smoothness, impact and "punch" are enhanced, depending on the needs of the music.
The goal is to increase the emotional intensity. If the performance, arrangement and recording
quality are good to start with, then the final master sounds even better than the mix tapes, and
even casual listeners can notice the difference. This is what albums need in the pop markets:
big, radio-ready sound and a competitive edge.

302
SCHOOL OF AUDIO ENGINEERING AE17 – Mastering for Audio and Multimedia

Student Notes

4. Output

The finished music tracks are transferred to the media needed for mass production, usually
CDR (Recordable CD). This master disc can be played on any CD player, so the client can
audition it and give final approval. Occasionally cassette replicators prefer 4mm DAT tapes.

Note - CD replicators, mastering means creating the glass master disc that is used to make
stampers (which are then used to press the CDs). Almost every CD plant prefers to make it's
own glass masters, for quality control reasons. The glass master should be a perfect mirror
image of the CDR master disc produced by the music mastering facility. The classic definition
of mastering is the final creative step in producing a CD.

Why not just transfer the mixes straight to cdr?

If a mixing engineer is one of the rare few who can mix and master at the same time, then a
straight transfer might do fine. But there are at least three reasons why professionals send
tapes out for mastering:

i. Usually a lot can be done to improve the mixes. The market is demanding. If you
want the disc to compete in the radio markets, in-store play, and the homes of
consumers accustomed to excellent music, it has to be right sonically. Also, since
the mixes were recorded at different times of day over a week or more, you end up
with differences in level and tone. Mastering creates a seamless whole out of a
collection of individual tracks.
ii. The mastering engineer has fresh, experienced ears. By the time the mixes are
done, everyone involved is tired of it. It's tough to keep perspective after you've
heard a mix 50 times. A new outlook, a new set of listening skills - attuned to the
complete presentation rather than the detail of the mixes - can make a huge
difference.
iii. Mastering engineers must be fluent in both the artistic and technical areas of
music-making. Good communications skills are also critical, since a lot of terms
are used to describe sound. For example: "It needs to sound big, resonant, fat,
warm, ambient, taut, sweet, present, smooth and live. It needs air, sparkle, depth,
brilliance, impact, punch, focus, clarity and definition." The goal is to understand
and accept the producer's guidance and then add or remove only what the music
requires, not a bit more. Often the changes are subtle. The mastering facility has
ultra-clean processors that are built to handle stereo signals. This may be obvious
- it is one thing to run a guitar through a limiter and equalizer, and another thing to
run your whole mix through it. A finished mix is a complex balance that can be
made worse as easily as it can be improved. Its worth using the best equipment
available.

5. Tube vs. solid state electronics

No one will ever settle this question, since both solutions have advantages in different areas.
Either tubes or transistors can be truly excellent when the circuit is done carefully with the
highest quality components available.

The best solid state circuits come close to disappearing. They tend to err in the direction of a
faster, more defined, slightly less forgiving sound. The best tube designs approach perfection
from the other side, with a sense of transparency and air. People consistently use the term

303
SCHOOL OF AUDIO ENGINEERING AE17 – Mastering for Audio and Multimedia

Student Notes

"warmth" when describing tube sound. Tube circuits offer a wider array of options if you are
trying to improve the sound, rather than just pass the signal through with no changes.

Digital design approach the standard set by the analog solid state, but are less forgiving.
A harder, closed top end feel often results from the limits set by current technology.
Sense of air and soundstaging are usually somewhat compromised.

6. Quality Digital & Analog Processing

1. Assume for a moment that you have a great-sounding two-channel mix. Assume
also that you have that mix stored to your satisfaction, either on DAT or a higher-
resolution 24-bit format.
2. You need to edit the mixes into a sequenced tape, usually called a submaster
or work tape. Many digital editors will do a good job if they are used correctly.
This means for straight assembly editing - sequencing the tracks, cross-fading
between takes and cleaning up the beginnings and ends of cuts. These
process are not considered Sweetening.
3. Sweetening takes a lot of computational horsepower, using at least 24 bits of data,
to maintain the integrity of the music. Lower resolution (16Bit) digital processing
devices, including the ones built into some well-known digital editing workstations,
just don't preserve enough accuracy in their computations. To be fair, many PC's
and Macs are equipped with 24-bit processing cards. They should sound good, or
at least neutral, but a significant number of people say the resulting master lacks
the raw energy that was there in the early mixdowns. They ask what can be done
to fix it. The summary? Consider using an editor to assemble your tracks, and
leave the sweetening to devices that have been proven over time to be more
friendly to the music.

Some digital processors can sound excellent. Still, the better analog
processors haven't been replicated digitally. This includes a number of
custom tools that I use because they make the music sound right. Signal
processors affect a lot of things in the music, and music is inherently an
analog process. The best analog gear has a sound that a digital device
cannot fully model.

4. We may have to go back to analog, through some high-resolution equipment, and


back again to digital to get the sounds that people want on their albums. You have
added a conversion process to the signal, and theory says that the music will
suffer. Technically, this is accurate. In the real world, the benefits usually outweigh
the drawbacks. Test instruments will tell you one thing, and your ears may tell you
another. We are not talking about the average studio compressor or EQ, but
specialized low distortion, low noise, wide bandwidth devices designed for
mastering.

Every converter, be it 16-bit or 24-bit has a unique sound. As an added measure


when going to analog, I can re-clock the data stream, effectively eliminating digital
jitter. Jitter is an interesting feature of digital audio that can mess up the sound of
otherwise good music. Jitter will cause a loss of stereo image focus (blurring) and
depth to the reproduced music during digital-to-analogue conversion.

So to recap, if the mixes are already stored digitally and you have digital sweetening tools
that deliver the sound you want, that's great. Sometimes the detailed sound that digital is
known for is perfect. It depends on your style. If you prefer the best that analog has to

304
SCHOOL OF AUDIO ENGINEERING AE17 – Mastering for Audio and Multimedia

Student Notes

offer, then convert only once, carefully, since extra conversions can remove “live” feel.
The most important thing is to do what serves the music.

7. Quality issues with CD-R media

As CDR (CD-Recorderable) replaces Umatic tape (Professional ¾” video cassette) as the


common format for shipping CD masters, many mastering houses use digital editors that write
the final CDR at higher than real-time speeds. This saves time for the engineer and allows the
facility to generate more products. 2x speeds are common and the trend is to 4x and beyond.
This practice is being mirrored at some replication plants, where glass masters are cut at faster
than real time rates to maximize throughput. I have heard a number of top people discuss
sound differences between CDRs recorded in real time and at higher speeds.

CDRs don't play as reliably as manufactured discs, especially on changers, car stereos and
boom boxes. People have noticed variations in level during playback or crackling sounds in
very quiet passages of the music, especially on high-speed copies. As manufacturers of
players cut costs, the problem may even be getting worse - there's equipment out there that
will barely handle a production CD. CDRs can tax this type of gear to it's limits, and strange
things result. Dirty laser pickups can also aggravate the problem. (When was the last time you
cleaned the pickup on your player?) The bad news is that a CDR might not sound perfect on
all CD players. The good news is this has never caused a problem on a master that I've
produced for replication at a CD plant. If the master that I send to you for final approval sounds
fine, then the plant will be able to produce discs that sound the same.

One possible explanation for sonic differences of the ‘70s trend towards mastering vinyl at half
speed. When discs are mastered in real time, the media has more relative time to respond to
the cutting laser. This results in a pit geometry that is more precise. When the disc is played
back to cut the glass master, there is less chance for jitter and other timing-related problems to
occur. A second factor is more measurable: as you increase cutting speeds, error rates
sometimes go up, even though discs spinning at higher speeds are considered to be more
stable. The built-in correction on the playback side should compensate for any increased
errors, but not everyone is convinced. I take the conservative approach and create all CDRs at
low speed, using media certified for 6 times cutting.

8. Should it be mixed to analog or digital?

Given a choice, mastering facilities prefer to work with source material that is higher in
resolution than the (16-bit) target media. There are two commonly available options that meet
this spec: 24-bit digital files and analog tape.

Many studios now have computers equipped with CD burners, and 24-bit sound cards are
available for a few hundred dollars. This makes it possible to mix directly to computer, and then
burn a CD-ROM disc containing 24-bit stereo music files. This is the least expensive way to
move your tracks into the higher-quality world of 24 bits. Common multitrack systems such as
Paris, Protools, Radar, and VS1680 can also write 24-bit files.

Among 16-bit mixdown formats, DAT is the most familiar. It is convenient and sonically equal
to the CD. It is worth examining the differences between this standard medium and analog
tape, especially 1/2 inch stereo:

1. Dynamic range is roughly equivalent, though measured signal to noise ratio


favors DAT. If you use noise reduction such as Dolby SR, analog is superior.
Most real-world pop and jazz music has a dynamic range considerably less than
85 dB, so noise is not often the issue. In fact, the noisier 1/4" stereo tape is just

305
SCHOOL OF AUDIO ENGINEERING AE17 – Mastering for Audio and Multimedia

Student Notes

fine for many styles of music. Most people feel that 1/2" tape preserves more
detail in low level signals. When you go lower than 16 bits of signal in a digital
system, the sound is gone. (A point in favor of 24-bit...) This is not true with
analog, where you can still track musical information below the noise floor.
2. Measured distortion favors digital, by a factor of at least ten. To many this means
cleaner sound overall. The flip side is that as you record hotter signals onto
analog tape, you get a natural compressor/limiter action as the tape begins to
saturate. This can be a big plus for certain styles of pop, rock and rap music.
3. DAT has a flatter frequency response within the normal 20 Hz to 20 kHz band.
Above 20 kHz, response drops like a stone due to the very sharp anti-aliasing
filters needed for digital. These filters can also cause strange things to happen to
the high end due to their impulse and phase response. Many 15 ips analog
machines have useable response from below 20 Hz out to almost 30 kHz. At 30
ips, finer machines go from near 20 Hz to way past 40 kHz, providing the extra
octave of high end as well as a smoother roll-off. This contributes to the quality of
"air" that people associate with analog tape. The noise spectrum is also different
at 30 ips. Its still there, but has a "silkier" quality. All analog machines have slight
irregularities in lower bass response caused by "head bumps". These bumps,
related to the design of the playback head, are usually 2dB or less in size and do
not cause phase response problems. They are often compensated for naturally
when mixing.

9. The verdict

DAT is generally considered better than 1/4" tape at 15 ips, though if you want the fat in-your-
face sound of tape, analog has the advantage, especially if you have good noise reduction. If
you have a 1/4" machine available, run it in parallel with the DAT for a couple of mixes, then sit
back and listen to both. I have clients that prefer the analog, especially if its running at 30 ips.

If you compare DAT to 1/2" analog, the picture changes. Even when expensive outboard A/D
converters are used, most folks choose the analog. My opinion is that 1/2" is the format of
choice for most styles of Pop and Rock music. Many hit records are being made in this format.
Once you hear it, you'll know why. While its more trouble and expense to work with and the
media costs more, the results are worth it.

If you can create and store 24-bit mixes, the comparison with 1/2" is closer. Many people still
prefer analog, but 24-bit can also be excellent, with a greater sense of air than DAT, and
enough resolution to make even long reverb tails sound convincing. 24-bit storage on
inexpensive CDROM discs has a lot of appeal for studios who don't want to buy and maintain
an older 1/2" deck. Compare them side by side if you have the chance.

10. A note on monitors

Many problems that mastering engineers encounter come from mixdowns made on less than
perfect monitors. Its easier to fix the problems when you know that you can trust the monitors.
Every engineer needs a reliable reference e.g. Mackie HRS824, Genelec 1031A, B&W 805
Nautilus.

Some prefer smaller nearfield monitors placed close to the listener, since these can minimize
problems associated with the acoustics of average rooms. These monitors can't deliver deep
bass, but since this is also true for most speakers that consumers use, it's not a bad
compromise. Also, speakers in this size range can often create a very accurate stereo image,
in depth as well as width.

306
SCHOOL OF AUDIO ENGINEERING AE17 – Mastering for Audio and Multimedia

Student Notes

Some engineers prefer larger wideband monitors e.g. Genelec 1038A, B&W 801 Nautilus,
correctly placed in a room that is designed for listening. These systems will give the most
accurate picture of the music, assuming the speaker/room match is done right. Of course,
most consumers don't have systems this good. Still, there's a lot to be said for speakers that
are just plain accurate, regardless of what the rest of the world uses. If your speakers tell the
truth, then you can compensate based on experience.

11. Considerations for mastering


11.1 The CD has to be loud

If you want maximum level, there is one good reason to go for it at the mastering
stage. Like it or not, radio stations will jam your music through their own limiters prior
to being transmitted over the airwaves. The sonic quality of these processors is
usually not perfect, to say the least. The processors in a good mastering facility will
produce high levels and better maintain the sound quality that you have worked so
hard to create.

11.2 What are the other considerations?

If you master an album for airplay only, all the tracks will sound punchy, but you run
the risk that people at home will experience listening fatigue after 15 to 30 minutes.
Music buffs generally don't enjoy hot sound for more than a little while, even if they
like the material itself.

Looking at this problem from a music industry perspective, record labels are getting
locked into a no-win contest to be 1/2 dB louder than the next guy. What gets
sacrificed is clarity. You can always push the level up a bit, but the live feel and sense
of air will suffer. A good home listening system can reproduce over 90dB of dynamic
range. This is wasted when the music is crammed into the top 10 dB.

11.3 Solution

Its usually possible to settle on a fairly hot compromise level that keeps everyone
happy. For those who want to push a particular track very hard, one possible scenario
is to release promotional EP's or singles, equalized and compressed for airplay. The
record store version can then be mastered for a hi-fi home environment. Here's
another option: Many CDs run 45 to 60 minutes in length, so there is unused time
available after the album is cut onto CD. Fill the extra space with music rather than
silence. The entire album can be mastered first for high fidelity listening. Then the
strongest tracks can be re-mastered specifically for airplay and added onto the tail of
the CD. These hotter versions can be labeled as bonus tracks on a jewel box sticker.
The option doesn't cost the buyer extra, and they can ignore the bonus cuts if they
want to.

12. Technical tip if loudness is important to a mix

Hit the mono button on your console or run the DAT mixes back through a couple of modules
panned to the center, whatever it takes. If the mixes drop noticeably in level when played in
mono, there is a phasing problem. This can be caused by delay lines, guitar processors and
room mics. In extreme cases, an instrument will sound HUGE in the stereo mix, and disappear
in mono. The real world impact occurs when the album goes out over the radio. Stereo
separation degrades as you get farther from a station. This means that for everyone not in a
perfect reception area, your tune starts to sound like a 'music minus one' track - the guitar
disappears.

307
SCHOOL OF AUDIO ENGINEERING AE17 – Mastering for Audio and Multimedia

Student Notes

To summarize, if you want your whole album as loud as possible, I'll work to make it that way.
You just need to be aware of the tradeoffs. If you want very pretty sound and very hot levels,
consider the option of including extra tracks.

A great mix usually produces a great master & vice versa. Any mix must have no major
technical problems, and the overall level, balance and EQ are good. The producer just wants a
big, radio-ready sound. I had one client whose mixes were seriously flawed. After we
discussed it, the producer decided to replace some weak tracks and remix the entire album.
The mastered CD got chart action in Billboard, so everybody was happy but the release date
had to be pushed back six weeks.

This brings out a key point: mastering can only improve good product. It cannot fix bad mixes,
mushy tracks, poor arrangements or sloppy playing. If the kick drum sounds great and the
bass guitar is terrible, there will be problems getting both to sound good. A stitch in time saves
nine.

13. Improving the Final Product

• The first suggestion crosses over into the producer's jurisdiction but a number of
clients have found it helpful. If you are still at the mixing stage, consider simplifying
the mix. If you have a lot of guitar, synth and "hot licks" tracks available to spice up
the mix, use them sparingly. Many hit records are very sparse. The producers
leave a lot of "space" so that the strong basic groove does not get cluttered. "Wall
of sound" mixes need to be done carefully, or they get muddy fast. When this
effect is overdone, it is tough to fix at the mastering stage.
• On many tapes I receive, kick and snare drums contribute most of the peak levels.
If this is the case as you mix, try using 3 to 6dB of limiting on either or both
instruments. You may be able to bring overall levels up, get a better balance
between all instruments and still maintain a great drum feel.
• Check your early mixdowns with good headphones available. Listen to tracks from
albums you respect, then compare your sound. Since phones eliminate all room-
related bass problems, you may be able to solve a bass muddiness problem
quickly. Excellent phones still can't image properly, but for resolving spectral
balance, noise and distortion issues, they are generally better than speakers that
cost 10 times as much.
• The best solution from the technical standpoint is to send the mixdown tapes, and
let the mastering engineer assemble and sweeten them into a finished master.
You will preserve more air and detail, and you will avoid adding unwanted “grunge”
to the sound. The final product will have a more polished feel.
• If you elect to use a local facility for assembly editing, ask the engineer to not
normalize, EQ or compress the music with their computer based editor.
• If you have a stereo compressor or spectral processor that you love the sound of,
then use it on the mix - moderately, and before the mix goes to DAT tape. If you
send a tape that is heavily limited, (the T.C. Finalizer and L1 Maximizer come to
mind), or EQ'd too much, then I have less room to maneuver, and may not be able
to create the sound you need. Even if you want the final sound to be stepped-on or
processed with effects, I can almost certainly do that with higher quality than you
get with commonly available studio gear. In general, the less stereo processing
you do, the better the final result. (This includes BBE Sonic Maximizer and Aphex
Aural Exciters, which often add CD-unfriendly grit to the top. A number of clients
remix tracks that were originally BBE'd. It turns out their mixdown monitors were
too smooth in the treble. They had used the exciter to compensate for speaker
defects or poor hearing, and the results on an accurate monitor were bad.)

308
SCHOOL OF AUDIO ENGINEERING AE17 – Mastering for Audio and Multimedia

Student Notes

13.1 EQ Technique

Use a few channels of sweepable or fully parametric EQ to remove problem


frequencies from instruments. Its usually more effective than boosting the
characteristics you like. Find the offending frequency range by boosting 6 to 12 dB
and sweeping back and forth - at fairly narrow bandwidth, if you have the option. Make
the sound worse. Then switch from boost to a few dB of cut at the problem frequency.
Vary the bandwidth to fine tune. This technique is effective for toning down the
inherent resonance in snares, acoustic guitars and other instruments. If you scan
across the console and notice that too many channels are boosted at 10KHz, this
might help solve the problem.

A related technique is removing unneeded frequencies, as opposed to objectionable


ones. If your mix is sounding denser than you like, try notching out or “thinning” (one
track at a time) sections of the musical spectrum above or below the frequencies
where a track makes it's major contribution to the mix. The EQ'd result might sound
odd when you solo it, but could be the right answer in the context of the whole mix.
You will free up 'space' needed for other instruments.

13.2 Creating Space

When you double track any instrument or vocal, try moving the microphone and the
performer within the recording space. Have the backup vocalists step back six inches
or switch places for the second pass. Add in a different room mic, etc. Our ears are
sensitive to the complex nature of acoustical ambiance. When you provide variation, it
comes across as richness. This why having a few different reverbs - even inexpensive
ones- is better than having just one, no matter how good the one is.

13.3 Phasing Problems

Detect these easily by listening to the mix in mono. Alternately use a phase meter.

14. Alternate Mixes

When you have the perfect mix on tape, and all the changes are fresh in your mind, consider
creating one or two alternate mixes before moving on to the next track:

• Vocals: Push up the vocal faders one or two dB, then rerun the mix with all other
settings identical. Vocals often get a bit buried in the heat of a long mix session.
When you go back through the mix tapes to choose the keepers, you may prefer
the version with slightly hotter vocals.
• Varispeed: If you can varispeed the multitrack recorder, speed it up a few cents,
up to maybe 1% total. Rerun the mix. The resulting mix may drive people with
perfect pitch nuts, but others will notice a subtly higher energy level in the sound. If
you've never tried it, its worth checking out at least once.

15. TC Finalizer

The good news is that the 24-bit converters in this unit may be the best ones you'll have
available - better than most DAT machines, for sure. An increasing number of studios can
store the higher-quality 24-bit mixes directly on computer, then burn 24-bit files to CDROM to

309
SCHOOL OF AUDIO ENGINEERING AE17 – Mastering for Audio and Multimedia

Student Notes

send out for mastering or as safeties. Considering the unit as a processor, if it is used in
moderation, primarily as a compressor, then the mixes will be louder, and they may improve in
other ways, depending on the experience of the user.

This device as advertised, with its presets, brick wall multiband limiting etc, will create a sonic
signature that is not reversible. I've gotten too many calls from folks who need significant
improvements to their one and only submaster tape which, it turns out, was “Finalized.” Usually
they were told "We run all our mixes through it because it makes everything sound good and
hot..." While this approach may work for demos and local radio, there's a reason why
mastering facilities have not traded in all their other equipment to buy a Finalizer (or similar
units now on the market.) If someone insists that it will provide all the enhancement you need,
believe them, and then make sure that you run a second parallel recorder fed directly off the
console, with no processing (safety copy). You will have an unprocessed safety copy if you are
unhappy with its processing.

Once you have the basics - decent mic preamps, EQ and compressors, the rest is
icing. There's a huge array of vintage and modern processors with distinctive sounds.
If you like exotic microphones, be sure to try them in your space before you commit.
Some legendary mics are special purpose devices, and may have a sound different
than you expect.

16. A non-technical perspective

Much of what I discuss in this presentation relates to technique. Ultimately, mastering has less
to do with technique and more with awareness. To produce the best work, you need to be in a
space very similar to that of an artist in a live performance. Technique has to be a given. How
and when to make the moves should not require thought go with the “feel”. The issue is how to
express the creative force (Force) that is available to all of us when we work to be receptive.
Excellence becomes possible when receptivity combines with experience, constant practice
and a quality instrument. Most musicians can relate to this way of working, as well as people
familiar with martial arts and similar disciplines. Having all the right factors aligned does not
guarantee a great performance, but increases the chances drastically. Even then, an
occasional failure is handed to everyone, regardless of skill. Like in anything else there will be
clients who cannot be satisfied.

©1999 DRT mastering

THE LANGUAGE OF EQUALIZATION

No treatise on equalization would be complete without offering some sort of a lexicon of


EQ'ese. More than any other aspect of recording, the process of equalization seems to
be replete with adjectives. Rarely do engineers, producers and artists communicate using
specific technical terms like, “try x number of dB boost at frequency y”; most often, it is
more like “let's warm up the guitar. And add a little sparkle to the keyboards.” While there
might he some regionalism in this vocabulary and perhaps some variety in interpretation,
it is still quite amazing how universal many of these terms are and how intuitively the
human mind seems to grasp even unfamiliar terms. Perhaps they reflect a simple
word/frequency association, but it seems there is a reality behind these words
automatically grasped by sonically sensitive people.

While no attempt has been made to be comprehensive or authoritative, the following list
does reflect some sort of a consensual opinion on the location between certain commonly

310
SCHOOL OF AUDIO ENGINEERING AE17 – Mastering for Audio and Multimedia

Student Notes

used terms and their associations in the realm of frequency. One is encouraged to modify
and add to this brief lexicon according to one’s common usage.

This vocabulary is offered in the interest of spurring better communication among


practitioners of the audio arts.

1. Baril language

Boomy – applied to a sound overabundant in low lows (in the region of 40-60 Hz). These
waves move a lot of air, hence, boomy.

Fat – generally applied to the octave above -boominess: -(say 60-150 Hz). Makes things
sound big, but not earth-shaking.

Woofy – a somewhat nebulous term for sounds that are sort of "covered"- masked by low-end
energy (typically in the region of 125-250 Hz).

Puffy – is like an octave -above woofy (say 250 to 500 Hz). It's still sort of a cloud, but not as
big.

Warm – obviously a positive characteristic often found between 200 and 400 Hz. Could easily
degenerate into woofiness or puffiness if overdone.

Boxy – seems to remind one of the sound in a small box-like room. (Usually found between
600 and 1 kHz).

Telephony – accentuating the limited bandwidth characteristic commonly associated with


telephones (A concentration of frequencies around 1.5-2.5 kHz with a roll-off both above and
below).

Cutting – Here, "cut" means to put an incisive "point" on the sound (2.5-4 kHz does this very
effectively).

Presence – Anywhere from 3-6 kHz can be used to make a sound more present.

Sibilance – Dangerous "s" sounds and lots of other trashiness can often be found at 7-10 kHz.

Zizz – refers to a pleasantly biting high-end resonance (think of a "harpsichord"-type brightness


found around 10 - 12 kHz).

Glass – A very translucent, but palpable brilliance associated with 12-15 kHz.

Sparkle – A real smooth stratospheric brilliance almost beyond hearing, but can certainly be
sensed, found at 15-20 kHz.

Brightness – Most generally achieved by shelving EQ of everything above 10 Khz.

Darkness – The opposite of brightness (a general lack of highs at 10 Khz and beyond).

Muddiness – Actually a compound problem: woofiness plus puffiness (excess low end and
also low mids).

311
SCHOOL OF AUDIO ENGINEERING AE17 – Mastering for Audio and Multimedia

Student Notes

Thinness – The opposite of muddiness (a deficiency of lows and low mid frequencies).

Openness – The quality of having sufficient highs and lows.

In conclusion, while no one can impart automatic discretion in the use of equalization, it should
be clear that the art of equalization is one that can be studied, practiced and learned.

Various CD Formats

What's confusing is the multiple number of competing CD-ROM formats for artists and
record labels to choose from, almost all of which are non-compatible. There are various
CD standards: Audio CDs, Enhanced CD, CD-I CD-Plus, CD-Extra, Mac CD-ROM, DOS
CD-ROM, Windows CD-ROM, etc.

All CDs are the same physical size. Yet the discs vary in two significant ways:

• The way information is stored on the physical disc media


• The nature of the information that can be contained on them

All of the most popular CD technology has been developed by Sony and Philips. Each
type of CD has a given set of parameters, or standards. These different types of CD
standards are named after colours.

1. Red Book

The original disc: the audio CD conforms to the Red Book standard. The compact disc industry
started in 1982 when Sony and Philips created the Compact Disc Digital Audio Standard,
commonly known as the Red Book. Audio is the only type of data that can be stored with the
technology. The Red Book standard allows for up to seventy-four minutes of stereo music
using Pulse Code Modulation (PCM to compress two stereo channels into 640MB of space).
Red Book organizes the audio to be stored into disc tracks with a fixed size and duration. The
newer CD-audio disc storage space is 700 MB allows for 80 minutes of programme time. Red
for stereo.

2. Yellow Book

Computer CD-ROMs conform to the Yellow Book standard also published by Sony and
Philips. This standard defines the proper layout of the computer data on a disc. Yellow Book
takes the basic Red Book standard and defines two new track types – computer data and
compressed audio or picture data. The format also adds better error correction (necessary for
computer data) and better random access capabilities.

3. Mixed Mode

When a CD has CD-ROM tracks and CD-Audio tracks it's called a "Mixed Mode" disc. To date,
data cannot be read while sound is being played, so computer applications must use Mixed
Mode in a staged manner. To get around these limitations and to obtain synchronization of
multiple tracks, an interleafed style was added in the form.of the CD-ROM/ XA standard.

312
SCHOOL OF AUDIO ENGINEERING AE17 – Mastering for Audio and Multimedia

Student Notes

CD-ROM/XA (CD-ROM Extended Architecture) was proposed by Philips, Sony and Microsoft
in 1988 as an extension to Yellow Book. This specification stretches the Yellow Book standard
and adds the most interesting idea from Green Book (CD-I), interleaved audio. Before this
time, audio was confined to one track at a time. Interleaved audio allows the sound to be
divided and intermixed with other content.

The CD-ROM/XA drive needs special hardware to separate the channels. When it does, it
gives the appearance of simultaneous synchronized media. Not only does CD-ROM/XA allow
for intermixed sound, but it also provides various levels of fidelity and compression.
CD-ROM/XA uses a more efficient technique of audio compression than the Red Book,
allowing four to eight times more audio on the same disc.

These hybrid discs became known as Mixed Mode CDs. Mixed Mode CDs placed the ROM
data on track one and audio on track two. And that seemed like an acceptable way to arrange
the information on a disc at that time. The CD worked in a standard audio CD player (however
the user had to manually skip track one) and the technology was designed to work with
computer CD-ROM drives. Everything seems to be developing smoothly until the music
industry started screening the technology. They found it unacceptable that when the data track
is played it gives a high pitch screeching noise. They prefer the audio track to be placed first
while the data track is placed last.

4. Blue Book

Enhanced CD, CD-Plus and Multi-Session are all names for the Blue Book standard.
Enhanced CD is a hybrid disc format which merges the audio-only characteristics of Red Book
and the visual data of Yellow Book. With the Blue Book standard, the music is on the first track
and the computer data is on the second track. Any user can take that CD and play it on any
standard audio CD player, even though it is labelled CD-ROM. The disc will play without any
difficulty, and won't have any squeaks because of computer data. It knows to handle both
audio and computer data because there are two tracks, which is why Blue Book discs are
called "MultiSession" discs.

In 1993/1994, Sony and Philips were busy propagating the name "'CD-Plus," the name they
coined to indicate the generic combination of multimedia material with audio CDs. That name
quickly faded when Sony/Philips lost a court battle to market their CD-Plus product to music
enthusiasts worldwide. It seems that a Canadian CD retailer already owned the Canadian
trademark to the CD-Plus name, so Sony/Philips were precluded from selling product under
that name. Sony and Philips next adopted the term "CD-Extra," which is the replacement
name for CD-Plus. That name has flunked with developers and the general public.

For the record, the Recording Industry Association of America (RIAA) has officially dubbed this
new technology "Enhanced CD." This new name for interactive music CDs has gained wide
acceptance and is now the preferred term used to describe discs which contain both audio and
ROM data.

Jackson Browne's Looking East Enhanced CD features the evolution of a song.


Soundgarden's Alive In the Superunknown has an autoplay mode which gives the title the feel
of a cool screen saver. Sarah Mclachlan’s Surfacing with video clips of her recording sessions.

The record companies were satisfied with this new technology because it meant everything is
absolutely identical to the audio Red Book discs. The upside is that you can now play your
Enhanced CD on any CD player. The negative aspect is that it can't be played on all CD-ROM
drives.

313
SCHOOL OF AUDIO ENGINEERING AE17 – Mastering for Audio and Multimedia

Student Notes

Single session CD-ROM player (before September 1994) probably can't play an Enhanced
CD. Its single session driver is not capable of reading a second session. Since Blue Book
technology places the music on the first track and the computer data on the second track, a
single, session player-which reads only the first information session stored on a
CD-ROM-cannot fully play a Multi-Session disc. One would need to load new drivers in order
to access that outside track or the data track a single session CD-ROM driver on the
computer.

The modern Enhanced CD avoids the data track problem by specifying that all computer data
be stored in a second session-one in which conventional audio CD players cannot access. In
order for Enhanced CD discs to be properly recognized by a computer, there are prerequisites:

The computer must be equipped with a Multi-Session CD-ROM drive. If a computer was
purchased after September 1994, it would have a Multi-Session drive, as well as a sound card.

The CD-ROM firmware must support CD player discs. Depending on the rigidity of the
CD-ROM firmware, the drive may or may not recognize Enhanced CD discs properly. The
firmware included in some drivers is inflexible to the point that they cannot accommodate the
different data layout contained on Enhanced CD discs. In some cases, one may have to
upgrade the firmware to allow the driver to support Enhanced CD.

Another prerequisite for Enhanced CD discs to be recognized by the computer is:

The CD-ROM device drivers must recognize Enhanced CD. This last requirement means that
the CD-ROM drive must be capable of recognizing that a given disc conforms to the Enhanced
CD standard. When an Enhanced CD disc is inserted into the CD-ROM drive, the current
device drivers must be able to differentiate between audio and data.

5. Green Book

The Green Book CD-ROM standard was introduced by Philips in 1986 and is the basis for a
special format called Compact Disc Interactive, or CD-I. Although CD-I using the Green Book
standard is a general CD-ROM format with the bonus of interleaved audio, all CD-I titles must
play on a CD-I player, because only this piece of equipment can decode the CD-I formatted
discs with a special operating system known as OS/9. A CD-I system consists of a stand-alone
player connected to a TV set. Consumers have been slow to embrace the CD-I technology.
The CD-I market is limited by the special CD-I development environment and the relatively
small number of CD-I player manufacturers.

A unique part of the Green Book standard is the specification for interleaved data. "Interleaved
data" means taking various forms of media, such as pictures, sounds, and movies, and
programming them into one track on a disc. Interleaving is one way to make sure that all data
on the same track is synchronized, a problem that existed with Yellow Book CD-ROM prior to
the integration of the CD-ROM/XA.

6. Orange Book

The Orange Book standard is used to define the writeable CD format. Part 1 of the standard
covers the new magneto-optical which is completely revisable. Part 2 covers the CD-R
(compact disc recordable) for compact disc write-once media (which defines the Multi-Session
format that allows for incremental updates to the media). Ordinary CD-ROM drives cannot
write to CD-R media and can only play the first session on a Multi-Session disc.

314
SCHOOL OF AUDIO ENGINEERING AE17 – Mastering for Audio and Multimedia

Student Notes

CD Access time - At the moment, there is no standardization of CD-ROM drives. However,


there are a specific set of criteria which they require, and life would be easier if this criteria
were standardized. Most existing CD-ROM drives operate with relatively slow access times
and transfer rates compared to computer hard drives.

The access time is how long it takes to find and return data. The access time of an average
CD-ROM is 300 milliseconds, while the access time for a hard drive is about ten to twenty
milliseconds.

The transfer rate is the amount of data that is passed in a second. The transfer rate for
CD-ROM is about 150 kilobytes per second, rooted in the technology designed to play a
steady stream of digital CD music. (The transfer rate needed for uncompressed full-screen,
full-motion video is approximately 30MB per second.) Dual speed drives help improve
performance by providing up to 300KB per second transfer rates.

6
Assignment 7 – AE006-2

315
AE18 – Synchronization and Timecode

SYNCHRONIZATION

1. Classification of Synchronization

1.1 Feedback Control Systems (Closed-loop)

1.2 Non Feedback Control (Open-loop control)

2. Pulse Synchronisation

2.1 Click Track

2.2 Frequency Shift Keying (FSK)

3. Timepiece Synchronisation

3.1 SMPTE/EBU TIME CODE

4. Longitudinal, Vertical Interval and Visible Time Code

4.1 Longitudinal Time Code (LTC)

5. Time Code and Audio Tape

6. Copying SMPTE Time Code

6.1 Refreshing or Regenerating Code

6.2 Jam Synchronising

6.3 Reshaping

7. Time Code Generator

8. Methods of Synchronisation

8.1 Chase Lock

8.2 Machine Control

9. Other Synchronization-related concepts

9.1 Slave Code Error

9.2 Slave Code Offset

9.3 Event Triggers

316
9.4 Flying Offsets

9.5 Slew

9.6 Advanced Transport Controls

10. SMPTE-TO-MIDI Conversion

317
SCHOOL OF AUDIO ENGINEERING AE18 – Synchronization and Timecode

Student Notes

AE18 – SYNCHRONIZATION AND TIMECODE

SYNCHRONIZATION

Synchronisation causes two or more devices to operate at the same time and speed. In
audio, the term means an interface between two or more independent, normally free-
running machines - two multitracks, for instance, or an audio tape recorder and a video
player. Audio synchronisation has its origin in post-production for film and, later, video,
like Sound and Foley, but synchronising equipment is now very common in music
recording studios as well.

Sound production will involve the engineer with some form of synchronization. Large-
scale industrials are huge, often intermixing pre-recorded video, film, slides, multitracked
music and effects with live performance. All of these media must operate synchronously,
on cue, without failure. The importance of understanding how synchronization systems
work, cannot be overemphasized.

1. Classification of Synchronization

For two machines to be synchronised there must be a means of determining both their
movements, and adjusting them so that the two operate at the same time. This means that
each machine must provide a sync signal that reflects its movement. The sync signals from
both machines must be compatible; each must show the machine's movement in the same
way, so they can be compared. The sync signals may be either generated directly by the
machine transports or recorded on a media, e.g. tape, handled by the machines and
reproduced in playback.

Synchronization control systems can be classified into either be feedback controlled or non-
feedback controlled systems.

1.1 Feedback Control Systems (Closed-loop)

One machine is taken to be the master, and is a reference to which the slave is
adjusted. In a feedback control synchronisation systems (Fig. 20-1), a separate
synchroniser compares the slave's sync signal to the master's sync signal, and
generates an error signal, which drives the slave's motor to follow the master.

Feedback control automatically compensates for any variations in the master's


movement, be it small speed variations due to changing power, or gross variations
like bumping the master into fast-forward. It is self-calibrating. As long as the sync
signals are compatible and readable, feedback control will bring the two machines into
lock. Feedback control is the method that is used when two or more normally free-
running machines are to be synchronised. The general term for closed-loop pulse
synchronisation is resolving.

1.2 Non Feedback Control (Open-loop control)

This simpler method is used when the slave machine can be driven directly by the
master’s time code. For example, syncing a drum machine from a MIDI Sequencer’s
clock and bypassing the drum machine's internal clock. This is analogous (similar) to
a physical mechanical linkage between two tape machine transports, causing them to
run from a single motor.

318
SCHOOL OF AUDIO ENGINEERING AE18 – Synchronization and Timecode

Student Notes

We can further categorise existing synchronisation techniques in two classes - Pulse


method and Timepiece method according to the information contained in each sync
signal.

2. Pulse Synchronisation

Relies upon a simple stream of electronic pulses to maintain constant speed among machines.

Pulse methods are used in open-loop systems.

Some of the pulse methods are use in:

2.1 Click Track

A click track is a metronome signal used by live musicians to stay in tempo with a
mechanical, MIDI-sequenced, pre-recorded music or visual program. The click
corresponds to the intended, beat of the music, so its rate can vary. On occasion,
amplitude accents or a dual-tone click may be used to denote the first beat of each
measure. Some MIDI sequencers can read and synchronize with click tracks.

2.2 Frequency Shift Keying (FSK)

Is a method for translating a low-frequency pulse (such as any - of the above


proprietary clocks) to a higher frequency for recording to audio tape or transmitting
over a medium whose low-frequency response is limited. The clock pulse modulates
the frequency of a carrier oscillator, producing a two-tone audio-frequency signal.
Units designed for FSK sync read the two-tone signal and convert it back into the
corresponding low-frequency pulse.

3. Timepiece Synchronisation

Pulse-type synchronization methods share a common drawback: sync signal in all cases has
no marker function. Pulse sync signals can be resolved so those two systems run at the same
rate. However they convey no information beyond speed. To be in sync, the systems must

319
SCHOOL OF AUDIO ENGINEERING AE18 – Synchronization and Timecode

Student Notes

start at the same time and at the same point in the song. Should the master be stopped in the
middle of a song, the slave will stop likewise. After the master has been rewound to the start of
the song again and begins playing, the slave will continue from where it last stopped.

Timepiece sync methods fix this problem by using a more complex sync signal, into which
place markers are encoded. These place markers identify individual points in the song. As a
consequence, systems using timepiece synchronisation can lock to one another exactly, even
though they may not start at precisely the same instant and place in the song.

E.g. SMPTE, MIDI clock with Song Position Pointer, Smart FSK, Midi Time Code (MTC)

3.1 SMPTE/EBU TIME CODE

SMPTE Time Code is a synchronisation standard adopted in the United States in the
early 1960’s for video editing. Although still used for that purpose, it is also used in
audio as a spin-off of the audio post production market (film and video sound); and is
widely used for audio synchronising. Sometimes called electronic sprockets, SMPTE
code allows one or more video or audio transports to be locked together via a
synchroniser, and can be used for syncing sequencers and console automation. EBU
is short for the European Broadcast Union, an organisation like SMPTE that uses the
same code standard.

SMPTE code is a timekeeping signal. It is an audio signal like any other audio signal
and it can be patched in the same manner (the path should be the most direct
possible e.g. without EQ or Noise Reduction). Time code requires a synchroniser,
which will interface with a multi-pin connector on each of the machines that it controls.
The LTC itself will be recorded on an audio track (edge tracks) or VITC incorporated
within the Video Signal between each video frame.

3.1.1 Signal Structure

Time code is a digital pulse stream carrying timekeeping info. The data is encoded
using Manchester Bi Phase Modulation, a technique which defines a binary ‘0’ as a
single clock transition, and a ‘1’ as a pair of clock transitions (see Figure 20-2).

This affords a number of distinct advantages:

1. The signal is immune to polarity reversals: an out-of-phase connection will not


affect the transmission of data.
2. Because the data can be detected using an electronic technique called zero
crossing detection, the Time Code signal's amplitude can vary somewhat without
confusing receiver circuitry.
3. The rate of transmission does not affect the way the synchronizer understands
the code, so it can still read the code if the source transport is in fast-forward.
4. The data can be understood when transmitted backwards i.e. when the source
transport is in rewind.

320
SCHOOL OF AUDIO ENGINEERING AE18 – Synchronization and Timecode

Student Notes

Time code is like a 24 hour clock and runs from 00:00:00:00 through 23:59:59:29. The
code address enables each image frame to be identified e.g. on the display one hour, two
minutes, three seconds, and four frames would look like 01:02:03:04.

SMPTE data bits are organised into 80- or 90-bit words, which are synchronised to the
image frame rate, one word per image frame. Data within each word are encoded in BCD
(Binary Coded Decimal) format, which is just a way of reading and writing it, and express
three things: time code address, user bits and sync bits. Figure 20 - 3 illustrate an 80-bit
SMPTE Time Code word.

3.1.2 Time Code Address

Time Code Address is an eight-digit number - two digits each for hours, minutes,
seconds and frames. The frame rate means how many frames per second used for
different countries' standards.

3.1.3 User Bits

User Bits are eight alphanumeric digits that information like the date, reel number, etc.
User bits do not control as it only relays information to make it easier for filing and
organisation.

3.1.4 Sync Words

Sync Words indicate the direction of the time code during playback of the machine. If
you go into rewind with time code playing into the head, it is the sync word that informs
the synchroniser or tape machine which direction the time code is travelling.

3.1.5 Frame Rates

Four different types of SMPTE code or frame rates are in use throughout the world.
Each is distinguished by its relationship to a film or video frame rate.

In the US, black-and-white video runs at 30 fps (frames per second), colour video at
approximately 29.97 fps, and film at 24 fps.

321
SCHOOL OF AUDIO ENGINEERING AE18 – Synchronization and Timecode

Student Notes

In Europe, both film and video run at 25 fps.

30-Frame - Time code with a 30-frame division is the "original time code" also known as
Non-Drop (N/D). Its frame rate corresponds to United States black-and-white NTSC
video.

30 non-drop- this is the same as a clock: its time code address represents real time –30
ms instead of a 1000ms in a second.

30 Drop-Frame - When colour TV was invented, it couldn't transmit colour at 30- frames
and still be in phase with black-and-white signals. The colour frame rate had to be
reduced to approx. 29.97 fps, to give the colour scan lines enough time to cross the
screen and still produce a clear image.

This fixed one problem, but created another. Time code at that rate ran slower than real
time, making an hour of 30 Frame Time Code last 1 hour and 3.6 seconds. A new time
code, Drop-Frame (D/F), was created to deal with this dilemma.

30 D/F omits frame number 00 and 01 for every minute except the 10th minute of the
TC.

For instance, 01:00:00:00 - 01:00:00:01 ~ 01:00:00:29 – 01:00:01:00

~ 01:00:59:29 - 01:01:00:02 - 01:01:00:03

~ 01:09:59:29 - 01:10:00:00 - 01:10:00:01

30 D/F has remained the US network broadcast standard ever since.

25-Frame - In Europe, the mains line frequency is 50 Hz. Time code based on that
reference is most easily divided into 25 frames per second, which is the frame rate for
European video and film (the PAL/SECAM standard, where PAL is Phase Alternation
Line, and SECAM is Sequential Colours in Memory. The 25-Frame time code also
called EBU time code is used in any country where 50 Hz is the line reference. There is
no 25 Drop-Frame in Europe, both colour and black-and-white run at the same frame
rate.

24-Frame - Since the film industry used 24 fps as their standard, 24-Frame Time Code
was introduced to correspond with film. This time code is sometimes used by film
composers who have all of their music cue sheets marked in frames based on a rate of
24, or for editing film on tape.

Unless one is doing a video for broadcast, most people choose Non-Drop time code
because it expresses real time. There is no harm in using Drop-Frame for any purpose
because most equipment will sync to either code.

In practical terms, it really doesn't matter which code you use, as long as the usage is
consistent.

However intermixing of different frame rates must be avoided.

322
SCHOOL OF AUDIO ENGINEERING AE18 – Synchronization and Timecode

Student Notes

4. Longitudinal, Vertical Interval and Visible Time Code

Regardless of the frame rate, there are two basic versions of SMPTE Time Code. Their
differences from each other are the physical way they are recorded on video tape (Fig. 20-4).

4.1 Longitudinal Time Code (LTC)

Longitudinal Time Code is recorded on a standard audio tape track. When recorded
to video, LTC is placed on one of the linear audio tracks of the video tape. LTC is the
"original SMPTE Time Code standard", and older video tapes, if they contain time
code, will be striped with LTC.

5. Time Code and Audio Tape

Recording SMPTE Time Code onto a tape is a.k.a as Striping Tape. (This term comes from
the motion picture industry, and refers to the stripe of oxide that is placed on the edge of the
film after it has been developed; the stripe is used to record audio for the film.)

Whenever possible record the TC first followed by the audio. This will reduce the likelihood of
time code bleeding into adjacent channels. A TC’s square wave frequencies lies in the mid-
frequency band where our hearing is most sensitive. On large multitrack tape recorders, the
edge tracks tend to be unstable. Recording the same time code on tracks 23 and 24 will
protect you from potential dropouts on track 24. If that tape is used often for dubbing or mixing.
There is a danger of the edge track becoming damaged. A spare time code track will becomes
invaluable.

323
SCHOOL OF AUDIO ENGINEERING AE18 – Synchronization and Timecode

Student Notes

TC should be recorded at a level not greater than -10 VU for professional machines. On semi-
pro machines, the level should be -3 VU.

It is important to record time code on an edge track of a multitrack recorder and then leave a
blank track (or guard track) between the code and the next recorded track.

Allow a minimum of running time code before recording any audio on tape. The pre-roll and
post roll of 30 seconds (before and after the start of the audio) will provide sufficient time for the
synchroniser to “lock” all the equipment up.

6. Copying SMPTE Time Code

If Time Code needs to be copied from one recorder to another like making a safety of a
multitrack master. Copying TC always involve some form of signal conditioning.

Three methods of copying LTC

1. Refreshing
2. Jam syncing
3. Reshaping

6.1 Refreshing or Regenerating Code

Since magnetic tape recordings are imperfect in reproducing square wave signals due
to its bandwidth limitations. As a result some form of signal conditioning is required if a
time code track is to be copied from one tape to another. This is known as Refreshing.

Refreshing or regenerating code is performed by sending the original TC into a time


code generator. The generator locks onto the incoming time code and replicates it,
creating a fresh time code signal. Some generators/regenerators can actually fix
missing bits in the code. This “refreshed” TC is recorded onto tape.

6.2 Jam Synchronising

Jam Synchronising is the same as regenerating code, with an exception. Once the
Jam Sync has started it will continue to generate TC even if the original TC has
stopped “playing”. This is especially useful when T/C is too short on a tape or there
are dropouts. Some generators can jam sync backwards extending the front of the
tape also. In both refreshing and jam sync mode, one can either choose to copy the
user bit information or generate new user bits - different dates.

6.3 Reshaping

Reshaping is a process that transfers the existing code through a signal refreshing but
it cannot repair missing bits.

7. Time Code Generator

Most synchronisers have built-in TC generators as well as certain tape machines including
certain portable 2 or 4 track units used for field production and news gathering like Nagra tape
recorders. A stand-alone TC generator must used in conjunction with synchronisers or tape
machines, which do not have built-in TC generators.

324
SCHOOL OF AUDIO ENGINEERING AE18 – Synchronization and Timecode

Student Notes

Vertical Interval Time Code (VITC) is recorded within the video picture frame - during the
vertical-blanking interval. It can be present in a video signal without being visible on screen.
VITC is structured similarly to LTC, but includes several "housekeeping bits" which bring its
word length to 90 bits.

LTC is most common in audio because VITC cannot be recorded on audio tracks. VITC can
be read from a still frame (whereas LTC cannot), and it provides half-frame accuracy for edits.
Where video and multitrack audio transports must be synchronised, both VITC and LTC may
be used together. In audio-only productions only LTC is used.

The time code recorded on a video, which appears on the monitor screen is called a window
dub or burnt time code, on the screen.

8. Methods of Synchronisation
8.1 Chase Lock

Chase Lock is the most common form of synchronisation.

In chase lock, the synchroniser controls both the slave capstan speed and its
transport controls. If the master code is three minutes ahead, for example, the slave
transport will be instructed to advance the tape to that area and park. If the master
code is already in play, the slave will automatically go to that general area and play.
The capstan will be instructed to speed up or slow down until it reaches lock.

8.2 Machine Control

Machine Control is another common method of synchronisation.

In Machine Control, the slave capstan control uses the closed-loop method. A
synchroniser monitors the slave’s and master’s time code signals and will speed up or
“chase” or slow the slave capstan until the TC read from the slave machine matches
the master time code. At that point, the system is considered "locked".

9. Other Synchronization-related concepts


9.1 Slave Code Error

Slave Code Error is simply the time difference between the master and slave codes.
The error can be either positive or negative depending on whether the slave is ahead
of or behind the master. The synchroniser displays the TC in hours, minutes, seconds,
and frames. This error can either be used to determine how far away the slave is from
the master or stored for subsequent usage. E.g. Slave code offset.

9.2 Slave Code Offset

Slave Code Offset is the slave code error added to or subtracted from the slave time
code. This calculation offsets the slave time code so that it matches the master time
code numerically. If the master time code reads 01:00:00:00 and the slave time code
reads 04:00:00:00, the offset would be 03:00:00:00.

If the master and slave time codes were reversed, the offset would be -03:00:00.00.

Offsets are extremely useful when matching a pre-striped music track to a new master
time code. It basically makes the two T/C values the same.

325
SCHOOL OF AUDIO ENGINEERING AE18 – Synchronization and Timecode

Student Notes

9.3 Event Triggers

Event Triggers are time code values placed in event trigger registers. The triggers can
be used to start and stop tape machines or to trigger sequences. They are used for
triggering equipment that either cannot or does not need to be synchronised.

9.4 Flying Offsets

Flying offsets are accomplished using a synchroniser capture key, which captures the
offsets while master and/or slave are in motion. This is useful for defining a specific
beat in the music relative to the master time code. For example, if the master is a
video deck and you want a particular beat in the music to occur at a specific point in
the video, you would park the master at that point, run the music slave (unlocked),
and hit the capture key at the desired beat.

9.5 Slew

Slewing is a way of manually advancing or retarding the slave capstan by increments


of frame and subframe. The transports must be locked before slewing can occur.
Once you have determined the proper slew value, it can be entered as a permanent
offset. If there is already an offset prior to slewing, the slew value will simply be added
to or subtracted from the prior offset. This is also referred to as offset trim.

9.6 Advanced Transport Controls

Similar to transport autolocators, some synchronisers’ autolocate master and slave


transports, with one major difference: the location points are related to In addition to
autolocation, some synchronisers offer such features as zone limiting and auto punch.
Zone limiting sets a Predetermined time code number that, when re-ached by the
slave, either stops the transport or takes it out of record. Auto punch brings the slave
in and out of record at specified time code locations. These time code numbers can
sometimes be stored in event trigger registers.

10. SMPTE-TO-MIDI Conversion

Used in conjunction with SMPTE Time Code, MIDI provides a powerful extension to the
automation capabilities of a synchronised system. The device that makes this possible is the
SMPTE-to-MIDI Converter, a unit which reads SMPTE Time Code and translates it to MIDI
clock with Song Position Pointer data. The converter serves as the interface between the
SMPTE-synchronised system and the MIDI-synchronised system.

Stand-alone converters which interfaced directly with computer sequencers are available from
several manufacturers. In addition, some sophisticated synthesisers having built-in sequencers
are capable of reading time code and performing the conversion internally.

In the studio, SMPTE-to-MIDI conversion is a tremendous aid in combining pre- sequenced


MIDI instrumental parts with multitrack overdubbing of acoustic instruments, and vocals.

326
SCHOOL OF AUDIO ENGINEERING AE18 – Synchronization and Timecode

Student Notes

In a live sound reinforcement, automated audio and MIDI data stored in MIDI-sequences, can
be played back synchronously with multitrack recorders, video or film. This technique is now
commonly used. It provides hi-fi audio and video presentation, including automated MIDI
controlled lighting, all at a lower cost.

327
SCHOOL OF AUDIO ENGINEERING AE18 – Synchronization and Timecode

Student Notes

328
SCHOOL OF AUDIO ENGINEERING AE18 – Synchronization and Timecode

Student Notes

329
330
AE19 Professional Recording studios

The Mackie Digital 8-Bus Console

1. Recording on the Mackie

2. Mixing down on the mackie

331
SCHOOL OF AUDIO ENGINEERING AE19 – Professional Recording Studio

Student Notes

AE19 PROFESSIONAL RECORDING STUDIOS

The Mackie Digital 8-Bus Console

This section gives an introduction to the professional studio environments. The 24 track studio
built around the mackie digital 8-bus console would be used.

1. Recording on the Mackie

2. Mixing down on the mackie

7
Assignment 8 – AE008

332
AE20 – Audio Postproduction for Video

1. Post Production Techniques

1.1 Consistency

1.2 Depth – Dimension

1.3 Worldize

1.4 Walla/Babble Track

1.5 Room Tone, Ambience Or Presence

1.6 Wild sound

1.7 Multisense Effects

1.8 Panning (dialogue for TV/film)

1.9 Cross Fades

1.10 Split Edit

1.11 Cut Effects

Psychoacoustics

1. Frequency Masking

1.1 Temporary Masking

2. Automated Dialogue Replacement

Sound Editing

1. Supervising Sound Editor / Sound Designer

1.1 Music Editing Specialist

1.2 Foley Sound – Effects Editing

2. Editing

2.1 Six generations in a typical film work

2.2 Four Generations In A Typical TV Sitcom

3. Documentary Production

333
3.1 Off-Line Editor

3.2 Edit Decision List

3.3 Edit Master

4. Print Masters to Exhibition

4.5 Masters for Video Release

4.6 Print Masters for Analogue Sound Track

334
SCHOOL OF AUDIO ENGINEERING AE20 – Audio Postproduction For Video

Student Notes

AE20 – AUDIO POSTPRODUCTION FOR VIDEO

Introduction

Post production occurs after the production stage. Here audio recording, ADR, addition of
sound effects, music, editing and sweetening takes place. Finally the sound track is
added to the video.

The producer, director and sound editor during the pre production meeting prior to the
production stage would determine this. The sound editor is responsible for the "sonic"
style or "feel" of the production.

During the production shoot, the on-location recordist is to capture the "production
sound." He would record the dialogue tracks or even wild tracks e.g. ambience onto a
portable mixer and DAT recorder using a fishpole (boom) and shotgun condensor
microphone. This is necessary because film does not have soundtracks unlike video. The
recordist is to capture the dialogue or ambience as best as he could. At the end of day,
the sound editor and director will decide on which "takes" to "print" to use. Hence detailed
labeling of the various takes by the on-location recordist is essential. This is the job of the
“third man’ if he is available.

The boom operator is required to read and understand the script what is going to take
place in the shoot. He is to hold the mike out of the way of the frame when the camera is
tracking out. He is to be mindful not to cast shadow onto the set when he is moving the
mike overhead to capture the best sound possible.

When the film has been edited and is ready for post production work. It will be scheduled
for completion in an agreed number of weeks by the sound editor and director.

The film is transferred to telecine machine which will convert the edited movie, a rough
cut at this stage, into video. The video is given to the sound editor. The sound editor
would now find the matching audio track on DAT and digitally transferred the audio to a
digital audio workstation (DAW) first e.g. Pro Tools 5.1.

Once it has been transferred to a DAW, the engineer will use a synchronizer to sync the
digital audio and video tape recorder together from the DAW using e.g. MIDI machine
control (MMC). This is at least one of way of synchronization.

Another method is to make the synchronizer the master controller, where the DAW and
video tape recorder will be slaves or synchronized to it.

Once the audio and video has been synchronized using time code offset. The sound
editor achieve synchronization of audio and video by looking for the clapper board
"snapping" on video, with a burnt time code window, on screen for making
synchronization easier and then matching its sound on the audio track.

335
SCHOOL OF AUDIO ENGINEERING AE20 – Audio Postproduction For Video

Student Notes

1. Post Production Techniques

Depending on where the characters are on screen, the sound editor will attempt to use
artificially created reverbs to match their location be it outdoors or indoors. This is to make the
sound realistic. Reverbs may be increased or decreased depending on how the camera
captures the character in e.g. a large hall. Should an extreme long shot or establishing shot of
the actor be taken while he is walking in this large empty hall. His footsteps will echo through
the audio track and a large reverb is used. When there is a "cut to" edit on the video, showing
him at a close up shot on screen in the same hall talking on his mobile phone, the reverb level
would be lower now (its high frequency rolled off or it would interfere with his speech). This
must be done to simulate that the audiences are “psychoacoustically” performing "the cocktail
party effect" in suppressing the noise in the characters surrounding while they concentrate on
listening to his phone conversation.

Doing this gives the video perspective also just like what is encountered in real life. When the
actor finished talking on the mobile, he walks away off-screen from the camera. Even though
the camera shot remains the same - revealing an empty hall, the sound of his footsteps is
turned up as though the audiences are doing it "psychoacoustically" again because they have
already finished listening to his phone conversation. On the other hand if the footsteps' sounds
are not turned up, the video will sound "empty.” Hence there is a need to turn up the footsteps'
level to fill up the “void” in the audio.

The sound editor will add sound effects into the DAW in the same place (timeline of the DAW)
as the video where the objects are making a sound or noise. E.g. A character is walking in an
empty hall. It is difficult to record the footsteps echoing in a large hall on DAT and then make it
"match" with the rest of the sounds on the DAW. The reverb of that hall environment with the
sound of footsteps will be difficult to control by equalization since the sound of footsteps will
sound loose (lacking crispiness and details) in comparison to what we hear in real life because
it is recorded in a distance. The reason for this is humans are able to psychoacoustically
suppress the reverberation while focusing on the sound of footsteps.

Hence the sound editor will capture the footsteps in a foley stage where the foley artiste will
walk in the same kind of shoes and on similar surface. He will capture the sound dry without
reverb. Subsequently during post production he will make sure the sound of the footsteps are
in sync with the character’s walking. Once that is correct he will add reverb from digital effects
unit to it. It is "easier" that way for the sound editor to control the ambience level and the sound
of the footsteps as two separate elements. As oppose to a recording both the footsteps sound
and the reverberation of a large hall, one element.

On the other hand the sound editor may use Effects Library CDs e.g. Hollywood Edge for the
sound effects. Here he will choose from many different sound effects on e.g. footsteps with
leather-sole shoes on wooden floor. There might be under this list a number of ways of walking
on different types of wooden floor. The sound editor will choose the most suitable sound effect
for the job. Since most of these sound effects are dry, the sound editor will add reverb to it after
he has spotted the effects to the picture. He would have to equalized them to make them fit
with the rest of the other audio.

When the music, sound effects, ambience and dialogue replacement is done and mixed
correctly. Everything will sound natural as thought it happened during the production stage
itself.

To make mixing easier for the sound editor in case there is a need for changes all the music
will be mixed into a music stem, the sound effects into a sound effect stem and the dialogue
into a dialogue stem. The final output of these stems is a Print Master.

336
SCHOOL OF AUDIO ENGINEERING AE20 – Audio Postproduction For Video

Student Notes

Thereafter it can be “laid back” or record the print master onto a film or video, DVD for
exhibition or release.

1.1 Consistency

Consistency during production sound is more important than having great and not-so-
great sounding dailies. This is important especially within a scene than from scene to
scene. Inconsistency in production sound is distracting. Make use of different miking
techniques/position to achieve the desired result without the use of EQ (processing) if
possible unless there is no other way around it.

1.2 Depth – Dimension

A talent speaking into a bright condenser mike with no reverb added will give a “voice
inside a head” effect. It gives an intimate “in-your-face” style different from the normal
reverb-added overdubs. E.g. Apocalypse Now

On the other hand, a voice-over narration that is made much more reverberant (while
an actor is not speaking) than the on-screen action is one method of suggesting to the
audience that we are listening to his thoughts.

1.3 Worldize

At times to worldize an archaic recording is to send them through loudspeakers in a


reverberant environment, which in turn gets picked up by mikes. A record played on a
turntable produces too much direct signal, which may not give the intended effect
even after some processing. Play the same music from a gymnasium hall’s public
addressing system and recording it will produce a “period sounding” recording that
suits a “period” production.

1.4 Walla/Babble Track

In a crowded room scene, record the actor’s voice first while the crowd, played by the
extras, mimic talking in the background. This will prevent spills of the crowd’s talking
from getting into the production recording.

Record the crowd talking separately in production or postproduction stage. This


recording is also known as “walla” or “babble” track. During the postproduction stage
recording ensure that the activity, gender and people is about the same. You do not
want an audience of about one hundred people applauding while the screen reveals
about thirty people. This happened in the classic movie Casablanca - in Rick’s Café
near the opening scene after Sam had sung “Knock on Wood”.

1.5 Room Tone, Ambience Or Presence

Record “room tone”, ambience noise, atmospheric or presence. This can be added
during post production (frequency masking) to ensure good continuity between cuts in
the same scene. This is vital should the “room tone” varies between takes within the
same scene since it was filmed over a few hours or even days.

1.6 Wild sound

337
SCHOOL OF AUDIO ENGINEERING AE20 – Audio Postproduction For Video

Student Notes

Wild sound is recorded without the camera rolling or without synchronization usually
after a shoot. It is used later in post production to match scene.

1.7 Multisense Effects

A movie’s sound track is designed to interact with the movie itself. They also
compliment one another. A movie without audio will appear longer than what it seems
to be. A tittle crawl at the beginning of Star Wars seems to take a longer time scrolling
up screen while watched in silent. On the other hand, with music it seems to march
right along.

1.8 Panning (dialogue for TV/film)

With stereo we can pan effects or voice in the stereo panorama to fit the positions of
the characters on the screen. However this is rarely done for the voice unlike effects.
The reason being if a character is speaking on the left side of the screen and his voice
is panned to the left also. When there is a cut to edit in the film and the same
character appears center of the screen with the voice now suddenly panned to the
center of the stereo panorama. The audience will find it jarring, lacking continuity. In
other word, it does not flow – unnatural.

Most character’s voices are panned center (mono) no matter where their position on
screen might be. The only exception here is when an off-screen character speaks a
line, which will be panned either left or right, this will distinguished it from the on-
screen dialogue.

1.9 Cross Fades

Cross fade room tone (presence, atmosphere) from one scene to another scene. It is
used to show transitions in time. This will ease the audience into the next scene as it
sounds more natural then a “butt splice”. However “butt splices” can be used if
desired.

1.10 Split Edit

When the sound of an object or object is heard first before it/they appear/s on screen
is known as split edit. E.g. A phone is heard ringing before it is seen on screen.

1.11 Cut Effects

Everything that is seen on screen that makes a sound should be heard. Thus is
covered by a cut sound effect. “See a car moving. Hear a car moving.”

Psychoacoustics

1. Frequency Masking

A scene was shot over the duration several days or hours. The performance/acting is
fine. During editing the change in background noise level is obvious. One solution is to
use the production recording (sounds captured during the shooting itself) with the louder

338
SCHOOL OF AUDIO ENGINEERING AE20 – Audio Postproduction For Video

Student Notes

noise (wild track) for the whole entire scene. This will produce smooth – sounding edits
and good continuity.

1.1 Temporary Masking

A loud sound can mask a soft sound that occurs just after it. This is known as post-
masking. Pre-masking or backward masking occurs when a louder sound masks a
softer one. Humans perceived louder sounds (some 10ms) faster than we do softer
one.

Sound editors use it to mask imperfection in an edit and also to maintain rhythm. For
instance, a loud cymbal crash is edited “on the beat” to cover up for any imperfection
momentary discontinuity at the actual edit point using pre-masking.

2. Automated Dialogue Replacement

Automated Dialogue Replacement (ADR) or Looping. The actors or voice talents will now be
watching the video on a monitor in a studio. He hears his dialogue (production sound) played
from the DAW through headphones. The actors or voice talents repeat their lines once again,
like on the video, in the studio now where it will be recorded. After the dialogue has been
recorded it is synchronized to the video. The sound editor will check to ensure that the lip-
syncing is perfect. The newly recorded spoken lines match with their “speaking” or “mouthing”
of their lines on the video. Should the lip-syncing be imperfect, the sound editor may attempt to
salvage it by "selecting the words individually" via the waveform on editing window and then
splitting it up on the digital audio software. Then he would match their waveforms to the original
dialogue track's waveform and timeline also so that it would appear to be in sync with the
"mouthing" of those words by the actor on screen.

If that proves unsuccessful, the sound editor would have to record the actor's voice again and
then perform another lip-syncing. The purpose for ADR is to have the dialogue track replaced
because the production recording of it is inconsistent and has noise in it. These factors made it
unsuitable for final release unless the production is a low-budget production where the
production sound - dialogue track is used whether it is of good quality or not.

Sound Editing

1. Supervising Sound Editor / Sound Designer

This term Sound Design encompasses the traditional processes of both editing and
mixing. The sound designer has over a range responsibility from the overall conception of
the sound track, composing music and its role in film (giving it the mood or sonic style) to
making specialized complex sound effects.

• Dinosaur sound in Jurassic Park is a mixture of penguin and baby elephant


trumpeting.
• Star Wars – Luke Skywalker’s land speeder’s sound is achieved
by sticking a mike in the tube of a turned on
vacuum cleaner

339
SCHOOL OF AUDIO ENGINEERING AE20 – Audio Postproduction For Video

Student Notes

Storm Trooper’s laser gun’s shots sound is obtained by


recording the vibration, via contact pickup, of tautly strung cable
after it has been struck

• The thud of a fist into a body is achieved by striking an axe into a watermelon.
• Bird’s wing flapping sound is created using the sound produced by opening and
closing an umbrella rapidly.

Sound design is the art of getting the “right sound” in the right place (screen) at the right
time (synchronized).

The sound editor may compose a piece of music or choose music from a CD library to
enhance the mood of the movie. Spotting is performed here at this stage. Where he will
spot the music to the movie via time code. Most movies have about 50% music in it. The
important question or art is when to bring in the music to enhance the mood or even
heightened tension or climax.

The tempo and style of the music will often give set the pace and mood for a certain part
or scene in a movie. Fast tempo together with quick edits of a movie showing someone
running away from danger will usually set the audience hearts racing. Or a slower tempo
with lots of low dark notes to give the impression of impeding danger. At the same time it
heightens suspense. On the other hand, a music spe