Вы находитесь на странице: 1из 57


Lecture 6 Multimedia Fundamentals

 Why is audio significant for multimedia design?
 It is truly a component of "multi" media, allowing the user to
captivate the audience in another way.
 In web design, sound is used to highlight such effects as pushing a
button, or to animate an introduction designed in flash.
 do not require highly specialized knowledge of harmonics, intervals,
sine waves or the principles of acoustics but the only thing to know is
that how the different types of sounds such as human voices,
instrumental music etc can be incorporated into the computer.
 There are basically two types of audio that can be played on the
multimedia systems i.e. Digitized Audio & MIDI.
Nature of Sound
Audio Fundamentals
 Acoustic is the study of sound
 Generation, transmission, and reception of sound waves.
 Sound wave - disturbance caused by the movement
of energy traveling through a medium (such as air,
water, or any other liquid or solid matter)

 Example: striking a drum

Head of drum vibrates-> disturbs air molecules close to head
of drum. Regions of molecules with pressure above and below
equilibrium. Sound transmitted by molecules bumping into each
Sound Waves
 Sound pressure levels ( loudness or volume) are
measured in decibels (dB).
 The perception of loudness is dependent upon the
frequency or pitch of the sound.

dB Example
30 Very soft whisper
70 Voice conversation
130 75-piece orchestra
170 Jet engine
Human Perception

• Speech is a complex waveform.

 Vowels and bass sounds are low frequencies
 Consonants are high frequencies.
• Human most sensitive to low frequencies-between
20Hz -4 kHz
• Humans normally hear sound frequencies between
approximately 20 Hz and 20,000 Hz (20 kHz).
Sound Waves




Period (T)
Sound Waves
 Sound waves can be characterized by the following
 Period
 The interval at which a periodic signal repeat
 Time for cycle to occur:

Sound Waves
 Frequency
 Frequency is the speed of the vibration, and this determines the
pitch of the sound. It is only useful or meaningful for musical sounds,
where there is a strongly regular waveform.
 Measures a physical property of a wave. The unit is Herts(Hz) or
kiloHertz(kHz). f=1/T
 One hertz is one cycle wave per second.
 Frequency ranges:
 Ultrasound 20 kHz- 1GHz
 Cosmic rays – 1022 Hz
 Human hearing range up to 20kHz
 Amplitude
 The measure of sound levels. Intensity of the sound.
 For a digital sound, amplitude is the sample value.
Sound Waves
 Wave length
 Distance sound travels in one cycle
 The higher the frequency of the signal, the shorter the
20 Hz is 56 feet
20 kHz is 0.7 inch

 Bandwidth
 Frequency range
 Range of frequencies a device can produce or a human can hear.
FM radio 50Hz-15kHz
AM radio 80Hz-5kHz
CD player 20Hz- 20kHz
Telephone 300Hz-3kHz
Older human ears 50Hz- 10kHz
Wavelength and Speed
 Sound travels at approximately 344m/sec or 1130 ft/sec in air
 Audible sound covers the frequency range from about 20 Hz to 20kHz.
 The wavelength of sound of a given frequency is the distance between
successive repetitions of the waveform as the sound travels through air. It
is given by the following equation:

wavelength = speed/frequency

or, using the common abbreviations of c for speed, f for frequency, and l for

l = c/f

 2 attributes of a sound wave affected how you work with sound on a

computer: volume and frequency.
 The height of each peak in the sound wave relates to its volume—the
higher the peak, the louder the sound.
 The distance between the peaks is the frequency (sometimes referred
to as pitch)—the greater the distance, the lower the sound.
 Frequency is measured in hertz (Hz). A pattern that recurs every
second is equal to one hertz. If the pattern recurs 1000 times in a
second, it would be equal to 1000 Hz or 1 kHz (kilohertz).
Example sound waves

The faster the vibration the higher the pitch of the note and the bigger
the vibration the louder the note.
 A dimensionless unit; no specifically defined physical quantities
 measures sound pressure or electrical pressure (voltage) levels.
 It is a logarithmic unit that describes a ratio of two intensities, such as two
different sound pressures, two different voltages, and so on.
 A bel is a base-ten logarithm of the ratio between two signals.
 means that for every additional bel on the scale, the signal represented
is ten times stronger.
 Example, two loudspeakers, the first playing a sound with power P1, and
another playing a louder version of the same sound with power P0
 The difference in decibels between the two is defined to be:

db  10log10  
 p0 
 If the second produces twice as much power than the first, the difference in dB
10 log (Po/P1) = 10 log 2 = 3 dB.
Common Sounds
Dynamic and Bandwidth
Types of digital audio
2 main formats:
1. Sampled/Digitized sounds
o digitally converted analog sounds.
o the analog sound waves are captured, converted, and transcribed into a
digital format.
o usually saved as WAV or AIFF files.

2. Synthesized sounds-MIDI
o recreations of many events that form together to create one sound
o Mostly is recreated though the MIDI (Musical Instrument Digital
Interface) format.
o pitch, frequency, period, and speed data are recorded as numeric
o when replayed, this numeric format is recreated into actual sound.
Synthesized Sounds -MIDI
 Advantages  Disadvantages
 Smaller file size  since it is recreated sound, the
 composition of many different quality of the output is dependent
instruments without actually on the quality of the sound card or
having the "real“ instruments MIDI instrument
there to do the recording.  Some instruments and more
 tempo (speed) of the MIDI files expensive soundcards sound
can also be changed at will, indistinguishable from analog
without changing the actual sounds, while other inexpensive
pitch of the piece. soundcards make the MIDI output to
Instrumentation can also be sound "tinny" or fake.
changed instantaneously.
Sound in Multimedia Application
 Types of Audio in Multimedia Applications:
 Music – set the mood of the presentation, enhance the emotion,
illustrate points
 Sound effects – to make specific points, e.g., squeaky doors,
explosions, wind, ...
 Narration – most direct message, often effective
Sound in Multimedia Application
 Analog sound need to be converted into digital data
 Conversion process is sampling, in which every fraction
of a second a sample of the sound is recorded in digital
 Two factors that affect the quality of the digitized sound
(1) the number of times the sample is taken, which is
called the sample rate, and
(2) the amount of information stored about the sample,
which is called the sample size.
 conversion to a stream of numbers, and preferably these
numbers should be integers for efficiency
 Sampling means measuring the quantity we are interested in,
usually at evenly-spaced intervals
 The rate at which it is performed is called the sampling
frequency. For audio, typical sampling rates are from 8 kHz
(8,000 samples per second) to 48 kHz. This range is determined
by the Nyquist theorem
 Sampling in the amplitude or voltage dimension is called
Nyquist Theorem
 The Nyquist theorem states how frequently we must sample in
time to be able to recover the original sound. For correct
sampling we must use a sampling rate equal to at least twice the
maximum frequency content in the signal. This rate is called the
Nyquist rate.
 Nyquist Theorem: If a signal is band-limited, i.e., there is a
lower limit f1 and an upper limit f2 of frequency components in
the signal, then the sampling rate should be at least 2(f2 − f1)
Signal to Noise Ratio (SNR)
 Measurement of the quality of the signal
 The ratio of the power of the correct signal and the noise is called
the signal to noise ratio (SNR)
 usually measured in decibels (dB)
Digital Audio

• Digital audio is created when a sound wave has

been going through a process of digitizing.
• Every n fraction of a second, a sample of sound is
taken and stored as digital recording depends on
how often the samples are taken and how many
numbers are used to represent the value of each
• Sampling rate or frequency measured in kilohertz,
thousands of samples per second.
Digitized Sound: Sampling &
• Sound waves are continuous.
• Samples of waves are taken to measure the signal at
discrete numbers.
• Each sample has to be quantized which represented using a
set of values.
• Total number of values depends on the number of bits used
to represent each sample.
• The more bits used to represent a sample, the smaller the
loss, but leads to larger bandwidth and more resource
consumption such as disk space and memory requirements.
Audio Sampling

Sample rate & Sample size
 The most common sample rates are 11.025 kHz, 22.05 kHz, and 44.1
kHz (CD audio quality), 48kHz (DVD audio quality).
 The higher the sample rate, the more samples that are taken and thus the
better the quality of the digitized sound.

 The two most common sample sizes are 8 bit and 16 bit.
 An 8-bit sample allows for 256 values that are used to describe the sound,
 a 16-bit sample allows for 65,536 values.
 The higher the sample size, the better the quality of the digitized sound and
the larger the file size.

 preparing audio for multimedia output, especially web output, it is

unnecessary to go beyond a 22 kHz sampling rate. The reason for this is
that this is the average frequency that is audible to the human ear.

 Some highs and lows will be lost; however, it is difficult to detect this,
especially with the quality of speakers found on most computers today.
Bit Depth
 One more important factor when sampling sounds is bit depth.

 Bit depth - the dynamic range (or amplitude) of a piece.

 also controls the resolution of the sound wave (higher bit
resolution results in a smoother wave).
 For example, a highly structured classical piece would have a high
bit depth, because of the great dynamic differences between a solo
flute section and a full orchestral section.
 Many pop tunes have less of a bit depth, because they are composed
on a more equal dynamic range.

 When sampling music for web format, a 16-bit rate is adequate. For
many instances, 8-bit rate may work as well and decrease the file size.
Quality versus File size
Sampling Resolution Stereo or Bytes needed
Rate Mono for 1 Minute Note
44.1 kHz 16-bit Stereo 10.5MB CD-quality
44.1kHz 8-bit Mono 2.6MB Recording a
mono source
22.05 kHz 16-bit Stereo 5.25MB CD-ROM
5.5kHz 8-bit Mono 325K A bad
Quality vs File size
• Sampling at higher rates such as 44.1 kHz, more
accurately captures the high-frequency content of
the sound.
• Audio resolution such as 8 bit or 16 bit determines
the accuracy with a sound can be digitized.
• More bits for the sample size yields a recording that
sounds more like its original.
Quality vS File size
 The size of a digital recording depends on the
sampling rate, number of bits and number of
S=R x (b/8) x C x D
S file size - bytes
R sampling rate -samples per seconds
b number of bits - bits
C channel 1-mono 2- stereo
D recording duration - seconds
If we record 10 seconds of stereo music at 44.1kHz,
resolution-16 bits, the file size is

S= 44100 x (16bits/8) x 2 x10

= 1, 764, 000 bytes
= 1722.7Kbytes
= 1.68 Mbytes
1 Kbytes=1024 bytes
1 Mbytes= 1024 Kbytes
1 byte = 8 bits
 If we record 20 seconds of mono music at 54.2 kHz,
8 bits, the file size is…
Audio Compression
 Since audio files are usually large, they must be compressed in order to be
delivered on the web.

 The most common compression type is MP3.

 MP3 compression can reduce audio file size from about 15-20 Megabytes to
2-5 Megabytes.

 MP3 compression works by taking out the highs and lows that are
inaudible to the human ear, and also highs and lows that are "hidden"
behind other sounds.
 For example, if you had an audio file that contained simultaneous sounds
of a train and a single bell, obviously, the sound of the bell would be
unable to be heard. MP3 compression takes out the sound of the bell,
thereby reducing the file sizes.
Audio File Format
Audio File Format Standard
AIFF (Audio Interchange File Format)
 Developed by Apple Inc.
 Uncompressed audio – lossless format – bigger file size
 e.g.: 1 min audio data (stereo with 44.1kHz sampling rate at 16 bit) : 10Mb
WAV (Waveform Audio file format)
 Developed by Microsoft and IBM
 Main format used on Windows for raw and typically uncompressed audio
 Uncompressed - big size file – uncommon for audio sharing over the Internet.
 designed by the Moving Picture Experts Group (MPEG) .
 Audio coding scheme with lossy data compression.
 The compression works by reducing accuracy of certain parts of sound that are considered to be beyond
the auditory resolution ability of most people
 MP3 can provide about 12:1 compression from an 44kHz 16-bit stereo WAV file without noticeable
degradation of sound quality
 MP3 playback is not recommended on machines slower than a Pentium or equivalent.
 De facto standard of digital audio compression for the transfer and playback of music on most digital
audio players.
Trade-off when creating an MP3 file : amount of space used and sound quality of the result.
MP3 files quality depend on:
1. Bit-rate can be set (specifies how many kilobits the file may use per second of audio).
 higher the bit rate, larger the compressed file, the closer it will sound to the original file.
 too low a bit rate, compression artifacts may be audible in the reproduction.
2. quality of the encoder
 Diff encoder different quality
3. difficulty of the signal being encoded

Type of MP3 bit rate

 Constant Bit Rate (CBR) – the simplest : 1 bit for entire file; encoding simpler and faster.
 Variable Bit Rate (VBR) - bit rate changes throughout the file
A song comprises of varies parts : some parts will be much easier to compress, such as silence or
music containing only a few instruments, while others will be more difficult to compress. So, the
overall quality of the file may be increased by using a lower bit rate for the less complex passages and
a higher one for the more complex parts.
Web Audio players
 the most common application to play audio files is QuickTime, designed
by Apple Computers and available for both Mac and PC.

 RealAudio was the first company to introduce streaming audio and

broadcasting of live events
 Streaming audio is the process of having the first part of the audio files
play while the remainder is being downloaded.

 Winamp is a free digital audio player for Windows, being able to play both
MP3 and internet radio stations.

 Windows Media Player 12 (only for Windows 7) is another free player,

which includes a CD player, audio and video player, media jukebox, media
guide, Internet radio, portable device music file transfer, and audio CD
 built-in support for many popular audio and video formats—including
3GP, AAC, AVCHD, MPEG-4, WMV, and WMA. It also supports most AVI,
DivX, MOV, and Xvid files.
Audio Editing Software
 Professional League:
 Sonic Foundry http://www.sonicfoundry.com/ sells many audio
editing software programs, such as Acid pro 2.0, Acid music 2.0,
Acid D.J 2.0, Acid Hip Hop 2.0, Acid Latin 2.0, Acid Rock 2.0, Acid
Techno 2.0, Acid Express 2.0, and Vegas audio 2.0,
 more complex audio editing software is Sound Forge.
 used for both professional audio production as well as
multimedia and internet development.
 Files can be saved in many formats, including WAV, WMA, RM,
ASF, and MP3.

 Most popular:
 Adobe Audition – Adobe SoundBooth

 Best free:
 Audacity
MIDI (Musical Instrument Digital Interface)
 It is a communication standard developed in the early 1980s for
electronic instruments and computers.
 It specifies the hardware connection between equipments as well as the
format in which the data are transferred between the equipments.
 In some ways it is the sound equivalent of vector graphics
 Common MIDI devices include electronic music synthesizers, modules,
and MIDI devices in common sound cards.
 MIDI is not sound. MIDI works by sending data signals between
equipment, not audio signals. These signals are also known as "events“
(via series of commands).
 it codes “events” that stand for the production of sounds e.g. values for the
pitch of a single note, its duration, and its volume
 File size is compact but sound generated depends on playback device;
different machine different quality
 only suitable for recording music; they cannot be used to store dialogue.
 more difficult to edit and manipulate.
MIDI Devices
4 basic logical groups of MIDI devices:
1. Synthesizers - components that generate sounds based on the input
of MIDI software messages; create pitched and/or percussion sound
2. Controllers - devices to generate MIDI software messages. MIDI
controllers can take the form of almost any acoustic or electronic
instrument such as keyboards, guitars, drum sets, drum pads, and
even woodwind-like instruments. Keep in mind that a controller does
NOT synthesize or generate audible music. MIDI controllers generate
MIDI software messages that are routed through one or more MIDI
3. Sequencers - device incorporating both MIDI software and hardware,
which is used for storing and replaying MIDI software message
sequences. In effect, the sequencer is the electronic version of the
musician in the MIDI world.
4. Networks
MIDI Synthesizers
 Three types:
(i) Integrated Keyboard and Synthesizer,
(ii) Rack-mounted Synthesizer, and
(iii) a Drum Machine.
MIDI Sequencer
Studio Setup
 MIDI data does not encode
individual samples.
 MIDI data encode musical
events and commands to
control instruments.
 MIDI data are grouped into
MIDI messages.
 Each MIDI message
represents a musical event,
e.g., pressing a key, setting a
switch or adjusting foot
 A sequence of MIDI messages
is grouped into a track.
 An instrument or a computer
satisfies both the hardware
interface and the data format
is known as a MIDI device.
MIDI Channels and Modes
 MIDI devices communicate with each other through channels.
 The MIDI standard specifies each MIDI connection has 16 channels.
 Each instrument can be mapped to a single channel (omni Off), or it can
use all 16 channels (Omni On).
 Some instruments are capable of playing more than one note at the same
time, e.g., organs and piano. This is known as polyphony.
 Other instruments, such as flute, is monophony since they can only play
one note at a time.
 Each MIDI device must be set to one of the modes for receiving MIDI data:
Instrument Patch

 Each MIDI device is usually

capable of producing sound
resembling several real
instruments and/or noise effects
(e.g., telephone, aircraft).
 Each instrument or noise effect is
known as a patch, or preset.
 The general MIDI standard
specifies 12 patches (ranges from
0 to 127).
MIDI Softwares
MIDI vs Digital Audio
When to use MIDI Audio?
• Not enough-> RAM, hard disk space, CPU processing
power or bandwidth.
• High-quality MIDI sound source.
• High-quality MIDI playback hardware.
• No need spoken dialog.
Steps Adding Sound to Multimedia Project

1. Determine the file formats that are compatible with

the multimedia authoring software and delivery
2. Determine the sound playback capabilities ( codecs
and plug-ins) that the end user’s system offers.
3. Decide what kind of sound is needed such as
background music, sound effects or spoken dialogs.
4. Decide where these audio events will occur in the
flow of the project. Jot down in the storyboard.
Steps Adding Sound to Multimedia Project
5. Decide where and when to use either digital audio
or MIDI data.
6. Obtain source material by creating it from scratch
or purchasing it.
7. Edit the sounds to fit the project.
8. Test the sounds to be sure they are timed properly
with the project’s images.
 Example MIDI and Digital Sound
 https://www.youtube.com/watch?v=WgnwU6I9SE8

 Digital sound:
 https://archive.org/details/DisneysFrozenOST
Why use Audio?
Obvious advantage: disabled users
For other users:
 extra channel of information, if the meaning is unclear using visual
information alone, the audio may clarify it.
 provide additional information to support different learning styles.
 can add a sense of realism, allow to convey emotion, time period, geographic
location, etc.
 directing attention to important events. Non-speech audio may be readily
identified by users, for example the sound of breaking glass to signify an
error. Since audio can grab the users attention so successfully, it must be
used carefully so as not to unduly distract from other media.
 It can add interest to a presentation or program.
 Like most media, files can be large. However files sizes can be
reduced by various methods and streamed audio can be delivered
over the Web.
 Audio can be easily overused. When used in a complex environment
it can increase the likelihood of cognitive overload. Studies have
shown that while congruent use of audio and video can enhance
comprehension and learning, incongruent material can significantly
reduce it. That is, where multiple media are used they should be
highly related to each other to be most effective.
 For most people, audio is not as memorable as visual media.
 Good quality audio can be difficult to produce, and like other media
most commercial audio, particularly music, is copyright.
 Users must have appropriate hardware and software. In an open plan
environment this must include headphones.