Вы находитесь на странице: 1из 10

Week 7 - Sound Cards & Digital audio

Analogue to Digital Conversion


Analog to digital converter is a device used to convert a continuous physical quantity of voltage to a
digital number which represents the quantity's amplitude.
quantization is involved during the conversion of the input. Errors may occur during the conversion.
ADC mostly performs the conversions as oppose to a single conversion. After the equation, this
results of a sequence of digital values which have been converted from a continuous time and
amplitude analog signal.
Audio fall under two different formats: analog and digital. Standard computer systems can only
produce or manipulate sound in a digital format. However real sound from the outside world is all
audio created in analog format. analog signals are produced by all speakers. Generally users conect
digital speakers to computer. The speakers convert the digital signal to a analog format by digital-to-
analog converter (DAC). This process is incorporated to typical speakers. A chip called a codec is
included in all motherboards to covert digital signals to analog signals.

Embedded sound cards, or onboard audio is compatible to all motherboards, however onboard
audio produces less quality audio in contrast to a dedicated sound card. Having a embedded
dedicated sound card will give users the benefit of producing clear, quality sound.
Dedicated sound cards have a higher signal to noise ratios with lower harmonic distortion. sample
rates of 24-bit and 192-kHz resolutions.
PCI
Peripheral Component Interconnect (PCI) is a local computer bus used for attaching other hardware
devices in a computer system. PCI is an element of the PCI Local Bus standard which supports the
processor buss functions. Any devices which is connected to the PCI bus will appear as a bus master.
The Bus master then connects directly to its own bus which means the processor's addresses are
allocated in the address space.

This is a bus clock syhcnonoused by the parallel bus.

Irq request for sound card is 0.6 or 0.7.
In a computer, processors may receive hardware signals called an interrupt request (IRQ). IRQ can
temporarily stop a running program, instead an special software called an interrupt handler will run.
This softeare will handle events as data receipts from devices recembling, modems, network, or a
key press/mouse movement. The priority of an interrupt request is identified as the interrupt
request level (IRQL)
An index with the format of IRQ followed by a number is how Interrupt lines are identified. For
example, Intel 8259 family of PICs have eight interrupt inputs commonly referred to as IRQ0 through
IRQ7. computer systems based on x86 use two of these PICs, in which the two set of lines are
referred as IRQ0 through IRQ15. IR0 through IR7 are the technical names of the two set of lines. In
addition to the attached IRQ0 through IRQ15 are the names of the ISA bus
Eye scanning devices. for Web designers - tracks where people looking at screen.
Pysco Acoustics
Signal processing is done by the inner ear, which converts sound waveforms to neural stimuli. This
means different waveforms may be heard at the slightest.
[1]

formats such as MP3 are compressed as a technique to make use of all waveforms.
[2]
nonlinear
responses are given by the ear as sounds have vary in levels of intensity. loudness is the name given
of this nonlinear response which is often used by telephone networks and audio noise
reduction systems. This is done before transmitting sound for playback, samples of data is
compressed by nonlinearly technique. Phantom beat nones are produced by sounds close in
frequency as an effect of an persons ears nonlinear response.
The human ear has an limitation of the frequency rate in which they can hear. The sound frequency
ranges from: 20 Hz (0.02 kHz) to 20,000 Hz (20 kHz). The high end of the spectrum decreases hearing
with age. Which means 16 kHz is usually the maximum frequency most adults can hear. musical tone
is 12 Hz and under which is identified as the lowest frequency. the body's sense of touch can
perceive tones between 4 and 16 Hz.
GSM phones optimized for voice enabling reduce data sending by 20
th
. By masking the frequency
in which we recording.
Surround Sound
n:1 bass=low frequency non directional.
5:1
7:1
10:2
22:2 Latest stage of surround sound audio.
Dolby surround started with film compression. Each frame has sprocket holes. Between
sprocket whole two channels for two tracks.
Surround is the digital processing of a soundtrack from a source such as videos, audio or video
games for output to multiple speakers. The multiple speakers come in the following equation: 5.1,
5.2, 6.1, 6.2, or 7.1 or 7.2 configuration. The number of speakers identifies the first number of the
equation. The number of subwoofers is the second number. For example five speakers plus a single
subwoofer is used by a Dolby Digital surround signal processor.
Dolby surround
AC3-encoded digital sound is processed by a Dolby Digital system processes by its processor. After
audio cassette was introduced, sound processing innovations have been produced by Dolby Labs.
The traditional three speakers behind a movie screen with surround sound speakers was first to
expand upon by Dolby. Dolby was also first to produce low frequency speaker effects (LFE) channel
as a subwoofer producing sounds assembling deep bass, explosions and other low frequency sounds.

DTS was introduced as an improvement of sound processing for Dolby surround. In contrast of Dolby
Digital, less compression is used by DTS of audio signal. DTS uses a 3:1 compression ratio however
Dolby Digital's uses ratio of 12:1. Less compression has its advantage since compression has an
affect on the breadth. Due to improvements of less compression, DTS has increase in fullness and
dynamics of recorded sound.

Midi
Music Instrument Digital Interface
Moved into electronic keyboards.

Musical Instrument Digital Interface (MIDI) is a technical standard that describes a protocol, digital
interface and connectors and allows a wide variety of electronic musical instruments, computers and
other related devices to connect and communicate with one another.
[1]
A single MIDI link can carry
up to sixteen channels of information, each of which can be routed to a separate device.
MIDI stands for Musical Instrument Digital Interface. The development of the MIDI system is a
facilitator of the music technology. By Less technical versed musicians and amateurs powerful
computer instrument networks and software have been produced. New and time-saving tools have
been produced by computer musicians. MIDI was introduced in 1982 for manufacturers and
developers of electronic musical instruments. The In 1983, hardware connectors and digital codes
was introduced for in instrument design.
The whole intention of MIDI was to control common functions, such as note events, timing events,
etc by connecting or interfacing instruments of different manufacture. For example a note, patch
change or pedal applied to one instrument will have the same effects of other branded MIDI cables.
MIDI instruments soon became available to all microcomputers, such Apple II. Instruments with
MIDI interface were soon introduced to computers which allowed programmers to write MIDI
sequencing and librarian software.

Audio Encoding Issues
CD-DA (audio CD) has a data rate of 44.1 kHz/16 which means the audio data is sampled at 44,100
times per second with a bit depth of 16. CD-DA is stereo which uses a left and right channel,
therefore double amount of audio data per second is mono with single channel used.
The MP3 audio format uses lossy data compression however audio quality improves with
increasing bitrate:
32 kbit/s generally acceptable only for speech
96 kbit/s generally used for speech or low-quality streaming
128 or 160 kbit/s mid-range bitrate quality
192 kbit/s a commonly used high-quality bitrate
320 kbit/s highest level supported by MP3 standard

Computational effort (time) in encoding/decoding big impact of certrian acpects of tracks
Delay and window size . how long the frequency is sustained for.
Recovery from lossy transmission, packet errors
Lossy Compression
"lossy" compression is known as a data encoding method which works by compressing data. The
downside is some of the data is discarded which means some of the data may be lost. Minimum
amount is lost data needs to be handled and transmitted by a computer.

Compression Approaches
Using the method of differences between sequential data rather than complete files is known as
Delta encoding. Also known as data differencing which is a method used for storing or
transmitting data. Delta encoding can also be called delta compression.
"deltas" or "diffs" is the name given for the discrete for the differences which recorded. Delta
encoding has its advantage as data redundancy is reduced. Deltas are also more space efficient than
their non-encoded.


Predictive coding
Linear predictive coding (LPC) is a tool used mostly in audio signal processing. LPC is also used
in speech processing for representing the spectral envelope of
a digital signal of speech in compressed form. using the information of a linear predictive model.
[1]
it
I is one of the most powerful speech analysis techniques. LPC has been known as an effective
method for encoding good quality speech at a low bit rate with accurate estimates of speech
parameters.
Variable bitrate (VBR) is the bitrate used in sound or video encoding for
telecommunications and computing. However constant bitrate (CBR) and VBR files per time segment
the amount of output data varies . VBR allows a higher bitrate which means more storage space is
needed for segments of media files. The average of these rates can be calculated to produce
an average bitrate for the file.
The following formats can optionally be encoded in VBR: MP3, Opus, WMA, Vorbis, and AAC audio
files.

Variable bit rate encoding is used commonly formats for such as, MPEG-2 video, MPEG-4 Part
2 video (Xvid, DivX, MPEG-4 Part 10/H.264 video, Theora, Dirac and other video compression
formats.
In signal processing is also known as Sub-band coding (SBC) which may be any form of transform
coding. The main role of SBC is to break a signal into numerous different frequency bands and
independently encode each one. In audio and video signals, generally decomposition is completed
first for data compression.
Since SBC is used for audio compression, SBC makes use of auditory masking in the human auditory
system. as wide range of frequencies can be sensitive to humans ears, when adequately loud signal
is active at one frequency, weaker signals at nearby frequencies will not be heard by humans ears.
Its been shown the louder signal masks the softer ones. The louder signal is reffered as the masker;
when the masking is present is called masking threshold.
Many Compression Standards
PCM (Pulse Code Modulation)
Pulse-code modulation (PCM) is a method used to digitally represent sampled analog signals. It is the
standard form of digital audio in the following platforms: computers,Compact Discs, digital
telephony and other digital audio applications. the analog signal is sampled regularly at constant
intervals In a PCM stream. within a range of digital steps amplitudes of samples countered to the
nearest value
PCM streams have two basic properties that determine the reliability to the original analog signal:
the sampling rate, which is the number of times per second the samples are taken; and the bit
depth, which determines the number of possible digital values each sample can withhold
u-LAW (Mu-law logarithmic coding
The -law algorithm is a commanding algorithms which is primarily used in
digital telecommunication systems of North America and Japan. By commanding algorithms
the dynamic range of an audio signal can be reduced. this can increase the signal-to-noise ratio (SNR)
In analog systems which is achieved during transmission. The digital domain can reduce the
quantization error by increasing signal to quantization noise ratio.
LPC-10E (Linear Predictive Coding 2.4kb/s)
Linear predictive coding (LPC) is a tool used mostly in audio signal processing and speech
processing. This is done to representing the spectral envelope( a curve in the frequency
amplitude plane) of a digital signal of speech in compressed form.
LPC was established with the statement which a speech signal is produced by a buzzer at the
end of a tube (voiced sounds). Occasionally LPC has hissing and popping sounds. This model
is actually a close approximation of the reality of speech production however its apparently
unsophisticated. The buzz is produced by the glottis (the space between the vocal folds),
which is characterized by its intensity (loudness) and frequency (pitch). The vocal tract (the
throat and mouth) forms the tube, which is characterized by its resonances. Hisses and pops
are generated by the action of the tongue, lips and throat during sibilants and plosives
CELP 4.8Kb/s code excited LPC builds on LPC
GSM (European Cell Phones, RPE-LPC)
GSM networks operate in a number of different carrier frequency ranges which are
separated into GSM frequency ranges for 2G and UMTS frequency bands for 3G.
Most 2G GSM networks operate in the 900 MHz or 1800 MHz bands. When these bands
were already allocated, the 850 MHz and 1900 MHz bands were used instead. The 400 and
450 MHz frequency bands where assigned in some countries because they were previously
used for first generation systems. Most 3G networks in Europe operate in the 2100 MHz
frequency band.
1650 bytes/sec (at 8000 samples/sec)

ADPCM (adaptive, delta PCM, 24/32/40 kbps)
ADPCM uses a technique for converting sound or analog information to binary information
(numbering scheme in which there are two values, string of 0's and 1's) . This is done by
taking frequent samples of the sound and expressing the value of the sampled sound
modulation in binary terms. ADPCM is used to send sound on fibre optic long distance lines.
Also used to store sound in addition to text, images and code on a CD-ROM.
MPEG Audio Layers (builds on ADPCM)
Layer-2: From 32 kbps to 384 kbps - target bit rate of 128 kbps
Layer-3: From 32 kbps to 320 kbps - target bit rate of 64 kbps
Complex compression, using perceptual models
RealAudio, Windows Media Formats (builds on above, proprietary)

Compression Encoding
data compression, source coding and bit-rate reduction involves encoding information using
less bits than the original representation.

Compression can be either lossy or lossless.
Lossless compression reduces bits by identifying and eliminating statistical redundancy (number of
bits used to transmit a message minus the number of bits of actual information in the message). No
information is lost in lossless compression. Lossy compression reduces bits by identifying
unnecessary information and removing it.
The process of reducing the size of a data file is known as data compression. Compression is reduces
resource usage, such as data storage space or transmission capacity. compressed data must be
decompressed to use.
Data compression is issued by space time complexity (determination of amount of resources to
execute). For example a compression scheme for video may require expensive hardware for the
video to be decompressed fast enough to be viewed. As the video is being decompressed in full,
watching the video may be inconvenient or require additional storage.
The design of data compression schemes involves various factors, including the degree of
compression, the amount of distortion, (using lossy data compression), and the computational
resources required to compress and uncompressed the data.
Efficient resource usage based on principles of compressed sensing is provided by new alternatives
to traditional systems (which sample at full resolution, then compress). The need for data
compression by sampling on selected basis is avoided by sensing techniques.
MP3 Mpeg1 layer 3 audio Compression Approaches
Mpeg1 layer 3 is a standard technology and format for compressing a sound sequence into a very
small file. (one twelfth the size of the original file) The MP3 file is preserved with the original level of
sound quality when its played. MP3 files are usually download-and-play files rather than streaming
sound .
The MP3 digital audio is produced by taking 16 binary digit samples in a second of the analog signal.
The signal is distributed over a spectrum of 44.1 thousand cycles per second (kHz), which means 1.4
million bits of data is required per second of CD quality .
Voice XML & NLP
XML
Voice XML is an Extensible Markup Language (XML) standard for storing and processing digitized
voice. XML defines human and machine voice interaction and inputs recognition. Voice XML uses
voice as an input to a machine for processing which the user desires. Voice application development
is managed by a voice browser.
Voice XML delivers and processers voice dialogs for example in applications
including automated driving assistance, voice access to email, voice directory access and other
services.
There are two basic Voice XML file types:
Static: Hard coded by the application developer
Dynamic: Generated by the server in response to client requests.
Voice XML facilitates the following:
Reduces client/server interactions by specifying multiple interactions per document
Protects developers from low level implementation platform details
Focuses on clear separation of business logic and interaction code
Functions and delivers the same results, regardless of underlying implementation platform
Creates and processes simple dialogs and with the help of Voice XML language tools complex
dialogs may be constructed and maintained.
NLP
Definition - What does Natural Language Processing (NLP) mean?

Natural language processing (NLP) is a method to translate between computer and human
languages. This method makes computers read a line of text to understandably without involving
calculation. Meaning the translation process between computers and humans is automated by NLP
In NLP feeding statistics and models have been used for interpreting phrases, this includes voice
recognition software, human language translation, information retrieval and artificial intelligence.
Since there is difficulty in developing human language translation software, natural language
processing is being developed to create human readable text.
Building software which will analyse, understand and generate human languages naturally is the
main goal of NLP. This will enable communication with a computer like it is a human.








References:
Wikipedia. 2014. Analog-to-digital converter. [online] Available at:
http://en.wikipedia.org/wiki/Analog-to-digital_converter [Accessed: 17 Mar 2014].
TopTenREVIEWS. 2014. Sound Cards vs. Onboard Audio - TopTenREVIEWS. [online] Available at:
http://sound-cards-review.toptenreviews.com/sound-cards-vs.-onboard-audio.html [Accessed: 17
Mar 2014].
Wikipedia. 2014. Conventional PCI. [online] Available at:
http://en.wikipedia.org/wiki/Peripheral_Component_Interconnect [Accessed: 17 Mar 2014].
Wikipedia. 2014. Interrupt request. [online] Available at:
http://en.wikipedia.org/wiki/Interrupt_request [Accessed: 17 Mar 2014].
Wikipedia. 2014. Psychoacoustics. [online] Available at:
http://en.wikipedia.org/wiki/Psychoacoustics [Accessed: 18 Mar 2014].
Ravens, D. 2014. Describe How 5.1 Surround Sound Works | eHow. [online] Available at:
http://www.ehow.com/about_6760082_describe-5_1-surround-sound-works.html [Accessed: 18
Mar 2014].
Indiana.edu. 2014. Chapter Three: How MIDI works. [online] Available at:
http://www.indiana.edu/%7Eemusic/etext/MIDI/chapter3_MIDI.shtml [Accessed: 18 Mar 2014].
Wikipedia. 2014. Lossy compression. [online] Available at:
http://en.wikipedia.org/wiki/Lossy_data_compression [Accessed: 18 Mar 2014].
Wikipedia. 2014. Bit rate. [online] Available at: http://en.wikipedia.org/wiki/Bit_rate#Audio
[Accessed: 18 Mar 2014].
Wikipedia. 2014. Delta encoding. [online] Available at: http://en.wikipedia.org/wiki/Delta_encoding
[Accessed: 18 Mar 2014].
Wikipedia. 2014. Linear predictive coding. [online] Available at:
http://en.wikipedia.org/wiki/Linear_predictive_coding [Accessed: 18 Mar 2014]
Wikipedia. 2014. Variable bitrate. [online] Available at: http://en.wikipedia.org/wiki/Variable_bitrate
[Accessed: 18 Mar 2014].
Wikipedia. 2014. Sub-band coding. [online] Available at: http://en.wikipedia.org/wiki/Sub-
band_coding [Accessed: 19 Mar 2014].
Wikipedia. 2014. Pulse-code modulation. [online] Available at: http://en.wikipedia.org/wiki/Pulse-
code_modulation [Accessed: 19 Mar 2014].
Wikipedia. 2014. -law algorithm. [online] Available at: http://en.wikipedia.org/wiki/-
law_algorithm [Accessed: 19 Mar 2014].
Techopedia.com. 2014. What is Voice XML? - Definition from Techopedia. [online] Available at:
http://www.techopedia.com/definition/24371/voice-xml [Accessed: 7 Apr 2014].
Techopedia.com. 2014. What is Natural Language Processing (NLP)? - Definition from Techopedia.
[online] Available at: http://www.techopedia.com/definition/653/natural-language-processing-nlp
[Accessed: 7 Apr 2014].
Whatis.techtarget.com. 2014. What is ADPCM (adaptive differential pulse-code modulation)? -
Definition from WhatIs.com. [online] Available at: http://whatis.techtarget.com/definition/ADPCM-
adaptive-differential-pulse-code-modulation [Accessed: 6 Apr 2014].
Whatis.techtarget.com. 2014. What is MP3 (MPEG-1 Audio Layer-3)? - Definition from WhatIs.com.
[online] Available at: http://whatis.techtarget.com/definition/MP3-MPEG-1-Audio-Layer-3
[Accessed: 7 Apr 2014].
Wikipedia. 2014. Linear predictive coding. [online] Available at:
http://en.wikipedia.org/wiki/Linear_predictive_coding [Accessed: 6 Apr 2014].
Wikipedia. 2014. GSM. [online] Available at: http://en.wikipedia.org/wiki/GSM [Accessed: 6 Apr
2014].
Wikipedia. 2014. Data compression. [online] Available at:
http://en.wikipedia.org/wiki/Data_compression [Accessed: 7 Apr 2014

Вам также может понравиться