Digital Audio Compression

Digital Audio Compression
Abstract
Compared to most digital data types, with the exception of digital video, the data rates
associated with uncompressed digital audio are substantial. Digital audio compression
enables more efficient storage and transmission of audio data. The many forms of audio
compression techniques offer a range of encoder and decoder complexity, compressed
audio quality, and differing amounts of data compression. The _-law transformation and
ADPCM coder are simple approaches with low-complexity, low-compression, and medium
audio quality algorithms.
The MPEG/audio standard is a highcomplexity, high-compression, and high audio quality
algorithm. These techniques apply to general audio signals and are not specifically tuned for
speech signals.
Compression is the reduction in size of data in order to save space or transmission
time. There are numerous goals when compressing data, many of which are especially
relevant to audio. Among these goals is reducing the required storage space, which in turn
also acts to reduce the cost of storage. Another goal in compression of audio is reducing the
bandwidth required to transfer the content. This aspect is especially relevant when applied to
the Internet and commercial television both of which require streaming audio and video.
Compression generally is presented in two different forms known as lossy and non-lossy or
lossless. Lossless compression uses formulas to look for redundancy within data and
represent that redundancy by using less information. By reversing the process the data can be
reproduced in an exact form mirroring the original bit for bit. Lossy compression schemes
throw away part of the data to get a smaller size. Using formulas, a description of the useful
components of the data is recorded, and any excess information is left out. When
reconstructed during decompression the reproduced data is often substantially different from
the original, but since only the least perceptually relevant portions of the signal are prone to
disposal due to the psycho acoustic complexity of the compression methods, the removed
data can be very hard to detect. Lossy compression results in vast improvements in final
storage requirements, which makes the often-imperfect output quite acceptable. One of the
biggest drawbacks with lossy schemes is that the effect is additive in that successive
iterations of saving the data will begin to show greater data loss. For this reason, they should
never be used in the studio and are only of use for final output.
1
INTRODUCTION
Digital Audio Data
The digital representation of audio data offers many advantages: high noise immunity,
stability, and reproducibility. Audio in digital form also allows the efficient implementation
of many audio processing functions (e.g., mixing, filtering, and equalization) through the
digital computer.
The conversion from the analog to the digital domain begins by sampling the audio input
in regular, discrete intervals of time and quantizing the sampled values into a discrete number
of evenly spaced levels. The digital audio data consists of a sequence
of binary values representing the number of quantizer levels for each audio sample. The
method of representing each sample with an independent code word is called pulse code
modulation (PCM). Figure 1 shows the digital audio process.
2
Audio compression:
Audio compression is a form of data compression designed to reduce the size of audio files.
Audio compression algorithms are implemented in computer software as audio codecs.
Generic data compression algorithms perform poorly with audio data, seldom reducing file
sizes much below 87% of the original, and are not designed for use in real time.
Consequently, specific audio "lossless" and "lossy" algorithms have been created. Lossy
algorithms provide far greater compression ratios and are used in mainstream consumer audio
devices.
As with image compression, both lossy and lossless compression algorithms are used in
audio compression, lossy being the most common for everyday use. In both lossy and lossless
compression, information redundancy is reduced, using methods such as coding, pattern
recognition and linear prediction to reduce the amount of information used to describe the
data.
The trade-off of slightly reduced audio quality is clearly outweighed for most practical audio
applications where users cannot perceive any difference and space requirements are
substantially reduced. For example, on one CD, one can fit an hour of high fidelity music,
less than 2 hours of music compressed losslessly, or 7 hours of music compressed in MP3
format at medium bit rates.
AUDIO COMPRESSION TECHNIQUE:

Lossless audio compression
Lossless audio compression allows one to preserve an exact copy of one's audio files, in
contrast to the irreversible changes from lossy compression techniques such as Vorbis and
MP3 Compression ratios are similar to those for generic lossless data compression (around
50–60% of original size), and substantially less than for lossy compression (which typically
yield 5–20% of original size).
Audio quality
Being lossless, these formats completely avoid compression artifacts. Audiophiles
thus favor lossless compression.
As file storage and communications bandwidth have become less expensive and more
available, lossless audio compression has become more popular.
Difficulties in lossless compression of audio data
It is difficult to maintain all the data in an audio stream and achieve substantial compression.
First, the vast majority of sound recordings are highly complex, recorded from the real world.
As
3
one of the key methods of compression is to find patterns and repetition, more chaotic data
such as audio doesn't compress well. In a similar manner, photographs compress less
efficiently with lossless methods than simpler computer-generated images do. But
interestingly, even computer generated sounds can contain very complicated waveforms that
present a challenge to many compression algorithms. This is due to the nature of audio
waveforms, which are generally difficult to simplify without a (necessarily lossy) conversion
to frequency information, as performed by the human ear.
The second reason is that values of audio samples change very quickly, so generic data
compression algorithms don't work well for audio, and strings of consecutive bytes don't
generally appear very often. However, convolution with the filter [-1 1] (that is, taking the
first difference) tends to slightly whiten (decorrelate, make flat) the spectrum, thereby
allowing traditional lossless compression at the encoder to do its job; integration at the
decoder restores the original signal. Codecs such as FLAC, Shorten and TTA use linear
prediction to estimate the spectrum of the signal. At the encoder, the estimator's inverse is
used to whiten the signal by removing spectral peaks while the estimator is used to
reconstruct the original signal at the decoder.
Lossy audio compression
Lossy audio compression is used in an extremely wide range of applications. In addition to
the direct applications (mp3 players or computers), digitally compressed audio streams are
used in most video DVDs; digital television; streaming media on the internet; satellite and
cable radio; and increasingly in terrestrial radio broadcasts. Lossy compression typically
achieves far greater compression than lossless compression (data of 5 percent to 20 percent of
the original stream, rather than 50 percent to 60 percent), by discarding less-critical data.
The innovation of lossy audio compression was to use psychoacoustics to recognize that not
all data in an audio stream can be perceived by the human auditory system. Most lossy
compression reduces perceptual redundancy by first identifying sounds which are considered
perceptually irrelevant, that is, sounds that are very hard to hear. Typical examples include
high frequencies, or sounds that occur at the same time as louder sounds. Those sounds are
coded with decreased accuracy or not coded at all.
While removing or reducing these 'unhearable' sounds may account for a small percentage of
bits saved in lossy compression, the real savings comes from a complementary phenomenon:
noise shaping. Reducing the number of bits used to code a signal increases the amount of
noise in that signal. In psychoacoustics-based lossy compression, the real key is to 'hide' the
noise generated by the bit savings in areas of the audio stream that cannot be perceived. This
is done by, for instance, using very small numbers of bits to code the high frequencies of
most signals - not because the signal has little high frequency information (though this is also
often true as well), but rather because the human ear can only perceive very loud signals in
this region, so that softer sounds 'hidden' there simply aren't heard.
Usability
Usability of lossy audio codecs is determined by:
• Perceived audio quality
• Compression factor
4
• Speed of compression and decompression
• Inherent latency of algorithm (critical for real-time streaming applications;
see below)
• Software and hardware support
Lossy formats are often used for the distribution of streaming audio, or interactive
applications (such as the coding of speech for digital transmission in cell phone networks). In
such applications, the data must be decompressed as the data flows, rather than after the
entire data stream has been transmitted. Not all audio codecs can be used for streaming
applications, and for such applications a codec designed to stream data effectively will
usually be chosen.
Audio compression is very similar to data compression, but due to the complex nature and
mathematical randomness of digital audio, the formulas used have to understand a number of
principals. These principals involve understanding how the human hearing system works and
using this information to selectively decide which portions of data are worth saving and
which will go unmissed when absent. There are two main methods used in order to achieve
this result. The first of these processes is known as Psychoacoustics and works by
understanding what the ear is capable of hearing.
NOISE MASKING AND MPEG FILES:
Masking is another important factor. Masking can be broken down into the two
most commonly accepted types, which are auditory and temporal. Auditory
5
masking is primarily based on the relationship between frequencies and their
volumes. “The simultaneous masking effect (sometimes referred to as "auditory
masking") may be best described by analogy. Think of a bird flying in front of the
sun. ) Temporal masking is based on time, rather than frequency like with
auditory masking. With temporal masking the idea is that it is difficult to hear
distinct sounds that are too close to each other in time. If there was a loud and
quiet sound happening too closely together, most people could not make the
distinction between one and the other, and simply assume they are the same
sound. There is a masking threshold, which determines the distance between
sounds which humans can recognize as being separate. The distance is
somewhere around five milliseconds, plus or minus depending on the tones
used.
There are many utilities available to compress audio with. One of the most
common forms of audio compression found on the Internet is MP3 or MPEG1
Layer III. MP3s provide excellent quality for the file size, however requires a
large amount of processing power to encode and decode. MP2, or MPEG1
Layer II, is the predecessor of MP3 and is not quite as hefty from a processing
point of view. MP2 is less compressed than MP3, which means that at low
6
bitrates the quality of the sound is pretty poor. The ability of MP3 to carry
high quality audio at very low bitrates is the reason it is so popular over the
Internet where bandwidth is generally limited. MP3 uses both lossy and
lossless compression. First it uses the perceptual encoding, which is lossy,
and then Huffman encoding, which is non lossy. This is the same type of
compression as used in zip files. Another commonly used compression form
is minidisc, which is a very lossy process. It is more lossy than MP3 in that it
does not have a layer of non lossy compression. AC3 is becoming more
popular, however it is considerably larger in size than MP3.
Summary:
Techniques to compress general digital audio signals include _-law and adaptive
differential pulse low-complexity, low-compression, and medium audio quality algorithms to
audio signals. A third technique, the MPEG/audio compression algorithm, is an ISO standard
for high-fidelity audio compression.
The MPEG/audio standard has three layers of successive complexity for improved
compression performance.
7
8

Digital Audio Compression

Загружено:

Сведения о документе

Исходное описание:

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Digital Audio Compression

Загружено:

Авторское право:

Доступные форматы

Digital Audio Compression

Digital Audio Data

AUDIO COMPRESSION TECHNIQUE:

NOISE MASKING AND MPEG FILES:

Вам также может понравиться