Вы находитесь на странице: 1из 7

AUDIO COMPRESSION

Varun Kumar Sen, RB6703B50, 3460070010,


Dept. of ECE, Lovely Professional University, Phagwara, Punjab.144402
E-mail id:varun01638@gmail.com

ABSTRACT: digital radio and television broadcast, Digital


Versatile Disc (DVD), and many others.
Digital audio compression enables more efficient
This paper introduces digital audio signal
storage and transmission of audio data. The many
compression, a technique essential to the
form of audio compression techniques offer a range
implementation of many digital audio applications.
of encoder and decoder complexity, compressed
Digital audio signal compression is the removal of
audio quality, and differing amounts of data
redundant or otherwise irrelevant information from a
compression. The MPEG/audio standard is a high
digital audio signal a process that is useful for
complexity, high compression, and high quality
conserving both transmission bandwidth and storage
algorithm. These techniques apply to general audio
space. We begin by defining some useful
signals and not specifically tuned for speech signals.
terminology. We then present a typical “encoder” (as

1. INTRODUCTION: compression algorithms are often called), and explain


how it functions.
Advances in digital audio technology are consists of
two sources: hardware developments and new signal 2. THEORY:
processing techniques. When processors dissipated Digital audio compression allows the efficient
tens of watts of power and memory densities were on storage and transmission of audio data. Audio
the order of kilobits per square inch, portable compression is designed to reduce the transmission
playback devices like an MP3 player were not bandwidth requirement of digital audio streams and
possible. Now, however, power dissipation, memory the storage size of audio files. Audio compression
densities, and processor speeds have improved by algorithms are implemented in computer software as
several orders of magnitude. Increasing hardware audio codec. Data compression algorithms perform
efficiency and an expanding array of digital audio poorly with audio data and reduce the data size much
representation formats are giving rise to a wide below 87% from the original. And produces
variety of new digital audio applications. These specifically optimized audio lossless and loses
applications include portable music playback devices, algorithms. There is greater compression rate
digital surround sound for cinema, high-quality provides by lose algorithms and are used in
mainstream consumer audio devices. In both lose and
lossless compression, information superfluously is both of which require streaming audio and video.
reduced, using methods such as coding, pattern Compression generally is presented in two different
recognition and linear prediction to reduce the forms known as lossy and non-lossy or lossless.
amount of information used to represent the Lossless compression uses formulas to look for
uncompressed data. Lossless audio compression redundancy within data and represent that
produces a digital data that can be expanded to an redundancy by using less information. By reversing
exact digital duplicate of the original audio stream. the process the data can be reproduced in an exact
The trade-off between slightly reduced audio quality form mirroring the original bit for bit. Lossy
and transmission or storage size is outweighed by the compression schemes throw away part of the data to
latter for most practical audio applications in which get a smaller size. Using formulas, a description of
users may not perceive the loss in playback rendition the useful components of the data is recorded, and
quality. For example, one compact disk (CD) holds any excess information is left out. When
approximately one hour of uncompressed high reconstructed during decompression the reproduced
fidelity music, less than 2 hours of music compressed data is often substantially different from the original,
lossless, or 7 hours of music compressed in the MP3 but since only the least perceptually relevant portions
format at medium bit rates. of the signal are prone to disposal due to the psycho
acoustic complexity of the compression methods, the
There are two processes which reduces the data rate
removed data can be very hard to detect. Lossy
or storage size of digital audio signal.
compression results in vast improvements in final
1). Dynamic range compression
storage requirements, which makes the often-

In this process, compression is performed without imperfect output quite acceptable. One of the biggest

reduces the amount of digital data. drawbacks with lossy schemes is that the effect is
additive in that successive iterations of saving the
2). Time compressed speech data will begin to show greater data loss. For this
reason, they should never be used in the studio and
In this process, compression is done with reduces the
are only of use for final output.
amount of time it takes to listen and to recording.

Compression is the reduction in size of data in order 3.Basic building blocks:


to save space or transmission time.
There are numerous goals when compressing data,
many of which are especially relevant to audio.
Among these goals is reducing the required storage
space, which in turn also acts to reduce the cost of
storage. Another goal in compression of audio is
reducing the bandwidth required to transfer the
content. This aspect is especially relevant when
applied to the Internet and commercial television
The human ear detects sounds as a local variation in
air pressure measured as the Sound Pressure Level
(SPL). If variations in the SPL are below a certain
threshold in amplitude, the ear cannot detect them.
This threshold, shown in Figure 2, is a function of the
sound’s frequency.

Figure 1 shows a generic encoder that takes blocks of


a sampled audio signal as its input. These blocks
typically consist of between 500 and 1500 samples
per channel, depending on the encoder specification.
For example, the MPEG-1 layer III (MP3)
specification takes 576 samples per channel per input
block. The output is a compressed representation of
the input block that can be transmitted or stored for
subsequent decoding.

3.1 Psychoacoustics:

The basic of how to reduce the size of the input data


is comes from eliminate information that is inaudible 3.3 Frequency Masking:

to the ear. This type of compression is often referred


Even if a signal component exceeds the hearing
to as perceptual encoding. To help determine what
threshold, it may still be masked
can and cannot be heard, compression algorithms rely
by louder components that are near it in frequency.
on the field of psychoacoustics, i.e., the study of
This phenomenon is known as
human sound perception. Specifically, audio
frequency masking or simultaneous masking. Each
compression algorithms exploit the conditions under
component in a signal can cast a “shadow” over
which signal characteristics obscure or mask each
neighboring components. If the neighboring
other. This phenomenon occurs in three different
components are covered by this shadow, they will not
ways: threshold cut-off, frequency masking, and
be heard. The effective result is that one component,
temporal masking. The remainder of this section
the masker, shifts the hearing threshold. Figure 3
explains the nature of these concepts; subsequent
shows a situation in which this occurs.
sections explain how they are typically applied to
audio signal compression
.
3.2 Threshold Cut-off:
analyzing the signal’s frequency characteristics and
determining thresholds. There are several different
techniques for converting a finite time sequence into
its spectral representation, and these typically fall
into one of two categories: transforms and filter
banks. Transforms calculate the spectrum of their
inputs in terms of a set of basis sequences; e.g., the
Fourier Transform uses basis sequences that are
complex exponentials. Taking the spectrum of a
signal has two purposes: to derive the masking
thresholds in order to determine which portion of the
3.4 Temporal Masking:
signal can be dropped, and to generate a
representation of the signal to which the masking
Just as tones cast shadows on their neighbors in the
threshold can be applied.
frequency domain, a sudden increase in volume can
The most popular transform in signal processing is
mask quieter sounds that are temporally close. This
the Fast Fourier Transform (FFT). Given a finite time
phenomenon is known as temporal masking. Figure 4
sequence, the FFT produces a complex-valued
illustrates a typical temporal masking scenario:
frequency domain representation. Encoders often use
events below the indicated threshold will not be
FFTs as a first step toward determining masking
heard. Note that the premasking interval is much
thresholds. Another popular transform is the Discrete
shorter than the post-masking interval.
Cosine Transform (DCT), which outputs a real-
valued frequency domain representation. Both the
FFT and the DCT suffer from distortion when
transforms are taken from contiguous blocks of time
data. To solve this problem, inputs and outputs can be
overlapped and windowed in such a way that, in the
absence of lossy compression techniques, entire time
signals can be perfectly reconstructed. For this
reason, most transform-based encoding schemes
employ an overlapped and windowed DCT known as
the Modified Discrete Cosine Transform (MDCT).
3.5 Spectral Analysis: Some compression algorithms that use the MDCT are
MPEG-1 layer-III, MPEG-2 AAC, and Dolby AC-3.
Of the three masking phenomena explained above, Filter banks pass a block of time samples through
two are best described in the frequency domain. several band pass filters to generate different signals
Thus, a frequency domain representation, also called corresponding to different subbands in frequency.
the “spectrum” of a signal, is a useful tool for After filtering, masking thresholds can be applied to
each subband. Two popular filter bank structures are that the temporal masking phenomenon obscures the
the poly-phase filter bank and the wavelet filter bank. noise.
The poly-phase filter bank uses parallel band-pass
filters of equal width whose outputs are 4.APPLICATIONS:
downsampled to create one (shorter) signal per
subband. In the absence of lossy compression The purpose of this section is to discuss some
techniques, a decoder can achieve perfect existing standards in digital audio compression, in
reconstruction by upsampling, filtering, and adding particular the MPEG-1 layer III, MPEG-2 AAC, and
each subband. This type of structure is used in all of Dolby AC-3. Features of interest for each standard
the MPEG-1 audio encoders. include which compression techniques are used,
special details or unique characteristics, and target
3.6 Noise Allocation: applications. Before discussing MPEG algorithms, let
us briefly discuss the MPEG naming convention. The
To encode a signal’s representation, an encoder must Moving Pictures Experts Group (MPEG) has
quantize (assign a binary representation to) each established many audio compression standards.
frequency component. If the encoder allocates more When we speak of MPEG audio encoding schemes
bits to each frequency, less error (noise) is we always refer to both a phase and a layer. The
introduced, but more space is required to store the phase is an Arabic numeral that indicates the MPEG
result. Conversely, fewer bits allocated to each release, e.g., MPEG-1 is the standard that was
frequency results in more noise, but less space is released in 1992. The layer is a Roman numeral
required to store the result. However, if noise is kept between I and III; this refers to a particular decoder.
beneath the shifted hearing threshold, it will not be Higher layers indicate a more complex decoder, e.g.,
audible. In order to preserve storage space MPEG-1 layer-III, known commonly as MP3, is the
(transmission bandwidth), perceptual encoders use most powerful decoder specified by MPEG-1. (Note
just enough bits to encode each frequency without that MP3 does not stand for MPEG-3.) Unless
introducing audible quantization noise. This otherwise specified, MPEG standards are backward
technique, known as noise allocation, can be compatible, i.e., the decoders used by older standards
problematic when the input signal contains a brief can interpret data compressed by newer standards.
attack preceded by silence. Noise introduced in the This backward compatibility applies to both a
frequency domain will be evenly spread throughout decoder’s phase and layer. Thus an MPEG-1 layer III
the reconstructed time block. Because the shifted can decode a backward compatible MPEG-2 signal,
hearing threshold is determined by both the loud and and any layer III decoder can read data from the
soft parts of the input, the silent segment of the simpler layer II. An exception is MPEG-2 Advanced
reconstructed signal will often be audibly corrupted. Audio Coding (AAC): this is an extension to the
This is called a pre-echo; the most common solution MPEG-2 standard that was added in 1997 to improve
to this problem is to use short enough time blocks compression for surround-sound signals. MPEG-1
layer-III, with a minimum output rate of 112 kbps,
has become the most popular format for Internet The purpose of this term paper is to present the
music. The first encoding stage combines a 32- concepts of contemporary audio compressions. Thus
subband polyphase filter bank with an MDCT that here by the MPEG/audio compression algorithm is an
transforms the filter bank’s 32 output signals into the ISO standard for high-fidelity audio compression.
frequency domain. This combination is known as a The MPEG/audio standard has three layers of
hybrid filter bank because it employs both a filter successive complexity for improved compression
bank and a transform. In the event of a time domain performance. The MPEG/audio standard is a high
attack, the input block size is cut from 576 samples to complexity, high-compression, and high audio
192, which helps avoid pre-echo artifacts. After the quality algorithm. These techniques apply to general
MDCT, masking thresholds are calculated and audio signals and are not specifically tuned for
applied based on the spectral properties of the input speech signals.
signal. Stereo redundancies are then eliminated, and
the resulting frequency components are Huffman REFERENCES:
encoded, formatted into a frame, and stored or
transmitted. Because the decoder does not have to 1. Digital Technical Journal, Vol. 5, Page
calculate masking thresholds, it is less complex than No.2, 1993
the encoder. All the decoder must do is unpack the 2. “Madisetti, Vijay K., and Williams,
frame, decode the Huffman code, rescale the Douglass B.”, “ The Digital Signal
Processing” , CRC Press LLC, 1998
frequency lines, and apply the inverse filter bank.
This simplified decoding process is desirable for
applications such as portable music devices because
it greatly reduces the cost and computing demands of
real-time decoding.
Dolby AC-3 (Dolby Digital) is the audio compression
format used in many movie theaters, home theaters,
and in High-Definition Television (HDTV) in the
United States. The AC-3 encodes a 5.1 channel audio
signal into a 384 kbps bit stream. The first stage of an
AC-3 encoder takes 512 input samples and applies an
MDCT. In order to preserve dynamic range, it then
splits the output frequency components into
mantissas and exponents. These values are then
quantized, coupled with other channels, and packed
into frames for transmission.

5.CONCLUSION:

Вам также может понравиться