Академический Документы
Профессиональный Документы
Культура Документы
Introduction
5/8/2013
Outline
Psychoacoustic Coding
MPEG
Chapter 5: Audio Compression
Sampling(1)
Standard
BW (Hz)
Compression
Rate (KHz)
Precision
(bits)
Bit rate
(kbps)
Quality
IMAADPCM
200-20000
ADPCM
8 - 44.1
32-350
Telephone,
CD
G.711
200-3200
-law PCM
64
Telephone
G.722
50-7000
DPCM
16
64
AM Radio
G.728
200-3200
low-delay
CELP
16
Telephone
G.723
200- 3200
ADPCM
5.3 or
6.3
H.323
Audio CD
20-20000
Linear PCM
44.1
16
1411.2
(stereo)
CD
MPEG-1
20-20000
sub-band
coding
32-48
2-15
256-384
near-CD
Sample
5/8/2013
Sampling(2)
5/8/2013
Uniform or Linear
Quantization
Quantization levels are evenly spaced.
16-bit samples provide plenty of dynamic
range.
CDs do this.
Compression factor 1:1.
File formats
.WAV (MS)
.AIFF (Unix and Mac)
Nonuniform Quantization
5/8/2013
Non-linear Sampling
Output
Input
ln(1 + |x|)
ln(1 + )
10
5/8/2013
-Law Companding
11
-Law Encoding
High-resolution
PCM encoding
(12, 14, 16 bits)
8-bit
Table
Lookup
Inverse
Table
Lookup
-Law
encoding
Sender
Input
Amplitude
33(0)
34(1)-35(2)
...
1111
0000
15
16
...
1111
0000
31
32
...
...
011
Code
Value
0
1
...
010
Quantization
0000
0001
...
1111
0000
47
48
...
...
16
001
...
...
496(463)-512(479)
000
...
...
248(215)-255(222)
256(223)-271(238)
Segment
...
...
124(91)-127(94)
128(95)-135(102)
Receiver
...
...
62(29)-63(30)
64(31)-67(34)
Step
Size
1
14-bit
decoding
1111
63
12
5/8/2013
-Law Decoding
High-resolution
PCM encoding
(12, 14, 16 bits)
Table
Lookup
8-bit
Inverse
Table
Lookup
-Law
encoding
Sender
14-bit
decoding
Receiver
-Law
Encoding
0000000
0000001
Multiplier
1
252(219)
264(231)
...
...
16
0111111
126(93)
132(99)
...
...
0101111
0110000
63(30)
66(33)
...
...
0011111
0100000
...
...
0001111
0010000
Decode
Amplitude
33(0)
35(2)
504(471)
13
Outline
Psychoacoustic Coding
MPEG
Chapter 5: Audio Compression
14
5/8/2013
Difference Encoding
Exploit temporal redundancy in samples.
Difference between 2 x-bit samples can be
represented with significantly fewer than x-bits
Transmit the difference (rather than the sample).
Differential-PCM (DPCM)
15
DPCM
Signal to be encoded
x x x
Prediction Difference/Error
Quantizer
Entropy
Coder
+ +
Predictor
Predicted Signal
x (n) x (n 1)
Simple difference
x (n) hk x (n k )
k1
x (n) 0
x-bit
DPCM
difference
x
Reconstructed Signal
x x x
Linear predictor
PCM
Chapter 5: Audio Compression
16
5/8/2013
17
18
5/8/2013
19
ADPCM
+
y-bit
PCM
sample
Difference
Quantizer
Predicted
PCM
Sample n+1
Predictor
Step-Size
Adjuster
x-bit
ADPCM
difference
+ +
Dequantizer
20
10
5/8/2013
Outline
Psychoacoustic Coding
MPEG
Chapter 5: Audio Compression
21
Psychoacoustic Principles
Based on extensive studies of human perception.
Average human does not hear all frequencies the
same way.
Limitations of the human sensory system leads to
facts that can be used to cut out unnecessary data
in an audio signal.
Two main properties of the human auditory
system:
22
11
5/8/2013
Tq ( f ) 3.64 f /1000
0.8
23
Simultaneous Masking
24
12
5/8/2013
Masking Threshold
25
Simultaneous Masking
Example
Signal
Noise
Signal + Noise
(SNR = 24 dB)
26
13
5/8/2013
Critical Bands(1)
Human auditory
Limited and frequency dependant resolution
25 critical bands
Bark: Barkhausen
1 Bark = width of one critical band
Critical band number (Bark) for a given frequency, z(f):
f < 500Hz => z(f) f/100
f > 500Hz => z(f) 9 + 4 log2(f/1000)
Also given as
z( f ) 13.0 arctan 0.76 f 3.5 arctan( f 2 /56.25)
27
28
Critical Bands(2)
14
5/8/2013
Temporal Masking
Example;
Play 1 kHz masking tone at 60 dB and 1.1 kHz test tone at 40dB.
SPL (dB)
simultaneous
Pre-masking
Post-masking
60
40
20
t (ms)
-50
200
400
Chapter 5: Audio Compression
29
Additive process
The minimum level of audibility in the presence of masking noise
Time varying function of frequency at each frequency at a given
time
500Hz
Masker
Audio
Level
200Hz
Masker
Masker
Threshold
2KHz
Masker
5KHz
Masker
0.05
0.1
0.2
0.5
1.0
2.0
5.0
10
20 KHz
Masked Frequency
Chapter 5: Audio Compression
30
15
5/8/2013
Summary
Auditory masking
Range of masking is 1 Bark (or critical band)
Two factors for masking - frequency masking and
temporal masking.
How can we use this information for
compression?
31
MPEG-1 Audio
Sampling frequency
32, 44.1 and 48 kHz
32
16
5/8/2013
Layer 1
DCT type filter with one frame and equal frequency spread per
band.
Psychoacoustic model only uses frequency masking.
Layer 2
Use three frames in filter (before, current, next, a total of 1152
samples).
This models a little bit of the temporal masking.
Layer 3
Better critical band filter is used (non-equal frequencies)
Psychoacoustic model includes temporal masking effects, and
takes into account stereo redundancy.
Huffman coder.
Known as MP3
Chapter 5: Audio Compression
33
Spectral Analysis
&
Psychoacoustic
Model
Decoding of
Bitstream
32 Subband
Filter
Bit allocation,
Quantization
& Coding
Encoding of
Bitstream
Encoded
Bitstream
Synthesis
Filterbank
Encoded
Bitstream
34
17
5/8/2013
Sub-band Filtering
35
36
18
5/8/2013
Psychoacoustic Model
37
s(n)
N2 b1
N - FFT length
b - number of bits per sample
38
19
5/8/2013
N 1
X[k]
x[n]e
2 nk
N
k 0,..., N 1
n 0
Eulers formula
e jx cos x j sin x
Chapter 5: Audio Compression
39
h(t)
40
20
5/8/2013
j 2
k
N Z[k]
for k 0,1,..., N / 2 1
k
j 2
N
/ 2] Y[k] e
Z[k]
for k 0,1,...,N / 2 1
41
x[n]e
X[k]
n 0
N
1
2
n 0
N
1
2
j 2
k
n
N
N
1
2
k
j 2 (2n )
N
x[2n]e
y[n]e
x[ 2n 1]e
j 2
k
(2n 1)
N
n 0
j 2
k
n
(N / 2)
n 0
j 2
N
1
k
k 2
j 2
n
(N / 2)
N
z[n]e
n 0
42
21
5/8/2013
Y[k]
j 2
y[n]e
k
n
(N / 2)
N
1
2
, Z[k]
n 0
j 2
z[n]e
k
n
(N / 2)
n 0
2 k
j
X[k] Y[k] e N Z[ k]
for k = 0,1,...,
N
1
2
First half
X[k N / 2] Y[k] e
2 k
N Z[ k]
for k = 0,1,...,
N
1
2
Second half
43
X[k N / 2]
x[n]e
j 2
n 0
N
1
2
x[2n]e
n 0
N
1
2
x[2n]e
n 0
N
1
2
n 0
N
1
2
j 2
j 2
kN / 2
(2n )
N
N
1
2
k
(2n )
N
e j 2 n
x[ 2n 1]e
j 2
n 0
N
1
2
x[2n 1]e
k N / 2
(2n 1)
N
j 2
k
(2n 1)
N
e j 2n e j
n 0
k
j 2 (2n )
N
x[2n]e
y[n]e
n 0
k N / 2
n
N
N
1
2
x[2n 1]e
j 2
k
(2n 1)
N
n 0
j 2
k
n
N /2
j 2
N
1
k 2
k
j 2
n
N
N /2
z[n]e
Y[k] e
j 2
k
N Z[k]
n 0
44
22
5/8/2013
DFTs of
even samples
Y[0]
X[0]
Y[1]
X[1]
Y[2]
X[2]
X[3]
Z[0]
W 80
-1
X[4]
Z[1]
W 81
-1
X[5]
Z[2]
W 82
-1
X[6]
Z[3]
W 83
-1
X[7]
Y[3]
DFTs of
odd samples
W Nk e
X[k] Y[k] e
2 k
N Z[k]
X[k N / 2] Y[k] e
2 k
N Z[ k]
2 k
N
45
x[0]
x[4]
Y1[0]
Z1[0]
W 20
-1
x[2]
x[6]
W 20
-1
x[1]
x[5]
Y2[1]
Z 2[0]
Z 2[1]
W 40
-1
W 41
-1
W 20
-1
W 20
-1
Stage 3
8-point FFTs
+
Y4 [0]
Y4 [1]
Y4 [2]
Y4 [3]
Z 4 [0]
+
+
x[3]
x[7]
Y2[0]
Stage 2
4-point FFTs
W 40
W 41
-1
-1
W 80
Z 4 [1] W 1
8
Z 4 [2] 2
W8
Z 4 [3] 3
W8
X[0]
X[1]
X[2]
X[3]
-1
X[4]
-1
X[5]
-1
X[6]
-1
X[7]
2 k
j
W Nk e N
46
23
5/8/2013
Data Windowing(1)
47
Data Windowing(2)
Hann Window
2 n
1
w[n] 1 cos
N
2
48
24
5/8/2013
Data Windowing(3)
2n
1
w[n] 1 cos
2
N
1/16
overlapped
window
512 samples
(12 ms frame)
448 samples
(10.9 ms)
49
w(n)x(n)e
n 0 Hann
Window
2 kn
N
dB
0k
FFT
N
2
PSD
Conversion to dB
50
25
5/8/2013
PSD Example
CD-quality pop music sampled @44.1 kHz, 16 bits per sample
Tonal masker x
Noise masker o
51
4 dB
Tone Masker
SPL (dB)
SPL (dB)
24 dB
Critical
Band
Critical
Band
52
26
5/8/2013
Energies from all spectral components within each critical band are
combined.
53
54
27
5/8/2013
Calculation of Individual
Masking Threshold(1)
Slope =-17
Slope =42
36
30
24
18
12
6
55
Calculation of Individual
Masking Threshold(2)
56
28
5/8/2013
Tonal maskers
Noise maskers
57
m1
l1
58
29
5/8/2013
0.5Bark Window
Chapter 5: Audio Compression
59
Spread of Masking
60
30
5/8/2013
Band
10
11
12
13
14
15
16
Level
(dB)
12
10
10
60
35
20
15
61
Scaling
62
31
5/8/2013
Bit Allocation
63
Quantization Noise
64
32
5/8/2013
Bit Allocation
Each bit
increases
resolution by
~6dB
SNR 20 log
Determined by
Tonal or Noise
masking
threshold
V
VQN
2 N 1
1/2
6.02N(dB)
20 log
Iterative process:
For each sub-band, determine NMR = SNR SMR
Allocate bits one at a time to the sub-band with the lowest NMR. Recalculate
NMR values.
Iterate until all bits used.
Chapter 5: Audio Compression
65
Bitstream Organization
Encoded Samples
384 for Layer 1, 1152 for Layers 2 and 3
Side Info
Encoding parameters (Layer specific)
Frame Header (32 bits)
Syncword
Bit-rate and sampling information.
Number of channels (1 or 2)
Misc. other stuff.
66
33
5/8/2013
Frame Header
Sampling rate frequency index
00 - 44.1 kHz
01 - 48 kHz
10 - 32 kHz
11 - reserved
Pad bit
0 - frame is not padded
1 - frame is padded with one extra slot
Private bit
Channel Mode
00 - Stereo
01 - Joint stereo (Stereo)
10 - Dual channel (2 mono channels)
11 - Single channel (Mono)
Misc.
Copyright
Original
Protection bit
0 - Protected by CRC (16-bit CRC follows header)
1 - Not protected
67
Frame Structure
(0xfff0)
sync
Framelen
Bit-allocation
16-bit
16-bit
4x32=128-bit
Scale-factors
6xN=>192-bit (max.)
Audio Data
12 samples x N
Scale-factors - 6 bits
68
34
5/8/2013
Layer 1
384 samples per frame
Up to 448 kbps
Layer 2
1152 samples per frame
Up to 384 kbps
Layer 3 (MP3)
69
70
35
5/8/2013
Improvements
z(f) f/100
z(f) 9 + 4 log2 (f/1000)
71
Layer
I
II
III
Target
bit rate
192 kbps
128 kbps
64 kbps
Ratio
4:1
6:1
12:1
Quality @
64 kbits
2.1 to 2.6
3.6 to 3.8
Quality @
128 kbits
4+
4+
Theoretical
Min. Delay
19 ms
35 ms
59 ms
72
36