Академический Документы
Профессиональный Документы
Культура Документы
Speech Processing
Contents
Introduction
Time-Dependent Processing of Speech
Short-Time Energy and Average Magnitude
Short-Time Average Zero Crossing Rate
Speech vs. Silence Discrimination Using Energy and
Zero-Crossing
The Short-Time Autocorrelation Function
The Short-Time Average Magnitude Difference
Function
Time-Domain Methods for
Speech Processing
Introduction
Speech Processing Methods
Time-Domain Method:
Involving the waveform of speech signal
directly.
Frequency-Domain Method:
Involving some form of spectrum
representation.
Time-Domain Measurements
Average zero-crossing rate, energy, and the
autocorrelation function.
Very simple to implement.
Provide a useful basis for estimating
important features of the speech signal, e.g.,
Voiced/unvoiced classification
Pitch estimation
Time-Domain Methods for
Speech Processing
Time-Dependent
Processing of Speech
Time Dependent Natural of Speech
This is a test.
Time Dependent Natural of Speech
Short-Time Behavior of Speech
Assumption
The properties of speech signal change
slowly with time.
Analysis Frames
Short segment of speech signal.
Overlap one another usually.
Time-Dependent Analyses
Analyzing each frame may produce either a
single number, or a set of numbers, e.g.,
Energy (a single number)
Vocal tract parameters (a set of numbers)
n: Frame index
x(m): Speech signal
T[ ]: A linear or nonlinear transformation.
w(m): A window function (finite of infinite).
General Form
Qnn
Q TT[[xx((mm)])]ww((nnmm))
mm
Short-Time n
En x (m) 2
Energy m n N 1
Example
Short-Time n
En x (m) 2
Energy m n N 1
Example
T [ x(m)] x (m)
2
1 0 m N 1
w(m)
0 otherwise
n
Short-Time
Ennn
E 2
TT[[xxx((m
m()]
)]w
m ) ((nnm
w m))
Energy mm
m N 1
n
General Short-Time-Analysis Scheme
Linear
Linear Lowpass
Lowpass
TT[[ ]]
Filter
Filter Filter
Filter
Depending on the
choice of window
Time-Domain Methods for
Speech Processing
Lip Sync
Short-Time Energy
En [ x(m)w(n m)]
m
2
x
m
2
( m) w ( n m)
2
x
m
2
( m ) h ( n m)
x 2 ( m) * h ( m)
Short-Time Average Magnitude
Mn | x(m) | w(n m)
m
| x(m) | *w(m)
Block Diagram Representation
x(n) x2(n)
[[ ]]22 h(n)
h(n) En
h( n) w ( m)
2
x(n) |x(n)|
|| || w(n)
w(n) Mn
What
What isis the
the effect
effect of
of windows?
windows?
Block Diagram Representation
x(n) x2(n)
[[ ]]22 h(n)
h(n) En
h( n) w ( m)
2
x(n) |x(n)|
|| || w(n)
w(n) Mn
The Effects of Windows
Window length
Window function
Rectangular Window
1 0 n N 1
h( n)
0 otherwise
j
j((NN11))//22
sin(
sin(N
N // 22))
H((ee )) ee
j
H j
sin(
sin(// 22))
j
j((NN11))/ /22
sin(
sin(N
N //22))
H((ee )) ee
j
H j
sin(
sin(
//22))
Rectangular Window
8 | H (e j ) |
N=8
Peak sidelobe
2 2 2 2
N N
Mainlobe
m
width
j
j((NN11))/ /22
sin(
sin(N
N //22))
H((ee )) ee
j
What
What is
is this?
this? H j
sin(
sin(
//22))
Rectangular Window
Discuss
Discuss the
the effect
effect of
of window
window duration.
duration.
Discuss
Discuss the
the effect
effect of
of mainlobe
mainlobe width
width and
and sid
sid
8 | H (e j ) |
N=8
Peak sidelobe
2 2 2 2
N N
Mainlobe
m
width
Commonly Used Windows
Rectangular
1
0.8
0.6 Hamming
Bartlett
0.4
Hanning
0.2
Blackman
0
0 5 10 15 20
Commonly Used Windows
11 00nnNN11
Rectangular ww((nn))
00 otherwise
otherwise
22nn/(/(NN11)) 00nn((NN11))/ /22
Bartlett
ww((nn)) 2222nn/(/(NN11)) ((NN11))/ /22nnNN11
(Triangular) 0 otherwise
0 otherwise
0.5 0.5 cos[2n /( N 1)] 00nnNN11
Hanning ww((nn)) 0.5 0.5 cos[2n /( N 1)]
00 otherwise
otherwise
00.54
. 5400.46
.46cos[
cos[22n
n/(/(NN11)])] 00nnNN11
Hamming ww((nn))
00 otherwise
otherwise
0.42 0.5 cos[2n /( N 1)] 0.8 cos[4n /( N 1)] 00nnNN11
Blackman ww((nn)) 0.42 0.5 cos[2n /( N 1)] 0.8 cos[4n /( N 1)]
00 otherwise
otherwise
Commonly Used Windows
Hanning
Rectangular Hamming
Bartlett
Blackman
Examples: Short-Time Energy
To provide
the basis for distinguishing
voiced speech segments from unvoiced
segments.
Silence detection.
Differences of En and Mn
EEnn
[[xx((mm))ww((nnmm)])]
22 Emphasizing large sample-to-
sample variations in x(n).
mm
Recursive formulas:
Short-Time Average
Zero-Crossing Rate
Voiced and Unvoiced Signals
Th/i/s
Thi/s/
The Short-Time Average Zero-Crossing Rate
ZZnn
||sgn[ m)])]sgn[
sgn[xx((m m11)])]||ww((nnm
sgn[xx((m m))
mm
1 x ( m) 0 1
sgn[ x(m)] w(m) 0 m N 1
1 x ( m) 0 2N
1
2
The Short-Time
Autocorrelation Function
Autocorrelation Functions
((kk))
xx((mm))xx((mmkk))
mm
x(m)
k
x(m+k)
((kk))
x
x ((mm))xx((mmkk))
mm
Properties
1. Even: (k) = (k).
2. (k) (0) for all k.
3. (0) is equal to the energy of x(m).
x(m)
k
x(m+k)
((kk))
x
x ((mm))xx((mmkk))
mm
Properties
4. If x(m) has period P, i.e. x(m)= x(m+P), then
k
x(m+k)
((kk))
x
x ((mm))xx((mmkk))
mm
Properties
4. If x(m) has period P, i.e. x(m)= x(m+P), then
Short-Time Version
RRnn((kk))
x
x ((mm))ww((nnmm))xx((mmkk))ww((nnkk
m
m ))
mm
x(m)
n
x(m)w(nm)
k
x(m+k)w(nkm)
RRnn((kk))
xx((mm))ww((nnmm))xx((mmkk))ww((nnkkmm))
mm
Property
RRnn((kk)) RRnn((kk))
x(mk)w(n+km)
Rn(k) k
x(m)w(nm)
k Rn(k)
x(m+k)w(nkm)
RRnn((kk))
xx((mm))ww((nnmm))xx((mmkk))ww((nnkkmm))
mm
Property
RRnn((kk)) RRnn((kk))
Rn (k ) x(m)w(n m) x(m k )w(n k m)
m
x(m) x(m k )[w(n m)w(n m k )]
m
y k(m) h k(n m)
yk (n) * hk (n)
yykk((nn)) xx((nn))xx((nnkk))
hhkk((nn))ww((nn))ww((nnkk))
Property
RRnn((kk)) RRnn((kk)) yk (n) * hk (n)
Rn (k ) x(m)w(n m) x(m k )w(n k m)
m
x(m) x(m k )[w(n m)w(n m k )]
m
y k(m) h k(n m)
yk (n) * hk (n)
yykk((nn)) xx((nn))xx((nnkk))
hhkk((nn))ww((nn))ww((nnkk))
Property
RRnn((kk)) RRnn((kk)) yk (n) * hk (n)
x(n) hhkk(n)
(n) Rn(k)
zzkk
RRnn((kk))
xx((mm))ww((nnmm))xx((mmkk))ww((nnkkmm))
mm
Another Formulation
ww''((nn)) ww((nn))
Rn (k ) x(m)w[(m n)]x(m k )w[(m n k )]
m
x(m)w' (m n) x(m k )w' (m n k )
m
x(m n)w' (m) x(m n k )w' (m k )
m
ww''((nn)) 00 for
for 00 nn NN 11
Another Formulation
ww''((nn)) ww((nn))
Rn (k ) x(m)w[(m n)]x(m k )w[(m n k )]
m
x(m)w' (m n) x(m k )w' (m n k )
m
x(m n)w' (m) x(m n k )w' (m k )
m
N 1 k
[ x(n m)w' (m)][ x(n m k )w' (m k )]
m 0
AA noncausal
noncausal formulation
formulation
N=401
N=401
Examples
voiced
voiced
Unvoiced
Unvoiced
N=401
N=401 R (k ) 1 k / N , | k | N
N=251
N=251
N=125
N=125
Modified Short-Time
Autocorrelation Function
Original Version:
RRnn((kk))
[[xx((nnmm))ww''((mm)][
)][xx((nnmmkk))ww' '((mmkk))]]
mm
Modified Version:
RRnn((kk))
[[xx((nnmm))ww11((mm)][
)][xx((nnmmkk))ww22((mmkk))]]
mm
RRnn((kk))
[[xx((nnmm))ww11((mm)][
)][xx((nnmmkk))ww22((mmkk))]]
Modified Short-Time mm
Autocorrelation Function
ww11((mm))
ww11((mm))
11 0011 NN 11
ww11((mm))
00 otherwise
otherwise
ww22((mm)) K Max. lag
11 0011 NN 11KK
ww22((mm))
00 otherwise
otherwise
N=401
N=401
Examples
i mi
SSim rila
lar
voiced
voiced
Unvoiced
Unvoiced
N=401
N=401
N=251
N=251
N=125
N=125
n (kP) 0, k 0,1,2,
Computationally
Computationally more
more effective
effective than
than au
au
Example
voiced
voiced
Unvoiced
Unvoiced
Exercise
Recordinga piece of yours speech to perform
voice/unvoice segmentation.