1
} ) ( { ) (
i
N E =
Estimate clean speech spectrum S
i
(), using Gain function G
i
() of
corrupted speech spectrum Y
i
() + estimated ():
) ( ) ( ) (
i i i
Y G S =
)) (
), ( ( ) (
i i
Y f G =
=
frames only  noise
) (
1
) (
M
i
Y
M
Magnitude Spectral Subtraction
Signal model:
Estimation of clean speech spectrum:
) (
,
) (
) ( ) ( ) (
i y
j
i
i i i
e Y
N S Y
=
+ =
[ ] ) (
) ( ) (
) (
,
j
i i
e Y S
i y
=
Spectral Subtraction
PS: halfwave rectification
[ ]
) (
) (
) (
1
) (
) ( ) (
) (
i
G
i
i i
Y
Y
e Y S
i
43 42 1
(
(
=
=
)) (
) ( , 0 max( ) (
)) ( , 0 max( ) (
=
i i
i i
Y S
G G
Power Spectral Subtraction
Signal model:
Estimation of clean speech spectrum:
{ } { } { }
2 2 2
2 2
) ( ) ( ) (
) ( ) ( ) (
i i i
i i i
N E Y E S E
N S Y
=
+ =
Spectral Subtraction
PS: halfwave rectification
( )
2
2
2
) (
) ( , 0 max ) (
i i i
Y S =
{ } { } { }
{ }
{ }
{ }
{ }
2 2
2
2
2
2 2 2
) ( ) (
) (
) (
1 ) (
) ( ) ( ) (
i i
i
i
i
i i i
G Y E
Y E
N E
Y E
N E Y E S E
=
(
(
=
=
Suppression Behavior
{ }
{ }
=
(
(


\

=
) (
1
1
) (
) (
1 ) (
2
2
2
i
i
i
i
Y E
N E
G
) (
i
G
) (
i
) (
i
G
Wiener Filter in Frequency Domain
Wiener Estimation
Goal: find linear filter G
i
() such that MSE
is minimized
Solution: The partial derivative of
2
) (
) ( ). ( ) (
48 47 6
i
S
Y G S E
i i i
( )( ) { } * ) ( ) ( ) ( ) ( ) ( ) ( ) (
) (
2
Y G S Y G S E S S E =
`
with respect to the real part of G
i
() which yields the condition:
and hence we have:
( )( ) { } * ) ( ) ( ) ( ) ( ) ( ) ( ) (
) (
2
i i i i i i i i
Y G S Y G S E S S E =
)
`
{ }
0
) ( Re
) (
) (
2
=
)
`
i
i i
G
S S E
{ }
{ }
{ }
{ }
{ } { }
{ } { }
{ }
2
2 2
2 2
2
2
2
) (
) ( ) (
) ( ) (
) (
) (
) (
) ( Re
i
i i
i i
i
i
i
i
Y E
N E Y E
N E S E
S E
Y E
S E
G
=
+
= =
Generalized Formula
Generalized magnitude squared spectral gain function
{ }
{ }
=
(
(


\

=
) (
1
1
) (
) (
1 ) (
2
2
2
i
i
i
i
Y E
N E
G
Practical heuristic form of spectral subtraction rule:
(
(
(



\

=
2
2
2
2
) (
) (
1 ) ( ) (
i
i i
Y
Y S
Suppression Behavior
{ }
{ }
=
(
(


\

=
) (
1
1
) (
) (
1 ) (
2
2
2
i
i
i
i
Y E
N E
G
) (
i
G
) (
i
) (
i
G
EphraimMalah Suppression Rule (EMSR)
( )
(
(




+
+


\

+


\

+
=
prio
post
prio
prio
post
SNR 1
SNR
SNR 1
SNR 1
SNR
SNR 1
1
2
) (
M
G
i
MMSE Estimation
with:
( )
(
(


\
+
+
prio
post
SNR 1
SNR 1 M
) (
) ( ) (
,0) )max(SNR  (1 ) ( SNR
1
) (
) (
) ( SNR
)
2
( )
2
( ) 1 ( ] [
2
1 1
post prio
2
post
1 0
2
+ =
=
(
+ + =
i i
i
Y G
Y
I I e M
modified Bessel functions
previous frame
Nonlinear Estimation
Interpretation
Power Spectral Subtraction method is interpreted as a time
variant filter with magnitude frequency response:
The shorttime energy spectrum Y
i
()
2
of noisy speech
signal is calculated directly. The noise level ()
2
is estimated signal is calculated directly. The noise level ()
2
is estimated
by averaging over many nonspeech frames where the
background noise is assumed to be stationary.
Negative values resulting from spectral subtraction are
replaced by zero. This results into musical noise: a
succession of randomly spaced spectral peaks emerges in
the frequency bands > the residual noise which is composed
of narrowband components located at random frequencies
that turn on and off randomly in each shorttime frame
magnitude subtraction
] [k y ] [ k s
magnitude subtraction
] [k y ] [ k s
Solutions
Flooring factor
Oversubtraction factor
SNRdependent subtraction factor
Averaging estimated noise level over K frames
Reduce noise variance at each frequency: apply a simple
recursive firstorder lowpass filter (using smoothing coef
p controlling bandwidth & time constant of the LP filter)
Solutions
) ( ) ( ) (
i i
Y G S =
 Magnitude averaging: replace Y
i
() in
calculation of G
i
() by a local average over
frames
probability that speech is present, given observation
i i
instantaneous average
 EMSR (p7)
 augment G
i
() with softdecision VAD:
G
i
() P(H
1
 Y
i
()). G
i
()