You are on page 1of 26

Speech Enhancement

Noise Reduction Noise Reduction


Pham Van Tuan
Electronic & Telecommunication Engineering
Danang University of Technology
Introduction
Aims:
Improvements in the intelligibility of speech to human
listeners.
Improvement in the quality of speech that make it
more acceptable to human listeners.
Modifications to the speech that lead to improved
performance of automatic speech or speaker performance of automatic speech or speaker
recognition systems.
Modifications to the speech so that it may be
encoded more effectively for storage or transmission.
Noise Types:
Additive acoustic noise
Acoustic reverberation
Convolutive channel effects
Electrical interference, Codec distortion
General Scheme
The signal is firstly transformed into other domains to get a
better presentation of the speech signal
The noise level is estimated by noise estimation
The noise component is removed out of the noisy speech
signal by the gain function
Based on different linear estimators and non-linear
estimators, a gain function will be designed.
Noise Estimation
Based on different linear estimators and non-linear
estimators, a gain function will be designed
The most difficult parts in the noise reduction algorithms
Especially for non-stationary and non-white noise (whose
characteristics change over time & over various frequency bands)
Single-channel noise reduction methods can be divided : Single-channel noise reduction methods can be divided :
Exploiting the periodicity of voiced speech.
Auditory model based systems.
Optimal linear estimators.
Statistical model based systems using optimal non-linear estimators.
Due to the spectral overlap between speech and noise
signals, the denoised speech signals obtained from single-
channel methods exhibit more speech distortion
However, this method shows low cost and small size
Additive Noise Model
Microphone signal is
desired signal
estimate
desired signal
noise signal(s)
?
] [k s
)
y[k]
] [ ] [ ] [ k n k s k y + =
desired signal noise
Goal: Estimate s[k] based on y[k]
Applications: SE in conferencing, handsfree telephony, hearing aids,
digital audio restoration, speech recognition, speech-based technology
Will consider speech applications: s[k] = speech signal
Can be stationary, non-stationary, narrowband, and
broadband noise. Interference speakers is also considered.
noise signal(s)
desired signal
contribution
noise
contribution
Strictly speaking: the estimation of statistical quantities via time
averaging is only admissible when the signal is stationary and ergodic.
Signal chopped into `frames (e.g. 10..20msec), for each frame i a
frequency domain representation is
where the spectral components are short-time spectra of time domain
signal frames (obtained from windowing
) ( ) ( ) (
i i i
N S Y + =
Additive Noise Model
) ( ), ( ), ( k n k s k y
where the spectral components are short-time spectra of time domain
signal frames (obtained from windowing
technique using window w )
) ( ), ( ), ( k n k s k y
i i i
Observation
Magnitude squared DFT coefs. of noisy, voiced speech
sound and the estimated noise power spectral density
(PSD) of the noisy speech.
Observation
Speech signal is an on/off (time-varying) signal, hence some
frames have speech +noise, some frames have noise only,
A speech detection algorithm or Voice Activity Detection
(VAD) is needed to distinguish between these 2 types of
frames (based on statistical features).
How to design VAD ?
Normal VAD [McAulay, Malpass 1980]
Soft-decision estimators [Sohn, Sung 1998]
Minimum statistics [Martin 1994, 2001]
Percentile filter [Pham, Kubin 2005]
.
Estimation
Definition: () = average amplitude of noise spectrum
Assumption: noise characteristics change slowly, hence estimate ()
by (long-time) averaging over (M) noise-only frames

1
} ) ( { ) (
i
N E =
Estimate clean speech spectrum S
i
(), using Gain function G
i
() of
corrupted speech spectrum Y
i
() + estimated ():
) ( ) ( ) (


i i i
Y G S =
)) (

), ( ( ) (
i i
Y f G =

=
frames only - noise
) (
1
) (

M
i
Y
M

Magnitude Spectral Subtraction
Signal model:
Estimation of clean speech spectrum:
) (
,
) (
) ( ) ( ) (


i y
j
i
i i i
e Y
N S Y
=
+ =
[ ] ) (

) ( ) (

) (
,

j
i i
e Y S
i y
=
Spectral Subtraction
PS: half-wave rectification
[ ]
) (
) (
) (

1
) (

) ( ) (

) (

i
G
i
i i
Y
Y
e Y S
i
43 42 1
(
(

=
=
)) (

) ( , 0 max( ) (

)) ( , 0 max( ) (


=

i i
i i
Y S
G G
Power Spectral Subtraction
Signal model:
Estimation of clean speech spectrum:
{ } { } { }
2 2 2
2 2
) ( ) ( ) (
) ( ) ( ) (


i i i
i i i
N E Y E S E
N S Y
=
+ =
Spectral Subtraction
PS: half-wave rectification
( )
2
2
2
) (

) ( , 0 max ) (


i i i
Y S =
{ } { } { }
{ }
{ }
{ }
{ }
2 2
2
2
2
2 2 2
) ( ) (
) (
) (
1 ) (
) ( ) ( ) (


i i
i
i
i
i i i
G Y E
Y E
N E
Y E
N E Y E S E
=
(
(

=
=
Suppression Behavior
{ }
{ }

=
(
(

|
|

\
|
=
) (
1
1
) (
) (
1 ) (
2
2
2
i
i
i
i
Y E
N E
G
) (
i
G
) (
i
) (
i
G
Wiener Filter in Frequency Domain
Wiener Estimation
Goal: find linear filter G
i
() such that MSE
is minimized
Solution: The partial derivative of

2
) (

) ( ). ( ) (
48 47 6


i
S
Y G S E
i i i
( )( ) { } * ) ( ) ( ) ( ) ( ) ( ) ( ) (

) (
2
Y G S Y G S E S S E =
`


with respect to the real part of G
i
() which yields the condition:
and hence we have:
( )( ) { } * ) ( ) ( ) ( ) ( ) ( ) ( ) (

) (
2

i i i i i i i i
Y G S Y G S E S S E =
)
`


{ }
0
) ( Re
) (

) (
2
=

)
`


i
i i
G
S S E
{ }
{ }
{ }
{ }
{ } { }
{ } { }
{ }
2
2 2
2 2
2
2
2
) (
) ( ) (
) ( ) (
) (
) (
) (
) ( Re

i
i i
i i
i
i
i
i
Y E
N E Y E
N E S E
S E
Y E
S E
G

=
+
= =
Generalized Formula
Generalized magnitude squared spectral gain function
{ }
{ }

=
(
(

|
|

\
|
=
) (
1
1
) (
) (
1 ) (
2
2
2
i
i
i
i
Y E
N E
G
Practical heuristic form of spectral subtraction rule:



(
(
(

|
|
|

\
|
=
2
2
2
2
) (

) (

1 ) ( ) (

i
i i
Y
Y S
Suppression Behavior
{ }
{ }

=
(
(

|
|

\
|
=
) (
1
1
) (
) (
1 ) (
2
2
2
i
i
i
i
Y E
N E
G
) (
i
G
) (
i
) (
i
G
Ephraim-Malah Suppression Rule (EMSR)
( )
(
(

|
|
|

|
+
+
|
|

\
|
+
|
|

\
|
+
=
prio
post
prio
prio
post
SNR 1
SNR
SNR 1
SNR 1
SNR
SNR 1
1
2
) (
M
G
i

MMSE Estimation
with:
( )
(
(

|
|

\
+
+
prio
post
SNR 1
SNR 1 M
) (
) ( ) (
,0) )max(SNR - (1 ) ( SNR
1
) (
) (
) ( SNR
)
2
( )
2
( ) 1 ( ] [
2
1 1
post prio
2
post
1 0
2



+ =
=
(

+ + =
i i
i
Y G
Y
I I e M
modified Bessel functions
previous frame
Wiener Estimation
Power Spectral Subtraction
Magnitude Spectral Subtraction
Gain functions
Ephraim-Malah Suppr. Rule
= most frequently used in practice
Non-linear Estimation
Maximum Likelihood
Wiener Estimation
Interpretation
Power Spectral Subtraction method is interpreted as a time-
variant filter with magnitude frequency response:
The short-time energy spectrum |Y
i
()|
2
of noisy speech
signal is calculated directly. The noise level ()
2
is estimated signal is calculated directly. The noise level ()
2
is estimated
by averaging over many non-speech frames where the
background noise is assumed to be stationary.
Negative values resulting from spectral subtraction are
replaced by zero. This results into musical noise: a
succession of randomly spaced spectral peaks emerges in
the frequency bands -> the residual noise which is composed
of narrow-band components located at random frequencies
that turn on and off randomly in each short-time frame
magnitude subtraction
] [k y ] [ k s
magnitude subtraction
] [k y ] [ k s
Solutions
Flooring factor
Over-subtraction factor
SNR-dependent subtraction factor
Averaging estimated noise level over K frames
Reduce noise variance at each frequency: apply a simple
recursive first-order low-pass filter (using smoothing coef
p controlling bandwidth & time constant of the LP filter)
Solutions
) ( ) ( ) (


i i
Y G S =
- Magnitude averaging: replace Y
i
() in
calculation of G
i
() by a local average over
frames
probability that speech is present, given observation
i i
instantaneous average
- EMSR (p7)
- augment G
i
() with soft-decision VAD:
G
i
() P(H
1
| Y
i
()). G
i
()

Additive noise model in Wavelet domain:


Hard-Thresholding
Wavelet Denoising
Soft-Thresholding
Shrinking
Hard Thresholding
Soft Thresholding
Optimal Shrinking
LAB ASSIGNMENT
Noise Reduction For Speech Enhancement
1.From the provided Matlab codes, present algorithm charts of the
following algorithms: Spectral Subtraction, Wiener Filter and Log-MMSE.
Be sure that you understand all parts of the algorithms.
2. Test the algorithms with provided audio samples. Evaluate processed
speech quality unofficially based on subjective evaluation with CCR
(table 1 described below). Give your comments on this first test. (table 1 described below). Give your comments on this first test.
3.From observed results, find out and explain sensitive variables of the
examined algorithms that affect performance of the algorithms
4.Propose your solutions how to improve performance of the algorithms
5.Test the modified algorithms with the provided audio samples again.
Evaluate processed speech quality unofficially based on subjective
evaluation with CCR.
6.Hand in your report at the end of the Lab session.