Вы находитесь на странице: 1из 6

2009 6th International Multi-Conference on Systems, Signals and Devices

Speech Signal Enhancement Using Neural


Network and Wavelet Transform
Department of Communication and Electronics Engineering, Philadelphia University, P.O.Box 1, Amman
19392, Jordan, 1st author e-mail haleddaq@yahoo.com
Abstract-- Speech enhancement is concerned with the
processing of corrupted or noisy speech signal in order to
improve the quality or intelligibility of the signal. Our
goal is to enhance speech signal corrupted by noise to
obtain a clean signal with higher quality. However, the
presence of noise in speech signals will contribute to a high
degree of inaccuracy in a system that requires speech
processing. This idea of noise cancellation for the speech
signal was processed through the neural networks. Three
methods were tested: 1. The adaptive linear neuron
(ADALINE). 2. Feed Forward Neural Network
Enhancement Method FFNN 3. Wavelet Transform and
Adaline Enhancement Method. The results obtained showed
high quality due to fast processing and high signal-noise-ratio.

The tested signal was enhanced 10 dB by Adaline, 3 dB


by FFNN and 8 dB by Wavelet Transform and Adaline
Enhancement Method.
Keywords Speech signal; Neural network, Discrete
wavelet transform.

I.

INTRODUCTION

In many speech signal processing applications the


desired signal is not available directly. It is distorted
due to several reasons. The distortion may be caused by
an interfering signal or by the equipments used in the
used system or by the transmission medium etc.
Because of this, the speech signal should be estimated
from the noisy observation [1]. For the specific case of
speech signals the distortion caused by interference
from the surrounding is a major problem in most
speech applications. The performance of speech
coding and recognition system that operate in noisy
environments (such as moving vehicles, crowded
areas etc.) decreases. Therefore, there has been a hot
research topic on speech enhancement system since
1970s tell today.
Speech enhancement is motivated by the need to
improve the performance of voice communications
systems in noisy environment. Applications range
from front-ends for speech recognition systems [14,15]
condition to enhancement of telecommunications in
aviation, military teleconferencing and cellular
conditions. The goal is either to improve the perceived
quality of the speech or to increase its intelligibility.
Improving quality can be important for reducing
listener fatigue in high stress and high noise
environments. In the recording industry improving the
quality of recorded speech may be desirable even if the
noise level is low to begin with. Its quality can be
measured in terms of speech recognition performance
Enhancement preprocessing technique, however have
not proven successful at improving human recognition

978-1-4244-4346-8/09/$25.00 2009 IEEE

rates. On the other hand, improving intelligibility has


been clearly presented in the field of automatic
speech recognition ASR. In this case, speech
enhancement preprocessing can greatly improve the
performance of ASR systems in noisy environments
[2,3].
II. LITIRATURE SURVEY
There are several different approaches to noise
reduction and elimination. These algorithms can be
classified into two main classes parametric and
non-parametric
methods [19]. Non-parametric
methods include the popular spectral subtraction
based algorithms. Parametric methods, as opposed to
non- parametric methods assume a model for the
signal generation process. The parametric model
describes the predictable structures and the
observable patterns in the process. With this a
priori information about the signal, model based
algorithms are supposed to yield better results.
However, the dependency on the model may also
yield terrible results if the selected model does not
fit the signal. Therefore model selection is a critical
point in these algorithms. A widely accepted model is
the linear prediction model, which enables the
prediction of the signal from its past samples. This
model has found many applications in several signal
processing problems and most enhancement algorithms
are based on linear prediction. The enhancement
algorithms that rely on a parametric model operate in
two steps. At the first step, the presumed model
parameters are estimated. And next, an optional
filter is constructed to filter out the undesired part
described by the model.
Since the early ages of speech enhancement; researches
have faced the problem of the degradation of speech
enhancement system performance when those are used
in adverse environment. This problem is encountered
each time a system is used in real-word applications;
facing a great diversity of conditions: background
noise; channel interference; microphone distortions;
etc..many solutions been developed to deal with each
problem separately. Classically, these solutions have
been class into two main areas: speech enhancement
and models adaptation.
The spectral subtraction offers the simple and
computationally efficient approaches for the
suppression of an additive noise in a desired speech
signal. This method has been extensively studied over
twenty years [4,5,6]. The research has been focused on
higher degree of noise suppression, lower last
requirement is important especially in the hand-free
telephony application. But the main shortcoming of this
method has not been overcome for a long time. Its
estimation, especially during speech sequences. The

key idea of these approaches is to estimate


background noise and then to eliminate this
estimation from the noisy speech. The noise
characteristics are usually updated during non-speech
segments, i.e. the voice activity detector (VAD) is
required to detect speech and non-speech sequences.
definitely, the errors of this updating depend on the
VAD quality, moreover, the updating cannot be
provided during the speech activity segments.
Approach based on the combination of spectral
subtraction with interactive wiener filtering. We refer
this method as extended spectral subtraction. The key
feature of this method is the capability to update
the background noise estimation during speech
activity.
Wiener Filtering and Extended Wiener Filtering
based method has been used in literatures: The general
approach for the model based speech enhancement
algorithms is to estimate
the
system
model
parameters from the noisy observations and then
apply an optimal filtering based on this model. Early
studies are mainly focused on wiener filtering. Lim and
Oppenheim proposed an enhancement algorithm for
white Gaussian noise contaminated speech. Wiener
filtering is applied in the spectral domain to estimate the
speech parameters. The procedure is iterated to enhance
the estimation results. In fader et al. extended the
method in to a two microphone system. The parameter
estimation step becomes more complicated with the
introduction of coupling systems that are modeled as fir
systems [7-9].
The problem of reducing the disturbing effects of
additive white noise on a speech signal is
considered when a ' noise reference'
is not
available. The problem examined here is the
enhancement of speech disturbed by additive noise.
The basic assumption is that the enhancement
system does not have access to any other signal
except the corrupted speech itself. This is, no "noise
reference" signal is available, which could allow
one to employ classical adaptive noise canceling.
The objective of obtaining higher quality and/or
intelligibility of the noisy speech compression, speech
recognition, and speaker verification, by improving
the performance of the relevant digital voice
processor. The output from the previous Weiner
filter iteration is used along with the original output
data to get less muffled sounding speech.
The enhancement results based on this model are
seen to be superior to the previous algorithms. The
aim of this word is to obtain a better state space model
for speech to further improve the enhancement results
in the kalman filter [10,11] formulation. Proposing a
speech
model
to
emphasize
the
voicing
characteristics in different frequency intervals.
Voiced-unvoiced mixture of the excitation is calculated
separately for each frequency interval. The band pass
voicing analysis of speech is also confirmed to
improve the speech coding algorithms. Based on
the proposed excitation model, and formulate a state
space model for speech. The constructed model is then
used in kalman filtering of noisy speech. The wiener
filter is used to form the estimation of the input noise

which is then subtracted from the input noisy speech.


Further improvements in recognition accuracy can
be obtained in difficult environments by combining
acoustical re-processing with arrays of multiple
microphone. The use of microphone arrays is
motivated by a desire to improve the effective SNR of
speech as it is input to the recognition system.
Close-talking microphones, for example, produce
higher SNRs than desktop microphones under
normal circumstances because they pick up a
relatively small amount of additive noise, and
because the incoming signal is not degraded by
reverberated components of the original speech.
III. ADAPTIVE LINEAR NEURON
(ADALINE) ENHANCEMENT METHOD

As we mention above, the main objective of a noise


cancellation system is to enhance the speech signal
to obtain a clean signal with improved quality. such
system has been widely
used in long distance
telephony application, however, the presence of noise in
speech signals will contribute to a high degree of
inaccuracy in system that need speech processing
such as speech recognition, synthesis, speaker
identification and verification systems. Thus; the
desired parametric representation will carry a high
amount of inaccuracy. For this reason, we chose the
adaptive linear neuron (ADALINE) to perform the
task of noise cancellation based on two considerations:
1- Its ability to act as an adaptive filter. 2- The high
processing speed that is achieved.
Adaline is certainly an approach worth to be studied
based on the fact that it is the most widely used neural
network approach in practical signal processing
applications today. Its fast processing speed is
contributed by its simple network architecture with
nearly minimum elements. Performance of the adaline
system will be best evaluated if it can be compared
to the performances of other system thus an
alternative speech enhancement system is constructed
using the well- known multilayer perceptron (mlp)
network and trained using the back propagation
algorithm (bpa).
Widrow and Hoff [12] developed and published the
learning rule, called Delta rule, adjusts the weights to
reduce the difference between the net input to the
output unit, and the desired output, which results in a
least mean squared (LMS) error. Adaline networks use
this LMS learning rule. The weights on the
interconnections for the adaline networks are adjustable.
The bias of the adaline networks is adjustable. Where
Delta rule is the learning rule. The Architecture of an
adaline is shown in Fig. 1.

Figure 1: Architecture of Adaline

The adaline has only one output unit. This output


unit receives the input from several units and also from
bias; whose activation is always "+1". The adaline is
also resembles a single layer network. It receives input
from several neurons. It should be noted that it also
receives input from the unit which is always "+1",
called bias. The bias weights are also trained in the
same manner as the other weights. In Fig. 1, an input
layer with X1,X2, , Xn and bias, an output layer
with only one output neuron is created. The link
between the input and the output neurons possess
weighted interconnections.

Wi (new) = Wi (old)+ ( t-Y-in) Xi


( 2)
Where
Wi is the weights, is the learning rate, and t-Y-in is the
error.
The Fig. 3 presents the results of Adalaine enhancement
method. The SNR is enhanced for about 10 dB. The
quality measurement method of filtered signal is the
subjective method [20], where 5 students are asked to
lessen carefully to give a judge. The result showed in
Fig. 3 presents the clean nearly like free of noise signal
selected by 5 students.

In this paper we will discuss and perform the


enhancing for random speech signal and random
noise. And the implementation is done by adding a
normalized random noise to the clean speech signal in
order to produce corrupted or noisy speech signal as
shown in the Fig. 2.

B
Figure 2: Structure of the noise cancellation system
The corrupted speech signal is then used as the target
signal to the ADALINE. Based on the LMS rule, the
ADALINE adapts to cancel the out noise from the
noisy signal to produce the clean speech signal.
The testing speech signal recorded via sound card in
nearly ideal conditions. The recording signal with
8000Hz sampling rate is used as clean signal (free of
noise) and added to random noise to accomplish noisy
signal. The noisy speech signal is compared to the free
of noise until we obtain the speech signal that is clear
for the listener. The following algorithm steps present
the enhancement system used.
Step 1: set the learning rate at 0.05.
Step 2: set the weight at
1.2*(rand(1, p) - 0.5)
(1)
Where p is the dimensionality of the input space
Step 3: set the target signal as the corrupted speech
signal.
Step 4: set the input signal as the free of noise signal.
Step 5: for each Epoch calculate the output and error.
Step 6: adjust the weights of the network.

C
Figure 3: Adaline results. A. Free of noise signal B.
Noisy signal. C. Filtered signal.
IV. FEED FORWARD NEURAL NET FFNN
ENHANCEMENT METHOD
There are two basic types of networks, networks with
feedback and those without. In networks with feedback,
the output value can be traced back to the input values.
However there are networks wherein for every input
vector laid on the network, an output vector is
calculated and this can be read from the output neurons.
There is no feedback. Hence only, a forward flow of
information is presented. Network having this structure

are called feed forward network [13]. There are various


nets that come under the feed forward types of nets
classification. One of the most popular type of feed
forward network is the Back Propagation Network.
Radial Basis Function is also included in this category.
A radial basis function is closely related to the back
propagation network except that there is a variation in
the function used between both the networks.
A multilayer feed forward back propagation network
with one layer of z-hidden units is shown in Fig. 4 the Y
output unit has Wok bias and Z hidden unit has Vok as
bias.

Figure 4: Architecture of back propagation network


It is found that both the output units and the hidden
units have bias. The bias acts like weights on
connection from units whose output is always 1. From
Fig.4, it is clear that the network has one input layer,
one hidden layer and one output layer. There can be
any number of hidden layers. The input layer is
connected to the hidden layer and the hidden layer is
connected to the output layer by means of
interconnection weights. As a result, the time taken for
convergence and to minimize the error may be very

high. The bias is provided for the hidden and the output
layer, to act upon the net input to be calculated. The
training algorithm of back propagation involves four
stages:
1. Intialization of weights
2. Feed forward
3. Back propagation of errors
4. Updation of the weight and biases.
During first stage which is the initialization of
weight, some small random values are assigned
during feed forward stage each input unit (Xi) receives
an input signal and transmits this signal to each of the
hidden units Z1Zp. Each hidden
unit
then
calculates the activation function and sends its
signal Zj to each output unit. The output unit
calculates the activation function to form the response
of the net for the given input pattern. During back
propagation of error, each output unit computes
activation Yk with its target value tk to determine the
associated error for that pattern with the unit. Based on
the error, the factor k (k=1,.,m) is computed
and is used to distribute the error at output unit Yk
back to all units in the previous layer. Similarly,
the factor j (j=1,.,p) is computed for each hidden
unit Zj. During final stage, the weight and biases are
update using the factor and the activation.
In this section we use a speech signal that was
recorded in noisy environment. The recorded signal
is used as input signal, which is compared with the
target signal that is free of noise. The Fig. 5 presents the
results of FFNN enhancement method. The SNR is
enhanced for about 3 dB.
The Training here is ON-line training which is
different than the training we used with the
ADALINE neural net, which off-line.
V. NEURAL NETWORK AND DISCRETE
WAVELET TRANSFORM NOISE CANCELATION

C
Figure 5: FFNN results. A. Free of noise signal B.
Noisy signal. C. Filtered signal.

Wavelets
are
mathematical
functions
that
decomposes
data
into
different
frequency
components, and then study each component with a
resolution matched to its scale. They have
advantages over traditional Fourier methods in
analyzing physical situations where the signal
contains discontinuities and sharp spikes. Wavelets
were developed independently in the fields of
mathematics, quantum physics, electrical engineering,
and seismic geology. Interchanges between these fields
during the last ten years have led to many new wavelet
applications such as image compression, turbulence,
human vision, radar, and earthquake prediction. Fig.7
illustrates random signal that has a length of 1000
samples, divided into number of frequency components
(subsignals).
In this suction we use the ADALINE as neural net
to filters each subsignal that produced by the Discrete
Wavelet Transform DWT (Fig.6) [16-18].

Figure 6: Practical noise reduction using the Neural


Net and Wavelet Transform:

This system depends on DWT that generates the desired


subsignals to the neural net (ADALINE). That means
multiple input will be applied to the neural net depends
on selected level. The same process is applied for the
noisy signal to applied as a target. The output of the
neural net will be back to the Inverse Wavelet
Transform to be reconstructed enhanced signal.
The aim of this method is to filter the speech signal
from the noise in selected subsignals of distinct
frequency sub band (Fig.8). This assists greatly in
eliminating special frequencies, which can preserve
other frequencies that is essential for many application
such as speech recognition ASR.

D
Figure 9: PSD of filtered signal by A. Adaline B.
FFNN. C. Neural Net Wavelet Transform
Figure 7: Random signal of 1000 samples length
divided into four levels of Discrete Wavelet
Transform subsignals

To evaluate the quality degree of the mentioned


methods we use power spectrum density method PSD.
By comparing the PSD of free of noise with PSD
calculated for each of our three methods filtered signal
(Fig.9), we notice that the PSD of the free of noise
signal is similar to Adaline PSD filtered signal. In
quantity assessment using mean value and standard
deviation, Adaline also succeeds (Tab. 1).

Table 1: Quantity assessment the presented methods

Figure 8: Noisy and filtered signals by Neural Net


and Wavelet Transform

VI. CONCLUSIO
In this paper, three methods are tested: The Adaline,
Feed Forward Neural Network Enhancement Method
FFNN and Wavelet Transform and Adaline
Enhancement Method. The results obtained showed
high quality due to fast processing and high signal to
noise ratio. The tested signal is enhanced 10 dB by
Adaline, 3 dB by FFNN and 8 dB by Wavelet
Transform and Adaline Enhancement Method. By
comparing the PSD of free of noise with PSD
calculated for each of our three methods filtered signal,
the Adaline filtered signal has the best quality for many
cases. Wavelet Transform and Adaline Enhancement
Method can be used in speaker identification where the
enhancement and feature extracting are both required.
VII. REFERENCES
[1]Alcantara, J.I., et al., Preliminary evaluation of a formant
enhancement algorithm on the perception of speech in noise for
normally hearing listeners. Audiology. 33(1): p. 15-27. 1994.
[2]Kates, J.M., Speech enhancement based on a sinusoidal model.
Journal of Speech and Hearing Research. 37(2):449-64. 1994.
[3]Warren, R.M., et al., Spectral restoration of speech: intelligibility
is increased by inserting noise in spectral gaps. Perception and
Psychophysics. 59(2): 275-83. 1997.
[4]Gautam Moharir, Spectral subtraction method for speech
enhancement, M.Tech . thesis, Department of electrical engineering,
I.I.T. Bombay, Mumbai, India, Jan 2002.
[5] Boll S.F., Suppression of acoustic noise in speech using spectral
subtraction, IEEE Trans. On Acoustics, Speech and Signal
Processing, vol. ASSP-27, pp.113-120, Apr 1979.
[6] Gautam Moharir, Pushkar Patwardhan and Preeti Rao, Spectral
enhancement preprocessing for the HNM coding of noisy speech,
Proc. of International Conference on Spoken Language Processing,
Sep. 2002.
[7] M. Berouti, R. Schwartz, and J. Makhoul, Enhancement of speech
corrupted by acoustic noise, Proc. IEEE Int. Conf. Acoust., Speech
Signal Processing, pp. 208-211 (1979).
[8] Y. Ephriam and H. L. Van Trees, A signal subspace approach for
speech enhancement, in Proc. International Conference on Acoustic,
Speech and Signal Processing, vol. II, Detroit, MI, U.S.A., pp. 355358, May (1993).
[9] Simon Haykin, Adaptive Filter Theory, Prentice-Hall, ISBN 0-13322760-X, (1996).
[10]Grewal, Mohinder S., and Angus P. Andrews, Kalman Filtering
Theory and Practice. Upper Saddle River, NJ USA, Prentice
Hall..1993. [11]Jacobs Jacobs, O. L. R.. Introduction to Control
Theory, 2nd Edition. Oxford University Press. 1993.
[12]Widrow, B. and Hoff (1960), Adaptive switching circuits. In
1960 IRE WESCON Convention Record, pages 96 - 104 RE.1960.

[13] Hagan, M. T., and M. Menhaj, Training feed forward networks


with the Marquardt algorithm, IEEE Transactions on Neural
Networks, vol. 5, no. 6, pp. 989993, 1994.
[14] Elisabeth Zetterholm, Voice Imitation, A Phonetic Study of
Perceptual Illusions and Acoustic Success. PhD thesis, Lund
University. (2003).
[15] D. Gabor, Theory of communication, Journal of I.E.E. 93 pp
429-441, 1946.
[16] P. Goupillaud, A. Grossmann, J. Morlet, Cycle-octave and
related transforms in seismic signal analysis, Geoexploration, 23, 85102, 1984-1985.
[17] A. Grossmann and J. Morlet, Decomposition of Hardy functions
into square enterable wavelets of constant shape, SIAM J. Math.
Anal, Vol. 15, pp 723-736, 1984.
[18] Meyer, Wavelets, Ed. J.M. Combes et al., Springer Verlag,
Berlin, p. 21, 1989.
[19] D. A.Linbarger, PARAMETRIC AND NON-PARAMETRIC
METHODS OF IMPROVING BEARING ESTIMATION IN
NARROWBAND PASSIVE SONAR SYSTEMS, PhD thesis,
Engineering, Electronics and Electrical, Rice University,2007.
[20] Lingyun Gu, John G. Harris, Rahul Shrivastav, and Christine
Sapienza, Disordered Speech Assessment Using Automatic Methods
Based on Quantitative Measures, EURASIP Journal on Applied
Signal Processing, Volume 2005 (2005), Issue 9, Pages 1400-1409

Вам также может понравиться