Вы находитесь на странице: 1из 32

Uppsala University

Department of Engineering Sciences


Adaptive Signal Processing 5p spring 2006

Acoustic Echo Cancellation

Authors:
Aleksandar Jovanovic
Kalle Nilvér
Patrik Söderberg
Magnus Broberg

1
Abstract
This paper shows some implementations of acoustic echo cancellation algorithms in Matlab and
the results of analysis on the broader systems involved. It is the result of a project in the course
adaptive signal processing at Uppsala University. It focuses on Normalized Least Mean Square
(NLMS) and Variable Impulse Response Double Talk Detector (VIRE DTD). Discussions on
Stereophonic Acoustic Echo Cancellation (SAEC) are carried out and we recommend some
topics for doing further work on with this project.

2
Acknowledgements
Professor Andreas Jakobsson at Karlstad University has developed an assignment1 for a course in
adaptive signal processing which clearly illustrates the effects of acoustic echo cancellation. It
has implementations of the NLMS algorithm and of the Geigel DTD. We have used the Matlab
code as a starting ground for our work and have also used the sounds included since our recorded
sounds created strange results, perhaps due to some downsamplings that we made. Andreas was
also kind enough to mail an article about Fast Normalized Cross Correlation (FNCC) DTD and
answer a question on the behavior of VIRE DTD.

Mikael Sternad has been our supervisor and we have received a lot of support from him.

Daniel Aronsson showed a useful way of plotting data in a timed sequence, a technique which we
used since to analyze how the filter adapted. The technique also clearly showed the effects of
adaption while double talk was present. He also suggested a loop method to extract correct
threshold for double talk detection.

Lars Johan Brännmark helped out when we had problems of measuring the room impulse
response. After some short time of work from his side the measurements worked correctly.

Simon Mika, Simon Moritz and Carl-Johan Larsson made some progress with this project last
year and some of our work is based on their results.

1
Åhgren, Per and Jacobsson, Andreas (2006) Course material for course in adaptive signal processing at Karlstad
University

3
Table of contents
1 Conclusion ............................................................................................................................... 5
2 Introduction.............................................................................................................................. 7
3 Background .............................................................................................................................. 7
4 System overview...................................................................................................................... 9
5 Filter adaptation ....................................................................................................................... 9
5.1 LMS ............................................................................................................................... 10
5.2 NLMS ............................................................................................................................ 11
6 Talk detection......................................................................................................................... 12
6.1 Far-end talk detection .................................................................................................... 12
6.2 Double talk detection ..................................................................................................... 13
6.2.1 Geigel..................................................................................................................... 14
6.2.2 VIRE DTD ............................................................................................................. 15
6.2.3 Other ...................................................................................................................... 17
7 Comfort noise generator ........................................................................................................ 17
8 Measuring room impulse response ........................................................................................ 20
9 Stereophonic Acoustic Echo Cancellation (SAEC)............................................................... 22
10 Real time implementation .................................................................................................. 24
11 Views on further development........................................................................................... 25
12 Figure index ....................................................................................................................... 26
13 Matlab code segment index ............................................................................................... 26
14 Mathematical formula index .............................................................................................. 26
15 Subject index...................................................................................................................... 26
16 Bibliography ...................................................................................................................... 27
17 Appendix............................................................................................................................ 28
17.1 aec.m - For acoustic echo cancellation .......................................................................... 28
17.2 ir.m - For calculating impulse response of a room ........................................................ 31

4
1 Conclusion
There are many ways of solving the acoustic echo cancellation and the “market” is flooded with
algorithms for both adaptation and double-talk detection. We opted for the VIRE DTD algorithm
proposed by Per Åhgren2 but a stable implementation was very difficult and behaviour was
somewhat unpredictable.

This year we improved last year’s work humongously, making our solutions better on all
previously implemented areas and implemented all restoring parts of a complete AEC solution.
The results were very good, but there are of course things that could be improved in a
complicated system like this. Maybe next step would be to try to cut down the computation time
and take the step to a full real-time system.

The results of our system can be viewed in Figure 1.

x 10
2
arg 1

-2 4
x 10 2 4 6 8 10 12 14 16 18 20
2
arg 2

0
-2 4
x 10 2 4 6 8 10 12 14 16 18 20
2
arg 3

-2 4
x 10 2 4 6 8 10 12 14 16 18 20
2
arg 4

-2 4
x 10 2 4 6 8 10 12 14 16 18 20
2
arg 5

-2 4
x 10 2 4 6 8 10 12 14 16 18 20
2
arg 6

-2 4
x 10 2 4 6 8 10 12 14 16 18 20
2
arg 7

-2
2 4 6 8 10 12 14 16 18 20

Figure 1: SPCLAB result window showing, from top to bottom, far-end talk, near-end talk, microphone
pickup, filtered signal, double-talk detection, far-end talk detection and finally an indication on when
adaptation is taking place.

2
Åhgren, Per (2004) On System Identification and Acoustic Echo Cancellation

5
spclab( xF(indV), v(indV), y(indV), e(indV), isDT(indV)*max(e)/2,
isFET(indV)*max(e)/2, isAdapt(indV)*max(e)/2 );
Matlab code segment 1: Using spclab to plot the results.

The performance of our filter can be viewed in Figure 2 and Figure 3

Figure 2: ERLE plot through Matlab command plot( smooth(-10*log10((((e(1:100000))).^2+eps) ./


((y(1:100000)).^2+eps)+eps), 5000) );

6
Figure 3: ERLE plot with NLP through Matlab command plot( smooth(-
10*log10((((e(1:100000).*not(isNLP(1:100000)))).^2+eps) ./ ((y(1:100000)).^2+eps)+eps), 1000) );

2 Introduction
This report is the result of a project course in adaptive signal processing at Uppsala University.
The aim of the project is to improve last year’s research on acoustic echo cancellation. More
specifically the tasks were to implement algorithms for adaptation in such ways that the results
exceeded those of the previous group using one microphone and one loudspeaker, and to further
examine the possibility of using two microphones and two loudspeakers. The tool used is
primarily Matlab.

3 Background

The problem of acoustic echo cancellation is the result of hands-free telephony and tele-
conferencing systems. In early telephony the microphone and loudspeaker were separated and no
sound could propagate between the speaker and the microphone. Therefore no echo would be
transmitted back. Using a hands free loudspeaker telephone, however, the sound from the
loudspeaker will be picked up by the microphone and transmitted back to the sender who will

7
recognize this as an echo. This severely reduces conversation quality, even at very small echo
delays.

Figure 4: A telephone conference using an IP-telephony system.

In a room with no propagation delay and no room impulse response (i.e. a studio with dampening
walls and the microphone placed with no distance from the loudspeaker) the solution would
simply be to subtract the input (far-end talk), which is readily available, from the output signal
picked up by the microphone, which consists of both near-end talk and far-end talk. After the
subtraction the output signal would consist of near-end talk only. This is not possible, however,
since the room both alters the sound and spreads it over time. Using IP-telephony, as illustrated in
Figure 4, this spread over time will vary according to the delay in the net and therefore IP-
telephony introduces even more problems. Due to these problems the input must be modified
accordingly before we subtract it. The problem is that the parameter after which it should be
modified is unknown. This is where the adaptive filtering technology comes in. The adaptive
filter adjusts according to inputs and outputs to form the parameters mentioned after which the
input must be modified if a subtraction is to be useful.

Acoustic echo cancellation also introduce a second problem: To be able to detect when there is
nothing but far-end talk entering the microphone, and when there are other things entering aswell.
The adaptation algorithm uses only one measurement, the difference between the modified input
and the real input. If this difference is zero, and no near-end talk is present, the filter will be an

8
exact copy of the room impulse response and hence work as intended. Now if, at this time, there
is near-end talk, the difference will be equal to the near-end talk and the filter algorithm will
notice this as an error in the filter. The filter will therefore adapt to cancel out the near-end talk
aswell and as a result the it will cease to work.

The same techniques are used in data networking where there are also problems with echoes.

4 System overview
To solve the acoustic echo problem the setup in Figure 5 was used.

Figure 5: System overview. The following notations are used: v(t) = white noise, ĝ(t) = adaptive coloring filter,
n_hat(t) = comfort noise, NLP = Non-Linear Processor, e(t) = error signal, d(t) = estimated echoic signal,
h_hat(t) = adative filter, y(t) = uncorrected output, s(t) = near-end talk, n(t) = ambient near-end noise.

The goal is to mimic h(t), which is the room impulse response, with the adaptive filter ĥ(t ) . The
Comfort Noise Generator and the NLP is used to further improve output, but are not an essential
part of the adaptive filtering problem.

5 Filter adaptation
A filter is something that transforms data to extract useful information from a noisy environment.
In digital filtering there are two primary types; infinite impulse response (IIR) and finite impulse
response (FIR). IIR filters can normally achieve the same filter characteristics as a FIR filter
using less memory and calculations with the cost of possibly becoming unstable. As the filters
become more complex though, IIR filters needs more parameters and the advantages are reduced.

9
Because of the high complexity of the many strong and sharp peaks in a room impulse response
the filters that are being used in acoustic echo cancellation are usually of the FIR type.3

For the filter in an acoustic echo cancellation to work in the real case, with changing parameters
such as different room acoustics and people moving around in the room, a filter with adapting
parameters (taps) is necessary.

Figure 6: Filter adaptation.

There are numerous adaptive algorithms that are applicable in acoustic echo cancellation such as
least mean squares (LMS), recursive least squares (RLS), affine projection algorithm (AFA) and
different degenerates there of. LMS is an old, simple and proven algorithm which has turned out
to work well in comparison with newer more advanced algorithms. In this project we use the
normalized LMS (NLMS) for the main filter and LMS for the noise generation.

5.1 LMS
In 1959 Widrow and Hoff introduced the LMS algorithm. During the years this has been the by
far most used adaptive filtering algorithm for several reasons; it was the first, it requires relatively
little computation and it works well, at least for slow changes in the filter.

The LMS filter uses a gradient method of steepest decent to adapt its weights to minimize some
function c(i) defined in Mathematical formula 1

(
c (n ) = E e(n )
2
)
Mathematical formula 1: Function c(n) to be minimized.

3
Liavas, Athanasios P. and Regalia, Phillip A. (1998) Acoustic Echo Cancellation: Do IIR Models Offer Better
Modeling Capabilities than Their FIR Counterparts?

10
where e(n) is defined in Figure 5. In comparison to other algorithms LMS is relatively simple as
it doesn’t require correlation function calculation or matrix inversions, for each sample in the
signal LMS only uses two multiplications and two additions.

µ
hn +1 (i ) = hn (i ) + * e(n ) * x(n − i )
2
Mathematical formula 2: Adjustment of taps using LMS algorithm.

The taps are adapted as shown in Mathematical formula 2 where h, e and x are defined in Figure
5 and µ is the step length between zero and one over the largest eigenvalue of the correlation
matrix.

if(isNT(k)) % If there is No Talk, adapt comfort noise filter


% Adapt using LMS
CNGFilter = CNGFilter + mu0/2 * e(k) * whiteNoiseBlock;
end
Matlab code segment 1: LMS implementation.

5.2 NLMS
The primary disadvantage of the LMS algorithm is its slow convergence rate that is the result of
the static step length µ. In NLMS µ is normalized by the energy of the signal vector as in
Mathematical formula 3 and therefore achieves a much faster convergence rate then LMS at a
low cost. To avoid division by zero a small number σ is often added to the energy.

µ0
µ NLMS (n ) =
σ + x (n ) * x(n )
T

Mathematical formula 3: Step length adjustment using NLMS algorithm.

µ0
hn +1 (i ) = hn (i ) + * e(n ) * x(n − i )
σ + x (n ) * x(n )
T

Mathematical formula 4: Adjustment of taps using NLMS algorithm.

The performance of NLMS has been satisfying, as shown in Figure 7.

11
Figure 7: The error declines with time as the adaptive filter converges towards the room impulse response.

6 Talk detection
Talk detection is used for deciding on when to activate the NLP and when to adapt the filter ĥ(t ) ,
amongst others. There are two types of talk detection, far-end talk detection and double talk
detection.

6.1 Far-end talk detection


The far-end talk is detected by measuring the power of the signal and comparing it to some
threshold. The threshold is calculated dynamically as shown in Matlab code segment 2.

% Calculate the far-end detection threshold.


temppow = zeros(floor(length(xF)/fs)-1, 1);
for k = 1:length(temppow),
temppow(k) = xF((k-1)*fs+1:k*fs)' * xF((k-1)*fs+1:k*fs) / fs;
end
% FE threshold is 1/10 of the average power
farendThres = max(temppow(k)) * 1/10;
Matlab code segment 2: Dynamic calculation of the far-end detection threshold.

Measurement and comparison is then made as in Matlab code segment 3.

12
energyOfSignalBlock = signalBlock' * signalBlock;
powerOfSignalBlock = energyOfSignalBlock / filterLength;
isFET(k) = (powerOfSignalBlock > farendThres);
Matlab code segment 3: Measurement of far-end talk power and comparison to calculated threshold.

This method results in the detection that is shown in Figure 8.

4
x 10 Far-end voice detection
2.5

1.5

0.5
Amplitude

-0.5

-1

-1.5

-2 Far-end voice detection


Far-end voice signal
-2.5
0 2 4 6 8 10 12 14 16 18
Sample 4
x 10
Figure 8: Far-end talk detection.

6.2 Double talk detection


One of the more difficult parts with acoustic echo cancellation is to know when to stop adapting
the filter. The filter must only be adapted when there is far-end talk only, but not when there are
both far- and near-end talk. The near-end talk would make the system estimation process fail and
produce extremely erroneous results. Therefore it is necessary to detect when there is both far-
end and near-end talk. This is what is called double-talk and for that a double-talk detector
(DTD) is needed.

However there is a problem in the real case where near-end talk is not available by it self but only
in a combination with far-end talk in the microphone signal. The difficulty is to distinguish the
different sub signals and to know which is what.

13
There are several solutions to this problem and we have chosen to implement two of these and
study them in terms of performance and computational complexity.

Figure 9: Double talk detection. When both far-end talk and near-end talk is present a detection variable
(marked with a blue line) is set.

6.2.1 Geigel
The Geigel algorithm is a very simple DTD with low computational complexity. It is based on
the assumption far-end talk has lower power then the near-end talk when we receive the signal in
the microphone. The room will most likely have worked dampening on the far-end signal and the
volume on the speaker is with any luck not turned up too much. Practically we form a decision
variable as shown in Mathematical formula 5.

y (t )
d (t ) =
max{ x(t ) , K , x(t − n ) }

Mathematical formula 5: Geigel decision variable.

If d(t) becomes larger than some predetermined threshold there is doubletalk.

14
We implemented this and got it to work very well if the power of the far-end signal was
significantly lower then the near-end signal. This was an acceptable solution to the double-talk
problem, but the implementation areas we aimed at, with unknown speaker and microphone
positions, demanded a more flexible solution.

6.2.2 VIRE DTD


The Variance Impulse Response algorithm (VIRE) calculates the variance of the maximum value
of the recent taps in the adaptive filter.

Figure 10: The development of the taps during filter adaptation. When double talk is present the taps diverge
from the average.

If the variance exceeds some threshold, which could be varied over time, we have double-talk. In
other words, if the estimated room impulse response changes a lot, we assume that it is not the
room that has changed, but that some other source of sound has appeared. The formula is
somewhat complicated,

15
σ γ2 (t ) = λσ γ2 (t − 1) + (1 − λ )(γ − γ (t ))2
γ (t ) = λγ (t − 1) + (1 − λ )γ
{
γ = max hˆ1 , K, hˆn }
Mathematical formula 6: The VIRE algorithm.

with γ¯(t) being the mean of γ, and λ a forgetting factor4, though the calculation is very
lightweight as it only needs five multiplications.

forgettingFactor = 0.97; % Forgetting factor for VIRE.


Matlab code segment 4: Eventually the forgetting factor was set to 0.97. Test showed promising results using
0.5 aswell.

We got this algorithm to work very well, but it demanded certain tweaking that seemed to be
input specific. Especially λ, the forgetting factor, was a challenge to understand and optimize. To
get good results we needed to change it a lot, and we couldn’t find a good way to estimate it over
time.

% VIRE DTD
if k > 10,
tap(k) = max(abs(tempFilter));
tapmean(k) = forgettingFactor*tapmean(k-1) + (1-forgettingFactor)*tap(k);
variance(k) = forgettingFactor*variance(k-1) +
(1-forgettingFactor)*(tap(k)-tapmean(k))^2;
end

if(k > 10*filterLength && variance(k) > vireThres && k+DTMemory < length(xF))
isDT(k : k+DTMemory) = 1;
end
Matlab code segment 5: VIRE DTD algorithm.

Because the calculation is very lightweight, however, it fitted our purpose best of the algorithms
we read about, so we decided to go with it. Nonetheless there are some other algorithms that were
worth looking at.

4
Åhgren, Per (2004) On System Identification and Acoustic Echo Cancellation

16
Figure 11: The VIRE variance. When it exceeds the threshold (marked with a red line) it triggers the detector
(marked with a blue line).

6.2.3 Other
There are several other ways of detecting double talk. The Cheap Normalized Cross Correlation
(CNCR) algorithm for example is based on the comparison of the variances of the estimated
signal and the measured signal.

It might be a good idea to go with another DTD algorithm if you skip the real-time
implementation goal; it would have saved us a lot of time if we chose an easier algorithm and that
might would have given us better results as well.

7 Comfort noise generator


The purpose of the Comfort Noise Generator (CNG) is to create synthetic noise while the
controller unit has shut down output from the system. The reason why one would want to shut
down the output is that it is unnecessary to transmit data when near end is silent as it would mean
that only far end sound would be fed back to the sender. By having a Non-Linear Processor
(NLP) it is possible to stop this sound from being transmitted. The NLP is activated when there is
far-end talk but no double talk, and hence no near-end talk, as shown in Matlab code segment 6.

17
isNLP(k) = ( not(isDT(k)) && isFET(k) );
Matlab code segment 6: Setting status of NLP.

However, if one would choose to not transmit anything the user on the other end might suspect
that the line has gone down. To avoid this, comfort noise is sent instead. This noise is colored
according to the background noise in the room. It is done through an adaptive filter ĝ (t ) which is

calculated by LMS with the error signal used by the main filter ĥ(t ) as an adaptive parameter.

4
x 10 Non-Linear Processor Activation
2.5

1.5

1
Amplitude

0.5

-0.5
Non-Linear Processor
-1 Far end signal
Near end signal
-1.5
0 2 4 6 8 10 12 14 16 18
Sample 4
x 10

Figure 12: This figure shows the activation of the NLP according to far end talk and near end talk. The NLP
should be activated when there is far end talk but no near end talk.

White noise is created with the WGN (White Gaussian Noise) command in Matlab. We are
setting the strength of this noise statically (assigning -27 to a parameter specifying power in
decibels relative to a watt) as shown in Matlab code segment 7, but we would rather want to set it
dynamically according to the intensity of the ambient noise in the room. This however has proven
very difficult and therefore the noise level is adjusted to suit our equipment. If other equipment is
to be used this parameter may have to be altered.

18
whiteNoise(k) = wgn(1,1,-27);
Matlab code segment 7: The creation of white noise.

The coloring filter ĝ (t ) is updated if there is near-end talk as shown in Matlab code segment 8.

if(isNT(k))
% Adapt using LMS
CNGFilter = CNGFilter + mu0/2 * e(k) * whiteNoiseBlock;
end
Matlab code segment 8: Adaptation of comfort noise coloring filter.

Over time the filter will adapt to model the noise that is present in the near end room as
illustrated in Figure 13.

Comfort noise filter after 1700 samples Comfort noise filter after 35000 samples
1000 1000
Filter taps Filter taps
500
500
0
Gain

Gain

0
-500
-500
-1000

-1000 -1500
0 20 40 60 80 100 0 20 40 60 80 100
Tap Tap

Comfort noise filter after 79000 samples Comfort noise filter after 158000 samples
1000 2000
Filter taps Filter taps
500
1000
0
Gain

Gain

0
-500
-1000
-1000

-1500 -2000
0 20 40 60 80 100 0 20 40 60 80 100
Tap Tap

Figure 13: The comfort noise filter at 1700, 35000, 79000 and 158000 samples, respectively. Notice the slight
divergence at 158000.

Finally, the assembly of a number of white noise samples generated in Matlab code segment 7,
whiteNoiseBlock, is filtered through the coloring filter CNGFilter if the NLP is activated,

19
if(isNLP(k))
comfortNoise(k) = whiteNoiseBlock' * CNGFilter;
e(k) = comfortNoise(k);
end
Matlab code segment 9: Coloring of comfort noise.

and the comfort noise is set as the output. Figure 14 is showing what the colored noise looks like
at the times the NLP is active, that is comfortNoise(k). This result is added to the output signal.

CNG
2000
Comfort noise
1500 Non Linear Processor

1000

500
Noise Level

-500

-1000

-1500

-2000
0 2 4 6 8 10 12 14 16 18
Sample 4
x 10

Figure 14: The generated comfort noise as the NLP turns off the microphone output. The noise filter diverges
somewhat between 150000 and 165000 samples when we have undetected near end talk which create a louder
noise level but then starts to converge.

8 Measuring room impulse response


When the adaptive filter works optimal it will converge towards the impulse response of the
LEM system. A room impulse response will therefore be good to have for comparisons with the
filter. Determination of an impulse response is essential to have control over all the factors of the
simulation and also to be able to vary them.

To measure an impulse response of a filter (in our case room) there are several methods that can
be used, one can record the echo of a impulse such as a loud bang, one can use sine-waves of all
the different frequencies as input and see what the system does or one can record what comes out

20
of the system when white noise is used for input. In the latter two cases the response of the
system is deconvolved with the input and the resulting signal is the impulse response.

Figure 15: Impulse response in room 1116 at Magistern.

Using sine-waves will produce the best result but going though all the relevant frequencies is a
time consuming task. For the impulse method to work optimal an infinitely short pulse with
infinite height needs to be used. A loud bang such as a clap or a balloon popping would work but
in most cases not give a very good result. The final method of recording white noise has the
potential of giving a good result while it quite easily can be realized using Matlab and doesn’t
require anything other than a computer equipped with a microphone and a speaker. This is the
method that we have chosen.

To be able to use division in the frequency domain instead of deconvolution in time domain
white noise with constant power is needed. This is accomplished by generating the noise in the
frequency domain with random phase.

r=ones(1,fftLen/2+1); % constant amplitude


arg=[0 2*pi*rand(1,fftLen/2-1) 0]; % random phase
X_0_nyq=r.*exp(i*arg); % complex noise in freq. domain
X=[X_0_nyq conj(fliplr(X_0_nyq(2:end-1)))]; % mirror
x=ifft(X); % transform to time domain
Matlab code segment 10: Noise generation courtesy of Lars-Johan Brännmark.

Further improvement of the result can be achieved by using the mean of multiple recordings.
The periodic signal is then played back and recorded at the same time after which the recording is
divided into periods again.

21
periods = 32;
for j=1:periods
xx(((j-1)*fftLen+1):(j*fftLen)) = x; % make signal periodic
end
r = audiorecorder(fs,16,1);
sound(xx, fs); % play noise
recordblocking(r,length(xx)/fs); % record
noiserec = getaudiodata(r);
for j=1:periods
x_rec(j,:) = noiserec(((j-1)*fftLen+1):(j*fftLen)).'; % split into blocks
end
Matlab code segment 11: Recording of noise.

Finally the impulse response is calculated by dividing recording with noise in frequency domain
as presented in Matlab code segment 12.

for j=1:periods
X_REC = fft(x_rec(j,:)); % transform recorded noise
ir(j,:) = ifft(X_REC./X); % determine one impulse response
end
impulse_response = mean(ir);
Matlab code segment 12: Calculate impulse response.

9 Stereophonic Acoustic Echo Cancellation (SAEC)

In teleconferencing systems, stereo sound would offer a better user experience then in a mono
system. It would offer the users the possibility to distinguish between different voices by
determining which speaker delivers the sound. But to cancel acoustic echo that comes from two
speakers into two microphones, that is demanded if we want stereo sound, turns out to be a very
complex problem.

22
Figure 16: A typical Loudspeaker Enclosure Microphone (LEM) setup in the stereophonic case

One microphone signal, y1(t), can be modeled with the following equation:

y1 (t ) = h11 ∗ x1 (t ) + h12 ∗ x 2 (t ) + n(t )


Mathematical formula 7: Microphone signal

Where * denotes convolution, h11 and h12 are the room impulse responses for the different
speakers to the microphone and n(t) is the noise of the room. The other microphone signal can be
modeled similarly since the system is symmetrical. This will make it at least four times as
computation heavy as in the mono case.

One big problem with SAEC is what is commonly referred to as the non-uniqueness problem5.
This problem arises from the fact that the signals x1(t) and x2(t) are highly correlated since they
originate from the same source and the fact that in a typical scenario where you would like stereo
sound there are different people speaking alternatively6. The algorithms used must track both
near-end and far-end changes in the echo paths, which is not easy since they can change so
drastically if another person starts talking. It is therefore important to keep the rooms impulse
response estimate very close to the real room impulse response before the paths change, which
would demand fast adaptive filter, and that is of course very hard to accomplish7.

5
Sundar G. Sankaran (1999), On ways to improve adaptive filter performance
6
Masahiro Yukawa, Noriaki Murakoshi, and Isao Yamada (2005), Efficient Fast Stereo Acoustic Echo Cancellation
Based on Pairwise Optimal Weight Realization Technique
7
Åhgren, Per (2004) Stereophonic Acoustic Echo Cancellation

23
There are different ways to approach SAEC and the non-uniqueness problem. One way is to
reduce the correlation of the input signals, but without adding distortion to the signal. One way to
do this is to introduce nonlinearity in one of the input signals. You could also add random noise
to the input channels. A possible way to do this without destroying the sound totally would be to
add noise that the human can’t hear.

It seems very hard to solve the SAEC problem, and extremely hard to implement it in a
practically usable way. It would be very computational heavy and the different user-scenarios
that could come up makes SAEC a big, if not impossible, challenge for next years group.

10 Real time implementation


A sampling rate of 8000 Hz was chosen because by Nyquist theory, a band limited analogue
signal can be sampled at a frequency double its bandwidth and be recovered. Thus, by sampling
at 8 kHz, a voice signal which has a bandwidth of 4 KHz can be sampled and recovered.
Simulations have shown that filter with 10248 taps is suitable to model room impulse responses.

Apparently echo cancellation is a very demanding process9. We did not succeed in


implementation of real time software echo canceller running natively on a PC with the help of the
MATLAB software. Our case shows that using a 1024 taps long adaptive filter and sampling rate
of 8 kHz the necessary time for calculating 25 seconds of speech is approximately 39 seconds
and the time taken for the NLMS algorithm to converge was about 10 seconds.

Real time implementation is possible through the use of custom Very Large Scale Integration
(VLSI) processors or Digital Signal Processors (DSP). These processors are specially designed
for signal processing tasks and their computational power is very high10. They provide parallel
processing of commands. DSP programs work by hardware interrupts. Sampling at 8 kHz, the
sampling program will be interrupted every 125 µsec by the next sampled signal. In the case of a
40 MHz fast DSP each instruction takes 25 nsec to complete. Thus, there are 125 µsec/25 nsec =
5000 machine cycles available for echo canceling calculation before the next sampled signal
arrives11.

8
Berkeman, Anders and Owall, Viktor Ä Architectural tradeoffs for a custom implementation of an acoustic echo
canceller
9
Raghavendran, Srinivasaprasath (2003) Implementation of an Acoustic Echo Canceller Using Matlab
10
Åhgren, Per (2004) On System Identification and Acoustic Echo Cancellation
11
Chong Chew, Wee and Boroujeny, Farhang (1997) Software Simulation and Real-time Implementation of Acoustic
Echo Cancelling

24
Figure 17: A process diagram for handling incoming data continously.

Multiplication has several times higher complexity than an addition, so only multiplications are
considered when choosing proper DSP. The division operation used by the NLMS has a high
complexity, but it is not used as frequent as multiplications in this algorithm, and is therefore
considered negligible in the analysis.

Unlike Matlab where we have floating point data representation, DSP algorithms store the data
with finite precision. Unnecessary large word lengths on signals result in larger arithmetic blocks
and larger memories. Such extra hardware consumes power without any performance gain.
Therefore, all signals should have a minimum word length. On the other hand, as the signal word
length also determines resolution and dynamic range, there is a trade off between performance
and power consumption. It is important to keep signals wide enough to avoid overflow and
rounding errors, or at least keep the probabilities for such events to a minimum.

11 Views on further development


• Keep algorithms as simple as possible
• Continue investigation and implement real time adaptation
• Pay less attention to the Stereophonic Acoustic Echo Cancellation-problem

25
12 Figure index
Figure 1: SPCLAB result window. .................................................................................................. 5
Figure 2: ERLE plot......................................................................................................................... 6
Figure 3: ERLE plot with NLP ........................................................................................................ 7
Figure 4: A telephone conference using an IP-telephony system.................................................... 8
Figure 5: System overview.. ............................................................................................................ 9
Figure 6: Filter adaptation.............................................................................................................. 10
Figure 7: The error decline. ........................................................................................................... 12
Figure 8: Far-end talk detection..................................................................................................... 13
Figure 9: Double talk detection...................................................................................................... 14
Figure 10: The development of the taps during filter adaptation................................................... 15
Figure 11: The VIRE variance....................................................................................................... 17
Figure 12: Activation of NLP according to far end talk and near end talk.................................... 18
Figure 13: The comfort noise filter at 1700, 35000, 79000 and 158000 samples. ........................ 19
Figure 14: The generated comfort noise as the NLP turns off the microphone output.. ............... 20
Figure 15: Impulse response in room 1116 at Magistern. ............................................................. 21
Figure 16: A typical LEM setup in the stereophonic case ............................................................. 23
Figure 17: A process diagram for handling incoming data continously........................................ 25

13 Matlab code segment index


Matlab code segment 1: Using spclab to plot the results................................................................. 6
Matlab code segment 2: Dynamic calculation of the far-end detection threshold......................... 12
Matlab code segment 3: Measurement and comparison of far-end talk power. ............................ 13
Matlab code segment 4: Setting the forgetting factor.................................................................... 16
Matlab code segment 5: VIRE DTD algorithm. ............................................................................ 16
Matlab code segment 6: Setting status of NLP.............................................................................. 18
Matlab code segment 7: The creation of white noise. ................................................................... 19
Matlab code segment 8: Adaptation of comfort noise coloring filter............................................ 19
Matlab code segment 9: Coloring of comfort noise....................................................................... 20
Matlab code segment 10: Noise generation courtesy of Lars-Johan Brännmark. ......................... 21
Matlab code segment 11: Recording of noise................................................................................ 22
Matlab code segment 12: Calculate impulse response. ................................................................. 22

14 Mathematical formula index


Mathematical formula 1: Function c(n) to be minimized. ............................................................. 10
Mathematical formula 2: Adjustment of taps using LMS algorithm............................................. 11
Mathematical formula 3: Step length adjustment using NLMS algorithm.................................... 11
Mathematical formula 4: Adjustment of taps using NLMS algorithm. ......................................... 11
Mathematical formula 5: Geigel decision variable........................................................................ 14
Mathematical formula 6: The VIRE algorithm.............................................................................. 16
Mathematical formula 7: Microphone signal................................................................................. 23

15 Subject index
CNG, 17 DTD, 2, 3, 17
DSP, 24, 25 ERLE, 6, 7, 26

26
Geigel, 3, 15 Real time implementation, 24
LEM, 23 room impulse response, 3, 20, 23, 24
NLMS, 2, 3, 11, 18, 25 SAEC, 2, 22, 23, 24
NLP, 17, 18, 20 WGN, 18
Nyquist theory, 24 VIRE DTD, 3

16 Bibliography
1. Åhgren, Per and Jacobsson, Andreas (2006) Course material for course in adaptive signal
processing at Karlstad University
http://www.it.kau.se/ee/utbildning/kurser/tel614/Download.html

2, 4, 10. Åhgren, Per (2004) On System Identification and Acoustic Echo Cancellation
http://www.ahgren.com/publications/phdthesis.pdf

3. Liavas, Athanasios P. and Regalia, Phillip A. (1998) Acoustic Echo Cancellation: Do IIR Models
Offer Better Modeling Capabilities than Their FIR Counterparts?
http://www.telecom.tuc.gr/Greek/Liavas/publications/Acoustic%20Echo%20Cancellation%20Do
%20IIR%20Models%20Offer%20Better%20Modeling%20Capabilities%20than%20Their%20FI
R%20Counterparts.pdf

5. Sundar G. Sankaran (1999), On ways to improve adaptive filter performance


http://scholar.lib.vt.edu/theses/available/etd-122099-153321/unrestricted/Chapter07.pdf

6. Masahiro Yukawa, NoriakiMurakoshi, and Isao Yamada (2005), Efficient Fast Stereo Acoustic
Echo Cancellation Based on Pairwise Optimal Weight Realization Technique
http://www.hindawi.com/GetPDF.aspx?doi=10.1155/ASP/2006/84797

7. Åhgren Per, (2004), Stereophonic Acoustic Echo Cancellation


http://www1.shellkonto.se/ahgren/research_saec.html

8. Berkeman, Anders and Owall, Viktor Ä Architectural tradeoffs for a custom implementation of
an acoustic echo canceller
http://www.norsig.no/norsig2002/Proceedings/papers/cr1125.pdf

9. Raghavendran, Srinivasaprasath (2003) Implementation of an Acoustic Echo Canceller Using


Matlab
http://etd.fcla.edu/SF/SFE0000169/Raghavendran_thesis.pdf

11. Chong Chew, Wee and Boroujeny, Farhang (1997) Software Simulation and Real-time
Implementation of Acoustic Echo Cancelling
http://www.ece.mtu.edu/ee/faculty/rezaz/index_files/Seminapapers2004/Kashulpatel.pdf

27
17 Appendix

17.1 aec.m - For acoustic echo cancellation

%clear; close all;


disp('Initialize...')

filterLength = 1000; % Length of adaptive filter.


forgettingFactor = 0.97; % Forgetting factor for VIRE.
mu0 = 1; % Step size parameter.
DTMemory = 1000; % Number of samples of inactivity after a DT.
firstDoubleTalk = 10*filterLength; % Where to start double talk detection
noiseStrength = -27; % Statically assigned noise strength

% Load data files.

% % Uses our soundfiles


% OUR_SOUND_FROM = 5; %in seconds
% OUR_SOUND_TO = 15; %in seconds
%
% xF = wavread( 'kalle.wav' ) * 100000;
% xE = wavread( 'kalle_room.wav' ) * 100000;
% v = wavread( 'magnus_tyst.wav' ) * 100000;
% xF = downsample(xF, 5);
% xE = downsample(xE, 5);
% v = downsample(v, 5);
%
% OUR_SOUND_FROM = 44100/5 * OUR_SOUND_FROM + 1;
% OUR_SOUND_TO = 44100/5 * OUR_SOUND_TO;
%
% xF = xF(OUR_SOUND_FROM:OUR_SOUND_TO);
% xE = xE(OUR_SOUND_FROM:OUR_SOUND_TO);
% v = v(OUR_SOUND_FROM:OUR_SOUND_TO);
% fs = 44100/5;

% Uses test soundfiles


xF = readData( 'FarEnd.pcm' );
h = [0 1 -0.8 0.3 -0.1 0.1 -0.5 0.3 0];
% xE = filter(h,[1],xF);
% filterLength = 9;
xE = readData( 'FarEndEcho.pcm' );
v = readData( 'NearEnd.pcm' );
fs = 8000;
y = xE + v;

% calculate the variance of the voice, in the real case we don't have this
% so we use a statically assigned value instead
varianceOfVoice = var(v);
% Rough estimate of suitable treshold for VIRE DTD, should be done as a
% estimate in the loop since we don't have xF
vireThres = (mu0*varianceOfVoice / (2 - mu0))*mean(1/norm(xF)^2);

% the number of filters to save for plotting


numberOfSaves = 100;
saveAtSample = floor(length(xF)/numberOfSaves);

% Initialize adaptive filtering.


e = zeros( size(xE) ); % Error signal.
s = zeros( size(xE) ); % Estimated echo signal.

28
signalBlock = eps*ones(filterLength,1); % Adaptive filter state for time t.
if (exist('filterTaps') == 0 || not(length(filterTaps) == filterLength))
filterTaps = eps*ones(filterLength,1); % Adaptive filter weights.
end
tempFilter = eps*ones(filterLength,1); % Adaptive filter weights.
saveFilter = eps*ones(filterLength,numberOfSaves);
saveTempFilter = eps*ones(filterLength,numberOfSaves);

tap = zeros( size(xE) ); % the maximoum tap of the filter over time
tapmean = zeros( size(xE) ); % the mean of the max taps
variance = zeros( size(xE) ); % the variance of the max taps

isDT = logical(zeros( size(xE) )); % is double talk


isFET = logical(zeros( size(xE) )); % is far-end talk
isAdapt = logical(zeros( size(xE) )); % is adapt filter
isNLP = logical(zeros( size(xE) )); % is non linear processor

comfortNoise = zeros( size(xE) );


CNGFilterLength = 100;
if (exist('CNGFilter') == 0 || not(length(CNGFilter) == CNGFilterLength))
CNGFilter = eps*ones( CNGFilterLength,1 );
end
saveCNGFilter = zeros(CNGFilterLength,numberOfSaves);
whiteNoise = wgn(length(xF),1,noiseStrength); %zeros( size(xE) );
whiteNoiseBlock = eps*ones( CNGFilterLength, 1 );

% Calculate the far-end detection threshold, not for the real case
% should be done in the loop using fethres = max(fethresh, power(last fs samples))
% or statically assigned
temppow = zeros(floor(length(xF)/fs)-1, 1);
for k = 1:length(temppow),
temppow(k) = xF((k-1)*fs+1:k*fs)' * xF((k-1)*fs+1:k*fs) / fs;
end
farendThres = max(temppow) * 1/10; % FE threshold is 1/10 of the average power

% alternative way of calculating far end threshold


%farendThres = max(smooth(xF.^2, fs)) / 10;

% Perform the adaptive filtering.


disp('Perform adaptive filtering...')
q = waitbar(0,'Adapting filter... Please hold on...');
for k = 1:length(xF),
% whiteNoise(k) = wgn(1,1,noiseStrength); % generate white noise continously

% Update the filter signalBlock.


if k > filterLength,
signalBlock = flipud(xF(k-filterLength+1:k));
whiteNoiseBlock = flipud(whiteNoise(k-CNGFilterLength+1:k));
end
s(k) = signalBlock' * filterTaps; % Estimated echo value.
e(k) = y(k) - s(k); % Prediction error.

energyOfSignalBlock = signalBlock' * signalBlock;


powerOfSignalBlock = energyOfSignalBlock / filterLength;
isFET(k) = (powerOfSignalBlock > farendThres);

% Always adapt temp filter using NLMS


tempFilter = filterTaps + (mu0/(energyOfSignalBlock + 1)) * e(k) *
signalBlock;

% VIRE DTD
if k > 1,

29
tap(k) = max(tempFilter);
tapmean(k) = forgettingFactor*tapmean(k-1) +
(1-forgettingFactor)*tap(k);
variance(k) = forgettingFactor*variance(k-1) +
(1-forgettingFactor)*(tap(k)-
tapmean(k))^2;
end

if (variance(k) > vireThres && k > firstDoubleTalk)


isDT(k : min([k+DTMemory length(xF)])) = 1;
end

isNLP(k) = isFET(k) && not(isDT(k));


isAdapt(k) = isNLP(k);
isNT(k) = not(isFET(k)) && not(isDT(k));

if (isNT(k))
% Adapt comfort noise filter using LMS
CNGFilter = CNGFilter + mu0/2 * e(k) * whiteNoiseBlock;
end

if (isNLP(k))
comfortNoise(k) = whiteNoiseBlock' * CNGFilter;
e(k) = comfortNoise(k);
end

if ( isAdapt(k) )
% use the temp filter on the signal (in the next iteration)
filterTaps = tempFilter;
end

% save filters regularly for plotting


if (mod(k, saveAtSample) == 0)
saveCNGFilter(:,k/floor(length(xF)/100)) = CNGFilter;
saveFilter(:,k/floor(length(xF)/100)) = filterTaps;
saveTempFilter(:,k/floor(length(xF)/100)) = tempFilter;
waitbar( k/length(xF), q ); % update progressbar
end
end
close(q);

% Display the far-end, near-end, measured signal, and the result


% after removing the estimated echo.
indV = 1:length(xF);
spclab( xF(indV), v(indV), y(indV), e(indV), isDT(indV)*max(e)/2,
isFET(indV)*max(e)/2, isAdapt(indV)*max(e)/2 );
figure;
plot (isFET*max(xF), 'DisplayName', 'Far-end voice detection');
title('Far-end voice detection');
hold all;
plot (xF, 'DisplayName' , 'Far-end voice signal');
hold off;
figure;
plot (isDT*max(xF), 'DisplayName' , 'Double talk detection' );
title('Double talk detection');
hold all;
plot (abs(xF), 'DisplayName', 'Far-end voice signal');
plot (-1*abs(v), 'DisplayName' , 'Near-end voice signal');
hold off;

figure;
plot (y, 'DisplayName' , 'Recorded signal');
title('Resulting signal');
hold all;
plot (e, 'DisplayName' , 'Echo cancelled signal');
hold off;

30
figure;
semilogy (isDT*mean(variance)*5, 'DisplayName', 'Double talk detector');
title('Double talk variance/threshold');
hold all;
semilogy (smooth(variance,1000), 'DisplayName', 'VIRE variance');
semilogy (vireThres*ones(length(xF),1), 'DisplayName', 'Double talk
threshold');
hold off;
figure;
plot (isNLP*max(xF), 'DisplayName', 'Non linear processor' );
title('NLP');
hold all;
plot (abs(xF), 'DisplayName', 'Far end signal');
plot (-abs(v), 'DisplayName', 'Near end signal');
hold off;
figure;
plot (comfortNoise, 'DisplayName' , 'CNG' );
title('CNG');
hold all;
plot (isNLP*max(comfortNoise), 'DisplayName', 'Non linear processor');
hold off;
figure;
plot (tap, 'DisplayName' , 'Largest tap' );
title('Tap development');
hold all;
plot (variance, 'DisplayName', 'Variance of largest tap');
hold off;
figure;
C:\Documents and Settings\Patrik\Skrivbord\aec.m 5 of 5
den 8 juni 2006 11:57:45
hold off;

for k=1:length(saveFilter(1,:))
plot(saveFilter(:,k));
axis([1 filterLength -1 1]);
title(k);
drawnow;
pause(0.1);
end

figure;
hold off;
for k=1:length(saveCNGFilter(1,:))
plot(saveCNGFilter(:,k));
title(k);
drawnow;
pause(0.1);
end

% Plot ERLE using NLP:


plot( smooth(-10*log10((((e(1:100000).*not(isNLP(1:100000)))).^2+eps) ./
((y(1:100000)).^2+eps)+eps), 1000) );

% Plot ERLE not using NLP:


plot( smooth(-10*log10((((e(1:100000))).^2+eps) ./
((y(1:100000)).^2+eps)+eps), 5000) );

17.2 ir.m - For calculating impulse response of a room

% Hej!

31
%
% Ett knep jag brukar använda är att generera en periodisk brussignal, där periodtiden
% måste vara större än längden på rummets impulssvar (ett vanligt vardagsrum har
% typiskt ca 0.5 sek impulssvar). Sedan beräknar man fft:n blockvis, med
% blocklängd = brusets periodtid. För att inte tappa noggrannhet, är det viktigt
% att X:s absolutbelopp är ca =1 för alla frekvenser, vilket löses genom att generera
% bruset i frekvensdomänen, med konstant belopp och slumpmässig fas, tex så här:

pause(30)

fs=44100; %Samplingsfrekv
fftLen=fs*1; %Impulssvaret ej längre än 0.5 sek, blocklängden blir fs/2

%Konstant belopp mellan 0 Hz och nyquist


r=ones(1,fftLen/2+1);
% fasvinkel 0 vid 0 Hz och nyquist, resten slumpmässigt, likformigt
% fördelat mellan 0 och 2*pi
arg=[0 2*pi*rand(1,fftLen/2-1) 0];
%Generera komplext brus i frekv. domänen
X_0_nyq=r.*exp(i*arg);
%Spegla och konjugera spektrum mellan nyquist och 2*pi
X=[X_0_nyq conj(fliplr(X_0_nyq(2:end-1)))];

x=ifft(X); %Beräkna tidssekvensen (skall vara reell om allt funkar som det skall)
x=0.99*x/max(abs(x));
periods = 32;
xx = zeros(1, periods*fftLen); %Periodisera, t.ex. 12 perioder
for j=1:periods
xx(((j-1)*fftLen+1):(j*fftLen)) = x;
end

% Vid identifieringen, använd fs/2 punkter i fft:n. Gör som ni gjort tidigare, men nu
% för varje block, och medelvärdesbilda. Brusigheten kan fås ner ytterligare om ni
% klipper bort eventuell inledande tystnad i början av inspelade bruset, alternativt
% medelvärdesbildar över fler perioder.

r = audiorecorder(fs,16,1);
sound(xx, fs);
recordblocking(r,length(xx)/fs);
noiserec = getaudiodata(r);

x_rec = zeros(periods,fftLen); % initiera array för periodisk inspelning


ir = zeros(periods,fftLen); % initiera array för impulssvar
for j=1:periods
x_rec(j,:) = noiserec(((j-1)*fftLen+1):(j*fftLen)).'; % dela upp inspelning i
block
X_REC = fft(x_rec(j,:));
ir(j,:) = ifft(X_REC./X); % bestäm impulssvar
% [c,b] = deconv(x_rec(j,:),x);
% ir(i,:) = c;
end

% the final impulse response


ir_avg = mean(ir);

32

Вам также может понравиться