Академический Документы
Профессиональный Документы
Культура Документы
I. INTRODUCTION
The performance of the Acoustic Echo Cancellation (AEC)
highly depends on the platform, more specifically audio
interface, interface driver & related hardware, pre & post
amplifier if any, and characteristics of microphone and
speaker. So, achieving full duplex voice communication on a
wide variety of platforms with sufficient echo cancellation is a
research challenge. Hence, majority of echo cancellers require
platform specific tuning to provide optimal performance. The
major components of AEC are Adaptive Echo Estimation
Filter (AEEF) and corresponding adaptation algorithm.
Usually, AEEF is implemented in integer format
representation of either 16bit (mostly) or 32 bit (specific
cases) due to high resource requirements of implementation
in fractional format. This limits the AEC performance to
limited range of ERL cases. Under high negative ERL, filter
coefficients may saturate or underflow or overflow. One
simple way to address this problem is by attenuating the
microphone signal used for error estimation. As this error is
used for adaptation of AEEF, filter coefficients will not
saturate or overflow or under flow. However, error signal is
re-scaled to obtain the actual error and it is processed in the
subsequent blocks. Similarly, when ERL is very high, say >
30dB, the integer representation of coefficients is not enough
to estimate the echo. In this case, the far-end signal input to
AEC as reference will be sufficiently attenuated to estimate
the echo. In a varying echo path environment and for diverse
platforms, dynamic estimation of attenuations for microphone
signal and far-end signal is required for general purpose AEC
Version 1.0
(generallycalledaserrorsignal).ThisSTdetectionalgorithm
relies on AEEF cancellation of at least 6dB. Hence, ERL
measurement isinvokedwhenever powerlevel ofechofree
signalisatleastapredefinedthresholdof6dBlowerthanthe
powerlevelofthemicrophonesignal.Figure2illustratesthe
lowlevelblockdiagramofERLestimation.
Process Next Frame
Gain Tuning
Estimate Frame energy of x(n), d(n) and e(n)
Attenuator
M
Amplifier
M
Px(l)
NLP &
CNG
Estimate Long
term Avg. of Px(l)
h(n)
PxLT(l)
Convergence
Stability Detector
NLP
Threshold
Tuning
NO
PxLT(l)> 2
Pd(l)< 3
PxLT(l) > 1
adpt_cntr
> 3
PdL(l), PxL(l)
YES
NO
Single
Talk ?
YES
YES
Attenuator, N
Estimate long
term Avg. of
PdW(l) and PxW(l)
NO
Power Loss
Estimation
YES
ST
Discrimination
ERL
Confirmation
NO
Attenuate
Mic. signal
Full
Duplex
Full
Duplex
Full Duplex
YES
NO
Half
Duplex
Half Duplex
Version 1.0
END
Computelongtermpoweraveragingoffarendsignal
andcomparewithapredefinedthreshold,.
Px (l )
N 1
x(i)
Where,Wiswindowof32frames.
a. Onsuccess,computelongtermpoweraveraging
ofmicrophonesignalandfarendsignal.
PdL (l )
PxL (l)
3.
Onfailure,AECwoksinfullduplexmodeifpast
modeofAECisinfullduplexotherwiseAEC
worksinhalfduplexmode.
Computetheratiooflongtermpoweraveragingof
microphonesignaltolongtermpoweraveragingof
farendsignal.
PdL (l )
L
Px (l )
ERL 10 log 10
i 0
P (l ) P (l 1) (1 ) Px (l )
Where, N istheframesizeof10msec.
LT
x
LT
x
a.
2.
Onsuccess,itindicatessingletalkregionsand
proceeds with ERL estimation as described in
belowsteps.
b. Onfailure,AECworksinfullduplexmode.
Compute short term power of microphone and far
endsignalsandcomparetheratiowithapredefined
threshold,.
Pd (l )
N 1
d (i)
i 0
Pe (l )
ERL
Logarithmic
scale (dB)
-18
-12
-6
-3
0
3
6
12
18
24
Linear scale
0.0158
0.0631
0.2512
0.5012
1
1.9953
3.9811
15.8489
63.0957
251.1886
N 1
e(i)
A. MicrophoneandFarendAttenuationTuning
i 0
PdW (l )
W 1
Pd (l )
i 0
PeW (l )
W 1
P (l )
e
i 0
Version 1.0
AstheAEEFcoefficientsaregenerallyrepresentedbyinteger
format,negativeERLwillleadstosaturation.So,appropriate
microphone and farend gain adjustment should be done so
thatfiltercoefficientswillnotsaturate.Thedefaultvaluesof
microphoneattenuationandfarendattenuationareprovidedin
thefollowingTable.2fordifferentaudiomodes.Thesedefault
Microphone
Attenuation
Far-end
Attenuation
4
0
0
1
2
5
Microphone
Attenuation
3
4
2
Far-end
Attenuation
1
1
1
Microphone
Attenuation
0
0
0
Far-end
Attenuation
0
2
5
Microphone
Attenuation
0
0
0
Far-end
Attenuation
3
5
6
Speakermodetuning
Ingeneral,ERLinspeakeraudiomodemay
belowerthan0dBandsomeplatformshavenegative
ERLof24dB.Thedefaultgainvalueswillbesetto
handleverylowERLcasessothatfilteranworkson
wide range ofERL up to24dB. However, high
attenuation of the microphone signal affects the
Version 1.0
Handsetmodetuning
Ingeneral,ERLinhandsetaudiomodeismore
than12dB,i.e.,theechopowerlevelatmicrophoneisless
than the four times the farend signal power level at
handsetspeaker.Thedefaulttunablevaluessupportsfor
AEEF
adpt_cntr
20ms
Yes
Timer
==
20ms
No
DT/Noise/Near End
Alone
Yes
ST Alone
No
iii.
Parse Echo
Path Filter
~
W (l , s )
Dominant
Effective
Echo Region
~
We (l , k )
~
Wed (l )
Modeled
Echo Path
Variation
Parameter
NLP
Threshold
Tuning
lt (l )
NLP
Threshold
Confirmation
Attenuation
Update
InadditiontogaintuningwithrespecttoERLestimation,if
convergencestabilitydetectorindicatesanysaturationinthe
AEEFcoefficients,gaintuningisdoneaccordinglytoavoid
saturationsinAEEFcoefficientsforechocancellation.
Process Next
Frame
No
ERL Estimator
Module
Frozen?
Yes
END
Version 1.0
Microphone
Attenuation
Increment
Retain
Decrement
p K *4
(h(n)) ;
E (i )
n p
p 1, K * 4,.., k * K * 4;
( N 4 * fs )
k 1,2,..,
3 * fs
forthepresentdiscussionK=3;
3.
4.
5.
Acoustic
Region
Noise/Near end
Double talk
Single Talk Echo
10
ET (l )
E d (l , p ) max E (i ) 1 p 3 ;
4000
i 1to10
2000
E (l )
~
ed (l ) d
th
PT (l )
0
-2000
-4000
0.5
1.5
2.5
5
x 10
6.
0.8
DEER
0.6
0.4
s (l )
0.2
~
(W (i ) Wavg (i )) 2
i 1
3
Samples -->
6
5
x 10
7.
2.
E (i)
i 1
Impulse Response
1.
1 i 10;
Version 1.0
s (l )
8.
s (l )
max E d (l , p )
lt (l ) * lt (l ) (1 ) * s (l )
where is long term averaging factor.
Re gion B; 1 lt (l ) 2 : Partial
duplex mode,
c.
d.
Re gion C ; 2 lt (l ) 0 : Full-duplex
mode,
Also, in a case where in far end is active but
impulse response is not at all modelled,
results in very low MEPV estimate. This
leads to echo leakage because of low value,
in such cases is set to maximum to
achieve half duplex communication.
ii.
If any double talk or near end data
appears, is in same region then
counter is reset.
iii. If changes frequently to other
regions then counters in other
regions are incremented. At the
end of 400ms time frame, is
updated with value; corresponding
to the region with maximum
counter value.
iv. For the above case: iii, counters in
other regions are reset after 400ms.
12. NLP threshold confirmation module verifies whether
NLP threshold parameter is frozen
a. If frozen, then NLP threshold tuning
parameter value is frozen and algorithm is
stopped.
b. If not frozen, then NLP threshold tuning
parameter is updated and algorithm is
repeated.
V. CONCLUSION
An Automatic AEC tuning algorithm is proposed. The
proposed algorithm estimates ERL and tunes attenuation
factors of microphone, far-end for higher echo cancellation
and also estimates the convergence stability of modeled echo
path to update NLP threshold parameter to provide echo free
output. Based on platform characteristics and modeled echo
path, tuned NLP threshold parameter triggers full duplex,
marginal duplex or half duplex voice communication.
VI. REFERENCES
Version 1.0
Version 1.0