Two-Sensor Adaptive Noise Cancellation

1
Two-Sensor Adaptive Noise Cancellation

Douglas A. Mann, Member, IEEE
AbstractA comparative performance analysis of adaptive noise cancellation (ANC) using the least-mean squares (LMS) and optimal wiener filtering algorithms is presented. The robustness to realistic non-stationary noise environments is also explored using simulations in MATLAB. An adaptive noise canceller system is also tested in the recording studio using the LMS algorithm. Index Termsnoise cancellation, wiener filter, lms
Primary Input d(n)
+ _
System Output
Reference Input u(n) Adaptive Filter
Error Signal e(n)
Fig. 1. Two-sensor adaptive noise canceller
I. INTRODUCTION
II. WIENER FILTERING The Wiener-Hopf solution provides the optimal coefficients for the Nth-order all-zero transfer function, but does not give information regarding whether the filter order is sufficient. Since both the desired clean speech signal and the transfer function are unknown, the best estimate would result from the minimization of the energy between the primary signal and the filtered reference signal. As the filter order increases, the error signal between the noise estimate and the reference signal decreases. At a certain point, the error signal converges to a value where the benefit of additional order becomes insignificant; an order considered sufficient for the transfer function representation. Since the Wiener-Hopf solution requires the noise to be stationary, the variance is timeinvariant, allowing the evaluation of the minimum meansquare-error. The desired signal is assumed to be a sinusoid corrupted by a white noise source with a mean of zero and variance of 0.5. It can be seen in figure 2, that a filter order of 20 gives a good estimate to minimize MSE for the mentioned scenario. The flat region shows that MSE is relatively constant for orders 20 through 50, demonstrating that minimal gains are realized beyond an order 20 transfer function. For the purposes of the Wiener ANC implementation, a 20th order FIR filter is used to evaluate the performance of the system under different noise conditions. In addition to white Gaussian noise, tests include non-stationary cafeteria, laundry room, and party noise environments.
he adaptive filtering solution presented uses a twomicrophone setup to reduce the background noise in a speech recording. This approach uses one microphone for the primary input signal and a second microphone for the reference noise estimate, as shown in figure 1. The primary signal consists of the relevant speech signal corrupted by a noise signal. The noise signal can be considered as background noise in an acoustic environment or as additive noise contributed by a communications channel. The reference input signal is adaptively filtered and the result subtracted to obtain a clean speech signal estimate. For optimal performance, it is assumed that the noise and the speech signals are uncorrelated. This assumption also implies that reference microphone has complete isolation from the speech signal. In general terms, the goal is to remove the noise component from the primary signal, with clean speech signal remaining. To eliminate the noisy component, it is necessary to determine the transfer function between the reference signal and the noisy component of the primary signal. Although the true order and coefficient values of this transfer function are unknown, an arbitrary Nth-order all-zero model filter can be used to estimate the transfer function. For each Nth-order filter, N optimal filter coefficients exist for the estimation of the transfer function. In this paper, the filter coefficients will be determined using both the Wiener-Hopf solution and the Least-Mean-Squares (LMS) algorithm. As discussed in Section II, the Wiener-Hopf solution provides the exact optimal coefficients for a stationary uncorrelated noise source. While the Wiener-Hopf approach provides the optimal solution, the computational overhead for the filter weight determination can be prohibitively exhaustive, leading to other algorithms such as LMS, which is discussed in more detail in Section IV.
III. REAL-TIME WIENER As expected, the Wiener-Hopf algorithm performs quite well in eliminating stationary white noise. In comparison, the algorithm had greater error using the time-varying nonstationary noise environments. While the squared error signal was visibly worse for the non-stationary noise, the perceived audio quality was nearly identical to the uncorrupted speech signal. In an attempt to develop a possible performance increase, the time-invariant noise environment can be
D. A. Mann is a music engineering graduate student at the University of Miami, Coral Gables, FL 33124 USA (e-mail: d.mann@umiami.edu).
2
6.905
5 4
6.91
x 10
white noise 5 4 3
e2
x 10
laundry room
3
e2
6.915
2 1
2 1 0
Minimum MSE dB
6.92
0.5
1 secs
1.5
0.5
1 secs cafeteria
1.5
6.925
5 4
x 10
party crowd 5 4 3
e2
x 10
6.93
3
e2
2
6.935 0 10 20 30 40 50 Order M 60 70 80 90 100
2 1 0
1 0
Fig. 2. Minimum MSE vs. MA filter order
0.5
1 secs
1.5
0.5
1 secs
1.5
considered stationary over short time periods or blocks. The signal can be considered wide-sense stationary over each block period with a constant mean and variance. A typical speech processing block size of approximately 20 milliseconds is used to track the changes in the noise statistics. For each block, the input autocorrelation matrix R and the crosscorrelation of input and desired signal p are computed. Using the product of the inverse autocorrelation matrix and crosscorrelation vector, according to (1), the optimal wiener filter coefficients are determined for the design of the all-zero block noise filter. After filtering the noise reference though the designed moving average (MA) filter, the result is subtracted from the primary signal to provide noise-reduced block.
Fig. 3. Error-square for stationary white noise and non-stationary laundry room, party, and cafeteria noise using the Wiener-Hopf algorithm.
w0 = R p
(1)
latency is less than 50 milliseconds, the block-size must be approximately 2048 for a sampling rate of 44100 samples per second. Using a block size of 2048 the block wiener algorithm performs well for all noise environments except cafeteria noise. Only after increasing the block size to 4096 samples does the noise suppression become reasonably acceptable, presenting a buffering latency of at least 92 milliseconds, which is too long for pseudo real-time applications. The square-error signal for the cafeteria noise environment is shown in figure 4. It can be seen that the error is quite significant at the transient locations when the block size is less than 4096 samples. This discovery leads to the benefits of using the LMS algorithm for adaptive noise cancellation.
The results of this approach are summarized in figure 3. It can be seen that the squared error between the pure signal and the noise-cancelled version is significantly greater for the block processing approach compared to that obtained by analyzing the entire recording. The short-time block allows for the tracking of the time-varying changes of the noise characteristics, but is especially sensitive to the transient noises like the plates clanking in the cafeteria environment. This author expected the short time block approach to be better at eliminating the time-varying noise characteristics, but the analysis proved otherwise. Decent noise suppression was produced, but for the highly non-stationary cafeteria noise environment there was some significant audible artifacts at where transient audio events occurred in the noise reference signal. Further investigation shows that where the mean and variance change drastically from block to block the severity of these artifacts increase. The time-varying changes were also observed in spectrographic analysis of the noise signals. Increasing the block size reduces the wiener filter coefficient sensitivity to transients by averaging them out with the dominating overall noise statistics. In order to have an effective real-time implementation, where the buffering
IV. LEAST-MEAN-SQUARES ALGORITHM As previously mentioned, the Wiener-Hopf solution perfectly eliminates stationary noise and performs well for even non-stationary noise. Unfortunately, the Wiener-Hopf solution requires that the entire signal be available for the determination of the optimal filter coefficients, presenting an unrealistic scenario for real-time adaptive noise cancellation. As previously discussed, even using an acceptable block implementation of the Wiener-Hopf solution requires buffer sizes longer than allowed for a real-time implementation. The LMS algorithm remedies this problem by adapting the filter weighs according to the incoming audio data as it is being received. The LMS algorithm is robust enough for a variety of signal conditions due to its adaptive nature. The LMS algorithm involves the computation of the output of a linear filter in response to the noise reference and the generation of the estimation error between this output and the desired response. The estimation error is used in the adjustment of the filter weighs. The LMS algorithm is summarized below in (2) through (6). To use the LMS algorithm, a filter order must be determined as well the step-
3
1024 samples 0.05 0.04 0.03
e2 e2
2048 samples 0.05 0.04
Corrupted =0.001 2 2
Desired =0.001
1
0.03 0.02 0.01 0
0.02 0.01 0
0.5
1 secs
1.5
0.5
1 secs
1.5
200
400 600 iterations Output =0.001
800
1000
200
400 600 iterations Error =0.001
800
1000
4096 samples 0.05 0.04 0.03

e2 e2
8192 samples 0.05
2
0.04 0.03 0.02 0.01 0
0.02 0.01 0
1
0 0.5 1 secs 1.5 2
0.5
1 secs
1.5
Fig. 4. Short-time Wiener filtering for block sizes of 1024, 2048, 4096, and 8192 samples
200
400 600 iterations
800
1000
200
400 600 iterations
800
1000
Fig. 5. LMS noise cancellation using sinusoidal input = 0.001
w(n + 1) = w(n) + 2 e(n)x(n)

where
(2) (3) (4)

T
1 2 avg
(5)
y(n) = w (n)x(n)
T
e(n) = d(n) y(n)
w(n) = [w0 (n) w1 (n)w M 1 (n)]
(5)
T
x(n) = [x(n) x(n 1) x(n M + 1)]
(6)
size parameter . To maintain consistency, the filter order will remain at M=20 as in the Wiener-Hopf implementation. The step-size parameter selection dictates the rate of convergence as well as the misadjustment. Typically the step-size parameter is small compared to unity. A small presents a small misadjustment, but requires more time for convergence. In contrast, a large converges quickly, but has greater misadjustment errors. In order to ensure stability the maximum step-size parameter is given by (7). The initial rate of convergence is dominated by the fastest mode (8) and the final rate of convergence is dominated by the slowest mode (9). This means that it can take a long time to converge when or lambda min is small or when the eigenvalue spread is large.
<
max
(7)
1 2 max 1 2 min
(8) (9)
A simple simulation was developed to demonstrate the ability of the LMS algorithm to remove unwanted noise from a sinusoidal signal. This evaluation shows the number of iterations for noisy tonal signal to approximate a pure sinusoid for various step-size parameters. It can be seen in figure 6 that for the large step-size =0.1 the filter becomes unstable resulting in the large error output and error signals in the lower two subplots. For a relatively small = 0.001, the signal has not yet converged after 1000 iterations, evidenced by the tapered error signal. The step-size of 0.01 is stable and closely approximates the desired signal at 1000 samples. The results of the simple simulation give insight into what range of values give a reasonable convergence time and stability for further analysis of a more realistic simulation. Various trials were conducted for the different noise environments to determine a suitable step-size parameter for all conditions. As expected, each noise environment requires different durations before reaching convergence. Assuming the same step-size, the whiter the noise environment, the faster it reaches convergence. According to (7), the upper limit for the selection of using the cafeteria noise is approximately 29, which far exceeds the stable range found in simulation. According to (5), for a training period of 1 second, a = 0.0016 would be required. Step size parameters of 0.1, 0.01, and 0.001 were used to get a course estimate of the performance characteristics. In all cases, except the white noise, = 0.001 was not suitable for most applications, as convergence was not reached until after 5 seconds. In contrast, the selection of = 0.1 showed a quick convergence for all cases, but has a greater misadjustment. For = 0.01, a good compromise was reached for the application of real-time audio noise cancellation. The training period, where convergence is reached, takes approximately a second for the cafeteria
4
Weight track for w0 to w4
Corrupted =0.1 2 2
Desired =0.1
1.2 1 0.8
weight
w0 w1 w2 w3 w4
0.6 0.4 0.2 0 0.2 0 0.2 0.4 0.6 0.8 1 secs Learning Curve 0.1 0.08 0.06 0.04 0.02 1.2 1.4 1.6 1.8
200
400 600 iterations Output =0.1
800
1000
200
400 600 iterations Error =0.1
800
1000
MSE
0.2
0.4
0.6
0.8
1 secs
1.2
1.4
1.6
1.8
iterations Fig. 6. LMS noise cancellation using sinusoidal input = iterations 0.1
200
400
600
800
1000
200
400
600
800
1000
Fig. 7. Weight track and learning curve for cafeteria.wav using = 0.01
environment. The tradeoff of this longer training period is a reduction in the misadjustment, resulting in a better approximation of the actual minimum-mean-square error. Other simulations using tonal input signals resulted in the filter weights becoming unstable for a = 0.1, so a smaller value would ensure a greater chance of stability. The filter weight training for the cafeteria noise is shown in figure 7. Only weights w 0 through w 4 of the 20 coefficient weighs are shown to clearly illustrate the weight adaptation. It can be seen in that the transient plate clank sound at approximately one second requires a drastic adjustment of the filter weights as well as increase in the MSE in the learning curve plot.
V. ACTUAL PERFORMANCE To compare the simulated performance to actual performance, the following test was conducted using the Weeks Recording Studio at the University of Miami. A pair of AGK 414 super-cardioid microphones were arranged as a stereo Blumlein pair as shown in figure 8. The goal was to achieve maximal isolation of the speech signal from the reference signal, while minimizing acoustic delay. The mounting configuration of the Blumlein pair seemed a good choice given these constraints. The party noise sound file was played through a loudspeaker directed at the ceiling on the far side of the room to create a diffuse noise environment. The speaker was seated approximately one meter from the primary microphone on-axis with the microphone. After several trials, it was determined that there was too much leakage into the reference microphone using this configuration, so movable acoustic isolation panels were added to reduce reflections into the reference microphone, as shown in figures 9 and 10. The final configuration, as shown in figure 10, provided good isolation from speech signal. As in the simulation, the audio used a sample rate of 44100 samples/sec and 16-bit resolution.
The recorded audio was processed using the same algorithm as in the simulation. To account for the acoustic delays, the cross-correlation reference signal and the primary signal was computed to determine the sample delay due to acoustic propagation times. Using this method for the final configuration, a delay of 1980 samples or 44.9ms was computed between the reference and primary arrival time. To account for this, the primary signal was delayed by 1980 samples to improve the correlation of the input and reference signals. Despite this adjustment, the results of this test still proved unsatisfactory. It is unclear exactly what led to the failure of the algorithm, but it is likely due to the acoustic echoes inherent to the room. The differences between the noise component of the primary signal and the reference noise estimate could in fact be quite different due to reflection, and attenuation caused by the acoustic treatments in the room. In further tests, it might be helpful to perform a similar test in an anechoic chamber to prove or disprove this hypothesis. System identification could be performed to account for the filtering characteristics of the room at the reference sensor versus the primary sensor. Once identified, the filter could be used to condition the signal to improve the estimate. VI. CONCLUSIONS It has been shown that the Wiener-Hopf method provides the optimal solution, but is not feasible for many real-time implementations. The tracking nature of the LMS algorithm allows for a close approximation of the wiener optimal solution after reaching convergence. The convergence time is strictly determined by the step-size factor, which also controls how closely the mean-square-error comes to the optimal value. Under simulation, the LMS algorithm performed quite well for all environments using step-sizes less than 0.1. Testing shows that a more sophisticated design would be necessary to implement the adaptive noise canceller into a realistic hardware solution.
Fig. 8. Blumlein pair stereo arrangement shown from speaker perspective between two acoustic isolators.
Fig. 10. Improved separation configuration. Left - primary sensor, right reference sensor.
REFERENCES
[1] A. Poularikas and Z. Ramadan, Adaptive filtering primer with MATLAB. CRC, 2006. [2] S. Haykin, Adaptive filter theory. Prentice hall, 2002. [3] L. Rabiner and B. Juang, Fundamentals of speech recognition. 1993. [4] E. Hnsler and G. Schmidt, Acoustic echo and noise control: a practical approach. Wiley-IEEE Press, 2004. [5] B. Widrow, J. Glover Jr, J. McCool, J. Kaunitz, C. Williams, R. Hearn, J. Zeidler, E. Dong Jr, and R. Goodlin, Adaptive noise cancelling: Principles and applications, Proceedings of the IEEE, vol. 63, no. 12, pp. 16921716, 1975. [6] M. Brandstein and D. Ward, Microphone arrays: signal processing techniques and applications. Springer Berlin, 2001. [7] J. Benesty and Y. Huang, Adaptive signal processing: applications to real-world problems. Springer Verlag, 2003. [8] J. Benesty, J. Chen, Y. Huang, and B. Rafaely, Microphone array signal processing, The Journal of the Acoustical Society of America, vol. 125, p. 4097, 2009.
Fig. 9. Improved separation configuration (speaker perspective side view)
The noise samples used in this project as well as the noise reduced versions of the speech signals can be downloaded from http://downbitaudio.com/een689final. Also available are audio recordings from the studio session as stereo audio files for audio takes using different distances and configurations. The left channel is the primary signal and the right channel is the reference signal. VII. ACKNOWLEDGEMENT I would like to thank Dr. Paul Mermelstein for his teaching efforts in the EEN698 adaptive filter theory course. Thanks to Joe Abatti for the use of the Weeks Recording Studio to conduct measurements. A special thanks to Sam Drazin for his time and assistance in microphone selection and positioning; without his help I would not been able to gather the data necessary to evaluate the feasibility of a real-world implementation of the designed LMS algorithm.

Two-Sensor Adaptive Noise Cancellation

Загружено:

Сведения о документе

Исходное описание:

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Two-Sensor Adaptive Noise Cancellation

Загружено:

Авторское право:

Доступные форматы

1

Two-Sensor Adaptive Noise Cancellation

Primary Input d(n)

Reference Input u(n) Adaptive Filter

Error Signal e(n)

Fig. 1. Two-sensor adaptive noise canceller

Fig. 2. Minimum MSE vs. MA filter order

2048 samples 0.05 0.04

400 600 iterations Output =0.001

400 600 iterations Error =0.001

4096 samples 0.05 0.04 0.03

8192 samples 0.05

400 600 iterations

400 600 iterations

Fig. 5. LMS noise cancellation using sinusoidal input = 0.001

w(n + 1) = w(n) + 2 e(n)x(n)

(2) (3) (4)

e(n) = d(n) y(n)

w(n) = [w0 (n) w1 (n)w M 1 (n)]

x(n) = [x(n) x(n 1) x(n M + 1)]

400 600 iterations Output =0.1

400 600 iterations Error =0.1

Fig. 9. Improved separation configuration (speaker perspective side view)

Вам также может понравиться