Академический Документы
Профессиональный Документы
Культура Документы
Beamforming
Using Uniform Circular Arrays
for Distant Speech Recognition
in Reverberant Environment and
Double-Talk Scenarios
conducted at the
Signal Processing and Speech Communication Laboratory
Graz University of Technology, Austria
by
BSc Hannes Pessentheiner, 0573063
Supervisors:
Dipl.-Ing. Dr.sc.ETH Harald Romsdorfer
MSc Dr.techn. Tania Habib
Assessors/Examiners:
Univ.-Prof. Dipl.-Ing. Dr.techn. Gernot Kubin
Statutory Declaration
I declare that I have authored this thesis independently, that I have not used other than the
declared sources/resources, and that I have explicitly marked all material which has been quoted
either literally or by content from the used sources.
date
(signature)
Eidesstattliche Erkl
arung
Ich erklare an Eides statt, dass ich die vorliegende Arbeit selbststandig verfasst, andere als die
angegebenen Quellen/Hilfsmittel nicht benutzt, und die den benutzten Quellen wortlich und
inhaltlich entnommene Stellen als solche kenntlich gemacht habe.
Graz, am
(Unterschrift)
Acknowledgement
Beamforming
ii
Abstract
Beamforming is crucial for hands-free mobile terminals and voice-enabled automated home environments based on distant-speech interaction to mitigate causes of system degradation, e.g.,
interfering noise sources, room reverberation, closed-loop feedback problems, and competing
speakers. The objective of this thesis is to find the most common and state-of-the-art broadband beamformers which are able to attenuate or eliminate the competing speaker in case of
double-talk scenarios, and which are compatible with the uniform circular microphone array,
orif notto make them compatible. Moreover, a new beamformer for improved spatial filtering in reverberant environments is introduced. Another objective is to design a MATLAB
framework to simplify the implementation of different microphone array geometries and beamformers, and to evaluate their performances and the quality of their corresponding enhanced
output signals numerically and graphically by considering different objective measures, e.g., a
word recognizer based on a simple grammar and a limited dictionary that covers all words appearing in the CHiME-Corpus and audio signals used in this work. For the evaluation, speech
signals are played-back synchronously and separately by two loudspeakers in a reverberant environment, recorded by a uniform circular microphone array, and subsequently filtered by different
beamformers.
Zusammenfassung
Heutzutage ist Beamforming ein wichtiger Bestandteil im Bereich der Telekommunikation und
Sprachsteuerung, um Storeinfl
usse wie unerw
unschte Rauschquellen, konkurrierende Sprecher,
Nachhall oder R
uckkopplungsschleifen zu unterdr
ucken. Ziele dieser Arbeit sind das Finden von
Beamformern, die mit einem kreisformigen Mikrofon-Array kompatibel sind, das Anpassen von
nicht kompatiblen Beamformern, und die Entwicklung eines neuen Beamformers zur besseren
Unterdr
uckung des konkurrierenden Sprechers im Fall eines Double-TalkSzenarios. Ein weiteres Ziel dieser Arbeit ist die Erstellung einer Simulations- und Auswertungsumgebung in MATLAB zur einfachen Einbindung verschiedener Mikrofon-Array Geometrien und Beamformern,
und zur grafischen und numerischen Qualitatsbeurteilung von Beamformern und den von ihnen gefilterten Signalen. Neben den bekannten Beurteilungsmaen f
ur die gefilterten Signale
findet auch ein Spracherkenner Verwendung, welcher auf einer einfachen Grammatik basiert
und eine bestimmte Anzahl verschiedener Worter erkennt, die im CHiME-Korpus definiert und
in den verwendeten Audio-Signalen vorhanden sind. F
ur die Evaluierung der BeamformerPerformance und der Qualitat der gefilterten Signale wurden bestimmte Sprachsignale mittels
zweier Lautsprecher in einem halligen Raum ausgegeben, mit einem kreisformigen MikrofonArray aufgenommen und anschlieend mit den vorhandenen Beamformern gefiltert.
iii
Beamforming
iv
Beamforming
Contents
1 Introduction
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.3 Objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2 Sound Capture with Microphone Arrays
2.1 Capturing Sound . . . . . . . . . . . . . . . .
2.1.1 The Wave Propagation . . . . . . . . .
2.1.2 The Wavevector-Frequency Domain .
2.1.3 Beamforming with Microphone Arrays
2.2 Channel Mismatch . . . . . . . . . . . . . . .
2.3 Gain Self-Calibrating Algorithms . . . . . . .
2.4 Short-Time Stationarity of Speech . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
3 Microphone Arrays
3.1 Uniform Linear Array . . . . . . . . . . . . . . . . . . .
3.1.1 The Array Response . . . . . . . . . . . . . . . .
3.1.2 The Beam Pattern . . . . . . . . . . . . . . . . .
3.1.3 Spatial Aliasing and Grating Lobes due to Phase
3.1.4 Characteristics of the Uniform Linear Array . . .
3.2 Uniform Circular Array . . . . . . . . . . . . . . . . . .
3.2.1 The Array Response . . . . . . . . . . . . . . . .
3.2.2 The Beam Pattern . . . . . . . . . . . . . . . . .
3.2.3 Spatial Aliasing and Grating Lobes due to Phase
3.2.4 Characteristics of a Uniform Circular Array . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. . . . . . .
. . . . . . .
. . . . . . .
Ambiguities
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
Ambiguity .
. . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
5
5
6
8
9
12
13
14
.
.
.
.
.
.
.
.
.
.
16
16
16
17
21
23
29
29
29
31
32
3
3
4
4
37
37
37
39
39
39
41
41
43
45
45
45
47
47
47
Contents
4.7
49
49
49
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
53
53
53
54
54
54
55
55
55
56
56
57
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
58
58
59
59
59
59
59
59
60
60
60
61
62
62
7 Results
7.1 Beam Pattern Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.2 Enhanced Signal Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
64
64
66
75
75
76
80
88
89
93
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
C Abbreviations
101
D Symbols
103
Beamforming
Introduction
1.1 Introduction
During World War I scientists realized that using antenna arrays for secret wireless communication entails directional transmission of information under certain circumstances. Armed forces
exploited this knowledge in World War II to communicate with allies without sending their
information radially (in all directions). Consequently, opposing forces were not able to easily
intercept secret messages at any position in the immediate vicinity of the arrays. To improve
the quality of long distance communication, military intelligence services used these arrays for
message reception to eliminate disturbing interferences produced by opposing forces, natural
noise sources, or atmospheric disturbances.
Nowadays, the use of antenna arraysin general: sensor arraysis a fundamental and important technique to improve data transmission or data reception over long distances, e.g.,
interplanetary communication between satellites, radio astronomy, etc. In everyday life, sensor
arrays improve communication between, e.g., underwater research facilities and their corresponding submarine vehicles, or hand-held devices and mobile phone base stations. Especially,
in times of hands-free functionality of mobile devices, microphone arrays become more and more
relevant as the following scenario shows: a person is driving a car and wants to phone a friend
without using its hands; therefore, its necessary to use the mobile device in hands-free mode.
The mobile phone is generally fixed near the instrument panel or somewhere at the dashboard.
Phoning without any modifications in hardware or without any additional signal processing techniques for speech enhancement and noise reduction leads to distorted and noisy communication
because of noise and interferences produced by mechanical vibrations of the wheels and the engine of the car [1]. A way around the problem is to consider beamforming in order to attenuate
noise and interferencesboth arrive from all directionsand to set a focus on the car-driving
speaker. A prerequisite for using beamforming techniques is to equip the hand-held devices with
multiple microphones (omnidirectional or directional) and proper signal processing techniques,
e.g., source localization or source tracking, source separation, and beamforming algorithms.
1 Introduction
1.2 Motivation
In audio signal processing beamforming provides the ability to separate two or more sound
sources. It is used as an acoustic camera to focus desired and eliminate interfering sources.
The choice of the right beamformer depends on the operating environment or working area, e.g., a
reverberant conference room, home environments, etc. Beamforming eliminates causes of system
degradationinterfering noise sources, room reverberation, closed-loop feedback problems, and
competing speakersin case of full-duplex teleconferencing. It is fundamental for hands-free
mobile terminals [2] (see Fig. 1.1) and for voice-enabled automated home environments based
on distant speech interaction, where a distributed microphone network enables the monitoring of
speech activity within a room (see DIRHA1 ). Furthermore, the right choice of the corresponding
microphone array is as important as the right choice of the beamformer. The UCA2 increases the
performance of the beamformer when the source distance is not known, and it enables focusing
sources which are larger than the microphone array.
Microphone
Array
BF
AEC
&
NR
BF
AEC
NR
SC
SC
:
:
:
:
ERC
Beamforming
Acoustic Echo Cancellation
Noise Reduction
Speech Coding
EC
ERC
EC
SD
BWE
SD
:
:
:
:
BWE
Figure 1.1: This figure shows the speech signal processing in a hands-free mobile terminal. The boxed elements highlight the focus of this thesis.
1.3 Objective
The objective of this work is to find the most common, state-of-the-art broadband beamformers
which are compatible with the UCA, orif notto make them compatible, modify an existing,
or introduce a new beamformer for improved spatial filtering in reverberant environments and
double-talk scenarios. Another objective is to design a MATLAB framework to simplify the
implementation of different microphone array geometries and beamformers, and to evaluate
the beamformers performance and the quality of their corresponding enhanced output signals
numerically and graphically by considering different objective measures and a word recognizer
based on a simple grammar and a limited dictionary that covers all words appearing in the
CHiME-Corpus and audio signals used in this work. For the evaluation, specially composed
signals are played-back by loudspeakers in a reverberant environment, recorded by a UCA, and
subsequently filtered by different beamformers.
1
2
Beamforming
3
4
5
Uniform
Circular
Array
Uniform
Rectangular
Array
Figure 2.1: This figure shows four different types of microphone arrays. The UCA and the uniform rectangular array are planar arrays, whereas the uniform and non-uniform linear array are referred to
as line arrays.
1 2 p(x, y, z, t)
,
c2
t2
(2.1)
derived by Richard Feynman, describes a simple linear propagation model [4][5][6]. In this model
p is the instantaneous sound pressure fluctuation, and c is the sound velocity or the propagation
speed of sound. The sound pressure depends on three space variables (x, y, z) and one time
variable (t). In simple terms, p(r, t) represents the acoustic pressure field, where r = (x, y, z)T
describes the position of a microphone. The four-dimensional Fourier transform, in this case
the temporal (one-dimensional) and the spatial (three-dimensional) Fourier transform of p(r, t),
results in
#
! + ! +
! + "! +
T
it
kT r
p(r, t)e
dt e
dr =
p(r, t)ei(tk r) drdt
P (k, ) =
+ ! +
P (k, )ei(tk
T r)
dkd,
(2.2)
sin() cos()
kx
kx
kx
kx
2
2f
ky = ky = sin() sin() ,
k = k ky = ky =
c
c
c
kz
kz
kz
kz
cos()
which depends on the angular frequency , the angular wavenumber k (|k| = 2/ is the
magnitude), and the directional information (kx , ky , kz ) retrieved from the spherical coordinates.
The variables and represent the elevation and azimuth as shown in Fig. 2.2. In this work
the elevation is set to 90 . The wavevector describes the phase variation of a monochromatic
plane wave, and its components kx , ky and kz define the change of phase in the corresponding
direction. In simple words, the wavevector gives information about the direction of propagation
and the wavelength of the monochromatic plane wave. In real scenarios a microphone captures
an infinite number of monochromatic waves from all directions at each point of time, e.g.,
y
r
Figure 2.2: The coordinate system used in this work with its azimuth , its elevation , and its position-vector
r,e.g., a microphone position, and its coordinates (x,y,z).
j=0
where p(r0 , t) is the sound pressure field at position r0 , and s0 (r0 , t) is the captured signal with
a microphone on position r0 . In the theoretical part of this work s0 (r0 , t) is modeled as a single,
monochromatic plane wave. Thus, s0 (r0 , t) assumes an anechoic room with a single source:
T
1
An ()Un ()ei c ||rs rn || ,
||rs rn ||
where Dn (, rn , rs ) is the sound capture model of a microphone with index n, ||rs rn || is the
distance between the source and the microphone with index n, An () is the frequency response of
an amplifier and/or an ADC6 , Un () represents the microphone characteristics, ei() describes
the phase rotation due to the distance between the microphone and the source, R()7 is the
source signal in frequency domain, and Nn () is the noise modeled as a zero-mean Gaussian
random process in frequency domain. A near-field model is physically more precise than a farfield model, and it is valid for both fields, the near- and far-field, but it requires the source
direction and the distance between the source and the microphone array. It is better to use the
near-field model in case of numerical approximations [7] of the optimal beamformer coefficients.
6
7
1
An ()Un ()ei c ||rn || cos(s n ) ,
where = (, s , n )T , is the average distance between the source and each microphone, i.e.
the attenuation 1/ of the source signal captured by each microphone is the same. It is easier
to do calculations with the far-field model, and it is more suitable for determining the weighting
coefficients analytically. It depends on the source direction only.
(2.3)
where k0 defines the direction of propagation. According to [4] the wavevector-frequency representation of (2.3) is
P0 (k, ) = A0 (2)4 (k k0 ) ( 0 ),
which yields a single point in the wavevector-frequency space spanned by the spatial-frequency
and the temporal-frequency space due to the one- and three-dimensional Dirac-impulse. Fig. 2.3
depicts a wavevector-frequency space for three different scenarios. The distance between the
center of the coordinate system and a single point defines the wave length or frequency of the
wave, the brightness of a point describes the magnitude, and the position unveils information
about the direction of propagation.
An important Fourier property in temporal-frequency domain is the representation of a convolution in time domain as a multiplication in frequency domain; and so it is in the spatialfrequency domain. Thus, a convolution in spatio-temporal domain
! + ! +
y(r, t) =
h(r 1 , t 2 )p(1 , 2 )d1 d2
corresponds to
Y (k, ) = H(k, ) P (k, ),
in wavevector-frequency domain, where h(r, t) is the spatio-temporal impulse response, and
H(k, ) is its wavevector-frequency representation. This property enables filtering the acoustic scalar pressure field in wavevector-frequency domain, which is exploited by beamforming.
The following equation manipulates the frequency response of propagating waves from a given
A sinusoidal wave
with high frequency
propagates on the kx-axis.
A sinusoidal wave
with low frequency
propagates on the kx-axis.
An impulse
propagates on the kx-axis.
direction k0
H(k, ) = (k k0 ) G(),
where G() is the frequency response and k0 describes the direction of the propagating wave.
where N is the number of microphones, rn represents the position of the microphone with index
n, n is a microphone-position and steering-direction specific delay which yields constructive
overlapping for signals from the direction s , and y(t) is the mono-output of the beamformer.
In this work the signal captured by each microphone is modeled as a monochromatic plane wave:
T
N
(
A0 ei(0 tk0 rn ) .
n=1
This special case yields constructive overlapping only if all microphones are placed on a straight
line symmetrically around the x-axisrn = (0, yn , 0)T (see Fig. 2.4)by assuming that a
monochromatic plane wave propagates from s = 0 , i.e. k = (kx , 0, 0)T and kT rn = 0. A
change in direction of the propagating wave of about 45 without changing the steering direction s yields a constructive interference for just a few frequencies within a broadband spectrum.
Thus, the sum of all signal components of a periodic broadband signal does not overlap constructively9 for all directions except 0 and 180 . Lets consider the following example: Two
microphones (N = 2) are placed symmetrically around the x-axis. The distance between each
microphone is d = 0.05 m.
y(t) =
2
(
A0 ei(0 tk
Tr
n)
(2.4)
n=1
sin(s ) cos(s )
cos(s )
0
0
k0 = sin(s ) sin(s ) = sin(s ) ,
c
c
0
cos(s )
sin(n ) cos(n )
cos(n )
rn = rn sin(n ) sin(n ) = rn sin(n ) ,
0
cos(n )
where s = 90 is the elevation of the source and n = 90 is the elevation of a microphone with
index n. This work assumes an array where all microphones are placed on the xy-plane with
n = 90 . Multiplying both vectors in terms of a scalar product yields
k0T rn =
0
0
rn (cos(s ) cos(n ) + sin(s ) sin(n )) = rn cos(s n ),
c
c
2
(
n=1
A0 e
i(0 t+
0
r
c n
cos(s n ))
= A0 e
i0 t
2
(
n=1
e)i
- 0 d
.
0 d
= A0 ei0 t ei c 2 cos(s 1 ) + ei c 2 cos(s 2 )
0 d
c 2
cos(s n )
*+
Dn (0 ,s )
for rn = | d2 | = d2 (the distance rn between the center of the coordinate system and the
microphone with index n has to be positive), and Dn (, s ) is the sound capture model of a
microphone with index n. Assuming that 1 = +90 and 2 = 90 yields
- 0 d
.
- 0 d
.
0 d
0 d
0 d
i0 t 2
i c 2 sin(s )
i c0 d2 sin(s )
i0 t
= A0 e
e
+e
= A0 e
2 cos
sin(s ) .
2
c 2
9
In this work captured signals overlap constructively if all frequency-components of the superposition of these
signals exhibit constructive interference (perfect fit).
10
For an even number of microphones and an aperture mentioned above the outputthe sum of
all signalsis
/
0
N/2
(
0
d
y(t) = A0 ei0 t 2
cos
(2k 1) sin(s )) ,
c
2
k=1
and the output for an odd number of microphones with a microphone in the middle of the
coordinate system, the same microphone spacing, and the same array alignment as mentioned
before (see Fig. 2.4) is
y(t) = A0 ei0 t 2 1 + 2
(N 1)/2
cos
k=1
0
d
(2k 1) sin(s )) .
c
2
(2.5)
One can see that the amplitude of the output signal depends on the amplitude of the wave A0 ,
its frequency 0 , its direction of propagation s , the microphone spacing d, the sound velocity c,
and the number of microphones N . The main task of a DS-BF is to aligndelay or advance
the signals, so that signals from s and captured by the microphones at position rn overlap
constructively. This yields
N
N
(
1 (
T
i(0 [tn ]k0T rn )
i0 t 1
y(t) =
A0 e
= A0 e
ei(0 n +k0 rn ) .
N
N
n=1
(2.6)
n=1
All delays n in the exponent of (2.6) have to be determined in a way that both terms 0 n
and k0T rn cancel out each other for the steering direction s . Signals from different directions
overlap destructively10 . It is noteworthy that the exponent in (2.6) gets positive for negative
k0T rn and |k0T rn | > |0 n |, i.e. the system exhibits non-causal behaviour. An additional delay
T0 eliminates the non-causality [4], which results in
y(t) = A0 ei0 t
N
1 ( i(0 [n +T0 ]+k0T rn )
e
.
N
(2.7)
n=1
Microphone arrays provide the ability to increase the quality of the captured sound. In general,
microphone arrays are better than a single microphone, because they increase the SNR11 of
the captured signals. The more elements are used without changing the distance between the
elements, the better the array works at higher frequencies because of an increase of the grating
lobe frequency fgl . If the sensor spacing decreases, the spatial aliasing frequency fsa and the
grating lobe frequency fgl increases. If the array consists of fewer elements, it becomes sensitive
to noise, reverberation, and other interferences at higher frequencies.
The use of microphone arrays also introduces some problems which limit the performance:
inherent noise of the microphones,
deviations in the microphone frequency responses, i.e. manufacturing variations, and
deviations in the microphone positions.
These problemsthey introduce channel mismatches (see Section 2.2)require robust micro10
11
In this work captured signals overlap deconstructively if one or more frequency-components of the superposition
of these signals exhibit deconstructive interference.
SNR - Signal to Noise Ratio
11
x
Even number
of microphones
x
Odd number
of microphones
Figure 2.4: The left figure shows a ULA which consists of an even number of microphones, whereas the right
one consists of an odd number of microphones.
phone array signal processing algorithms, e.g. a MPDR-BF12 with proper loading level, i.e. a
standard MPDR-BF with additional constraints (see Section 4.4).
12
12
n
L
,
Lm,n
13
Energy Level
Source 1
DOA-Line
m=1
r1
k1
r2
m=2
DOA
d4
d3
d2
d1
Figure 2.5: Projection of the microphone positions on the DOA line (a) and linear interpolation of the
captured energy levels for gain self-calibration (b). The symbol di represents the distance between
the center of the coordinate system and the microphone projection on the DOA-line.
This distance is also necessary for delay computations (see Section 3.1.2 and Section 3.2.2).
According to [8], the level is interpolated as a straight line towards the DOA:
"
#
" #
a1 (n)
d
T
N
(
m=1
m ) Lm ) 2
(L(d
yields the MMSE17 -solution for the parameters a1 and a2 [8], where Lm is the RMS per frame for
each channel. The determination of the parameters and levels by considering the interpolation
method enables the compensation of the gain-mismatch.
0.55 m
= 0.0016 s = 1.6 ms.
343.2 m/s
Thus, stationarity of speech can be assumed within the whole array geometry. A sampling
17
14
frequency of fs = 48000 Hz and a block size of 256 samples results in a 0.0053-seconds time
resolution. In this case speech is stationary and quasi-periodic for at least three blocks. Timealignment in frequency domain and in case of block-processing is efficient only if the signal within
a block is (quasi-)periodic.
15
Beamforming
Microphone Arrays
3.1 Uniform Linear Array
Microphone arrays do not subject to any restrictions in geometry. Therere special types of
geometries which are more attractive than randomly positioned microphones because of ordinary
and easy-to-implement time-alignment functions. In case of a ULA all microphones are placed
on a straight line equidistantly (uniformly). It exhibits a front-back ambiguity because of its
linear geometry [7] as shown in Section 3.1.2.
Grating Lobe
Main Lobe
Main Lobe
Null
Null
Grating Lobe
Figure 3.1: Both surface plots show the array response of a ULA of two (a) and three (b) microphones with
a spacing of d = 0.05 m over all frequencies and angles.
16
y
Virtual
Reference
Point
Source 2
Source 1
s
k2
k1
Delay
Pattern
Delay
Pattern
Figure 3.2: A ULA, which consists of four microphones, receives signals from two different directions: 20
and 160 . Although both sources are positioned at different places, the delays 1,1 = 2,1 , 1,2 =
2,2 , and 1,3 = 2,3 are identical in both cases. The delay pattern unveils information about the
necessary delays for each microphone to obtain constructive overlapping for a certain direction.
17
3 Microphone Arrays
In general, the array reference point is in the center of the coordinate system and the ULA.
The delay is calculated as follows [4]18 : inserting k0T rn into the beamformers output (2.7) results
in
y(t) =
N
N
1 ( i(0 n +k0T rn )
1 ( i(0 [tn ]k0T rn )
e
= ei0 t
e
N
N
n=1
y(t) = ei0 t
n=1
1
N
N
(
0 n i
e)i
*+ , e)
n=1 Bn (0 )
0
r
c n
cos(s n )
*+
Dn (0 ,s )
in frequency domain, and the beam patternit is the sum of multiplied array response and
beamformer kernelsfor the steering direction s is
H(, , s ) =
N
N
1 ( i rn cos(n ) i rn cos(s n )
1 (
e c
=
e c
Bn () Dn (),
N
N
n=1
n=1
N
1 ( i rn (cos(s n )cos(n ))
H(, , s ) =
e c
,
N
(3.1)
n=1
rn cos( n )
.
c
(3.2)
where c is the sound velocity, rn is the distance between the virtual reference point and the
microphones, n is the microphone angle, s is the steering direction and the desired source
angle, and is the delay compensating angle, which should be the source angle for a perfect
capture of signals from this direction. Fig. 3.3a depicts a 4-element ULA with steering direction
s = 25 . The virtual mapping lineit passes the virtual reference point and is perpendicular
to s splits the 2-dimensional plane into two half planes, whereas the waves captured by the
microphones in the right plane experience a delay in time (i.e. a causal system behaviour), and
all others experience an advance in time (i.e. a non-causal system behaviour) due to the use
of a beamformer. Waves from the competing source direction c do not experience any delays
that lead to constructive overlapping; but there are constructive interferences for a countable
number of signal components of a broadband signal due to a perfect match of the captured signalcomponents phase, their wavelengths, and the beamformers delays. The delays are equivalent
to 2-phase-rotations or inter multiples of it. Fig. 3.3b illustrates a virtual mapping line for
waves from c = 0 . In case of a continuous array it unveils information about the alignment for
each sensor. Constructive overlapping occurs if the virtual mapping line of c is parallel to the
virtual mapping line of s . Because of a lack of derivations of the implementation mentioned
above, the closed-form solution of the beam pattern is derived in this work. It is crucial to
determine an analytical description of the distance between the virtual reference point and each
18
18
Virtual
Mapping
Line
Virtual
Mapping
Line
of s
delay
delay
advance
x
advance
Competing Source
Direction c
Delays due to
the steering direction,
the sound velocity,
and the microphone position.
Delay
Pattern
Virtual
Mapping
Line
of c
Figure 3.3: The left figure shows a delay pattern for signals from s captured by all microphones. The right
figure shows the virtual mapping line of both sources, the desired and competing source.
microphone.
rn = d
"
#
N 1
n , n = {0, 1, 2, ..., N/2 1}, N = 2k, k N
2
(3.3)
N/21
where pos = 90 and neg = 90 are the microphone angles. This yields
N/21
1 ( i d( N 1 n)(sin(s )sin())
2
H(, , s ) =
e c
N
n=0
N/21
1 ( i d( N 1 n)(sin(s )sin())
2
e c
.
N
n=0
In comparison to (3.1) the beam pattern equation is now split into two parts, because (3.3)
has to be positive. If index n goes until N 1, rn becomes negative for n > N/2 1, which
compensates the sign generated by the expression cos(s neg ) cos( neg ). Further
19
3 Microphone Arrays
N/21
N/21
(
(
N
1
N
1
1 i
H() =
e 2
ein + ei 2
ein ,
N
n=0
(3.4)
n=0
where
d (sin(s ) sin()) .
c
n=0
1x2
x =
,
1x
n
which yields a closed-form expression that describes the beam pattern in terms of a Dirichlet
kernel [10]
1
3
4
3
42
i N
i N
2
2
N 1
N 1
1
1
e
1
e
H() =
ei 2
+ ei 2
N
1 ei
1 ei
1 N
2
1
2
N
N
N
1
1
ei 2 1
ei 2 1
ei 2 1
ei 2 1
=
=
N ei 2 ei 2
N ei 2 ei 2
ei 2 ei 2
ei 2 ei 2
1 N
2
1 N
2
N
N
1 ei 2 1 ei 2 + 1
1 2i ei 2 ei 2
=
=
N
N 2i
ei 2 ei 2
ei 2 ei 2
- 2
.
N d(sin(s )sin())
5N 6
sin
2
1 sin 2
1
56 =
H() =
- 2
.
d(sin(
s )sin())
N sin 2
N sin
2
(3.5)
for an even and odd number of microphones. In case of an odd number of microphones, the
derivation is different to the previous derivation, i.e.
"
#
N 1
rn = d
n , n = {0, 1, 2, ..., (N 1)/2}, N = 2k + 1, k N
2
and
1
H(, , s ) =
N
+
1
N
(N 1)/2
ei c d(
N 1
n
2
n=0
(N 1)/2
ei c d(
N 1
n
2
n=0
which yields
1
H() =
N
(1) + e
i N 21
N +1
1 ei 2
1 ei
+e
i N 21
N +1
1 ei 2
1 ei
42
5 6
1 sin N2
5 6 ,
= ... =
N sin 2
otherwise the captured waves from the microphone in the center of the coordinate system are
20
considered twice. The first null in the beam pattern can be calculated by setting the argument
of sin( N2 ) to . If the arguments of both sine-functions are identical, a grating lobea lobe
with the same maximum gain or a higher gain as the main lobeoccurs in the beam pattern.
If the argument of the sine-function in the numerator is 2 and not equal to the argument of
the sine-function in the denominator, a side lobea lobe with a gain smaller than the gain of
the main lobeoccurs.
60
0.8
0.6
150
30
0.4
0.2
180
210
330
240
300
270
1716 Hz
3432 Hz
5148 Hz
6864 Hz
Figure 3.4: Array response for different frequencies of a ULA, which consists of two microphones, and which
exhibits a looking direction of 0 .
21
3 Microphone Arrays
Dn (, s ) = ei c dcos(s n ) ,
(3.6)
where n {1, 2} is the microphone index, f is the observed frequency, n is microphone angle,
and s is the steering direction. This model describes the captured monochromatic plane waves
by multiplying the models frequency response with the Fourier transform of r(t), i.e. R1 () =
R()D1 (), where r(t) is the source signal. Now, lets have a focus on the exponent of the
capturing model, which contains information about the phase of the received waves.
2
d cos(s n ) =
d cos(s n )
c
(3.7)
Equation (3.7) considers the distance between both microphones, which is an essential quantity
for determining the grating lobe frequency fgl , which is depicted in Fig. 3.5.
Spatial aliasing and grating lobes occur because the waves captured with both microphones
exhibit the same frequency and the same phase which results in constructive overlapping after
summarizing both waves. The frequency of the grating lobe can be calculated as follows:
!
d cos(s 90 ) = m gl =
cm
,
fgl
(3.8)
where m describes the number of the grating lobe and m N. Rewriting (3.8) yields
fgl =
cm
d cos(s 90 )
cm
.
d cos(s n )
(3.9)
The result changes slightly if we consider a DS-BF. The system response for the microphone
with index n is
Hn (, ) = ei c rn (cos(s n )cos(n )) ,
y Impinging
Plane
Wave
Source
1 Period
Figure 3.5: In this figure a grating lobe occurs for a certain frequency and its integer multiple by considering
a two-element microphone array with distance d between both microphones.
22
which consists of the sound capture model Dn (, s ) = ei c rn cos(s n ) and the beamformer
cm
.
d [cos(s n ) cos( n )]
(3.10)
Both equations, (3.9) and (3.10), are independent of the number of microphones in case of
equidistantly distributed microphones, but they depend on the microphone spacing. Thus,
using additional microphones with the same sensor interval does not change the frequency of
the maximum point of the grating lobes, but a change in microphone spacing will do so.
23
3 Microphone Arrays
Main Lobe
Main Lobe
Side Lobe
Null
Null
Figure 3.6: The surface plots show the array response for all frequencies and angles. Computations are
based on a ULA consisting of two (a) and three (b) microphones with a microphone spacing of
d = 0.05m.
Main Lobe
Grating Lobe
Grating Lobe
Side Lobe
Main Lobe
(a) One-dimensional plot of a two-element array. (b) One-dimensional plot of a three-element array.
Figure 3.7: The one-dimensional plots show the array response for all angles but a certain number of frequencies. Computations are based on a ULA consisting of two (a) and three (b) microphones
with a microphone spacing of d = 0.05m.
24
Grating Lobe
Main Lobe
Lobe
Side
Main Lobe
Grating Lobe
Figure 3.8: The polar plots show the directivity pattern. Computations are based on a ULA consisting of
two (a) and three (b) microphones with a microphone spacing of d = 0.05m.
(3.11)
where N is the number of microphones. Equation (3.11) is true for different equidistant sensor
intervals because perfect overlapping has to occur in all microphone-paircombinations which
depends on the number of microphones.
25
3 Microphone Arrays
10
10
15
15
Gain (dB)
Gain (dB)
20
25
20
25
30
30
35
35
40
40
4000 Hz
5000 Hz
6000 Hz
7000 Hz
45
50
150
100
50
50
100
4000 Hz
5000 Hz
6000 Hz
7000 Hz
45
50
150
150
100
50
150
90
90
1
120
1
60
120
60
0.8
0.8
0.6
0.6
150
30
150
30
0.4
0.4
0.2
0.2
180
210
330
300
270
100
240
50
180
210
330
4000 Hz
5000 Hz
6000 Hz
7000 Hz
240
300
270
4000 Hz
5000 Hz
6000 Hz
7000 Hz
Figure 3.9: The surface plots (a-b) show the array response for all frequencies and angles.
The
one-dimensional plots (c-d) show the array response for the given frequencies f =
{4000, 5000, 6000, 7000} Hz, and so are the polar plots (e-f ). Computations are based on a ULA
consisting of four (a,c,e) and five (b,d,f ) microphones. The distance between all microphones is
d = 0.05 m.
26
10
10
15
15
Gain (dB)
Gain (dB)
20
25
20
25
30
30
35
35
40
40
4000 Hz
5000 Hz
6000 Hz
7000 Hz
45
50
150
100
50
50
100
4000 Hz
5000 Hz
6000 Hz
7000 Hz
45
50
150
150
100
50
150
90
90
1
120
1
60
120
60
0.8
0.8
0.6
0.6
150
30
150
30
0.4
0.4
0.2
0.2
180
210
330
300
270
100
240
50
180
210
330
4000 Hz
5000 Hz
6000 Hz
7000 Hz
240
300
270
4000 Hz
5000 Hz
6000 Hz
7000 Hz
Figure 3.10: The surface plots (a-b) show the array response for all frequencies and angles. The
one-dimensional plots (c-d) show the array response for the given frequencies f =
{4000, 5000, 6000, 7000} Hz, and so are the polar plots (e-f ). Computations are based on a
ULA consisting of four (a,c,e) and five (b,d,f ) microphones. The distance between all microphones is d = 0.025 m.
February 28, 2012
27
3 Microphone Arrays
10
10
15
15
Gain (dB)
Gain (dB)
20
25
20
25
30
30
35
35
40
40
4000 Hz
5000 Hz
6000 Hz
7000 Hz
45
50
150
100
50
50
100
4000 Hz
5000 Hz
6000 Hz
7000 Hz
45
50
150
150
100
50
150
90
90
1
120
1
60
120
60
0.8
0.8
0.6
0.6
150
30
150
30
0.4
0.4
0.2
0.2
180
210
330
300
270
100
240
50
180
210
330
4000 Hz
5000 Hz
6000 Hz
7000 Hz
240
300
270
4000 Hz
5000 Hz
6000 Hz
7000 Hz
Figure 3.11: This figure shows the beam pattern for all angles and (all) frequencies. Computations are based
on a ULA consisting of four (a,c,e) and five (b,d,f ) microphones. The distance between all
microphones is d = 0.025 m andin comparison to the previous figuresthe steering direction
is set to s = 90 which implies the use of a beamformer (here: DS-BF).
28
(3.12)
n=1
where
rn = r
and
n =
2n 180
n =
2(n 1) 180
, n = {1, 2, ..., 2N }, N N.
2N
or
The virtual mapping line splits the 2-dimensional plane into two half planes, whereas the
monochromatic plane waves captured by the microphones in the source-including half plane
experience a delay in time (i.e. a causal system behaviour), and all others experience an advance
in time (i.e. a non-causal system behaviour) due to the use of a beamformer. A close look at
Fig. 3.13 and a focus on the half-plane closest to the desired source with steering direction
s reveals that all microphones within this area sample the sound field on certain positions.
Summarizing all captured signals without manipulating the phase doesnt lead to constructive
29
3 Microphone Arrays
Figure 3.12: The surface plots show the array response for all frequencies and angles. Computations are
based on a UCA consisting of four (a) and eight (b) microphones with a diameter of d = 0.20m.
overlapping. The beamformer has to shift the signals in time, so that all signal components
from the steering direction s overlap constructively. The time-shift depends on the speed of
sound c, the constant radius r, the microphone positions rn , and the steering direction s . The
frequency independent delay is modeled as
n =
r
cos( n ) .
c
All microphones in the remaining half-plane capture the impinging waves, and the beamformer
advances all waves from the direction s in time. Advancing indicates non-causality, which
actually evokes no problems because of block-processing and short-time stationarity of speech.
Due to delaying in the right half-plane and advancing in the left one, signal components from
direction s overlap constructively, and signal components from direction c are advanced in
the left half-plane and delayed in the right one. Fig. 3.14 show the corresponding delay pattern
for monochromatic plane waves from the steering direction s and from the competing source
direction c , which is shifted 180 degrees compared to the steering direction s .
The beamformer shifts the captured waves from s on the virtual mapping line, where they
exhibit the same phase and, thus, overlap constructively. Waves from other directions experience
the same shifts; but they are mapped on an ellipsoid instead of a straight line which is necessary
for constructive overlapping.
30
ay
del
lay
Source
Virtual Mapping
Line
Impinging
Plane Wave
passing
Microphone
{2,3,1,4,8,5,7,6}
in this order
Delay
Pattern
nc
e
Impinging
Plane Wave
passing
Microphone
{2,1,3,8,4,7,5,6}
in this order
va
ad
ad
de
Source
ce
van
Delay
Pattern
Virtual Mapping
Line
Figure 3.13: Structure of a UCA with radius r and 8 microphones. The bold diagonal line is the virtual
mapping line, which is the reference line for delay computations.
Steering
Direction s
a
del
del
nce
adv
ay
adv
nce
c
Virtual
Ellipsoidal
Delay Line
Competing Source
Direction c
Delay
Pattern
Virtual
Ellipsoidal
Delay Line
Delay
Pattern
Figure 3.14: This figure shows a delay pattern for signals captured by all microphones of a competing, undesired source.
or
rn (cos(s n ) cos( n )) = m , n
31
3 Microphone Arrays
All possible
microphone-pair
combinations
with m N.
32
Main Lobe
Side Lobe
Side Lobe
10
15
Gain (dB)
20
25
30
35
40
45
50
150
100
1
120
60
0.8
Side Lobe
0.6
150
30
0.4
0.2
Main Lobe
Side Lobe
150
90
100
50
50
2000 Hz
7000 Hz
180
210
330
240
300
270
2000 Hz
7000 Hz
Figure 3.16: The surface plots (a,b) show the beam pattern for all frequencies and angles. The onedimensional plots show the beam pattern for the given frequencies f = {2000, 7000} Hz, and
so are the polar plots (e-f ). Computations are based on a UCA consisting of eight (a,c,e) and
twelve (b,d,f ) microphones. The diameter of the UCA is d = 0.55 m.
February 28, 2012
33
3 Microphone Arrays
10
10
15
15
Gain (dB)
Gain (dB)
20
25
20
25
30
30
35
35
40
40
45
50
2000 Hz
7000 Hz
150
100
50
50
100
45
50
150
2000 Hz
7000 Hz
150
100
50
50
100
150
90
90
1
120
1
60
120
60
0.8
0.8
0.6
0.6
150
30
150
30
0.4
0.4
0.2
0.2
180
210
330
240
300
270
180
210
2000 Hz
7000 Hz
330
240
300
270
2000 Hz
7000 Hz
Figure 3.17: The surface plots (a,b) show the beam pattern for all frequencies and angles. The onedimensional plots show the beam pattern for the given frequencies f = {2000, 7000} Hz, and so
are the polar plots (e-f ). Computations are based on a UCA consisting of 16 (a,c,e) and 24
(b,d,f ) microphones. The diameter of the UCA is d = 0.55 m.
34
10
10
15
15
Gain (dB)
Gain (dB)
20
25
20
25
30
30
35
35
40
40
45
50
2000 Hz
7000 Hz
150
100
50
50
100
45
50
150
2000 Hz
7000 Hz
150
100
50
50
100
150
90
90
1
120
1
60
120
60
0.8
0.8
0.6
0.6
150
30
150
30
0.4
0.4
0.2
0.2
180
210
330
240
300
270
180
210
2000 Hz
7000 Hz
330
240
300
270
2000 Hz
7000 Hz
Figure 3.18: The surface plots (a,b) show the beam pattern for all frequencies and angles. The onedimensional plots show the beam pattern for the given frequencies f = {2000, 7000} Hz, and so
are the polar plots (e-f ). Computations are based on a UCA consisting of 16 (a,c,e) and 24
(b,d,f ) microphones. The diameter of the UCA is d = 0.20 m.
February 28, 2012
35
3 Microphone Arrays
10
10
15
15
Gain (dB)
Gain (dB)
20
25
20
25
30
30
35
35
40
40
45
2000 Hz
7000 Hz
50
150
100
50
50
100
45
2000 Hz
7000 Hz
50
150
150
100
50
50
100
150
90
90
1
120
1
60
120
60
0.8
0.8
0.6
0.6
150
30
150
30
0.4
0.4
0.2
0.2
180
210
330
240
300
270
180
210
2000 Hz
7000 Hz
330
240
300
270
2000 Hz
7000 Hz
Figure 3.19: In this figure, computations are based on a UCA consisting of 16 (a,c,e) and 24 (b,d,f ) microphones. The diameter of the UCA is d = 0.20 m andin comparison to the previous
figuresthe steering direction is set to 11.25 .
36
Beamforming
4.1 Beamforming
Beamforming is a special technique in signal processing which enables source localization, source
separation, signal de-reverberation, etc. It refers to designing a spatio-temporal filter [4], i.e. it
manipulates signals in temporal and spatial domain.
Captured signals contain signal components from different sourcesa desired source and competing sources. Temporal filtering only doesnt work well in case of eliminating the competing
sources, because the desired and competing sources may occupy the same frequency bands, and
filtering the affected bands leads to a loss of the desired-sourceinformation. A beamformer
extracts a desired signal from a specific direction from a reverberant environment influenced by
interfering sources. It combines the signals captured by the microphones in a way that signals
from a certain direction experience constructive overlapping while others experience destructive
interference. The beamformer modifies the directionality of the array.
The use of beamforming algorithms depends on the characteristics of the environment. For
instance, a conference room may exhibit a strong/weak presence of reverberation, it may exhibit
interfering sources such as a video projector, a (CPU) fan, an air conditioner, etc. On the one
hand a DS-BF is able to filter out stationary, non-coherent noise signals, but on the other hand
its performance decreases if there is a lot of reverberation, and if the noise signals or interferences
are coherent.
19
37
rn cos( n )
.
c
(4.1)
It describes the delay generated by the beamformer for the waves captured by the microphone
with index n. These delays steer the beamformer into the direction of the desired source in case
of = s . In time-domain signal processing, they have to be integer multiples of the sampling
period. If the delay is fractional, it can be rounded to a delay closest to the fractional delay;
but this causes little changes in the beam pattern. Another way to allow fractional delays is to
increase the sampling rate of the ADC, or by considering up-sampling and interpolation, timeshifting and down-sampling during signal processing. Nevertheless, this method needs much
more resources than rounding.
A better way to consider fractional delays is doing these temporal shifts in frequency domain.
In case of block-processing, a DFT or FFT transforms the time-domain signal into frequency
domain. An arbitrary phase shift in frequency domain leads to a change of the phase spectrum. A
transformation back into time domain yields a signal with modified amplitudes that correspond
to the amplitudes of the fractional shifted signal. In this case, the steering delay quantization
error depends only on the amplitude resolution of the system, which is generally high enough
for a 16-bit quantizer and higher. Fig. 4.1 shows the steering delay quantization with a low
amplitude resolution. A small phase shift in frequency domain does not lead to a time shift in
time domain because of a bad amplitude resolution.
x[n]
Quantization Interval
n
after phase shift
original and resulting am
plitude after phase shift in
frequency domain due to
bad amplitude resolution
and rounding to the near
est quantization step
Figure 4.1: This figure shows the steering delay quantization with a low amplitude resolution and rounding
to the nearest quantization step. A phase shift in frequency domain does not lead to any shift in
time domain.
38
Wn () = ei c 2 cos(n )
(4.2)
(4.3)
n=1
for the whole array, where is the angular frequency, N is the number of microphones, c is the
speed of sound, d is the array diameter, n is the angle of the microphone with index n, and
is an arbitrary angle which should be the direction of the desired source. In the following
pseudo code N is the number of beams, Nb is the number of frequencies, Nm is the number of
microphones, and n is the microphone angle vector.
Algorithm 1 Delay&Sum Beamformer
1: for j = 1 : N do
2:
for k = 1 : Nf do
W (j, :, k) =
4:
end for
5: end for
3:
1 i
Nm e
2f (k) d
c
2
cos((j)n )
- W is a (N Nm Nf )-matrix.
39
Source
Source
4 Element
Uniform
Circular
Array
4 Element
Uniform
Circular
Array
z+4
z-2
z+3
z-1
Output Signal
of Array
Delay&Sum
Beamformer
1/N
Output Signal
of Beamformer
Figure 4.2: Left: Array processing without a beamformer. The output signal consists of the non-delayed
captured signals; there is no constructive interference. Right: Array Processing with a DS-BF.
The output signal consists of a single pulse due to the compensation of the relative delays.
e-i01
e-im1
c(0)
c(m)
e-iK-11
c(K-1)
X1[0].e-i01
X1[0]
K
Sample
Buffer
K-Point X1[m]
DFT
X1[K-1]
Windowing
Y[m]
Y[K-1]
Sum
e-iK-1N
c(K-1)
XN[0].e-i0N
XN[0]
XN[m].e-imN
K-Point XN[m]
DFT
XN[K-1]
Windowing
c(K-1)
Y[0]
e-i0N
e-imN
c(0)
K
Sample
Buffer
c(m)
X1[K-1].e-iK-11
Phase Shifts
c(m)
xN[n]
c(0)
X1[m].e-im1
x1[n]
XN[K-1].e-iK-1N
Phase Shifts
40
K
Sample
Buffer
K-Point
IDFT
Windowing
1/N
Scaling
4.4 Minimum Power Distortionless Response Beamformer with Loading Level and Sample Matrix Inversion
(4.4)
where x() = (x1 (), x2 (), .., xN ())T is the input vector, d() = (d1 (), d2 (), .., dN ())T is
the capturing or steering vector, s() is the desired signal and n() = (n1 (), n2 (), .., nN ())T
is the noise vector, and all vectors exhibit the dimension (N 1) where N is the number of
microphones. The output signal is
y() = wH ()x()
(4.5)
where ()H stands for the Hermitian transpose. The main target is to output the desired signal
only without any influence of noise and other interferences; that is
!
(4.7)
subject to wH ()d() = 1 .
(4.8)
w()
20
41
which is
J(w, ) = wH ()Rxx ()w() + (wH ()d() 1) + ( [dH ()w() 1])
(4.9)
or
7 5
68
J(w, ) = wH ()Rxx ()w() + 2Re wH ()d() 1
(4.10)
The complex gradient of (4.9) with respect to wH () and assuming w() as a constant leads
to
1
wH () J(w, ) = Rxx ()w() + d() = 0 / Rxx
()
1
()d()
0 = w() + Rxx
and
1
w() = Rxx
() () d()
(4.11)
(4.12)
w() =
1 ()d()
Rxx
.
1
()d()
dH ()Rxx
(4.13)
(4.14)
(4.15)
w()
which leads to
wDL () =
(4.16)
where T is the quadratic norm threshold, is the loading level, and I is the identity matrix.
According to [15] and [16], the increase in robustness of the MVDR-BF and MPDR-BF is a
trade-off between the suppression of the side lobes and the ability to cancel interferences and
attenuate noise. If the loading level is zero, the beamformer behaves as an ordinary but
sensitive MPDR-BF. If = , the beamformer behaves as a DS-BF[17]. The use of Newtons
method may lead to the optimal loading level parameters, but this requires the knowledge of all
imbalances.
42
4.4 Minimum Power Distortionless Response Beamformer with Loading Level and Sample Matrix Inversion
(4.17)
1
()wH () T ,
subject to wH ()d() = 1, wH ()Rxx
(4.18)
w()
which leads to variable loading levels for the eigenvalues of Rxx () and the weighting coefficients
5
6
1 () 1 d()
Rxx () + Rxx
wV L () =
5
61
1
dH () Rxx () + Rxx
()
d()
(4.19)
Rxx () =
x[n]x[n]H
K
(4.20)
n=0
where x[n]x[n]H yields an (N N )-matrix for each time step n, and K scales the sum of these
matrices. The estimator considers a rectangular window function. The estimated matrix can
be ill-conditioned or inaccurately estimated because of a lack of training data, silent signals,
or non-stationary interferences. Diagonal or variable loading with proper loading level values
eliminate the problem of bad-conditioned matrices.
In this work the coefficients are calculated in frequency domain in two different ways. The
first implementation considers diagonal loading according to
wDL () =
(4.21)
where = x()H x() 103 , and the second implementation computes its coefficients according
to
5
6
1 () 1 d()
Rxx () + Rxx
wV L () =
5
61
1
dH () Rxx () + Rxx
()
d()
(4.22)
where = 102 [15] (Note: The values of and depend on the signals (speech or music) and
other conditions (e.g., single- or double-talk)). The spectrum of each captured signal frame is
weighted with the corresponding coefficients (see Fig. 4.4) after calculating the coefficients. The
pseudo code of the diagonal and variable loading algorithms is shown in Algorithm 2 and 3. In
the following pseudo code N is the number of beams, Nb is the number of frequencies, Nm is
the number of microphones, and n is the microphone position vector.
43
1:
5:
6:
7:
8:
9:
2f (k) d
d(:) = ei c 2 cos((j)n )
Rxx = Rxx + (1 ) n()nH () + I
1
xx d
W (j, :, k) = dHRR
1
xx d
end for
end for
- d is a (Nm 1)-steering-vector.
- Rxx is the covariance matrix.
- W is a (N Nm Nf )-matrix.
2f (k) d
d(:) = ei c 2 cos((j)n )
Rxx = Rxx + (1 ) n()nH () + /I
1 1
(Rxx +Rxx
) d
W (j, :, k) = H
1 1
d (Rxx +Rxx ) d
end for
end for
w1(m)
c(m)
w1(K-1)
c(K-1)
X1[0].w1(0)
X1[0]
K-Point X1[m]
DFT
X1[K-1]
wN(m)
Y[m]
Y[K-1]
Sum
wN(K-1)
c(K-1)
wN(0)
c(m)
XN[0].wN(0)
XN[0]
K-Point XN[m]
DFT
XN[K-1]
Windowing
Y[0]
Weighting Coefficients
c(0)
K
Sample
Buffer
c(m)
X1[K-1].w1(K-1)
c(K-1)
Windowing
xN[n]
c(0)
X1[m].w1(m)
K
Sample
Buffer
- W is a (N Nm Nf )-matrix.
w1(0)
c(0)
x1[n]
- d is a (Nm 1)-steering-vector.
- Rxx is the covariance matrix.
XN[m].wN(m)
XN[K-1].wN(K-1)
Weighting Coefficients
44
K
Sample
Buffer
K-Point
IDFT
Windowing
N
(
wn (f )ei
2f d
c 2
cos(n )
(4.23)
n=1
or in vector notation
) = G(f )w(f )
B(f
(4.24)
where w(f )T = (w1 (f ), w2 (f ), ..., wN (f )) is the coefficient vector and G(f ) is a matrix containing
2f d
M N elements according to Gm,n = ei c 2 cos(m n ) . A simple LS-solution is obtained by
minimizing
)(22
arg min (G(f )w(f ) B(f
(4.25)
subject to wH ()d() = 1
(4.26)
w(f )
is the
(4.27)
i=1
)=B
and
The RLSFI-BF assumes the same desired response for all frequencies, i.e. B(f
)(22
arg min (G(f )w(f ) B(f
(4.28)
w(f )
subject to
21
(4.29)
45
where the first constraint describes the white noise gain bounded by a lower bound . The
lower bound is a parameter which enables controlling the robustness of the beamformer [19]. If
the gain is smaller than one, it amplifies the spatially white noise. In case of a super-directive
beamformer the white noise gain is smaller than 103 at lower frequencies, and thats the reason
why the RLSFI-BF is sensitive to white noise. The unconstrained least-squares problem (4.28)
and both constraints (4.29) have to span a convex set in the Euclidean space. All points within
this set can be joined with a straight line without leaving this set; a cube or a circle exhibit
this property (see Fig. 4.5a). If a line-segment is outside of this set, its defined as a non-convex
set (see Fig. 4.5b), and convex optimization methods are not able to determine the optimal
coefficients. According to [21] (4.28) is a convex function because of its quadratic L2 -norm. The
constraints are convex too; equation (4.29a) describes an Euclidean ball, whereas the elements
of (4.29b) lie in a hyper-plane. If the unconstrained least-squares problem (4.28) and both
constraints (4.29) exhibit convexity, convex optimization algorithms are able to approximate
the optimal solutions, e.g., by using the modeling system for disciplined convex programming
cvx22 which is efficient in case of constrained norm minimization [21]. The pseudo code of the
beam-design is shown in Algorithm 4. In the following pseudo code N is the number of beams,
Nb is the number of frequencies, Nm is the number of microphones, and n is the microphone
position vector.
Convex Set
Non-Convex Set
Figure 4.5: In case of a non-convex set there are points which are not connectable with each other point
without leaving the set, whereas in case of a convex set every point is connectable with each
other point within the set.
2: B(l) = 1/ 2
3: for k = 1 : Nf do
4:
for j = 1 : N do
1:
5:
6:
7:
2f (k) d
G(j, :, k) = ei c 2 cos((j))
end for 2f (k)
d
d(:, k) = ei c 2 cos((j)n )
- Far-Field model.
8:
9:
10:
11:
12:
13:
14:
15:
16:
22
cvx: http://cvxr.com/cvx/
46
4.6 Multiple Null Synthesis Robust Least Squares Frequency Invariant Beamformer
(4.30)
w(f )
subject to
(4.31)
and
subject to wH ()V () = 0
(4.32)
where
V = [v1 , v2 , ..., vS ]
(4.33)
is a matrix which consists of vectors v that describe the sound capture model of the competing
sources, and S is the number of nulls. Again, convex optimization algorithms determine the
weighting coefficients, as shown in Section 4.5. The pseudo code of the new beam-design is
shown in Algorithm 5. In the following pseudo code N is the number of beams, Nb is the
number of frequencies, Nm is the number of microphones, and n is the microphone position
vector.
23
MNS-RLSFI-BF - Multiple Null Synthesis Robust Least Squares Frequency Invariant Beamformer
47
Algorithm 5 Multiple Null Synthesis Robust Least Squares Frequency Invariant Beamformer
[, l] = min(abs(
s ))
2: B(l) = 1/ 2
1:
3:
for k = 1 : Nf do
5:
for j = 1 : N do
4:
6:
7:
G(j, :, k) = ei
end for
8:
9:
10:
11:
12:
13:
d(:, k) = ei
2f (k) d
c
2
2f (k) d
c
2
cos((j))
cos((j)n )
2f (k)
i c d2
cos((j)n )
v1 (:, k) = e
...
2f (k) d
vS (:, k) = ei c 2 cos((j)n )
V = [v1 , v2 , ..., vS ]
- Far-Field model.
14:
15:
16:
17:
18:
19:
20:
21:
22:
23:
48
1 1
1
... 1
1 w
w2
... w2K1
F =
1 w2
w4
... w2(2K1)
2K
... ...
...
... ...
2K1
2(2K1))
(2K1)(2K1)
1 w
w
... w
24
25
49
yF BF (nK K)
y
F BF (nK K + 1)
yF BF (nK K + 2)
yF BF = ...
.
yF BF (nK)
...
yF BF (nK + K 1)
Consequently, the DFT of yF BF is a (2K 2K)-diagonal-matrix
YF BF = diag {F yF BF } .
(4.34)
(2K2K)(2K2K)(2K2K)(2K1)
where is the fixed step-size parameter, and SYF BF ,YF BF (n, k) is the power estimate of the
fixed-beamformer output of the k-th frequency bin with the forgetting factor
SYF BF ,YF BF (n, k) = SYF BF ,YF BF (n 1, k) + (1 )|YF BF (n, k)|2
with 0 k 2K 1 and YF BF (n, k) as the magnitude of the k-th frequency bin of YF BF (n).
The vector EB,m is the result of the DFT of the error signal
EB,m (n) = F eB,m (n),
(4.35)
(2K2K)(2K2K)(2K2K)(2K1)
where c describes the time lag because of block processing, xm (n) = (0, xm (nK), ..., xm (nK + K 1))T
is the input-data vector of channel m, and v = diag {(0, 1)} is a matrix which eliminates circular
convolution effects, i.e. the first block or the first K samples are discarded and the second block
is stored.
50
(4.36)
where J = diag {(+1, 1, +1, 1, ..., 1)} is a (2L 2L)-matrix and realizes a circular shift of L
samples in frequency domain. This input matrix is fundamental for calculating the coefficients
of the AIC according to
Am (n + 1) = Am (n) +
H
(n) EA (n)
G (n) XA,m
)
*+
,
(4.37)
(2K2K)(2K2K)(2K2K)(2K1)
where
>.?
1
1
SX
(n,
0),
...,
S
(n,
2K
1)
,
XA ,XA
A ,XA
where is the fixed step-size parameter, and SXA ,XA (n, k) is the power estimate of the signal
defined in (4.36) output of the k-th frequency bin with the forgetting factor
SXA ,XA (n, k) = SXA ,XA (n 1, k) + (1 )
N
(
m=1
The output signal yAIC (n)necessary for the computation of EA (n)is the result of
1 N
2
(
1
yAIC (n) = F
XA,m (n) Am (n) .
m=1
This work does not provide any pseudo code of the GSC because of its extensive source code.
See framework function beamdesignGSC.m for more details.
51
x1[n]
K-Buffer
&
Windowing
(Parallelizer)
FBF
K
x1
K-Point
DFT
W1(k)
X1
Y1
xN[n]
(Parallelizer)
xN
K-Point
DFT
IDFT
/
y
Windowing
&
2K-Buffer
WN(k)
XN
2K
YN
K-Buffer
&
Windowing
yFBF
YFBF
2K
x1[n]
xN[n]
BN(k)
YB,N
Delay to achieve
causality in
case of
block processing
IDFT
EB,N
DFT
YB,1
IDFT
yB,N[n]
z-c
z
EB,1
DFT
yB,1[n]
-c
ABM
B1(k)
2KPoint
DFT YFBF 2K
e1[n]
eN[n]
DFT
2K
(Serializer)
EB,1
z-L
z-L
XA,N
AN(k)
Delay to
achieve
causality
in case of z-d
block
processing
XA,1
XA,N
XA,1
A1(k)
2K YA
2K EA
52
2K-Buffer
2K
EB,N
AIC
DFT
IDFT
DFT
yAIC[n]
eA[n]
Enhanced
Signal
eA[n]
Beamforming
1
4
|U (f, s , s )|2
@ @ 2
2
0 0 |U (f, , )| dd
fmax
0
1
4
|U (f, s , s )|2
df,
@ @ 2
2
0 0 |U (f, , )| dd
(5.1)
(5.2)
where U (f, , ) is the directivity of the array for a certain frequency f , azimuth , and elevation
. The higher the directivity index the higher the increase in SNR in case of noisy listening
situations, which results, e.g., in an improved speech recognition ability. The higher the DI the
higher the ability in attenuating a competing speaker and the higher the increase in SNR in case
of noisy listening situations which results, e.g., in an improved speech recognition ability.
53
54
aTe Roo ae
,
aTo Roo ao
where ae is the LPC-vector of the enhanced signal, ao is the LPC-vector of the original signal,
and Roo is the autocorrelation matrix of the original signal. The lower the LLR the better the
speech quality of the enhanced signal.
s[n
+
kN
]|
n=1
k=1
where K is the number of frames which are part of a segment, N is the number of samples of
a frame, s[n] is the noise-free speech signal, and s[n] is the enhanced signal. The higher the
measure the better the attenuation of noise and interferences during pauses.
where M is the number of frequency bands, K is the number of frames which are part of a
segment, W (m, k) are the weights according to [29], So (m, k) and Se (m, k) are the spectral slopes
of the original and the enhanced signal for the j-th frequency band at frame m. The magnitude
of each weight reflects whether the band is near a spectral peak or valley, and whether the peak
55
is the largest in the whole spectrum. The lower the WSS the better the speech quality of the
enhanced signal.
Scale:
Description
very natural, no degradation
fairly natural, little degradation
somewhat natural, somewhat degraded
fairly unnatural, fairly degraded
very unnatural, very degraded
56
C-OVRL Scale:
Rating Description
5
excellent
4
good
3
fair
2
poor
1
bad
26
27
28
HTK-Receipt: http://www.keithv.com/software/htk/
Triphone is the abbreviation of three phonemes.
Synvo: http://www.synvo.com/
57
Beamforming
~ 1.43 m
Window
Sill
Column
Table
PC-Fan
230
120
Periodical Rack
Uniform
Circular
Array
300
~ 1.37 m
0
Door
Shelf
~ 5.56 m
~ 5.72 m
~ 5.99 m
45
~ 0.50 m
H: 3.13 m
~
H: 2.66 m
r=2.00 m
~ 4.88 m
~ 2.50 m
Figure 6.1: This figure shows the recording environmentthe cocktail party roomat the Signal Processing
and Speech Communication Laboratory Graz.
58
6.2.2 Microphones
The Behringer Measurement Microphones ECM8000 are used for measuring the room and channel impulse response and for the recording. It is a precise electret condenser measurement
microphone; it exhibits an ultra-linear frequency response and a well-balanced, true omnidirectional pattern. The deviations in the frequency response and the omnidirectional pattern are
shown in Fig. 6.2(a).
Magnitude (dB)
Transfer Function
Magnitude (dB)
Transfer Function
Log-Frequency (Hz)
Log-Frequency (Hz)
Figure 6.2: This figure shows the the microphone characteristics and the loudspeaker frequency response.
59
range, a bandwidth ranging from 10 Hz until 200 kHz, and extremely low-noise and distortionless
circuits.
The SM Pro Audio SM PR8E is a multi-channel preamplifier system for studio applications.
It exhibits eight independent preamplifiers, a flat frequency response between 20 and 20 000
Hzthe absolute deviations are between 0 and 0.5 dB, and high-quality components, which
yield a high quality in audio processing.
The RME Fireface 800 is a 24-bit eight-channel FireWire audio interfaces with a sampling
frequency of 96 kHz.
6.3 Recording
6.3.1 Setup
The recording took place in a confined space: the cocktail party room with a closed window and
a closed door. The heating system regulated the temperature to exactly 20.6 . The array with
its 24 microphonesonly two different diameters were considered (d=0.20m and d=0.55m, see
Fig. 6.4 (a) and (d))was placed in the middle of the room, and the loudspeakers were placed
around the array at 0 , 45 , 120 , and 300 at a distance of 2 m relative to the center of the
array. The loudspeaker in the upper-left part of Fig. 6.1 represents an interfering source in the
form of a PC-fan. The UCA features a height of 1.175 m, and the center of the loudspeaker
exhibits a height of 1.30 m (see Fig. 6.4 (b) and (c)). Section 6.2 lists the equipment used in
this setup. Summarized, the recording required the following equipment:
Equipment:
Array
Microphones
Loudspeakers
Calibrator
Audio-Interface
ADC/DAC
PC
01
24
02
01
01
02
01
01
x
x
x
x
x
x
x
x
6.3.2 Calibration
Each microphone including its corresponding channel was calibrated with the Cirrus CR: 511E.
The calibrator, fixed on a microphone, generated a 1000 Hz oscillation with a SPL of 94 dB,
which was recorded for 15 seconds. The subsequent computation of the RMS value of the
recorded signal considered the signal between 4 and 12 seconds only. The computation of the
compensation-gains n is as follows:
9
: m+M
:1 (
(n)
xRM S = ;
x(n) [m]2
M m
(n)
(n)
60
6.3 Recording
GainReference Microphone
1
0.9
Gain
0.8
0.7
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
Microphone Number
Figure 6.3: This figure shows the compensation-gains for each channel. The compensation gains scale each
channel so that each recorded signal exhibits the same RMS-values.
where n is the channel number, x[n] represents the recorded signal, and P is the power of the
recorded signal. The channel, which exhibits the highest power Pref,dB , is the reference channel
for the computation of the compensation-gains n :
(n)
(n)
(n)
P
dB
10
The compensation gains n (see Fig. 6.3) scale each channel so that each recorded signal exhibits
the same RMS-values:
1
x(n)
x(n) [n]
new [n] =
n
The reference channel is not affected. During the 30-minute calibration the air temperature
remained constant at 20.6 .
The loudspeakers were calibrated relatively to each other. A microphonethe same for both
loudspeakersrecorded a test signal 20 cm in front of the center of the loudspeaker membrane.
Again, the power differences yielded the compensation-gains.
29
CHiME-Corpus: http://spandh.dcs.shef.ac.uk/projects/chime/research_corpus.html
61
verb:
colour:
preposition:
letter:
digit:
coda:
{bin|lay|place|set},
{blue|green|red|white},
{at|by|in|with},
{a|b|c|...|x|y|z},
{zero|one|two|...|seven|eight|nine}, and
{again|now|please|soon}.
30
31
32
28 samples
sine-window
OLA32 with 50 % overlapping
343.57 m/s
48000 Hz
16000 Hz
01
90
ideal
ideal omnidirectional
ideal and lossless far-field to avoid requirement of source distance
MATLAB: http://www.mathworks.de/products/matlab/index.html
PD: http://puredata.info/
OLA - Overlap and Add
62
Source 2: 45
Source 1: 0
d=0.20 m
1.30 m
d=0.30 m
1.20 m
d=0.40 m
0.075 m
d=0.55 m
UCA: 24 Mics
UCA: 24 Mics
1.175 m
Figure 6.4: This figure shows the microphone array with a diameter of d=0.55m (a,b,c) and d=0.20m (d),
the loudspeakers (a-b), and the calibrator (c) used in this work.
63
Beamforming
Results
64
The DS-BF exhibits the highest 3dB-BW for lower frequencies before 2000 Hz (d=0.20m,
mics=08), 3000 Hz (d=0.20m, mics=12), and 4000 Hz (d=0.20m, mics=24), and the lowest
for higher frequencies. According to Fig. A.1 the 3dB-BW of the DS-BF exhibits the same
progression for a constant diameter but for a different number of microphones.
In comparison to the DS-BF, both, the RLSFI- and the MNS-RLSFI-BF, exhibit a smaller
3dB-BW for lower frequencies and almost the same 3dB-BWthere are only small deviations
at higher frequencies.
Main-to-Side-Lobe Ratio
The higher the MSR, the higher the attenuation over all angles outside of the main lobe. According to Fig. A.2, the larger the diameter of the UCA the smaller the MSR over all frequencies,
and the higher the number of microphones the higher the MSR.
The DS-BF does not exhibit a MSR at very low frequencies because of a missing side lobe
there is only a main lobe. That implies a higher spatial aliasing frequency in comparison to
the RLSFI- and the MNS-RLSFI-BF. The DS-BF features high MSRs above 12000 Hz and low
MSRs below that frequency. In comparison to the other data-independent beamformers the
DS-BF exhibits a poorer MSR-progression the higher the number of microphones.
The RLSFI-BF features the best MSR-progressions. The smaller the number of microphones
the better the MSR in comparison to the DS- and the MNS-RLSFI-BF. In general, its progressions are similar to the progressions of the MNS-RLSFI-BF for frequencies above 1000 Hz and
identical for frequencies below that frequency, because the MNS-RLSFI-BF is a hybrid model
which features a RLSFI-BF for lower frequencies.
Directivity Index
The higher the DI the higher the ability in attenuating a competing speaker and the higher the
increase in SNR in case of noisy listening situations which results, e.g., in an improved speech
recognition ability. According to Fig. A.3, an increase in the number of microphones leads
to higher DI-values, whereas an increase in diameter yields a smaller frequency range which
exhibits the highest DI-values.
The DS-BF exhibits good DIs for a low number of microphones. It approaches the indices of
the other beamformers at higher frequencies. The higher the number of microphones the faster
the approach. In comparison to the other beamformers the DS-BF features the smallest DIs at
lower frequencies.
The RLSFI-BF exhibits the best overall directivity-index-progression. The MNS-RLSFI-BF
features similar progressions for a high number of microphones, but the worst DIs for a low
number of microphones due to the bad performance of the optimization algorithms for a lower
number of microphones.
2D Beam Pattern
Fig. A.4 and A.5 show the beam patterns for different beamformers, numbers of microphones,
and diameters, which reflect the results mentioned before. In general, one can see that an increase in diameter yields a smaller main lobe and smaller side lobes, a lower spatial aliasing
frequency and a lower attenuation over all angles and frequencies. An increase in microphones
results in a later decrease of attenuation of the side lobes and a higher attenuation over all angles
65
7 Results
and frequencies.
In comparison to the DS-BF the RLSFI-BF features side lobes with higher attenuation at
frequencies until 8000 Hz. The MNS-RLSFI-BF exhibits a beam pattern similar to the RLSFIBF, but with an additional area exhibiting a very high attenuation due to the null-placement.
The higher the number of microphones the more angles are affected from this area. A low
number of microphones and a large diameter causes highly-attenuated frequency bands (see
Fig. A.5e) due to the choice of parameters used for the optimization and a lower number of
microphones.
3D Beam Pattern
The three dimensional beam pattern shown in Fig. A.6 and A.7generally ignored in scientific
papersbecomes attractive in case of reverberant environments, because it enables an estimation
of the influence of reflections from different elevation angles, e.g., reflections from the floor or
the ceiling, which may facilitate the decision to place a projector above or beside the array. In
Fig. A.6a the DS-BF exhibits a widespread maine-lobe whereas the MNS-RLSFI-BF in Fig.
A.6e features a flat main lobe at different elevation angles, which results in a higher attenuation
of reflections impinging from different elevation angles. In case of positioning a projector above
a UCA, the MNS-RLSFI-BF is a good choice because it exhibits a higher attenuation of signals
with frequency f = 2000 Hz from = 0 than the DS-BF (compare Fig. A.6a and e). One
way to improve the behaviour of the RLSFI- and the MNS-RLSFI-BF is to introduce additional
constraints for different elevation angles.
33
34
66
The two remaining measures appearing in these tablesthe improvement in the global signal
to interference plus noise ratio (iGSINR) and a gain measure (GAIN)are not discussed in this
work because of their low significance in the evaluation.
Synthetic Data
The synthetic scenarios,
1st Scenario: Double-Talk (Speaker 0335 and 12 @ MRA = {0 , 45 }) and
2nd Scenario: Double-Talk (Speaker 03 and 12 @ MRA = {300 , 120 }),
exhibit an ideal array aperture, no deviations in microphone positions, no mismatches in microphones and loudspeakers, no room impulse responses, and a constant and well-known sound
velocity c = 343 m/s. The synthetic audio signals match the audio signals played-back by the
loudspeakers in the real scenarios, except that they features ideal time-shifts and attenuations
between the loudspeakers and the microphones and no influence of the loudspeaker characteristics.
In the 1st scenario both speakers are close together. The numerical results match the graphical evaluation in Section 7.1 and the data-independent and data-dependent beamformers theoretical behaviour mentioned in Chapter 4. The MNS-RLSFI-BF exhibits the highest word
recognition rate of the data-independent (Sp.1 = 86.66 % and Sp.2 = 81.66 %), the GSC of the
data-dependent beamformers (Sp.1 = 83.33 % and Sp.2 = 79.17 %). Moreover, both exhibit
the highest difference between the word recognition rate of the enhanced signal and the word
recognition rate of the signal captured by the nearest microphone: the MNS-RLSFI-BF achieves
Sp.1 = 36.66 % and Sp.2 = 35.84 %, the GSC Sp.1 = 33.33 % and Sp.2 = 31.67 %. The dataindependent beamformers exhibit a better performance than the data-dependent beamformers.
According to the word recognition rates the MNS-RLSFI-BF is better than the RLSFI-BF and
the DS-BF (in this order) which corresponds to the theory. Among the word recognition rates,
the MNS-RLSFI-BF always achieves the highest Cbak for both speakers, but it does not achieve
the highest PESQ and Corvl in both cases due to a different pronunciation of the words. The
highest PESQ is achieved with an array diameter of d=0.55 m, the highest Csig with d=0.20
m, the highest Cbak with d=0.20 m, and the highest Corvl with d=0.20 m. Both, the word
recognition rates and the other objective measures, exhibit a high correlation. See Tab. 7.1 for
more details.
In the 2nd scenario both speakers are talking face-to-face. Again, the numerical results match
the graphical evaluation in Section 7.1 and the data-independent and data-dependent beamformers theoretical behaviour mentioned in Chapter 4. The MNS-RLSFI-BF exhibits the highest
word recognition rate of the data-independent (Sp.1 = 89.17 % and Sp.2 = 91.66 %), the GSC of
the data-dependent beamformers (Sp.1 = 78.33 % and Sp.2 = 71.67 %). The MNS-RLSFI-BF
achieves the highest difference between the word recognition rate of the enhanced signal and
the word recognition rate of the signal captured by the nearest microphone: the MNS-RLSFIBF achieves Sp.1 = 37.50 % and Sp.2 = 45.00 %, the GSC Sp.1 = 27.50 % and Sp.2 = 25.84
%. The data-independent beamformers exhibit a better performance than the data-dependent
beamformers. According to the word recognition rates the MNS-RLSFI-BF is better than the
RLSFI-BF and the DS-BF, but both, the RLSFI- and the DS-BF exhibit the same performance
due to the positions of both speaker: a smaller beamwidththe main advantage of the RLSFIBFdoes not matter, if both speakers are talking face-to-face. Among the word recognition
35
The number of the speaker corresponds to the number of the speaker in the CHiME-database.
67
7 Results
rates, the MNS-RLSFI-BF achieves the highest PESQ and composite measures. The highest
PESQ is achieved with an array diameter of d=0.55 m, the highest Csig with d=0.55 m, the
highest Cbak with d=0.20 m, and the highest Corvl with d=0.55 m. Both, the word recognition
rates and the other objective measures, exhibit a high correlation in case of data-independent
and data-dependent beamformers. See Tab. 7.2 for more details.
Summarizing and comparing the results of both double-talk scenarios with synthetic data
yield the following hypothesis:
the best data-independent beamformer is the MNS-RLSFI-BF,
the best data-dependent beamformer is the GSC,
the numerical results match the beamformers theoretical behaviour,
medium correlation between the word recognition rate and all other obj. measures for DI-BF36
and side-by-side scenarios,
high correlation between the word recognition rate and all other obj. measures for DI-BF37
and face-to-face scenarios,
medium correlation between the word recognition rate and all other obj. measures for DDBF38 and side-by-side scenarios,
high correlation between the word recognition rate and all other obj. measures for DD-BF39
and face-to-face scenarios,
a small diameter yields the best perceptual results when both speakers are close together,
a large diameter yields the best perceptual results when both speakers talk face-to-face.
36
37
38
39
68
PESQ
Csig
Cbak
Covrl
85.00 %
50.83 %
34.17 %
24/20
x
x
86.66 %
50.00 %
36.66 %
12/55
76.66 %
50.83 %
25.84 %
08/20
MPDRDL
38.33 %
50.00 %
-11.65 %
12/55
DD
MPDRVL
x
x
x
83.33 %
50.00 %
33.33 %
24/55
GSC
x
6
PESQ
Csig
Cbak
Covrl
77.50 %
46.67 %
30.83 %
24/20
81.66 %
45.83 %
35.84 %
08/55
MNS
x
6
80.83 %
46.67 %
34.16 %
08/20
MPDRDL
x
6
Speaker 2 (s = 45 )
36.66 %
48.33 %
-11.66 %
08/55
DD
MPDRVL
85.00 %
52.50 %
32.50 %
(24,55)
DS
85.00 %
52.50 %
32.50 %
(24,55)
DI
RLSFI
x
x
x
x
89.17 %
52.50 %
36.67 %
(24,55)
MNS
x
6
35.00 %
52.50 %
-17.50 %
(12,55)
MPDRDL
Speaker 1 (s = 300 )
32.50 %
52.50 %
-20.00 %
(12,55)
DD
MPDRVL
x
x
x
x
78.33 %
58.83 %
27.50 %
08/20
GSC
x
6
87.50 %
45.83 %
41.66 %
(24,55)
DS
87.50 %
45.83 %
41.66 %
(24,55)
DI
RLSFI
x
x
x
x
91.66 %
48.33 %
43.33 %
(08,20)
MNS
x
6
37.50 %
45.83 %
-08.33 %
(24,55)
MPDRDL
Speaker 2 (s = 120 )
25.00 %
45.83 %
-20.83 %
(08,55)
DD
MPDRVL
Table 7.2: This table summarizes the numerical results of the double-talk scenario 2 based on synthetic data in Chapter B (Appendix).
Highest WRe-Rate
WRn-Rate
Improvement
Setup
Best BF
Setups exh. Improvements
Highest
Highest
Highest
Highest
78.33 %
48.33 %
30.00 %
12/55
DS
DI
RLSFI
Table 7.1: This table summarizes the numerical results of the double-talk scenario 1 based on synthetic data in Chapter B (Appendix).
84.17 %
50.00 %
34.17 %
24/55
MNS
x
6
Speaker 1 (s = 00 )
Highest
Highest
Highest
Highest
Highest WRe-Rate
WRn-Rate
Improvement
Setup
Best BF
Setups exh. Improvements
DS
DI
RLSFI
x
x
x
x
71.67 %
45.83 %
25.84 %
08/55
GSC
x
6
x
x
x
79.17 %
48.33 %
30.84 %
12/55
GSC
69
7 Results
70
Summarizing and comparing the results of both single-talk scenarios with real data yield the
following hypothesis:
each data-independent beamformer is suitable for the single-speaker scenario,
the best data-dependent beamformer is the MPDRDL-BF,
the numerical results match the beamformers theoretical behaviour in case of DI-BF,
high correlation between the word recognition rate and all other obj. measures for DI-BF for
side-by-side and face-to-face scenarios,
no correlation between the word recognition rate and all other obj. measures for DD-BF for
side-by-side and face-to-face scenarios,
a small diameter yields the best perceptual results when both speakers are close together,
a small diameter yields the best perceptual results when both speakers talk face-to-face.
71
7 Results
highest objective measures in both directions. The highest PESQ is achieved with an array
diameter of d=0.20 m, the highest Csig with d=0.20 m, the highest Cbak with d=0.20 m, and
the highest Corvl with d=0.20 m. Both, the word recognition rates and the other objective
measures, exhibit a high correlation in case of data-independent, and a high correlation in case
of data-dependent beamformers. See Tab. 7.6 for more details.
Summarizing and comparing the results of both double-talk scenarios with real data yield the
following hypothesis:
the best data-independent beamformer is the DS-BF,
the best data-dependent beamformer is the GSC,
the numerical results match the beamformers theoretical behaviour in case of DI-BF,
high correlation between the word recognition rate and all other obj. measures for DI-BF for
side-by-side and face-to-face scenarios ,
high correlation between the word recognition rate and all other obj. measures for DD-BF
for face-to-face scenarios,
no correlation between the word recognition rate and all other obj. measures for DD-BF for
side-by-side scenarios,
a small diameter yields the best perceptual results when both speakers are close together,
a small diameter yields the best perceptual results when both speakers talk face-to-face.
72
PESQ
Csig
Cbak
Covrl
x
x
x
x
95.83 %
91.67 %
04.16 %
24/20
x
x
x
x
95.83 %
91.67 %
04.16 %
24/20
95.83 %
91.67 %
04.16 %
08/20
MPDRDL
x
3
x
x
94.17 %
93.33 %
00.84 %
24/55
DD
MPDRVL
91.67 %
91.67 %
00.00 %
08/20
GSC
Highest
Highest
Highest
Highest
PESQ
Csig
Cbak
Covrl
x
x
x
x
89.17 %
79.17 %
10.00 %
24/55
x
x
x
x
89.17 %
79.17 %
10.00 %
24/55
MNS
x
6
83.33 %
81.67 %
01.66 %
12/55
MPDRDL
x
5
Speaker 2 (s = 45 )
x
x
x
x
73.33 %
77.50 %
-04.17 %
08/20
DD
MPDRVL
x
x
x
x
68.33 %
60.83 %
07.50 %
24/55
DS
x
5
64.17 %
60.83 %
03.34 %
08/55
DI
RLSFI
62.50 %
55.00 %
07.50 %
12/20
MNS
64.17 %
60.83 %
03.34 %
08/55
MPDRDL
x
4
Speaker 1 (s = 00 )
56.67 %
55.00 %
01.67 %
24/20
DD
MPDRVL
x
x
64.17 %
55.00 %
09.17 %
24/20
GSC
x
x
x
x
50.00 %
33.33 %
16.67 %
24/55
DS
x
6
41.67 %
33.33 %
08.34 %
08/55
DI
RLSFI
40.00 %
33.33 %
06.67 %
24/20
MNS
38.33 %
30.83 %
07.50 %
12/55
MPDRDL
x
4
Speaker 2 (s = 45 )
25.83 %
30.83 %
-05.00 %
12/55
DD
MPDRVL
Table 7.4: This table summarizes the numerical results of the double-talk scenario 1 based on real data in Chapter B (Appendix).
Best BF
Setups exh. Improvements
Highest WRe-Rate
WRn-Rate
Improvement
Setup
x
x
x
x
89.17 %
79.17 %
10.00 %
24/55
DS
x
6
DI
RLSFI
x
6
Table 7.3: This table summarizes the numerical results of the single-talk scenario 1 based on real data in Chapter B (Appendix).
x
x
x
x
95.83 %
91.67 %
04.16 %
24/20
MNS
x
6
Speaker 1 (s = 00 )
Highest
Highest
Highest
Highest
Highest WRe-Rate
WRn-Rate
Improvement
Setup
Best BF
Setups exh. Improvements
DS
x
6
DI
RLSFI
x
6
37.50 %
33.33 %
04.17 %
08/55
GSC
77.50 %
77.50 %
00.00 %
08/20
GSC
73
74
PESQ
Csig
Cbak
Covrl
x
x
x
x
95.83 %
93.33 %
02.50 %
24/20
x
x
x
x
95.83 %
93.33 %
02.50 %
24/20
93.33 %
93.33 %
00.00 %
08/55
MPDRDL
x
1
92.50 %
91.67 %
00.83 %
08/20
DD
MPDRVL
x
x
89.17 %
95.00 %
-05.83 %
12/55
GSC
Highest
Highest
Highest
Highest
PESQ
Csig
Cbak
Covrl
x
x
x
x
88.33 %
79.17 %
09.16 %
24/55
x
x
x
x
88.33 %
79.17 %
09.16 %
24/55
MNS
x
4
83.33 %
79.17 %
04.16 %
24/55
MPDRDL
x
3
Speaker 2 (s = 120 )
x
x
x
x
74.17 %
74.17 %
00.00 %
08/20
DD
MPDRVL
x
x
x
x
66.67 %
50.83 %
15.84 %
24/55
DS
x
6
61.67 %
50.83 %
10.84 %
08/20
DI
RLSFI
61.67 %
50.83 %
10.84 %
08/20
MNS
48.33 %
50.83 %
-02.50 %
12/55
MPDRDL
Speaker 1 (s = 300 )
52.50 %
50.83 %
01.67 %
24/55
DD
MPDRVL
x
x
x
x
57.50 %
50.83 %
06.67 %
08/20
GSC
x
5
x
x
x
x
49.17 %
35.83 %
13.34 %
24/55
DS
x
5
36.67 %
33.33 %
03.34 %
08/55
DI
RLSFI
35.83 %
33.33 %
02.50 %
08/55
MNS
24.17 %
30.83 %
-06.66 %
08/20
MPDRDL
Speaker 2 (s = 120 )
23.33 %
35.83 %
-12.50 %
12/55
DD
MPDRVL
Table 7.6: This table summarizes the numerical results of the double-talk scenario 2 based on real data in Chapter B (Appendix).
Best BF
Setups exh. Improvements
Highest WRe-Rate
WRn-Rate
Improvement
Setup
x
x
x
x
88.33 %
79.17 %
09.16 %
24/55
DS
x
4
DI
RLSFI
x
4
Table 7.5: This table summarizes the numerical results of the single-talk scenario 2 based on real data in Chapter B (Appendix).
x
x
x
x
95.83 %
93.33 %
02.50 %
24/20
MNS
x
3
Speaker 1 (s = 300 )
Highest
Highest
Highest
Highest
Highest WRe-Rate
WRn-Rate
Improvement
Setup
Best BF
Setups exh. Improvements
DS
x
3
DI
RLSFI
x
3
x
x
x
x
34.17 %
30.83 %
03.34 %
08/20
GSC
x
2
75.83 %
74.17 %
01.66 %
08/20
GSC
7 Results
Beamforming
8.1 Conclusion
In double-talk scenarios with simulated wave-propagation, free-field conditions, and perfect
sound capturing without any deviations in microphone positions, loudspeaker characteristics,
and microphone characteristics, the beam pattern and numerical evaluations confirm the dataindependent beamformers theoretical behaviour and highlight the advantages of the new established MNS-RLSFI-BF, which achieves the highest word recognition rates when both speakers
are talking face-to-face or side-by-side. The differences between the word recognition rats of the
DS-BF and the RLSFI-BF are smaller than the differences between the DS-BF and the MNSRLSFI-BF or the RLSFI-BF and the MNS-RLSFI-BF, which underlines the high performance of
the MNS-RLSFI-BF. A high correlation between the word recognition rates and the remaining
objective measuresthe PESQ, C-SIG, C-BAK, and C-OVRLin case of face-to-face scenarios
and a medium correlation in case of side-by-side scenarios show that a high speech quality of the
enhanced signal corresponds to a high word recognition rate. The MNS-RLSFI-BF achieves the
highest C-BAK in side-by-side and face-to-face scenarios and the highest PESQ and composite
measures (C-SIG, C-BAK, and C-OVRL) in face-to-face scenarios.
In case of data-dependent beamformers the GSC yields the highest word recognition rates
for side-by-side and face-to-face scenarios. It is noteworthy that the GSC is the most complex
and CPU-intensive beamformer in this work. The MPDRVL-BF always exhibits lower word
recognition rates than the nearest microphone for both double-talk scenarios, and so is the
MPDRDL-BF for face-to-face scenarios. A medium and high correlation between the word
recognition rates and the remaining objective measures in case of side-by-side and face-to-face
scenarios show that a high speech quality of the enhanced signal corresponds to a high word
recognition rate. The GSC achieves the highest C-SIG and C-BAK in side-by-side scenarios and
the highest PESQ and composite measures in face-to-face scenarios.
Additionally, a small array diameter yields the best perceptual results when both speakers are
close together, whereas a large diameter yields the best perceptual results when both speakers
talk face-to-face.
In real double-talk scenarios with minimal deviations in temperature (0.5 ), small deviations
in loudspeaker characteristics, microphone characteristics, and microphone positions (a non-ideal
array aperture), the DS-BF always achieves the highest word recognition rates among the data-
75
independent beamformers due to the mismatches and deviations which affect the performance of
the RLSFI-BF and the MNS-RLSFI-BF. The differences between the word recognition rates of all
data-independent beamformers are larger than in case of the synthetic scenario which points out
the robustness of the DS-BF and the performance loss of the RLSFI-BF and the MNS-RLSFIBF due to the mismatches and deviations, which matches the super-directive beamformers
theoretical behaviour in case of real scenarios. There is a high correlation between the word
recognition rates and the remaining objective measures in case of side-by-side and face-to-face
scenarios. Thus, a high speech quality of the enhanced signal corresponds to a high word
recognition rate. The DS-BF achieves the best PESQ and composite measures in side-by-side
and face-to-face scenarios.
In case of data-dependent beamformers the MPDRDL-BF yields the highest word recognition
rates in side-by-side, the GSC in face-to-face scenarios. There is a high correlation between the
word recognition rates and the remaining objective measures in case of face-to-face scenarios,
and no correlation in case of side-by-side scenarios. The GSC achieves the highest PESQ and
composite measures in case of side-by-side and face-to-face scenarios.
Additionally, a small array diameter yields the best perceptual results when both speakers
talk side-by-side or face-to-face.
What this all amounts to is that data-dependent beamformers exhibit a lower performance
and a higher sensitivity to mismatches and deviations than super-directive data-independent
beamformers. A comparison between the results of the synthetic and real double-talk scenarios show that reverb, mismatches, and deviations in microphone characteristics, loudspeaker
characteristics, and microphone positions yield a decrease in the absolute word recognition rate
between 10 % and 60 %.
76
Beamforming
Bibliography
[1] X. Zhang and J. Hansen, Csa-bf: A constrained switched adaptive beamformer for speech
enhancement and recognition in real car environments, Speech and Audio Processing, IEEE
Transactions, vol. 11, no. 6, pp. 733745, November 2003.
[2] P. Vary and R. Martin, Digital Speech Transmission.
2006.
[3] J. Benesty, J. Chen, and Y. Huang, Microphone Array Signal Processing, ser. Springer
Topics in Signal Processing. Springer, January 2008, vol. 1.
[4] J. Benesty, Y. Huang, and M. Sondhi, Springer Handbook of Speech Processing. Springer,
December 2007.
[5] F. Zotter and H. Pomberger, Acoustic holography and holophony, Lecture Notes, Institute
of Electronic Music and Acoustics, Inffeldgasse 10/3, 8010 Graz, Austria, October 2011.
[6] H. Teutsch, Wavefield decomposition using microphone arrays and its application to acoustic scene analysis, Ph.D. dissertation, Technische Fakultat der Friedrich-Alexander Universitat Erlangen-N
urnberg, Erlangen-N
urnberg, Germany, October 2005.
[7] I. Tashev, Sound Capture and Processing: Practical Approaches.
Inc., July 2009.
[8] , International conference for multimedia and expo icme 2004, Taipei, Taiwan, June
2004.
[9] M. Ehsan and G. Kubin, Frame change ratio: A measure to model short-time stationarity
of speech, in Innovations in Information Technology, 2006, November 2006, pp. 15.
[10] G. Wei, Discrete singular convolution for beam analysis, Engineering Structures, vol. 23,
pp. 10451053, January 2001.
[11] J. Fung, Literature Survey EE 381K Multi-Dimensional Signal Processing - Effects of
Steering Delay Quantization in Beamforming, 10000 Burnet Road, Austin, Texas 78758,
Applied Research Laboratories, The University of Texas at Austin.
[12] J. Li and S. P., Robust Adaptive Beamforming (1st edition).
October 2005.
77
Bibliography
[16] J. Li, P. Stoica, and Z. Wang, On Robust Capon Beamforming and Diagonal Loading,
Signal Processing, IEEE Transactions, vol. 51, no. 7, pp. 17021715, July 2003.
[17] P. Lilja and H. Saarnisaari, Robust Adaptive Beamforming in Software Defined Radio with
Adaptive Diagonal Loading, in Military Communications Conference, 2005. MILCOM
2005. IEEE, vol. 4, October 2005, pp. 25962601.
[18] T. S. Laseetha and R. Sukanesh, Robust Adaptive Beamformers using Diagonal Loading,
Cyber Journals: Multidisciplinary Journals in Science and Technology, Journal of Selected
Areas in Telecommunications (JSAT), March Edition 2011, pp. 7379, March 2011.
[19] E. Mabande, A. Schad, and W. Kellermann, Design of Robust Superdirective Beamformers
as a Convex Optimization Problem, in Acoustics, Speech and Signal Processing, 2009.
ICASSP 2009. IEEE International Conference, April 2009, pp. 77 80.
[20] M. Wolfel and J. McDonough, Distant Speech Recognition.
June 2009.
[21] M. Grant and S. Boyd, cvx Users Guide for cvx Version 1.21, Code Commit: 808, 201107-17, Doc Commit: 806, 2011-02-25, d/b/a CVX Research, 1104 Claire Ave., Austin, TX
78703-2502, April 2011.
[22] P. Tsai, K. Ebrahim, G. Lange, Y. Paichard, and M. Inggs, Null placement in a circular
antenna array for passive coherent location systems, in Radar Conference, 2010 IEEE,
May 2010, pp. 11401143.
[23] W. Herbordt and W. Kellermann, Computationally efficient frequency-domain robust generalized sidelobe canceller, in 7th International Workshop on Acoustic Echo and Noise
Control, Darmstadt University of Technology, Darmstadt, Germany, 2001.
[24] J. Shynk, Frequency-Domain and Multirate Adaptive Filtering, Signal Processing Magazine, IEEE, vol. 9, no. 1, pp. 14 37, January 1992.
[25] J. Chen, H. Gu, H. Wang, and W. Su, Mathematical analysis of main-to-sidelobe ratio after
pulse compression in pseudorandom code phase modulation cw radar, in Radar Conference
2008, IEEE, May 2008, pp. 15.
[26] G. W. Elko, A new technique to measure electroacoustic transducer directivity indices
in reverberant fields, in Applications of Signal Processing to Audio and Acoustics, 1993.
Final Program and Paper Summaries, 1993 IEEE Workshop, October 1993, pp. 6467.
[27] H. Yi and P. Loizou, Evaluation of objective quality measures for speech enhancement,
Audio, Speech, and Language Processing, IEEE Transactions, vol. 16, no. 1, pp. 229238,
January 2008.
[28] J. H. L. Hansen and B. L. Pellom, An effective quality evaluation protocol for speech
enhancement algorithms, in The International Conference on Speech and Language Processing, 1998, pp. 28192822.
[29] D. Klatt, Prediction of perceived phonetic distance from critical-band spectra: A first
step, in Acoustics, Speech, and Signal Processing, IEEE International Conference on
ICASSP 82., vol. 7, May 1982, pp. 12781281.
[30] A. W. Rix, J. G. Beerends, M. P. Hollier, and A. P. Hekstra, Perceptual evaluation of
speech quality (pesq) - a new method for speech quality assessment of telephone networks
and codecs, in Acoustics, Speech, and Signal Processing, 2001. Proceedings. (ICASSP 01),
vol. 2, 2001, pp. 749752.
78
Bibliography
John
[43] M. Bar, Visual objects in context, Nature Reviews Neuroscience, vol. 5, pp. 617629,
August 2004.
[44] R. Scholte, B. Roozen, and I. Lopez, Twelfth international congress on sound and vibration, Portugal, Lisbon, July 2005.
[45] T. Habib and H. Romsdorfer, Comparison of SRP-PHAT and Multiband-PoPi Algorithms
for Speaker Localization using Particle Filters, in Proc. of the 13th Int. Conference on
Digital Audio Effects (DAFx-10), September 2010.
79
Beamforming
80
3dBBeamwidth (MRA = 0 )
250
3dBBeamwidth (MRA = 0 )
250
200
3dBBeamwidth (degree)
3dBBeamwidth (degree)
200
150
100
50
150
100
50
0
0
5000
10000
15000
5000
Frequency (Hz)
10000
3dBBeamwidth (MRA = 0 )
3dBBeamwidth (MRA = 0 )
250
250
3dBBeamwidth (degree)
3dBBeamwidth (degree)
200
150
100
50
150
100
50
0
0
5000
10000
15000
5000
Frequency (Hz)
10000
15000
Frequency (Hz)
250
250
3dBBeamwidth (degree)
200
3dBBeamwidth (degree)
15000
Frequency (Hz)
150
100
50
150
100
50
0
0
5000
10000
15000
5000
Frequency (Hz)
10000
15000
Frequency (Hz)
81
20
22
20
18
18
16
16
14
MSR (dB)
MSR (dB)
14
12
10
12
10
8
0
0
5000
10000
15000
5000
Frequency (Hz)
20
18
18
16
16
14
14
12
10
20
MSR (dB)
MSR (dB)
15000
22
12
10
0
0
5000
10000
15000
5000
Frequency (Hz)
10000
15000
Frequency (Hz)
22
22
20
18
18
16
16
14
14
12
10
12
10
20
MSR (dB)
MSR (dB)
10000
Frequency (Hz)
0
0
5000
10000
15000
5000
Frequency (Hz)
15000
82
10000
Frequency (Hz)
16
18
16
14
14
12
10
12
10
0
0
2000
4000
6000
8000
10000
12000
14000
16000
2000
4000
6000
Frequency (Hz)
12000
14000
16000
16
14
14
12
12
10
16
10000
18
10
0
0
2000
4000
6000
8000
10000
12000
14000
16000
2000
4000
6000
Frequency (Hz)
8000
10000
12000
14000
16000
Frequency (Hz)
18
18
16
14
14
12
12
10
10
16
8000
Frequency (Hz)
0
0
2000
4000
6000
8000
10000
12000
14000
16000
2000
4000
Frequency (Hz)
6000
8000
10000
12000
14000
16000
Frequency (Hz)
83
Figure A.4: These plots depict beampatterns of different beamformers for d=0.20m.
84
Figure A.5: These plots depict beampatterns of different beamformers for d=0.50m.
85
Figure A.6: These plots depict beampatterns of different beamformers for d=0.20m and f=2000Hz.
86
Figure A.7: These plots depict beampatterns of different beamformers for d=0.50m and f=2000Hz.
87
Beamforming
88
WRe
74.17 %
71.67 %
71.67 %
77.50 %
81.67 %
84.17 %
WRn
50.83 %
50.83 %
50.83 %
50.00 %
50.00 %
50.00 %
iGSINR
12.467 dB
12.563 dB
12.578 dB
11.950 dB
12.068 dB
12.099 dB
GAIN
-1.234 dB
-1.251 dB
-1.242 dB
-3.407 dB
-3.427 dB
-3.427 dB
LLR
2.414
2.540
2.555
2.565
2.606
2.627
sSNR
-4.745
-4.719
-4.721
-4.359
-4.324
-4.319
WSS
51.640
51.631
51.588
53.168
52.801
52.590
PESQ
2.340
2.342
2.346
2.359
2.386
2.388
Csig
1.555
1.427
1.414
1.397
1.374
1.357
Cbak
2.092
2.095
2.097
2.115
2.132
2.135
Covrl
1.880
1.817
1.813
1.807
1.810
1.803
Table B.1: This table contains the resulting measurements of the enhanced file of speaker 03 with s = 0 .
DS-BF (Speaker 12, MRA = 45 )
(08,20)
(12,20)
(24,20)
(08,55)
(12,55)
(24,55)
WRe
58.33 %
60.83 %
60.83 %
70.00 %
78.33 %
77.50 %
WRn
46.67 %
46.67 %
46.67 %
45.83 %
48.33 %
45.83 %
iGSINR
1.732 dB
1.726 dB
1.730 dB
2.586 dB
2.577 dB
2.595 dB
GAIN
-1.074 dB
-1.072 dB
-1.079 dB
-3.045 dB
-3.070 dB
-3.072 dB
LLR
2.382
2.514
2.547
2.587
2.552
2.648
sSNR
-4.626
-4.834
-4.606
-4.311
-4.530
-4.264
WSS
50.527
50.511
50.408
52.068
51.566
51.298
PESQ
2.292
2.241
2.284
2.343
2.284
2.378
Csig
1.569
1.403
1.396
1.375
1.380
1.341
Cbak
2.085
2.047
2.083
2.118
2.079
2.143
Covrl
1.866
1.757
1.776
1.791
1.765
1.794
Table B.2: This table contains the resulting measurements of the enhanced file of speaker 12 with s = 45 .
RLSFI-BF (Speaker 03, MRA = 0 )
(08,20)
(12,20)
(24,20)
(08,55)
(12,55)
(24,55)
WRe
80.83 %
83.33 %
85.00 %
80.00 %
79.17 %
80.83 %
WRn
50.83 %
50.83 %
50.83 %
50.00 %
50.00 %
50.00 %
iGSINR
14.255 dB
14.463 dB
13.923 dB
15.793 dB
15.792 dB
16.231 dB
GAIN
-3.887 dB
-4.212 dB
-4.471 dB
-2.768 dB
-3.224 dB
-2.582 dB
LLR
2.543
2.570
2.475
2.579
2.633
2.642
sSNR
-4.106
-4.007
-3.932
-3.947
-3.837
-3.911
WSS
52.561
51.527
51.513
58.129
57.158
60.362
PESQ
2.374
2.381
2.388
2.432
2.414
2.489
Csig
1.435
1.420
1.522
1.382
1.325
1.332
Cbak
2.142
2.159
2.167
2.141
2.146
2.155
Covrl
1.835
1.834
1.888
1.824
1.789
1.823
Table B.3: This table contains the resulting measurements of the enhanced file of speaker 03 with s = 0 .
RLSFI-BF (Speaker 12, MRA = 45 )
(08,20)
(12,20)
(24,20)
(08,55)
(12,55)
(24,55)
WRe
70.83 %
71.67 %
77.50 %
74.17 %
73.33 %
71.67 %
WRn
46.67 %
46.67 %
46.67 %
45.83 %
48.33 %
45.83 %
iGSINR
2.201 dB
2.278 dB
2.370 dB
2.755 dB
2.754 dB
2.566 dB
GAIN
-3.030 dB
-3.139 dB
-3.491 dB
-3.018 dB
-3.344 dB
-2.379 dB
LLR
2.455
2.525
2.353
2.579
2.571
2.631
sSNR
-4.048
-4.277
-3.910
-3.956
-4.179
-4.019
WSS
50.590
50.452
50.701
54.868
55.696
56.387
PESQ
2.320
2.297
2.349
2.392
2.333
2.419
Csig
1.510
1.425
1.631
1.388
1.353
1.337
Cbak
2.134
2.109
2.156
2.144
2.096
2.143
Covrl
1.850
1.797
1.925
1.815
1.766
1.800
Table B.4: This table contains the resulting measurements of the enhanced file of speaker 12 with s = 45 .
MNS-RLSFI-BF (Speaker 03, MRA = 0 )
(08,20)
(12,20)
(24,20)
(08,55)
(12,55)
(24,55)
WRe
83.33 %
84.17 %
82.50 %
81.67 %
86.67 %
83.33 %
WRn
50.83 %
50.83 %
50.83 %
50.00 %
50.00 %
50.00 %
iGSINR
14.474 dB
14.490 dB
13.909 dB
16.132 dB
15.880 dB
16.327 dB
GAIN
-3.899 dB
-4.210 dB
-4.465 dB
-2.790 dB
-3.243 dB
-2.581 dB
LLR
2.493
2.565
2.460
2.637
2.618
2.616
sSNR
-4.062
-3.989
-3.935
-3.883
-3.799
-3.892
WSS
51.267
51.049
51.502
57.925
58.652
59.952
PESQ
2.372
2.378
2.394
2.463
2.437
2.465
Csig
1.497
1.428
1.541
1.344
1.341
1.348
Cbak
2.153
2.162
2.170
2.161
2.149
2.147
Covrl
1.868
1.838
1.901
1.821
1.805
1.819
Table B.5: This table contains the resulting measurements of the enhanced file of speaker 03 with s = 0 .
MNS-RLSFI-BF (Speaker 12, MRA = 45 )
(08,20)
(12,20)
(24,20)
(08,55)
(12,55)
(24,55)
WRe
79.17 %
79.17 %
79.17 %
81.67 %
80.00 %
75.83 %
WRn
46.67 %
46.67 %
46.67 %
45.83 %
48.33 %
45.83 %
iGSINR
2.194 dB
2.280 dB
2.371 dB
2.758 dB
2.763 dB
2.566 dB
GAIN
-3.038 dB
-3.135 dB
-3.477 dB
-3.053 dB
-3.373 dB
-2.375 dB
LLR
2.456
2.514
2.344
2.666
2.584
2.663
sSNR
-3.998
-4.244
-3.922
-3.877
-4.106
-3.999
WSS
50.579
50.800
50.846
54.620
55.188
56.421
PESQ
2.328
2.277
2.323
2.421
2.333
2.411
Csig
1.515
1.422
1.625
1.318
1.344
1.298
Cbak
2.141
2.099
2.142
2.165
2.104
2.139
Covrl
1.857
1.784
1.909
1.796
1.763
1.776
Table B.6: This table contains the resulting measurements of the enhanced file of speaker 12 with s = 45 .
89
WRe
76.67 %
75.00 %
71.67 %
66.67 %
63.33 %
64.17 %
WRn
50.83 %
50.83 %
50.83 %
50.00 %
50.00 %
50.00 %
iGSINR
19.601 dB
20.223 dB
22.641 dB
15.718 dB
16.374 dB
18.302 dB
GAIN
9.088 dB
8.836 dB
9.098 dB
0.184 dB
-0.479 dB
-0.965 dB
LLR
8.801
4.232
0.909
3.641
0.765
0.632
sSNR
-7.967
-7.804
-7.762
-6.551
-6.253
-6.077
WSS
29.989
26.709
25.433
32.322
30.250
32.511
PESQ
3.222
3.242
3.241
2.676
2.678
2.614
Csig
-4.290
0.452
3.883
0.669
3.648
3.727
Cbak
2.462
2.505
2.516
2.274
2.308
2.273
Table B.7: This table contains the resulting measurements of the enhanced file of speaker 03 with
Covrl
-0.528
1.850
3.559
1.658
3.146
3.147
s = 0 .
WRe
80.83 %
77.50 %
72.50 %
59.17 %
62.50 %
64.17 %
WRn
46.67 %
46.67 %
46.67 %
45.83 %
48.33 %
45.83 %
iGSINR
3.356 dB
3.320 dB
3.400 dB
3.119 dB
3.733 dB
3.604 dB
GAIN
9.590 dB
10.228 dB
9.578 dB
0.930 dB
-0.844 dB
-0.876 dB
LLR
8.966
5.671
1.152
3.989
0.973
0.615
sSNR
-7.780
-7.658
-7.433
-6.605
-6.204
-6.108
WSS
32.446
28.561
27.872
33.397
33.368
34.379
PESQ
3.293
3.337
3.321
2.928
2.794
2.832
Csig
-4.440
-0.988
3.660
0.453
3.476
3.858
Cbak
2.491
2.547
2.558
2.383
2.345
2.362
Covrl
-0.573
1.177
3.483
1.674
3.111
3.318
Table B.8: This table contains the resulting measurements of the enhanced file of speaker 12 with s = 45 .
MPDRVL-BF (Speaker 03, MRA = 0 )
(08,20)
(12,20)
(24,20)
(08,55)
(12,55)
(24,55)
WRe
0.00 %
35.00 %
34.17 %
0.00 %
38.33 %
33.33 %
WRn
50.83 %
50.83 %
50.83 %
50.00 %
50.00 %
50.00 %
iGSINR
10.036 dB
10.734 dB
12.016 dB
8.383 dB
7.832 dB
8.260 dB
GAIN
-1.490 dB
-1.569 dB
-1.779 dB
-1.955 dB
-3.148 dB
-3.474 dB
LLR
0.813
0.751
0.733
0.693
0.689
0.691
sSNR
-4.585
-4.388
-4.285
-5.237
-4.992
-4.921
WSS
32.875
33.659
32.243
37.553
37.410
40.802
PESQ
2.073
2.157
2.160
2.053
2.023
2.129
Csig
3.210
3.318
3.351
3.280
3.267
3.298
Cbak
2.106
2.153
2.171
2.023
2.025
2.056
Covrl
2.616
2.710
2.732
2.629
2.608
2.668
Table B.9: This table contains the resulting measurements of the enhanced file of speaker 03 with s = 0 .
MPDRVL-BF (Speaker 12, MRA = 45 )
(08,20)
(12,20)
(24,20)
(08,55)
(12,55)
(24,55)
WRe
31.67 %
32.50 %
31.67 %
34.17 %
36.67 %
33.33 %
WRn
46.67 %
46.67 %
46.67 %
45.83 %
48.33 %
45.83 %
iGSINR
1.910 dB
2.134 dB
1.917 dB
1.456 dB
1.743 dB
1.767 dB
GAIN
-0.795 dB
-0.692 dB
-0.985 dB
-1.454 dB
-2.570 dB
-2.987 dB
LLR
0.672
0.619
0.748
0.580
0.586
0.621
sSNR
-5.315
-5.237
-5.101
-5.633
-5.532
-5.369
WSS
34.003
35.264
35.215
42.374
38.586
42.449
PESQ
2.239
2.253
2.309
2.180
2.128
2.218
Csig
3.445
3.497
3.399
3.429
3.425
3.409
Cbak
2.131
2.134
2.170
2.024
2.032
2.059
Covrl
2.814
2.844
2.824
2.755
2.736
2.764
Table B.10: This table contains the resulting measurements of the enhanced file of speaker 12 with s = 45 .
GSC (Speaker 03, MRA = 0 )
(08,20)
(12,20)
(24,20)
(08,55)
(12,55)
(24,55)
WRe
71.67 %
73.33 %
69.17 %
77.50 %
82.50 %
83.33 %
WRn
50.83 %
50.83 %
50.83 %
50.00 %
50.00 %
50.00 %
iGSINR
12.467 dB
12.563 dB
12.578 dB
11.950 dB
12.068 dB
12.099 dB
GAIN
-1.234 dB
-1.251 dB
-1.242 dB
-3.407 dB
-3.427 dB
-3.427 dB
LLR
0.768
0.716
0.989
0.775
0.686
0.915
sSNR
-4.994
-5.002
-5.243
-4.833
-4.864
-4.933
WSS
21.577
21.994
22.942
21.638
21.287
22.447
PESQ
2.784
2.739
2.649
2.954
2.977
2.923
Csig
3.788
3.810
3.466
3.883
3.990
3.712
Cbak
2.499
2.474
2.409
2.590
2.601
2.563
Covrl
3.291
3.278
3.059
3.424
3.490
3.322
Table B.11: This table contains the resulting measurements of the enhanced file of speaker 03 with s = 0 .
GSC (Speaker 12, MRA = 45 )
(08,20)
(12,20)
(24,20)
(08,55)
(12,55)
(24,55)
WRe
61.67 %
65.83 %
64.17 %
67.50 %
79.17 %
77.50 %
WRn
46.67 %
46.67 %
46.67 %
45.83 %
48.33 %
45.83 %
iGSINR
1.732 dB
1.726 dB
1.730 dB
2.586 dB
2.577 dB
2.595 dB
GAIN
-1.074 dB
-1.072 dB
-1.079 dB
-3.045 dB
-3.070 dB
-3.072 dB
LLR
0.753
0.707
0.879
0.767
0.664
1.021
sSNR
-5.359
-5.438
-5.492
-5.162
-5.109
-5.398
WSS
23.686
24.040
24.706
26.001
24.031
25.686
PESQ
2.792
2.779
2.715
2.974
3.039
3.019
Csig
3.789
3.825
3.603
3.862
4.026
3.631
Cbak
2.465
2.452
2.413
2.548
2.597
2.557
Covrl
3.291
3.301
3.156
3.413
3.532
3.322
Table B.12: This table contains the resulting measurements of the enhanced file of speaker 12 with s = 45 .
90
WRe
84.17 %
80.83 %
82.50 %
84.17 %
77.50 %
85.00 %
WRn
50.83 %
50.83 %
50.83 %
51.67 %
52.50 %
52.50 %
iGSINR
11.148 dB
11.107 dB
11.168 dB
11.866 dB
11.813 dB
11.946 dB
GAIN
-2.842 dB
-2.829 dB
-2.844 dB
-3.812 dB
-3.778 dB
-3.826 dB
LLR
0.579
0.718
0.328
0.627
0.574
0.376
sSNR
-3.968
-3.989
-3.949
-5.093
-5.160
-5.099
WSS
15.627
15.716
15.139
16.010
15.929
14.897
PESQ
3.121
3.099
3.131
3.097
3.095
3.273
Csig
4.238
4.082
4.507
4.172
4.225
4.546
Cbak
2.766
2.754
2.776
2.681
2.677
2.773
Covrl
3.700
3.611
3.840
3.654
3.680
3.932
Table B.13: This table contains the resulting measurements of the enhanced file of speaker 03 with s = 300 .
DS-BF (Speaker 12, MRA = 120 )
(08,20)
(12,20)
(24,20)
(08,55)
(12,55)
(24,55)
WRe
75.00 %
79.17 %
82.50 %
80.00 %
75.00 %
87.50 %
WRn
48.33 %
48.33 %
48.33 %
45.83 %
45.83 %
45.83 %
iGSINR
2.474 dB
2.462 dB
2.474 dB
3.062 dB
3.053 dB
3.063 dB
GAIN
-2.474 dB
-2.455 dB
-2.477 dB
-3.635 dB
-3.593 dB
-3.632 dB
LLR
0.694
0.745
0.429
0.623
0.651
0.346
sSNR
-4.942
-4.963
-4.918
-5.420
-5.480
-5.435
WSS
19.010
18.880
18.148
18.031
19.333
17.155
PESQ
3.043
3.042
3.059
3.157
3.143
3.233
Csig
4.043
3.991
4.333
4.193
4.145
4.532
Cbak
2.644
2.643
2.660
2.675
2.656
2.717
Covrl
3.555
3.529
3.710
3.690
3.655
3.899
Table B.14: This table contains the resulting measurements of the enhanced file of speaker 12 with s = 120 .
RLSFI-BF (Speaker 03, MRA = 300 )
(08,20)
(12,20)
(24,20)
(08,55)
(12,55)
(24,55)
WRe
84.17 %
80.83 %
82.50 %
84.17 %
77.50 %
85.00 %
WRn
50.83 %
50.83 %
50.83 %
51.67 %
52.50 %
52.50 %
iGSINR
11.148 dB
11.107 dB
11.168 dB
11.866 dB
11.813 dB
11.946 dB
GAIN
-2.842 dB
-2.829 dB
-2.844 dB
-3.812 dB
-3.778 dB
-3.826 dB
LLR
0.579
0.718
0.328
0.627
0.574
0.376
sSNR
-3.968
-3.989
-3.949
-5.093
-5.160
-5.099
WSS
15.627
15.716
15.139
16.010
15.929
14.897
PESQ
3.121
3.099
3.131
3.097
3.095
3.273
Csig
4.238
4.082
4.507
4.172
4.225
4.546
Cbak
2.766
2.754
2.776
2.681
2.677
2.773
Covrl
3.700
3.611
3.840
3.654
3.680
3.932
Table B.15: This table contains the resulting measurements of the enhanced file of speaker 03 with s = 300 .
RLSFI-BF (Speaker 12, MRA = 120 )
(08,20)
(12,20)
(24,20)
(08,55)
(12,55)
(24,55)
WRe
75.00 %
79.17 %
82.50 %
80.00 %
75.00 %
87.50 %
WRn
48.33 %
48.33 %
48.33 %
45.83 %
45.83 %
45.83 %
iGSINR
2.474 dB
2.462 dB
2.474 dB
3.062 dB
3.053 dB
3.063 dB
GAIN
-2.474 dB
-2.455 dB
-2.477 dB
-3.635 dB
-3.593 dB
-3.632 dB
LLR
0.694
0.745
0.429
0.623
0.651
0.346
sSNR
-4.942
-4.963
-4.918
-5.420
-5.480
-5.435
WSS
19.010
18.880
18.148
18.031
19.333
17.155
PESQ
3.043
3.042
3.059
3.157
3.143
3.233
Csig
4.043
3.991
4.333
4.193
4.145
4.532
Cbak
2.644
2.643
2.660
2.675
2.656
2.717
Covrl
3.555
3.529
3.710
3.690
3.655
3.899
Table B.16: This table contains the resulting measurements of the enhanced file of speaker 12 with s = 120 .
MNS-RLSFI-BF (Speaker 03, MRA = 300 )
(08,20)
(12,20)
(24,20)
(08,55)
(12,55)
(24,55)
WRe
85.83 %
86.67 %
85.83 %
89.17 %
87.50 %
89.17 %
WRn
50.83 %
50.83 %
50.83 %
51.67 %
52.50 %
52.50 %
iGSINR
11.201 dB
11.178 dB
11.205 dB
11.999 dB
11.945 dB
12.015 dB
GAIN
-2.859 dB
-2.852 dB
-2.859 dB
-3.857 dB
-3.832 dB
-3.842 dB
LLR
0.264
0.174
0.162
0.254
0.258
0.206
sSNR
-3.918
-3.907
-3.902
-4.999
-5.069
-5.048
WSS
12.802
13.058
12.601
12.388
12.217
11.336
PESQ
3.276
3.264
3.276
3.361
3.353
3.411
Csig
4.682
4.765
4.789
4.746
4.740
4.836
Cbak
2.863
2.857
2.866
2.839
2.832
2.867
Covrl
4.006
4.041
4.060
4.082
4.076
4.155
Table B.17: This table contains the resulting measurements of the enhanced file of speaker 03 with s = 300 .
MNS-RLSFI-BF (Speaker 12, MRA = 120 )
(08,20)
(12,20)
(24,20)
(08,55)
(12,55)
(24,55)
WRe
91.67 %
91.67 %
90.83 %
89.17 %
90.83 %
87.50 %
WRn
48.33 %
48.33 %
48.33 %
45.83 %
45.83 %
45.83 %
iGSINR
2.480 dB
2.461 dB
2.475 dB
3.081 dB
3.056 dB
3.068 dB
GAIN
-2.481 dB
-2.478 dB
-2.485 dB
-3.664 dB
-3.635 dB
-3.644 dB
LLR
0.305
0.230
0.202
0.282
0.259
0.221
sSNR
-4.903
-4.893
-4.889
-5.342
-5.404
-5.388
WSS
16.143
16.183
15.940
15.719
15.388
14.837
PESQ
3.117
3.112
3.117
3.278
3.263
3.282
Csig
4.513
4.588
4.622
4.638
4.655
4.711
Cbak
2.702
2.700
2.704
2.754
2.745
2.760
Covrl
3.834
3.869
3.888
3.978
3.980
4.019
Table B.18: This table contains the resulting measurements of the enhanced file of speaker 12 with s = 120 .
91
WRe
0.00 %
0.00 %
0.00 %
0.00 %
35.00 %
0.00 %
WRn
50.83 %
50.83 %
50.83 %
51.67 %
52.50 %
52.50 %
iGSINR
19.348 dB
17.904 dB
20.125 dB
15.357 dB
15.181 dB
16.233 dB
GAIN
7.302 dB
6.954 dB
7.260 dB
-0.046 dB
0.154 dB
-0.129 dB
LLR
0.993
0.941
0.970
0.973
0.974
0.959
sSNR
-7.538
-7.497
-7.494
-6.182
-6.336
-6.234
WSS
34.192
34.564
34.166
38.053
37.197
37.962
PESQ
2.483
2.477
2.483
2.170
2.204
2.175
Csig
3.261
3.307
3.284
3.058
3.085
3.075
Cbak
2.107
2.104
2.109
2.015
2.028
2.015
Covrl
2.845
2.864
2.857
2.576
2.609
2.588
Table B.19: This table contains the resulting measurements of the enhanced file of speaker 03 with s = 300 .
MPDRDL-BF (Speaker 12, MRA = 120 )
(08,20)
(12,20)
(24,20)
(08,55)
(12,55)
(24,55)
WRe
35.00 %
36.67 %
35.00 %
35.83 %
37.50 %
37.50 %
WRn
48.33 %
48.33 %
48.33 %
45.83 %
45.83 %
45.83 %
iGSINR
3.688 dB
3.723 dB
3.688 dB
2.380 dB
2.433 dB
2.349 dB
GAIN
7.473 dB
7.070 dB
7.428 dB
0.141 dB
0.401 dB
0.131 dB
LLR
0.972
0.913
0.935
0.894
0.921
0.884
sSNR
-7.462
-7.433
-7.409
-6.413
-6.522
-6.418
WSS
37.441
37.860
37.380
40.342
39.569
40.206
PESQ
2.641
2.640
2.640
2.276
2.371
2.325
Csig
3.348
3.405
3.386
3.182
3.218
3.223
Cbak
2.164
2.163
2.168
2.036
2.079
2.060
Covrl
2.960
2.987
2.979
2.686
2.754
2.731
Table B.20: This table contains the resulting measurements of the enhanced file of speaker 12 with s = 120 .
MPDRVL-BF (Speaker 03, MRA = 300 )
(08,20)
(12,20)
(24,20)
(08,55)
(12,55)
(24,55)
WRe
25.83 %
0.00 %
0.00 %
27.50 %
32.50 %
30.00 %
WRn
50.83 %
50.83 %
50.83 %
51.67 %
52.50 %
52.50 %
iGSINR
12.917 dB
12.688 dB
12.870 dB
11.124 dB
11.038 dB
11.146 dB
GAIN
-0.286 dB
-0.335 dB
-0.293 dB
-0.538 dB
-0.501 dB
-0.499 dB
LLR
1.348
1.375
1.132
1.134
1.145
0.929
sSNR
-4.825
-4.834
-4.862
-5.498
-5.520
-5.546
WSS
31.485
31.250
31.382
34.271
34.328
36.084
PESQ
1.953
1.999
1.966
2.005
2.026
1.980
Csig
2.600
2.602
2.831
2.826
2.828
3.006
Cbak
2.043
2.066
2.048
2.006
2.014
1.978
Covrl
2.255
2.280
2.377
2.387
2.398
2.460
Table B.21: This table contains the resulting measurements of the enhanced file of speaker 03 with s = 300 .
MPDRVL-BF (Speaker 12, MRA = 120 )
(08,20)
(12,20)
(24,20)
(08,55)
(12,55)
(24,55)
WRe
23.33 %
21.67 %
17.50 %
25.00 %
24.17 %
20.00 %
WRn
48.33 %
48.33 %
48.33 %
45.83 %
45.83 %
45.83 %
iGSINR
2.094 dB
2.081 dB
2.092 dB
2.172 dB
2.126 dB
2.161 dB
GAIN
2.709 dB
2.582 dB
2.693 dB
0.983 dB
1.001 dB
1.034 dB
LLR
1.113
1.235
0.921
0.941
0.848
0.790
sSNR
-5.708
-5.714
-5.738
-6.070
-6.055
-6.086
WSS
34.310
34.213
33.844
36.668
36.589
38.106
PESQ
2.157
2.203
2.190
2.184
2.219
2.140
Csig
2.939
2.843
3.162
3.111
3.229
3.227
Cbak
2.065
2.088
2.083
2.039
2.057
2.007
Covrl
2.520
2.496
2.649
2.613
2.690
2.645
Table B.22: This table contains the resulting measurements of the enhanced file of speaker 12 with s = 120 .
GSC (Speaker 03, MRA = 300 )
(08,20)
(12,20)
(24,20)
(08,55)
(12,55)
(24,55)
WRe
78.33 %
76.67 %
73.33 %
77.50 %
75.00 %
77.50 %
WRn
50.83 %
50.83 %
50.83 %
51.67 %
52.50 %
52.50 %
iGSINR
11.148 dB
11.107 dB
11.168 dB
11.866 dB
11.813 dB
11.946 dB
GAIN
-2.842 dB
-2.829 dB
-2.844 dB
-3.812 dB
-3.778 dB
-3.826 dB
LLR
0.808
0.822
1.072
0.822
0.852
1.090
sSNR
-5.106
-5.057
-5.330
-4.738
-4.845
-4.953
WSS
22.816
22.582
24.393
21.997
22.378
23.083
PESQ
2.923
2.906
2.779
2.928
2.846
2.903
Csig
3.819
3.796
3.446
3.815
3.731
3.514
Cbak
2.550
2.546
2.456
2.581
2.532
2.548
Covrl
3.374
3.354
3.112
3.376
3.292
3.211
Table B.23: This table contains the resulting measurements of the enhanced file of speaker 03 with s = 300 .
GSC (Speaker 12, MRA = 120 )
(08,20)
(12,20)
(24,20)
(08,55)
(12,55)
(24,55)
WRe
65.83 %
65.83 %
70.00 %
71.67 %
65.83 %
70.00 %
WRn
48.33 %
48.33 %
48.33 %
45.83 %
45.83 %
45.83 %
iGSINR
2.474 dB
2.462 dB
2.474 dB
3.062 dB
3.053 dB
3.063 dB
GAIN
-2.474 dB
-2.455 dB
-2.477 dB
-3.635 dB
-3.593 dB
-3.632 dB
LLR
0.816
0.847
1.128
0.773
0.827
1.008
sSNR
-5.465
-5.539
-5.625
-4.950
-5.077
-5.117
WSS
25.867
25.769
26.233
24.072
25.668
24.913
PESQ
2.966
2.927
2.880
3.048
2.994
3.045
Csig
3.810
3.755
3.433
3.919
3.816
3.668
Cbak
2.527
2.504
2.473
2.611
2.565
2.593
Covrl
3.383
3.336
3.151
3.484
3.401
3.355
Table B.24: This table contains the resulting measurements of the enhanced file of speaker 12 with s = 120 .
92
WRe
95.00 %
94.17 %
95.83 %
94.17 %
94.17 %
95.00 %
WRn
91.67 %
91.67 %
91.67 %
93.33 %
93.33 %
93.33 %
iGSINR
-0.312 dB
-0.302 dB
-0.275 dB
-0.292 dB
-0.280 dB
-0.202 dB
GAIN
3.583 dB
1.189 dB
0.647 dB
0.825 dB
-1.589 dB
-2.553 dB
LLR
0.272
0.324
0.336
0.239
0.290
0.356
sSNR
1.643
1.286
1.184
0.207
0.193
0.199
WSS
27.533
27.021
27.023
32.766
31.800
30.296
PESQ
3.653
3.668
3.649
3.186
3.193
3.247
Csig
4.767
4.728
4.705
4.473
4.434
4.412
Cbak
3.291
3.279
3.264
2.941
2.950
2.986
Covrl
4.202
4.191
4.171
3.807
3.794
3.813
Table B.25: This table contains the resulting measurements of the enhanced file of speaker 03 with s = 0 .
RLSFI-BF (Speaker 03, MRA = 0 )
(08,20)
(12,20)
(24,20)
(08,55)
(12,55)
(24,55)
WRe
95.00 %
94.17 %
95.83 %
94.17 %
94.17 %
95.00 %
WRn
91.67 %
91.67 %
91.67 %
93.33 %
93.33 %
93.33 %
iGSINR
-0.312 dB
-0.302 dB
-0.275 dB
-0.292 dB
-0.280 dB
-0.202 dB
GAIN
3.583 dB
1.189 dB
0.647 dB
0.825 dB
-1.589 dB
-2.553 dB
LLR
0.272
0.324
0.336
0.239
0.290
0.356
sSNR
1.643
1.286
1.184
0.207
0.193
0.199
WSS
27.533
27.021
27.023
32.766
31.800
30.296
PESQ
3.653
3.668
3.649
3.186
3.193
3.247
Csig
4.767
4.728
4.705
4.473
4.434
4.412
Cbak
3.291
3.279
3.264
2.941
2.950
2.986
Covrl
4.202
4.191
4.171
3.807
3.794
3.813
Table B.26: This table contains the resulting measurements of the enhanced file of speaker 03 with s = 0 .
MNS-RLSFI-BF (Speaker 03, MRA = 0 )
(08,20)
(12,20)
(24,20)
(08,55)
(12,55)
(24,55)
WRe
95.00 %
94.17 %
95.83 %
94.17 %
94.17 %
95.00 %
WRn
91.67 %
91.67 %
91.67 %
93.33 %
93.33 %
93.33 %
iGSINR
-0.312 dB
-0.302 dB
-0.275 dB
-0.292 dB
-0.280 dB
-0.202 dB
GAIN
3.583 dB
1.189 dB
0.647 dB
0.825 dB
-1.589 dB
-2.553 dB
LLR
0.272
0.324
0.336
0.239
0.290
0.356
sSNR
1.643
1.286
1.184
0.207
0.193
0.199
WSS
27.533
27.021
27.023
32.766
31.800
30.296
PESQ
3.653
3.668
3.649
3.186
3.193
3.247
Csig
4.767
4.728
4.705
4.473
4.434
4.412
Cbak
3.291
3.279
3.264
2.941
2.950
2.986
Covrl
4.202
4.191
4.171
3.807
3.794
3.813
Table B.27: This table contains the resulting measurements of the enhanced file of speaker 03 with s = 0 .
MPDRDL-BF (Speaker 03, MRA = 0 )
(08,20)
(12,20)
(24,20)
(08,55)
(12,55)
(24,55)
WRe
95.83 %
94.17 %
92.50 %
93.33 %
90.00 %
92.50 %
WRn
91.67 %
91.67 %
91.67 %
93.33 %
93.33 %
93.33 %
iGSINR
0.592 dB
0.609 dB
0.361 dB
0.018 dB
0.079 dB
-0.093 dB
GAIN
-3.980 dB
-7.796 dB
-8.745 dB
-3.269 dB
-6.896 dB
-7.390 dB
LLR
0.427
0.495
0.501
0.362
0.471
0.498
sSNR
0.047
0.084
0.059
-0.306
-0.133
-0.139
WSS
34.293
33.657
33.360
37.995
37.304
35.380
PESQ
3.163
3.176
3.128
2.996
2.942
2.918
Csig
4.252
4.196
4.163
4.185
4.047
4.022
Cbak
2.909
2.922
2.899
2.781
2.771
2.773
Covrl
3.682
3.662
3.622
3.555
3.460
3.441
Table B.28: This table contains the resulting measurements of the enhanced file of speaker 03 with s = 0 .
MPDRVL-BF (Speaker 03, MRA = 0 )
(08,20)
(12,20)
(24,20)
(08,55)
(12,55)
(24,55)
WRe
90.83 %
91.67 %
93.33 %
92.50 %
92.50 %
94.17 %
WRn
91.67 %
91.67 %
91.67 %
93.33 %
93.33 %
93.33 %
iGSINR
-0.246 dB
-0.278 dB
-0.287 dB
-0.050 dB
-0.338 dB
-0.026 dB
GAIN
7.932 dB
6.594 dB
5.331 dB
8.248 dB
7.450 dB
5.705 dB
LLR
0.318
0.380
0.417
0.310
0.378
0.396
sSNR
1.721
1.576
1.491
-0.287
-0.244
-0.075
WSS
31.152
31.285
31.129
39.658
39.138
37.403
PESQ
3.361
3.341
3.397
2.820
2.845
2.864
Csig
4.512
4.435
4.432
4.117
4.067
4.076
Cbak
3.131
3.111
3.134
2.686
2.705
2.736
Covrl
3.918
3.870
3.897
3.428
3.417
3.435
Table B.29: This table contains the resulting measurements of the enhanced file of speaker 03 with s = 0 .
GSC (Speaker 03, MRA = 0 )
(08,20)
(12,20)
(24,20)
(08,55)
(12,55)
(24,55)
WRe
91.67 %
90.00 %
85.00 %
85.00 %
89.17 %
84.17 %
WRn
91.67 %
91.67 %
91.67 %
93.33 %
93.33 %
93.33 %
iGSINR
-0.312 dB
-0.302 dB
-0.275 dB
-0.292 dB
-0.280 dB
-0.202 dB
GAIN
3.583 dB
1.189 dB
0.647 dB
0.825 dB
-1.589 dB
-2.553 dB
LLR
0.232
0.228
0.205
0.261
0.256
0.236
sSNR
-0.710
-0.815
-1.199
-0.493
-0.663
-1.202
WSS
33.841
34.002
36.170
37.855
37.132
38.468
PESQ
3.316
3.144
2.787
2.962
3.097
2.876
Csig
4.549
4.448
4.237
4.270
4.363
4.238
Cbak
2.938
2.848
2.637
2.754
2.813
2.664
Covrl
3.908
3.770
3.479
3.580
3.696
3.519
Table B.30: This table contains the resulting measurements of the enhanced file of speaker 03 with s = 0 .
93
WRe
80.83 %
82.50 %
85.00 %
83.33 %
83.33 %
89.17 %
WRn
77.50 %
79.17 %
77.50 %
79.17 %
81.67 %
79.17 %
iGSINR
0.548 dB
0.680 dB
0.649 dB
-0.019 dB
0.105 dB
0.031 dB
GAIN
-4.130 dB
1.857 dB
-6.739 dB
-6.966 dB
-0.469 dB
-10.010 dB
LLR
0.241
0.393
0.314
0.252
0.508
0.476
sSNR
1.221
1.128
0.880
0.010
-0.003
0.051
WSS
22.830
29.109
25.790
28.240
35.096
28.244
PESQ
3.814
3.697
3.747
3.391
3.120
3.306
Csig
4.939
4.656
4.797
4.624
4.136
4.342
Cbak
3.374
3.269
3.300
3.058
2.880
3.020
Covrl
4.381
4.165
4.269
3.997
3.600
3.814
Table B.31: This table contains the resulting measurements of the enhanced file of speaker 12 with s = 45 .
RLSFI-BF (Speaker 12, MRA = 45 )
(08,20)
(12,20)
(24,20)
(08,55)
(12,55)
(24,55)
WRe
80.83 %
82.50 %
85.00 %
83.33 %
83.33 %
89.17 %
WRn
77.50 %
79.17 %
77.50 %
79.17 %
81.67 %
79.17 %
iGSINR
0.548 dB
0.680 dB
0.649 dB
-0.019 dB
0.105 dB
0.031 dB
GAIN
-4.130 dB
1.857 dB
-6.739 dB
-6.966 dB
-0.469 dB
-10.010 dB
LLR
0.241
0.393
0.314
0.252
0.508
0.476
sSNR
1.221
1.128
0.880
0.010
-0.003
0.051
WSS
22.830
29.109
25.790
28.240
35.096
28.244
PESQ
3.814
3.697
3.747
3.391
3.120
3.306
Csig
4.939
4.656
4.797
4.624
4.136
4.342
Cbak
3.374
3.269
3.300
3.058
2.880
3.020
Covrl
4.381
4.165
4.269
3.997
3.600
3.814
Table B.32: This table contains the resulting measurements of the enhanced file of speaker 12 with s = 45 .
MNS-RLSFI-BF (Speaker 12, MRA = 45 )
(08,20)
(12,20)
(24,20)
(08,55)
(12,55)
(24,55)
WRe
80.83 %
82.50 %
85.00 %
83.33 %
83.33 %
89.17 %
WRn
77.50 %
79.17 %
77.50 %
79.17 %
81.67 %
79.17 %
iGSINR
0.548 dB
0.680 dB
0.649 dB
-0.019 dB
0.105 dB
0.031 dB
GAIN
-4.130 dB
1.857 dB
-6.739 dB
-6.966 dB
-0.469 dB
-10.010 dB
LLR
0.241
0.393
0.314
0.252
0.508
0.476
sSNR
1.221
1.128
0.880
0.010
-0.003
0.051
WSS
22.830
29.109
25.790
28.240
35.096
28.244
PESQ
3.814
3.697
3.747
3.391
3.120
3.306
Csig
4.939
4.656
4.797
4.624
4.136
4.342
Cbak
3.374
3.269
3.300
3.058
2.880
3.020
Covrl
4.381
4.165
4.269
3.997
3.600
3.814
Table B.33: This table contains the resulting measurements of the enhanced file of speaker 12 with s = 45 .
MPDRDL-BF (Speaker 12, MRA = 45 )
(08,20)
(12,20)
(24,20)
(08,55)
(12,55)
(24,55)
WRe
80.00 %
82.50 %
74.17 %
80.00 %
83.33 %
81.67 %
WRn
77.50 %
79.17 %
77.50 %
79.17 %
81.67 %
79.17 %
iGSINR
0.472 dB
0.384 dB
0.565 dB
-0.626 dB
-0.205 dB
-0.515 dB
GAIN
-12.209 dB
-7.552 dB
-15.586 dB
-10.413 dB
-4.671 dB
-14.578 dB
LLR
0.374
0.600
0.436
0.445
0.748
0.746
sSNR
-0.060
-0.047
-0.009
-0.248
-0.267
-0.144
WSS
33.817
33.372
32.299
36.628
36.259
32.541
PESQ
3.199
3.264
3.214
3.082
3.063
3.004
Csig
4.333
4.143
4.292
4.164
3.844
3.844
Cbak
2.923
2.957
2.943
2.835
2.827
2.833
Covrl
3.741
3.680
3.732
3.591
3.423
3.403
Table B.34: This table contains the resulting measurements of the enhanced file of speaker 12 with s = 45 .
MPDRVL-BF (Speaker 12, MRA = 45 )
(08,20)
(12,20)
(24,20)
(08,55)
(12,55)
(24,55)
WRe
73.33 %
72.50 %
69.17 %
72.50 %
72.50 %
71.67 %
WRn
77.50 %
79.17 %
77.50 %
79.17 %
81.67 %
79.17 %
iGSINR
0.017 dB
0.226 dB
0.186 dB
-0.464 dB
-0.281 dB
-0.067 dB
GAIN
1.413 dB
8.505 dB
-1.038 dB
1.506 dB
9.404 dB
-1.609 dB
LLR
0.293
0.447
0.408
0.354
0.552
0.496
sSNR
2.009
1.998
1.604
-0.287
-0.440
-0.003
WSS
24.672
30.879
27.289
33.098
39.897
33.308
PESQ
3.625
3.577
3.675
3.128
2.955
3.048
Csig
4.755
4.512
4.643
4.317
3.948
4.120
Cbak
3.320
3.254
3.300
2.879
2.740
2.857
Covrl
4.189
4.029
4.152
3.699
3.411
3.560
Table B.35: This table contains the resulting measurements of the enhanced file of speaker 12 with s = 45 .
GSC (Speaker 12, MRA = 45 )
(08,20)
(12,20)
(24,20)
(08,55)
(12,55)
(24,55)
WRe
77.50 %
76.67 %
71.67 %
77.50 %
75.83 %
70.83 %
WRn
77.50 %
79.17 %
77.50 %
79.17 %
81.67 %
79.17 %
iGSINR
0.548 dB
0.680 dB
0.649 dB
-0.019 dB
0.105 dB
0.031 dB
GAIN
-4.130 dB
1.857 dB
-6.739 dB
-6.966 dB
-0.469 dB
-10.010 dB
LLR
0.241
0.270
0.224
0.307
0.383
0.240
sSNR
-0.415
-0.484
-0.833
-0.437
-0.611
-1.082
WSS
33.094
34.853
35.930
34.325
36.147
36.524
PESQ
3.418
3.305
3.077
3.106
3.111
2.843
Csig
4.608
4.494
4.395
4.342
4.249
4.232
Cbak
3.010
2.939
2.801
2.851
2.830
2.669
Covrl
3.990
3.872
3.705
3.697
3.649
3.504
Table B.36: This table contains the resulting measurements of the enhanced file of speaker 12 with s = 45 .
94
WRe
61.67 %
56.67 %
65.83 %
62.50 %
60.00 %
68.33 %
WRn
55.00 %
55.00 %
55.00 %
60.83 %
60.83 %
60.83 %
iGSINR
3.522 dB
3.566 dB
3.754 dB
3.004 dB
3.027 dB
3.085 dB
GAIN
3.509 dB
1.084 dB
0.510 dB
0.018 dB
-2.638 dB
-3.610 dB
LLR
0.450
0.488
0.489
0.566
0.663
0.791
sSNR
0.237
0.339
0.354
-0.448
-0.240
-0.135
WSS
41.097
40.327
39.715
44.022
42.541
38.537
PESQ
2.706
2.726
2.723
2.641
2.639
2.688
Csig
3.892
3.872
3.874
3.707
3.619
3.553
Cbak
2.655
2.676
2.680
2.560
2.582
2.641
Covrl
3.255
3.256
3.258
3.122
3.081
3.083
Table B.37: This table contains the resulting measurements of the enhanced file of speaker 03 with s = 0 .
DS-BF (Speaker 12, MRA = 45 )
(08,20)
(12,20)
(24,20)
(08,55)
(12,55)
(24,55)
WRe
37.50 %
36.67 %
43.33 %
41.67 %
43.33 %
50.00 %
WRn
33.33 %
35.00 %
33.33 %
33.33 %
30.83 %
33.33 %
iGSINR
-0.007 dB
-0.003 dB
0.027 dB
0.268 dB
0.302 dB
0.227 dB
GAIN
-4.647 dB
1.445 dB
-7.262 dB
-8.363 dB
-2.143 dB
-11.608 dB
LLR
0.408
0.547
0.450
0.476
0.718
0.704
sSNR
-0.298
-0.265
-0.112
-0.761
-0.680
-0.401
WSS
42.125
43.631
41.245
42.990
45.948
39.153
PESQ
2.832
2.803
2.841
2.728
2.626
2.720
Csig
4.001
3.827
3.972
3.861
3.524
3.657
Cbak
2.674
2.652
2.696
2.589
2.525
2.635
Covrl
3.369
3.265
3.362
3.245
3.019
3.149
Table B.38: This table contains the resulting measurements of the enhanced file of speaker 12 with s = 45 .
RLSFI-BF (Speaker 03, MRA = 0 )
(08,20)
(12,20)
(24,20)
(08,55)
(12,55)
(24,55)
WRe
62.50 %
62.50 %
60.00 %
64.17 %
61.67 %
59.17 %
WRn
55.00 %
55.00 %
55.00 %
60.83 %
60.83 %
60.83 %
iGSINR
4.003 dB
3.497 dB
2.209 dB
4.236 dB
4.491 dB
3.929 dB
GAIN
7.984 dB
6.328 dB
2.249 dB
6.054 dB
5.683 dB
4.217 dB
LLR
0.452
0.481
0.501
0.440
0.474
0.522
sSNR
-0.641
-0.444
-0.514
-1.117
-1.181
-1.206
WSS
47.804
43.547
43.152
50.387
49.547
50.943
PESQ
2.718
2.698
2.630
2.607
2.635
2.482
Csig
3.836
3.833
3.775
3.760
3.748
3.594
Cbak
2.558
2.591
2.557
2.457
2.472
2.388
Covrl
3.216
3.215
3.152
3.115
3.125
2.968
Table B.39: This table contains the resulting measurements of the enhanced file of speaker 03 with s = 0 .
RLSFI-BF (Speaker 12, MRA = 45 )
(08,20)
(12,20)
(24,20)
(08,55)
(12,55)
(24,55)
WRe
41.67 %
34.17 %
38.33 %
35.83 %
30.00 %
31.67 %
WRn
33.33 %
35.00 %
33.33 %
33.33 %
30.83 %
33.33 %
iGSINR
0.643 dB
0.393 dB
0.472 dB
0.047 dB
0.222 dB
-0.157 dB
GAIN
-10.379 dB
-0.341 dB
-8.014 dB
-7.156 dB
0.740 dB
-5.060 dB
LLR
0.371
0.445
0.443
0.402
0.474
0.429
sSNR
-0.649
-0.497
-0.512
-0.848
-1.056
-0.834
WSS
48.972
49.948
45.633
48.350
51.022
52.660
PESQ
2.604
2.563
2.709
2.691
2.530
2.547
Csig
3.841
3.731
3.860
3.866
3.672
3.714
Cbak
2.495
2.478
2.577
2.528
2.420
2.430
Covrl
3.158
3.080
3.229
3.216
3.031
3.056
Table B.40: This table contains the resulting measurements of the enhanced file of speaker 12 with s = 45 .
MNS-RLSFI-BF (Speaker 03, MRA = 0 )
(08,20)
(12,20)
(24,20)
(08,55)
(12,55)
(24,55)
WRe
60.83 %
62.50 %
60.00 %
57.50 %
61.67 %
59.17 %
WRn
55.00 %
55.00 %
55.00 %
60.83 %
60.83 %
60.83 %
iGSINR
3.937 dB
3.530 dB
2.262 dB
4.008 dB
4.479 dB
3.931 dB
GAIN
8.002 dB
6.322 dB
2.256 dB
6.062 dB
5.683 dB
4.214 dB
LLR
0.469
0.481
0.500
0.474
0.483
0.509
sSNR
-0.669
-0.430
-0.511
-1.134
-1.186
-1.205
WSS
48.301
43.984
43.535
50.882
49.231
50.842
PESQ
2.713
2.706
2.627
2.593
2.635
2.486
Csig
3.812
3.833
3.771
3.711
3.742
3.611
Cbak
2.550
2.593
2.553
2.446
2.474
2.391
Covrl
3.199
3.218
3.148
3.083
3.123
2.979
Table B.41: This table contains the resulting measurements of the enhanced file of speaker 03 with s = 0 .
MNS-RLSFI-BF (Speaker 12, MRA = 45 )
(08,20)
(12,20)
(24,20)
(08,55)
(12,55)
(24,55)
WRe
36.67 %
39.17 %
40.00 %
35.83 %
30.00 %
32.50 %
WRn
33.33 %
35.00 %
33.33 %
33.33 %
30.83 %
33.33 %
iGSINR
0.582 dB
0.399 dB
0.503 dB
0.055 dB
0.222 dB
-0.150 dB
GAIN
-10.478 dB
-0.378 dB
-7.954 dB
-7.123 dB
0.761 dB
-5.042 dB
LLR
0.397
0.436
0.444
0.472
0.481
0.431
sSNR
-0.635
-0.472
-0.510
-0.858
-1.057
-0.832
WSS
48.731
49.697
45.588
48.823
50.657
52.496
PESQ
2.606
2.613
2.736
2.660
2.535
2.558
Csig
3.817
3.772
3.876
3.772
3.671
3.719
Cbak
2.498
2.505
2.590
2.510
2.425
2.437
Covrl
3.147
3.126
3.250
3.152
3.034
3.065
Table B.42: This table contains the resulting measurements of the enhanced file of speaker 12 with s = 45 .
95
WRe
61.67 %
61.67 %
61.67 %
64.17 %
59.17 %
53.33 %
WRn
55.00 %
55.00 %
55.00 %
60.83 %
60.83 %
60.83 %
iGSINR
1.527 dB
1.344 dB
1.638 dB
2.413 dB
2.988 dB
3.780 dB
GAIN
-4.655 dB
-8.837 dB
-8.987 dB
-3.351 dB
-6.789 dB
-7.127 dB
LLR
0.663
0.731
0.723
0.805
0.925
1.078
sSNR
-0.349
-0.122
-0.148
-0.612
-0.324
-0.323
WSS
45.740
44.074
43.717
46.598
44.731
41.976
PESQ
2.559
2.558
2.510
2.477
2.492
2.454
Csig
3.542
3.486
3.469
3.338
3.241
3.086
Cbak
2.515
2.540
2.519
2.453
2.492
2.493
Covrl
2.995
2.970
2.939
2.849
2.813
2.723
Table B.43: This table contains the resulting measurements of the enhanced file of speaker 03 with s = 0 .
MPDRDL-BF (Speaker 12, MRA = 45 )
(08,20)
(12,20)
(24,20)
(08,55)
(12,55)
(24,55)
WRe
27.50 %
30.00 %
30.83 %
30.00 %
38.33 %
26.67 %
WRn
33.33 %
35.00 %
33.33 %
33.33 %
30.83 %
33.33 %
iGSINR
1.081 dB
0.882 dB
1.082 dB
0.347 dB
0.378 dB
0.227 dB
GAIN
-12.763 dB
-7.633 dB
-16.125 dB
-11.826 dB
-5.984 dB
-15.344 dB
LLR
0.551
0.754
0.584
0.660
0.926
0.978
sSNR
-0.481
-0.387
-0.265
-0.619
-0.584
-0.382
WSS
46.625
45.018
43.772
47.679
45.604
41.629
PESQ
2.605
2.596
2.619
2.526
2.506
2.495
Csig
3.677
3.477
3.678
3.508
3.241
3.217
Cbak
2.523
2.535
2.563
2.469
2.476
2.511
Covrl
3.083
2.982
3.097
2.956
2.818
2.810
Table B.44: This table contains the resulting measurements of the enhanced file of speaker 12 with s = 45 .
MPDRVL-BF (Speaker 03, MRA = 0 )
(08,20)
(12,20)
(24,20)
(08,55)
(12,55)
(24,55)
WRe
50.00 %
56.67 %
56.67 %
55.00 %
55.00 %
53.33 %
WRn
55.00 %
55.00 %
55.00 %
60.83 %
60.83 %
60.83 %
iGSINR
2.545 dB
3.144 dB
4.199 dB
3.070 dB
5.078 dB
6.118 dB
GAIN
8.265 dB
7.192 dB
5.892 dB
9.582 dB
9.218 dB
7.233 dB
LLR
0.521
0.594
0.604
0.681
0.811
0.878
sSNR
-0.544
-0.289
-0.099
-1.786
-1.543
-1.127
WSS
46.143
46.140
45.021
51.427
50.493
46.865
PESQ
2.427
2.400
2.502
2.248
2.275
2.332
Csig
3.605
3.513
3.575
3.285
3.176
3.174
Cbak
2.437
2.440
2.509
2.236
2.271
2.349
Covrl
2.958
2.899
2.984
2.695
2.657
2.693
Table B.45: This table contains the resulting measurements of the enhanced file of speaker 03 with s = 0 .
MPDRVL-BF (Speaker 12, MRA = 45 )
(08,20)
(12,20)
(24,20)
(08,55)
(12,55)
(24,55)
WRe
24.17 %
25.00 %
24.17 %
22.50 %
25.83 %
22.50 %
WRn
33.33 %
35.00 %
33.33 %
33.33 %
30.83 %
33.33 %
iGSINR
-0.024 dB
0.023 dB
0.055 dB
-0.116 dB
-0.227 dB
0.032 dB
GAIN
1.209 dB
8.391 dB
-1.378 dB
1.702 dB
9.851 dB
-0.770 dB
LLR
0.515
0.667
0.600
0.626
0.845
0.813
sSNR
-0.944
-1.083
-0.418
-2.108
-2.344
-1.435
WSS
45.786
47.383
44.627
49.662
52.235
46.453
PESQ
2.644
2.668
2.763
2.388
2.314
2.390
Csig
3.745
3.589
3.740
3.441
3.149
3.280
Cbak
2.518
2.509
2.616
2.295
2.227
2.361
Covrl
3.138
3.068
3.199
2.848
2.659
2.776
Table B.46: This table contains the resulting measurements of the enhanced file of speaker 12 with s = 45 .
GSC (Speaker 03, MRA = 0 )
(08,20)
(12,20)
(24,20)
(08,55)
(12,55)
(24,55)
WRe
60.00 %
56.67 %
64.17 %
58.33 %
57.50 %
52.50 %
WRn
55.00 %
55.00 %
55.00 %
60.83 %
60.83 %
60.83 %
iGSINR
3.522 dB
3.566 dB
3.754 dB
3.004 dB
3.027 dB
3.085 dB
GAIN
3.509 dB
1.084 dB
0.510 dB
0.018 dB
-2.638 dB
-3.610 dB
LLR
0.406
0.385
0.379
0.545
0.480
0.449
sSNR
-1.189
-1.131
-1.649
-0.767
-0.833
-1.354
WSS
43.852
43.003
44.148
45.651
44.962
46.309
PESQ
2.626
2.538
2.200
2.536
2.590
2.295
Csig
3.864
3.840
3.633
3.650
3.756
3.598
Cbak
2.507
2.475
2.273
2.478
2.505
2.322
Covrl
3.193
3.139
2.862
3.036
3.119
2.888
Table B.47: This table contains the resulting measurements of the enhanced file of speaker 03 with s = 0 .
GSC (Speaker 12, MRA = 45 )
(08,20)
(12,20)
(24,20)
(08,55)
(12,55)
(24,55)
WRe
32.50 %
30.83 %
28.33 %
37.50 %
32.50 %
36.67 %
WRn
33.33 %
35.00 %
33.33 %
33.33 %
30.83 %
33.33 %
iGSINR
-0.007 dB
-0.003 dB
0.027 dB
0.268 dB
0.302 dB
0.227 dB
GAIN
-4.647 dB
1.445 dB
-7.262 dB
-8.363 dB
-2.143 dB
-11.608 dB
LLR
0.390
0.407
0.353
0.540
0.596
0.404
sSNR
-1.087
-1.197
-1.506
-0.891
-1.123
-1.614
WSS
45.182
45.128
46.260
44.591
45.629
45.550
PESQ
2.755
2.699
2.438
2.629
2.603
2.346
Csig
3.946
3.896
3.783
3.721
3.638
3.682
Cbak
2.566
2.533
2.381
2.523
2.488
2.335
Covrl
3.295
3.243
3.052
3.122
3.064
2.957
Table B.48: This table contains the resulting measurements of the enhanced file of speaker 12 with s = 45 .
96
WRe
95.00 %
95.00 %
95.83 %
93.33 %
95.00 %
93.33 %
WRn
91.67 %
93.33 %
93.33 %
93.33 %
95.00 %
95.00 %
iGSINR
-0.264 dB
-0.254 dB
-0.309 dB
-0.550 dB
-0.753 dB
-0.958 dB
GAIN
-1.246 dB
-2.132 dB
-2.499 dB
-4.317 dB
-5.196 dB
-5.819 dB
LLR
0.224
0.252
0.279
0.239
0.289
0.359
sSNR
1.465
1.190
1.164
0.030
-0.034
0.023
WSS
27.499
26.002
25.177
34.998
32.407
30.810
PESQ
3.585
3.575
3.623
3.121
3.184
3.129
Csig
4.777
4.755
4.763
4.414
4.424
4.332
Cbak
3.247
3.236
3.263
2.883
2.927
2.915
Covrl
4.173
4.160
4.191
3.739
3.782
3.713
Table B.49: This table contains the resulting measurements of the enhanced file of speaker 03 with s = 300 .
RLSFI-BF (Speaker 03, MRA = 300 )
(08,20)
(12,20)
(24,20)
(08,55)
(12,55)
(24,55)
WRe
95.00 %
95.00 %
95.83 %
93.33 %
95.00 %
93.33 %
WRn
91.67 %
93.33 %
93.33 %
93.33 %
95.00 %
95.00 %
iGSINR
-0.264 dB
-0.254 dB
-0.309 dB
-0.550 dB
-0.753 dB
-0.958 dB
GAIN
-1.246 dB
-2.132 dB
-2.499 dB
-4.317 dB
-5.196 dB
-5.819 dB
LLR
0.224
0.252
0.279
0.239
0.289
0.359
sSNR
1.465
1.190
1.164
0.030
-0.034
0.023
WSS
27.499
26.002
25.177
34.998
32.407
30.810
PESQ
3.585
3.575
3.623
3.121
3.184
3.129
Csig
4.777
4.755
4.763
4.414
4.424
4.332
Cbak
3.247
3.236
3.263
2.883
2.927
2.915
Covrl
4.173
4.160
4.191
3.739
3.782
3.713
Table B.50: This table contains the resulting measurements of the enhanced file of speaker 03 with s = 300 .
MNS-RLSFI-BF (Speaker 03, MRA = 300 )
(08,20)
(12,20)
(24,20)
(08,55)
(12,55)
(24,55)
WRe
95.00 %
95.00 %
95.83 %
93.33 %
95.00 %
93.33 %
WRn
91.67 %
93.33 %
93.33 %
93.33 %
95.00 %
95.00 %
iGSINR
-0.264 dB
-0.254 dB
-0.309 dB
-0.550 dB
-0.753 dB
-0.958 dB
GAIN
-1.246 dB
-2.132 dB
-2.499 dB
-4.317 dB
-5.196 dB
-5.819 dB
LLR
0.224
0.252
0.279
0.239
0.289
0.359
sSNR
1.465
1.190
1.164
0.030
-0.034
0.023
WSS
27.499
26.002
25.177
34.998
32.407
30.810
PESQ
3.585
3.575
3.623
3.121
3.184
3.129
Csig
4.777
4.755
4.763
4.414
4.424
4.332
Cbak
3.247
3.236
3.263
2.883
2.927
2.915
Covrl
4.173
4.160
4.191
3.739
3.782
3.713
Table B.51: This table contains the resulting measurements of the enhanced file of speaker 03 with s = 300 .
MPDRDL-BF (Speaker 03, MRA = 300 )
(08,20)
(12,20)
(24,20)
(08,55)
(12,55)
(24,55)
WRe
92.50 %
90.83 %
91.67 %
93.33 %
93.33 %
89.17 %
WRn
91.67 %
93.33 %
93.33 %
93.33 %
95.00 %
95.00 %
iGSINR
0.650 dB
0.686 dB
0.773 dB
-0.639 dB
-0.712 dB
-0.882 dB
GAIN
-7.566 dB
-10.590 dB
-10.675 dB
-9.736 dB
-12.394 dB
-11.690 dB
LLR
0.314
0.390
0.402
0.361
0.488
0.672
sSNR
0.299
0.192
0.215
-0.011
-0.014
-0.017
WSS
35.050
35.380
34.169
38.756
36.760
34.570
PESQ
3.010
2.996
2.960
2.870
2.853
2.818
Csig
4.269
4.179
4.157
4.104
3.980
3.790
Cbak
2.846
2.830
2.823
2.734
2.739
2.738
Covrl
3.611
3.558
3.532
3.449
3.383
3.277
Table B.52: This table contains the resulting measurements of the enhanced file of speaker 03 with s = 300 .
MPDRVL-BF (Speaker 03, MRA = 300 )
(08,20)
(12,20)
(24,20)
(08,55)
(12,55)
(24,55)
WRe
92.50 %
91.67 %
90.00 %
92.50 %
92.50 %
91.67 %
WRn
91.67 %
93.33 %
93.33 %
93.33 %
95.00 %
95.00 %
iGSINR
-0.200 dB
-0.204 dB
-0.342 dB
-0.002 dB
-0.324 dB
-0.567 dB
GAIN
3.262 dB
3.912 dB
2.050 dB
4.237 dB
4.909 dB
3.125 dB
LLR
0.266
0.329
0.338
0.296
0.372
0.435
sSNR
1.126
1.082
1.315
-0.487
-0.521
-0.149
WSS
32.529
31.053
28.997
40.886
38.716
36.285
PESQ
3.233
3.244
3.324
2.801
2.831
2.839
Csig
4.476
4.431
4.489
4.109
4.068
4.031
Cbak
3.023
3.036
3.103
2.656
2.683
2.728
Covrl
3.833
3.820
3.894
3.411
3.411
3.402
Table B.53: This table contains the resulting measurements of the enhanced file of speaker 03 with s = 300 .
GSC (Speaker 03, MRA = 300 )
(08,20)
(12,20)
(24,20)
(08,55)
(12,55)
(24,55)
WRe
88.33 %
87.50 %
85.83 %
88.33 %
89.17 %
85.83 %
WRn
91.67 %
93.33 %
93.33 %
93.33 %
95.00 %
95.00 %
iGSINR
-0.264 dB
-0.254 dB
-0.309 dB
-0.550 dB
-0.753 dB
-0.958 dB
GAIN
-1.246 dB
-2.132 dB
-2.499 dB
-4.317 dB
-5.196 dB
-5.819 dB
LLR
0.182
0.207
0.193
0.209
0.231
0.207
sSNR
-0.817
-1.063
-1.652
-0.691
-0.906
-1.492
WSS
33.528
35.217
35.706
37.763
37.769
37.444
PESQ
3.350
3.164
2.922
2.972
2.987
2.844
Csig
4.625
4.471
4.335
4.330
4.317
4.258
Cbak
2.949
2.833
2.677
2.747
2.740
2.637
Covrl
3.963
3.789
3.597
3.615
3.616
3.515
Table B.54: This table contains the resulting measurements of the enhanced file of speaker 03 with s = 300 .
97
WRe
80.00 %
79.17 %
80.00 %
79.17 %
82.50 %
88.33 %
WRn
74.17 %
79.17 %
79.17 %
80.83 %
79.17 %
79.17 %
iGSINR
0.084 dB
0.352 dB
0.312 dB
0.290 dB
0.562 dB
0.382 dB
GAIN
2.080 dB
3.950 dB
3.874 dB
-0.991 dB
0.886 dB
0.614 dB
LLR
0.187
0.400
0.398
0.196
0.551
0.599
sSNR
1.416
0.990
1.054
0.042
-0.116
-0.021
WSS
24.999
29.768
29.644
31.661
33.205
31.552
PESQ
3.778
3.703
3.752
3.236
3.258
3.234
Csig
4.954
4.646
4.679
4.558
4.192
4.143
Cbak
3.354
3.258
3.286
2.962
2.952
2.958
Covrl
4.365
4.162
4.203
3.877
3.702
3.670
Table B.55: This table contains the resulting measurements of the enhanced file of speaker 12 with s = 120 .
RLSFI-BF (Speaker 12, MRA = 120 )
(08,20)
(12,20)
(24,20)
(08,55)
(12,55)
(24,55)
WRe
80.00 %
79.17 %
80.00 %
79.17 %
82.50 %
88.33 %
WRn
74.17 %
79.17 %
79.17 %
80.83 %
79.17 %
79.17 %
iGSINR
0.084 dB
0.352 dB
0.312 dB
0.290 dB
0.562 dB
0.382 dB
GAIN
2.080 dB
3.950 dB
3.874 dB
-0.991 dB
0.886 dB
0.614 dB
LLR
0.187
0.400
0.398
0.196
0.551
0.599
sSNR
1.416
0.990
1.054
0.042
-0.116
-0.021
WSS
24.999
29.768
29.644
31.661
33.205
31.552
PESQ
3.778
3.703
3.752
3.236
3.258
3.234
Csig
4.954
4.646
4.679
4.558
4.192
4.143
Cbak
3.354
3.258
3.286
2.962
2.952
2.958
Covrl
4.365
4.162
4.203
3.877
3.702
3.670
Table B.56: This table contains the resulting measurements of the enhanced file of speaker 12 with s = 120 .
MNS-RLSFI-BF (Speaker 12, MRA = 120 )
(08,20)
(12,20)
(24,20)
(08,55)
(12,55)
(24,55)
WRe
80.00 %
79.17 %
80.00 %
79.17 %
82.50 %
88.33 %
WRn
74.17 %
79.17 %
79.17 %
80.83 %
79.17 %
79.17 %
iGSINR
0.084 dB
0.352 dB
0.312 dB
0.290 dB
0.562 dB
0.382 dB
GAIN
2.080 dB
3.950 dB
3.874 dB
-0.991 dB
0.886 dB
0.614 dB
LLR
0.187
0.400
0.398
0.196
0.551
0.599
sSNR
1.416
0.990
1.054
0.042
-0.116
-0.021
WSS
24.999
29.768
29.644
31.661
33.205
31.552
PESQ
3.778
3.703
3.752
3.236
3.258
3.234
Csig
4.954
4.646
4.679
4.558
4.192
4.143
Cbak
3.354
3.258
3.286
2.962
2.952
2.958
Covrl
4.365
4.162
4.203
3.877
3.702
3.670
Table B.57: This table contains the resulting measurements of the enhanced file of speaker 12 with s = 120 .
MPDRDL-BF (Speaker 12, MRA = 120 )
(08,20)
(12,20)
(24,20)
(08,55)
(12,55)
(24,55)
WRe
78.33 %
79.17 %
79.17 %
80.83 %
83.33 %
83.33 %
WRn
74.17 %
79.17 %
79.17 %
80.83 %
79.17 %
79.17 %
iGSINR
-0.770 dB
-0.232 dB
-0.340 dB
0.299 dB
1.140 dB
1.118 dB
GAIN
-6.902 dB
-6.020 dB
-5.983 dB
-4.920 dB
-4.012 dB
-4.124 dB
LLR
0.292
0.642
0.673
0.302
0.800
1.082
sSNR
-0.025
-0.045
-0.008
-0.215
-0.136
-0.137
WSS
34.641
34.542
35.046
37.444
36.195
33.562
PESQ
3.220
3.213
3.117
2.987
2.939
2.873
Csig
4.423
4.059
3.965
4.246
3.717
3.410
Cbak
2.929
2.925
2.878
2.786
2.777
2.764
Covrl
3.794
3.610
3.514
3.582
3.297
3.118
Table B.58: This table contains the resulting measurements of the enhanced file of speaker 12 with s = 120 .
MPDRVL-BF (Speaker 12, MRA = 120 )
(08,20)
(12,20)
(24,20)
(08,55)
(12,55)
(24,55)
WRe
74.17 %
74.17 %
74.17 %
71.67 %
70.00 %
69.17 %
WRn
74.17 %
79.17 %
79.17 %
80.83 %
79.17 %
79.17 %
iGSINR
-0.594 dB
-0.178 dB
0.170 dB
-0.669 dB
-0.277 dB
-0.186 dB
GAIN
7.985 dB
11.559 dB
9.543 dB
9.020 dB
13.038 dB
10.686 dB
LLR
0.235
0.512
0.499
0.262
0.628
0.712
sSNR
2.614
2.254
2.243
-0.089
-0.450
0.106
WSS
27.260
32.082
31.524
35.497
36.708
35.207
PESQ
3.540
3.560
3.577
2.985
2.986
2.910
Csig
4.741
4.424
4.453
4.304
3.917
3.799
Cbak
3.300
3.253
3.264
2.807
2.776
2.785
Covrl
4.133
3.973
3.997
3.614
3.419
3.326
Table B.59: This table contains the resulting measurements of the enhanced file of speaker 12 with s = 120 .
GSC (Speaker 12, MRA = 120 )
(08,20)
(12,20)
(24,20)
(08,55)
(12,55)
(24,55)
WRe
75.83 %
75.00 %
70.83 %
70.00 %
71.67 %
60.83 %
WRn
74.17 %
79.17 %
79.17 %
80.83 %
79.17 %
79.17 %
iGSINR
0.084 dB
0.352 dB
0.312 dB
0.290 dB
0.562 dB
0.382 dB
GAIN
2.080 dB
3.950 dB
3.874 dB
-0.991 dB
0.886 dB
0.614 dB
LLR
0.199
0.256
0.229
0.218
0.337
0.252
sSNR
-0.619
-0.391
-0.798
-0.676
-0.593
-1.036
WSS
34.417
35.938
37.214
35.841
36.330
37.818
PESQ
3.399
3.123
2.850
3.024
3.103
2.451
Csig
4.628
4.390
4.241
4.370
4.290
3.972
Cbak
2.979
2.851
2.685
2.786
2.826
2.476
Covrl
3.987
3.726
3.510
3.666
3.665
3.174
Table B.60: This table contains the resulting measurements of the enhanced file of speaker 12 with s = 120 .
98
WRe
61.67 %
60.83 %
61.67 %
61.67 %
60.00 %
66.67 %
WRn
50.83 %
51.67 %
51.67 %
55.83 %
50.83 %
50.83 %
iGSINR
1.814 dB
1.831 dB
1.858 dB
2.098 dB
2.234 dB
2.342 dB
GAIN
-1.981 dB
-2.935 dB
-3.234 dB
-5.366 dB
-6.033 dB
-7.133 dB
LLR
0.375
0.389
0.390
0.447
0.511
0.571
sSNR
0.041
0.109
0.163
-0.586
-0.495
-0.330
WSS
44.421
42.008
40.832
45.927
42.760
39.328
PESQ
2.725
2.712
2.754
2.568
2.593
2.634
Csig
3.951
3.950
3.984
3.769
3.746
3.739
Cbak
2.628
2.643
2.675
2.503
2.543
2.597
Covrl
3.285
3.284
3.325
3.111
3.120
3.146
Table B.61: This table contains the resulting measurements of the enhanced file of speaker 03 with ms =
300 .
DS-BF (Speaker 12, MRA = 120 )
(08,20)
(12,20)
(24,20)
(08,55)
(12,55)
(24,55)
WRe
38.33 %
37.50 %
42.50 %
35.83 %
37.50 %
49.17 %
WRn
30.83 %
37.50 %
37.50 %
33.33 %
35.83 %
35.83 %
iGSINR
0.444 dB
0.493 dB
0.377 dB
1.541 dB
1.101 dB
1.524 dB
GAIN
1.314 dB
3.117 dB
3.080 dB
-1.914 dB
0.336 dB
-0.604 dB
LLR
0.324
0.500
0.476
0.361
0.757
0.791
sSNR
-1.333
-0.805
-0.768
-1.182
-0.773
-0.625
WSS
42.814
44.476
43.400
43.222
44.321
40.947
PESQ
2.839
2.809
2.842
2.692
2.655
2.727
Csig
4.086
3.872
3.927
3.956
3.516
3.555
Cbak
2.607
2.615
2.640
2.544
2.544
2.611
Covrl
3.414
3.288
3.335
3.274
3.034
3.097
Table B.62: This table contains the resulting measurements of the enhanced file of speaker 12 with s = 120 .
RLSFI-BF (Speaker 03, MRA = 300 )
(08,20)
(12,20)
(24,20)
(08,55)
(12,55)
(24,55)
WRe
61.67 %
52.50 %
46.67 %
55.83 %
50.83 %
46.67 %
WRn
50.83 %
51.67 %
51.67 %
55.83 %
50.83 %
50.83 %
iGSINR
-0.739 dB
0.526 dB
0.881 dB
3.603 dB
1.472 dB
2.395 dB
GAIN
-8.654 dB
-4.146 dB
-5.386 dB
-1.048 dB
-4.528 dB
0.694 dB
LLR
0.440
0.380
0.462
0.389
0.398
0.411
sSNR
-0.684
-1.405
-0.680
-0.799
-0.591
-0.859
WSS
52.412
48.716
50.281
51.991
54.053
55.447
PESQ
2.371
2.486
2.472
2.627
2.323
2.458
Csig
3.598
3.762
3.655
3.808
3.598
3.653
Cbak
2.357
2.393
2.421
2.475
2.329
2.367
Covrl
2.910
3.059
2.995
3.145
2.882
2.974
Table B.63: This table contains the resulting measurements of the enhanced file of speaker 03 with s = 300 .
RLSFI-BF (Speaker 12, MRA = 120 )
(08,20)
(12,20)
(24,20)
(08,55)
(12,55)
(24,55)
WRe
32.50 %
30.83 %
30.00 %
36.67 %
30.83 %
30.00 %
WRn
30.83 %
37.50 %
37.50 %
33.33 %
35.83 %
35.83 %
iGSINR
-0.130 dB
-0.480 dB
0.243 dB
1.132 dB
0.582 dB
0.469 dB
GAIN
-3.097 dB
3.515 dB
2.252 dB
2.301 dB
2.679 dB
7.562 dB
LLR
0.364
0.466
0.562
0.357
0.430
0.458
sSNR
-1.334
-1.341
-1.082
-1.275
-0.590
-1.324
WSS
46.782
47.842
48.443
51.536
52.237
53.780
PESQ
2.675
2.694
2.783
2.675
2.533
2.611
Csig
3.911
3.807
3.757
3.875
3.707
3.712
Cbak
2.501
2.502
2.557
2.472
2.442
2.422
Covrl
3.234
3.189
3.207
3.204
3.047
3.085
Table B.64: This table contains the resulting measurements of the enhanced file of speaker 12 with s = 120 .
MNS-RLSFI-BF (Speaker 03, MRA = 300 )
(08,20)
(12,20)
(24,20)
(08,55)
(12,55)
(24,55)
WRe
61.67 %
51.67 %
46.67 %
50.00 %
53.33 %
47.50 %
WRn
50.83 %
51.67 %
51.67 %
55.83 %
50.83 %
50.83 %
iGSINR
-0.984 dB
0.536 dB
0.887 dB
3.349 dB
1.288 dB
2.430 dB
GAIN
-8.508 dB
-4.140 dB
-5.387 dB
-1.000 dB
-4.457 dB
0.690 dB
LLR
0.468
0.369
0.465
0.414
0.398
0.407
sSNR
-0.723
-1.405
-0.678
-0.844
-0.616
-0.853
WSS
52.477
48.672
50.657
53.131
53.899
55.495
PESQ
2.360
2.488
2.468
2.603
2.290
2.455
Csig
3.562
3.776
3.647
3.759
3.579
3.655
Cbak
2.349
2.394
2.417
2.453
2.312
2.365
Covrl
2.887
3.067
2.989
3.106
2.856
2.974
Table B.65: This table contains the resulting measurements of the enhanced file of speaker 03 with s = 300 .
MNS-RLSFI-BF (Speaker 12, MRA = 120 )
(08,20)
(12,20)
(24,20)
(08,55)
(12,55)
(24,55)
WRe
31.67 %
29.17 %
30.00 %
35.83 %
30.83 %
30.00 %
WRn
30.83 %
37.50 %
37.50 %
33.33 %
35.83 %
35.83 %
iGSINR
-0.095 dB
-0.483 dB
0.243 dB
1.138 dB
0.606 dB
0.469 dB
GAIN
-2.963 dB
3.518 dB
2.238 dB
2.344 dB
2.733 dB
7.563 dB
LLR
0.392
0.457
0.565
0.376
0.450
0.456
sSNR
-1.390
-1.339
-1.073
-1.314
-0.608
-1.323
WSS
46.975
47.953
48.443
51.752
52.472
53.607
PESQ
2.668
2.692
2.781
2.668
2.512
2.614
Csig
3.876
3.815
3.753
3.849
3.672
3.717
Cbak
2.493
2.501
2.557
2.464
2.429
2.425
Covrl
3.212
3.192
3.204
3.187
3.018
3.089
Table B.66: This table contains the resulting measurements of the enhanced file of speaker 12 with s = 120 .
99
WRe
45.83 %
45.83 %
46.67 %
47.50 %
48.33 %
44.17 %
WRn
50.83 %
51.67 %
51.67 %
55.83 %
50.83 %
50.83 %
iGSINR
1.509 dB
1.053 dB
1.968 dB
1.577 dB
1.383 dB
2.641 dB
GAIN
-9.472 dB
-11.988 dB
-11.241 dB
-8.881 dB
-11.067 dB
-10.238 dB
LLR
0.543
0.620
0.635
0.626
0.745
0.979
sSNR
-0.154
-0.101
-0.114
-0.394
-0.258
-0.288
WSS
47.310
46.252
45.491
49.414
45.923
44.078
PESQ
2.376
2.358
2.289
2.255
2.231
2.219
Csig
3.541
3.461
3.410
3.365
3.258
3.027
Cbak
2.429
2.431
2.403
2.341
2.363
2.368
Covrl
2.897
2.852
2.793
2.743
2.687
2.570
Table B.67: This table contains the resulting measurements of the enhanced file of speaker 03 with s = 300 .
MPDRDL-BF (Speaker 12, MRA = 120 )
(08,20)
(12,20)
(24,20)
(08,55)
(12,55)
(24,55)
WRe
24.17 %
23.33 %
20.83 %
22.50 %
20.00 %
16.67 %
WRn
30.83 %
37.50 %
37.50 %
33.33 %
35.83 %
35.83 %
iGSINR
-0.635 dB
-0.494 dB
-0.584 dB
0.152 dB
0.231 dB
-0.288 dB
GAIN
-5.727 dB
-4.913 dB
-4.277 dB
-4.656 dB
-3.538 dB
-2.950 dB
LLR
0.470
0.819
0.843
0.486
0.992
1.264
sSNR
-0.838
-0.476
-0.501
-0.762
-0.485
-0.521
WSS
49.368
48.409
48.501
48.025
46.975
44.214
PESQ
2.403
2.366
2.298
2.435
2.376
2.260
Csig
3.615
3.241
3.174
3.629
3.082
2.757
Cbak
2.384
2.396
2.361
2.414
2.410
2.372
Covrl
2.943
2.740
2.672
2.970
2.670
2.457
Table B.68: This table contains the resulting measurements of the enhanced file of speaker 12 with s = 120 .
MPDRVL-BF (Speaker 03, MRA = 300 )
(08,20)
(12,20)
(24,20)
(08,55)
(12,55)
(24,55)
WRe
50.83 %
49.17 %
45.83 %
0.00 %
50.00 %
52.50 %
WRn
50.83 %
51.67 %
51.67 %
55.83 %
50.83 %
50.83 %
iGSINR
1.781 dB
2.964 dB
3.144 dB
1.861 dB
3.083 dB
3.614 dB
GAIN
3.729 dB
4.608 dB
2.897 dB
5.602 dB
6.127 dB
4.847 dB
LLR
0.476
0.532
0.535
0.560
0.649
0.738
sSNR
-1.201
-1.057
-0.603
-2.110
-1.908
-1.482
WSS
50.267
48.245
46.516
52.550
50.062
47.210
PESQ
2.363
2.386
2.456
2.146
2.207
2.194
Csig
3.576
3.550
3.605
3.338
3.305
3.232
Cbak
2.336
2.370
2.445
2.159
2.218
2.259
Covrl
2.901
2.905
2.972
2.667
2.688
2.652
Table B.69: This table contains the resulting measurements of the enhanced file of speaker 03 with s = 300 .
MPDRVL-BF (Speaker 12, MRA = 120 )
(08,20)
(12,20)
(24,20)
(08,55)
(12,55)
(24,55)
WRe
20.00 %
20.00 %
18.33 %
22.50 %
23.33 %
17.50 %
WRn
30.83 %
37.50 %
37.50 %
33.33 %
35.83 %
35.83 %
iGSINR
-0.411 dB
-0.335 dB
-0.184 dB
0.124 dB
0.184 dB
-0.104 dB
GAIN
8.105 dB
11.746 dB
9.751 dB
9.315 dB
13.070 dB
11.494 dB
LLR
0.438
0.664
0.655
0.477
0.879
0.984
sSNR
-2.475
-2.123
-1.703
-3.040
-2.747
-2.375
WSS
46.373
48.745
47.764
48.823
49.048
46.765
PESQ
2.623
2.588
2.634
2.422
2.464
2.371
Csig
3.807
3.532
3.578
3.623
3.233
3.089
Cbak
2.407
2.396
2.452
2.259
2.295
2.291
Covrl
3.157
2.996
3.045
2.958
2.784
2.672
Table B.70: This table contains the resulting measurements of the enhanced file of speaker 12 with s = 120 .
GSC (Speaker 03, MRA = 300 )
(08,20)
(12,20)
(24,20)
(08,55)
(12,55)
(24,55)
WRe
57.50 %
56.67 %
55.00 %
53.33 %
51.67 %
54.17 %
WRn
50.83 %
51.67 %
51.67 %
55.83 %
50.83 %
50.83 %
iGSINR
1.814 dB
1.831 dB
1.858 dB
2.098 dB
2.234 dB
2.342 dB
GAIN
-1.981 dB
-2.935 dB
-3.234 dB
-5.366 dB
-6.033 dB
-7.133 dB
LLR
0.360
0.382
0.374
0.433
0.463
0.392
sSNR
-1.225
-1.400
-2.093
-0.949
-1.154
-1.837
WSS
45.416
45.046
45.627
46.847
45.607
45.994
PESQ
2.652
2.563
2.301
2.498
2.497
2.327
Csig
3.913
3.839
3.685
3.733
3.712
3.679
Cbak
2.507
2.455
2.283
2.440
2.436
2.308
Covrl
3.227
3.146
2.936
3.056
3.048
2.944
Table B.71: This table contains the resulting measurements of the enhanced file of speaker 03 with s = 300 .
GSC (Speaker 12, MRA = 120 )
(08,20)
(12,20)
(24,20)
(08,55)
(12,55)
(24,55)
WRe
34.17 %
32.50 %
33.33 %
32.50 %
31.67 %
37.50 %
WRn
30.83 %
37.50 %
37.50 %
33.33 %
35.83 %
35.83 %
iGSINR
0.444 dB
0.493 dB
0.377 dB
1.541 dB
1.101 dB
1.524 dB
GAIN
1.314 dB
3.117 dB
3.080 dB
-1.914 dB
0.336 dB
-0.604 dB
LLR
0.337
0.395
0.378
0.380
0.572
0.424
sSNR
-1.501
-1.131
-1.658
-1.220
-1.029
-1.464
WSS
45.345
45.385
45.603
43.675
44.150
44.967
PESQ
2.752
2.629
2.475
2.681
2.633
2.313
Csig
3.998
3.863
3.786
3.925
3.694
3.646
Cbak
2.538
2.502
2.393
2.533
2.519
2.332
Covrl
3.320
3.190
3.073
3.252
3.111
2.924
Table B.72: This table contains the resulting measurements of the enhanced file of speaker 12 with s = 120 .
100
Beamforming
Abbreviations
ADC
BF
BW
CHiME
CPR
CPU
Cbak
C-BAK
Corvl
C-OVRL
Csig
C-SIG
DAC
DI
DI-BF
DD-BF
DIRHA
DOA
DS
GSC
LLR
LMS
MMSE
MNS-RLSFI
MPDR
MPDRDL
MPDRVL
MSR
MVDR
NLMS
OLA
PESQ
101
C Abbreviations
RLSFI
RMS
SNR
sSNR
TDOA
UCA
ULA
VAD
WRe
WRn
WSS
102
Beamforming
Symbols
c
t
T0
f
fs
fsa
fgl
gl
0
d
n
rn
k
k0
A0
y(t)
s(rn , t)
p(r, t)
P (k, w)
s
n
l
c
M L1
M L2
n
s
d
sound velocity
time variable
time interval
frequency variable
sampling frequency
spatial aliasing frequency
grating lobe frequency
grating lobe wave length
angular frequency
angular frequency of a propagating plane wave
distance between each microphone in case of a ULA or diameter of a UCA
delay of channel n
microphone position vector
wavevector
direction of propagation of a plane wave
amplitude of a plane wave
(mono-)output signal
signal captured by microphone n
acoustic pressure at position r in spatio-temporal domain
acoustic pressure at position r in wavevector-frequency domain
elevation
azimuth
azimuth angle of the desired source / steering direction
azimuth angle of microphone n
azimuth angle of looking direction
azimuth angle of competing source
azimuth angle of the main lobe of a ULA
azimuth angle of the main lobe of a ULA due to front-back ambiguity
elevation angle of microphone n
elevation angle of the desired source
angle resolution
103
D Symbols
h(r, t)
H(k, )
H(, , s )
Dn ()
Bn ()
N ()
R()
N
U (f )
Uopt (f )
M (f )
(f )
Gm
L
Lm
L(d)
SLnum
104