Академический Документы
Профессиональный Документы
Культура Документы
Convention Paper
Presented at the 130th Convention
2011 May 13–16 London, UK
The papers at this Convention have been selected on the basis of a submitted abstract and extended precis that have
been peer reviewed by at least two qualified anonymous reviewers. This convention paper has been reproduced from
the author’s advance manuscript, without editing, corrections, or consideration by the Review Board. The AES takes
no responsibility for the contents. Additional papers may be obtained by sending request and remittance to Audio
Engineering Society, 60 East 42nd Street, New York, New York 10165-2520, USA; also see www.aes.org. All rights
reserved. Reproduction of this paper, or any portion thereof, is not permitted without direct permission from the
Journal of the Audio Engineering Society.
ABSTRACT
The diffuse-field correlation of the two signals generated by a stereophonic microphone setup has an effect
on the perception of spatial width. A correlation meter is often used to measure the correlation coefficient.
However, due to the frequency dependence of the correlation function, the correlation coefficient is not an
appropriate value for predicting the perceived width when it comes to time-delay stereophony.
By using the newly defined “Diffuse-Field Image Predictor” (DFI Predictor) presented in this paper, an
attempt is made to reliably predict perceived width. Listening tests show that the DFI Predictor is fairly
suitable for this task. The aim of the study is to compare the spatial properties of different stereophonic
microphone techniques by one calculated value.
Magnitude
fields [6]. The correlation of the diffuse sound has 0
Correlation
0
preference for a specific recording technique. The
-0.5
diffuse-field correlation of spaced microphones is de-
-1
0 0.002 0.004 0.006 0.008 0.01 0.012 0.014 0.016 0.018 0.02
pendent on frequency. Thus, the correlation coeffi- Time in [s]
0.8
0.6
0.4
0.2
Coherence
-0.2
-0.4
-0.6
A/B with omnis with 10 cm spacing
-0.8 A/B with omnis with 50 cm spacing
X/Y with cardioids and 90° offset angle
ORTF with supercardioids
-1 0 1 2 3 4
10 10 10 10 10
Frequency in [Hz]
Fig. 2: Coherence functions of different stereo microphone setups in a diffuse sound field.
The advantage of the coherence function is that you degree of coherence. Furthermore, the coherence
have the “similarity” of the input signals at every function is not dependent on the power spectrum
frequency. In [9] the coherence function is defined density of the input signals. In [10] it is shown that
as: low frequencies have a strong effect on the spatial
2
|Pxy (f )| impression. In figure 2 the coherence function for
Cxy (f ) = (3)
Pxx (f ) · Pyy (f ) different stereo microphone setups in a diffuse field
is shown.
with Pxy (f ) being the cross power density spectrum
An A/B setup with omnidirectional microphones
and Pxx (f ) and Pyy (f ) being the power density spec-
with a spacing of 10 cm is almost completely mono
tra. This type of coherence function is called the
below 400 Hz. For coincident setups the signal cor-
magnitude squared coherence. In [6] another ver-
relation is the same for every frequency. In this case,
sion of the coherence function is defined (see figure
the correlation coefficient leads to a better conclu-
2):
Pxy (f ) sion on signal correlation.
γxy (f ) = p (4)
Pxx (f ) · Pyy (f )
again with Pxy (f ) being the cross power density
spectrum and Pxx (f ) and Pyy (f ) being the power 3. THE DFI PREDICTOR
density spectra. The DFI Predictor is based on the complex coher-
The complex coherence function γxy (f ) is able to ence function for microphones in the diffuse sound
describe the phase shift between the two input sig- field. A weighting function χ(f ) is applied which
nals. In a nutshell it can be said that the complex describes a 3 dB per octave attenuation of the co-
coherence function γxy (f ) denotes a correlation coef- herence function (see figure 3). By summing up the
ficient for every frequency. Thus, the “similarity” of weighted coherence function, the DFI Predictor rep-
two broadband input signals can be described more resents the frequency-dependent correlation of mi-
precisely than with the correlation coefficient or the crophones in a diffuse field with a single value.
0.5
15
Coherence
0
10 γnoise1,noise2 (f )
-0.5
γ (f )
noise1,noise3
5
Magnitude in [dB]
-1 0 1 2 3 4
10 10 10 10 10
Frequency in [Hz]
0
Coherence function of the two noise signals noise-left, noise-right and the
theoretical coherence at low frequencies
-5 1
0.5
-10
Coherence
0
-15
-0.5 γnoiseL,noiseR (f )
γxy (f )
-20 0 1 2 3 4 -1 0
10 10 10 10 10 10 10
1
10
2
10
3
10
4
Frequency in [Hz] Frequency in [Hz]
Fig. 3: Weighting of the coherence function. Fig. 4: Top diagram: Coherence of the initial noise
signals. Bottom diagram: Result for an A/B setup
with omnis at 0.2 m distance. DFI Predictor =
It is defined as: 0.2957
f =6000Hz
1 X
DF I = · [γxy (f ) · χ(f )]2 (5) noise signals were separately convolved with the dif-
n
f =100Hz fuse part of a mono room impulse response. To iso-
late the diffuse part of the impulse response, the first
with n being the FFT length, γxy (f ) being the com- 100 ms were cut off.
plex coherence function and χ(f ) being the weight- After the convolution a stereo diffuse room impulse
ing function. response is created which has a defined coherence.
In [11], [12] a frequency range from 40 Hz to 1.5 kHz This impulse response is then convolved with a dry
was used to calculate the DFI Predictor. However, mono recording. The result is the stimulus. Figure
the frequency range most likely has to match the 4 shows an example of a noise signal with a defined
stimulus used in the listening test. coherence. The subjects were listening to the stim-
In the listening test described below, the stimulus uli via headphones. The test software used for the
was female speech. The effect on the results of the listening test is based on a MUSHRA test (see fig-
listening test are shown in chapter 4. The DFI Pre- ure 5). After a short introduction the subjects were
dictor can be considered as a correlation coefficient. able to run the listening test with the test software
A small DFI Predictor defines a low correlation in by themselves.
the frequency range 100 Hz to 6 kHz. A high DFI
Predictor value will indicate a high correlation in
that frequency range.
The definition of the DFI Predictor is a first ap-
proach until further studies can be performed to
check and refine this definition.
4. LISTENING TEST
The stimuli for the listening test were created in
Matlab. The aim was to create different stimuli with
a defined coherence. The basis is an arbitrary stereo
microphone setup and its theoretical coherence func-
tion in the diffuse sound field [6]. Fig. 5: The test software was specially designed for
Based on this coherence function two noise signals the listening test
are created having approximately the same coher-
ence as predicted by this calculation. These two In this MUSHRA-like test design, the subjects com-
0cm/45°
0
2cm/30°
0cm/45°
0 0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9 1
10cm
-1
0cm/45°
0cm/45°
-2
-3
2cm
narrow
0cm/45°
-4
DFI-Predictor
Fig. 6: Relationship between the DFI Predictor and the perceived spatial width.
pared the stimuli to the reference and all other stim- reference were perceived as narrower and vice versa.
uli. A stimulus that is perceived as wider gets a The hidden reference was well recognized by all sub-
higher quantitative rating on a scale from -4 to 4. jects and as expected was positioned in the middle of
If no difference is perceived between a stimulus and the scale. These results already show that the DFI
the reference, the rating is zero. Predictor can be used to predict perceived width.
In the perfect case there should be a gradual char- At the moment the definition of the DFI Predictor
acteristic from narrow to wide after quantifying all is fairly rough. In further studies the stimuli for the
stimuli. The reference in the listening test was an listening test should be expanded with regard to fre-
X/Y setup with cardioids and a 90-degree offset an- quency range to improve the significance of the DFI
gle (DFI Predictor = 0.5607). The microphone se- Predictor.
tups simulated for the listening test are shown in Figure 7 shows the results of the listening test plot-
figure 6. A shuffle function was integrated in the ted against the correlation coefficient of the two noise
test software. As a result, every subject listened to signals used for the decorrelation of the impulse re-
a different order of stimuli. The dry recording used sponse. Depending on whether the diffuse sound
for all stimuli was female speech (SQAM-CD). Eight field was simulated with pink noise or white noise,
subjects participated in the listening test. the results vary. The graphs show that the corre-
The results of the listening test are also shown in lation coefficient is less able to lead to conclusions
figure 6. All curves show a similar behavior. All about perceived spatial width.
stimuli with a smaller DFI Predictor value than the
4 4
3 3
2 2
Quantitation
Quantitation
1 1
0 0
0 0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9 1 0 0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9 1
-1 -1
-2 -2
-3 -3
-4 -4
Correlation coefficient (determined with white noise) Correlation coefficient (determined with pink noise)
Fig. 7: The results of the correlation coefficient vary, depending on whether the diffuse sound field was
simulated with pink or white noise.