Академический Документы
Профессиональный Документы
Культура Документы
a Microphone Array
Heungkyu Lee1, Jounghoon Beh2, June Kim3, and Hanseok Ko2
1
1 Introduction
This paper is motivated by the need of abnormal acoustic source detection capability
to compensate the detection performance of image sensor [1][2]. When the image
sensor rotates between left to right or vice versa, specific boundary region that is out
of camera view is not covered for security monitoring. Thus, abnormal intrusion of
suspicious person can be occurred using such a small defect. To cope with this issue,
microphone array technology can be used to locate an abnormal acoustic source and
obtain its coordinates. The abnormal acoustic source is defined as speech signal and
manually made acoustic signal. Using the microphone array technology, a Time
Difference Of Arrival (TDOA) computation between the signals of array is done as a
fist step. Because the accuracy of estimation for direction of arrival angle (DOA) is
especially poor on noisy outdoor environment, we employ the end-point detection
algorithm to detect the acoustic source greater than the pre-defined and adapted
threshold value. In this point, we can not know whether the detected acoustic source
is valid one or not. There are sounds of wind, rain, birds singing, thunder, and a
breaking wave as a natural one on outdoor environments. Meanwhile, manually made
acoustic source is various like sounds of speech, a footstep of the person, breaking a
steel barred window, and so on. Thus, to resolve this issue, we model the sounds of
natural environments using HMM (Hidden Markov Model), and then we verify the
detected acoustic source. By using the environmental sound models as anti-models,
we propose the out-of-normal acoustic rejection method based on N-best likelihood
ratio test (LRT). Figure 1 describes the overall process flow.
S. Zhang and R. Jarvis (Eds.): AI 2005, LNAI 3809, pp. 966 969, 2005.
Springer-Verlag Berlin Heidelberg 2005
967
(1)
where O is the observation sequence, Wk is the most likely acoustic type, and is
environmental sound models. Then, N-Best OONA rejection method based on subwords induced by likelihood ratio test (LRT) [4] as follows;
968
H. Lee et al.
LRT ( X ) =
P ( X / H 0 ) P (On / n )
=
P ( X / H 1 ) P (On / n )
(2)
where H0 and H1 means that hypothesis is true and false. is a given threshold
value. This equation can be changed for the N-best models given by
LRT (On ) =
1
ln
1 nBest
(3)
where the model 0 is an environmental sound model that has maximum likelihood
scores and m is environmental sound models that have N-best high likelihood
scores. The variable, nBest is the number of most likely sequences. Finally, the likelihood ratio is compared with given threshold value for verification task. If its value
is below the threshold, the candidate is considered as abnormal one because it proves
that the overall likelihood scores are similar and there is no corresponding model in
given acoustic model. Thus, we decide that the detected acoustic source is abnormal
one.
3 Experimental Results
For acoustic source input to detect an occurred acoustic sound, the sampling rate is
11Khz PCM. Acoustic signals are analyzed within 125ms frame with 10ms lapped
into 39th order feature vector that has 13th order MFCCs including log energy and
their 1st and 2st derivatives. The training data set is collected from the natural scene
and previously recorded waves. The total number of classes is five, and total recording time is about 3 hours. It is composed of sounds of wind, rain, birds singing,
rain and thunder, and a breaking wave. For testing a data, Aurora2 speech DB is
used.
The sound model is constructed using the left-to-right Continuous Hidden Markov
Model (CHMM) having 50 states and 16 mixtures. First, we evaluate the recognition
performance when we apply the environmental sound waves to the proposed system
in order to verify the training accuracy. In addition, this is to verify that the constructed acoustic model is robust even when the false alarm (environmental sounds) is
detected because the false alarm should be discarded. The result according to the
number of mixtures is shown in Table 1.
Table 1. Recognition performance to verify the training accuracy
No. of mixture
Recognition rate (%)
1
86.02
2
94.37
4
95.26
8
96.20
16
96.84
32
96.08
To evaluate the OONA rejection rate, the speech data using Aurora 2 DB is applied. In the proposed system, speech signal is considered as abnormal acoustic
source. As shown in Table 2, all of speech data is classified as speech data. That is,
all of speech data is decided as an abnormal acoustic sound. Some of environmental
sound type is considered as other type. But, this result does not affect the final result
969
because the all of environmental sounds are classified as valid class. From the result,
abnormal acoustic verification rate showed good performance when a speech signal is
detected. This result is due to the fact that feature vectors of speech signals using
MFCCs are very different with the one of environmental sounds. In addition, the
manually made acoustic signals also showed that that they are very different with the
environmental sounds even if the test data are not enough to evaluate.
Table 2. Confusion matrix (TND: Total Number of Data, ACC:Accuracy, %)
Speech
Beach
Bird
Rain
Thunder
Wind
Speech
1064
0
0
0
0
0
Beach
0
268
5
2
1
0
Bird
0
1
18
0
0
0
Rain
0
0
0
178
26
0
Thunder
0
0
0
0
3
0
Wind
0
0
0
0
0
15
TND
1064
269
23
180
30
15
ACC
100
99.6
78.3
98.9
10.0
100
4 Conclusions
In order to verify whether the detected source is valid or not, we proposed the out-ofnormal acoustic rejection method based on the N-best likelihood ratio test using natural environmental sound models. From the result, the verification rate of abnormal
acoustic source showed good performance when the speech signal is detected.
Acknowledgements
This work was supported by grant No. 10012805 from the Korea Institute of Industrial Technology Evaluation & Planning Foundation.
References
[1] J.A. Cadzow, "Multiple source location-the signal subspace approach," IEEE Trans. On
Signal Processing, Vol. 38, Issue 7, pp. 1110-1125, July 1990.
[2] C.H. Knapp, and G.C. Carter, The Generalized Correlation Method for Estimation of
Time Delay, IEEE Trans. On Speech and Signal Processing, Vol. ASSP-24, n.4, August
1976.
[3] C.E. Mokbel and G. F. A. Chollet, Automatic word recognition in cars, IEEE Trans.
Speech and Audio Processing, vol 3, pp. 346-356, Sept 1995.
[4] E. Lleida, and R.C. Rose, Utterance verification in continuous speech recognition, IEEE
Trans. On Speech and Audio Processing, Vol. 8, March 2000.