Академический Документы
Профессиональный Документы
Культура Документы
Department of Psychology,
The University of Melbourne, Melbourne, Australia
nba@unimelb.edu.au
I.
INTRODUCTION
ICICS 2009
Pre-processing
Voiced Speech
Detection
Features
Generation
Classification
Classification
result
II.
SPEECH DATA
A. SUSAS Database.
The Speech under Simulated and Actual Stress (SUSAS)
[13] database comprises a wide variety of acted and actual
stresses and emotions. Only speech recorded under actual stress
conditions was used in this study. The speech samples were
selected from the Actual Speech under Stress domain, which
includes speech recordings made by passengers during rides on
a roller-coaster. This domain consisted of recordings from 7
speakers (4 male and 3 female). The speakers were reading
words from the 35 word list. The amount of stress was
subjectively determined by the position of the roller-coaster
during the time when the recording was made. A total of 3179
speech recordings, including 1202 recordings representing the
high stress, 1276 recordings representing the moderate stress,
and 701 recordings representing the neutral speech, were used
in this study.
B. ORI Database.
A soundtrack of video recordings from the Oregon
Research Institute (ORI) [14] was used to select speech
samples for processing. The data included 71 parents (27
mothers and 44 fathers) video recorded while being engaged in
a family discussion with their children. During the discussion
the family was asked to discuss different problem solving tasks.
The videotapes were annotated by a trained psychologist based
on both speech and facial expressions and using the Living in
ORI
database
Number of samples
High
Low
Neu
stress
stress
tral
133
143
59
206
220
121
III.
SYSTEM STRUCTURE
ICICS 2009
E i =
1
N f Nt
Nf
Nt
y =1
x =1
s ( x, y )
(1)
Spectrogram
speech Calculation
12 Log-Gabor
Filters
Optimal Features
Selection
Averaging
Average Energy
Spectrogram
speech Calculation
CB, Bark
or ERB bands
........
Average Energy
12 Log-Gabor
Filters
Averaging
Optimal features
selection
B. Pre-processing
Both the SUSAS and ORI data sets were recorded in reallife noisy conditions. To reduce the background noise, a
wavelet-based method developed by Donoho [16] was applied.
Speech signals of length N and standard deviation were
decomposed using the wavelet transform with the mother
wavelet db2 up to the second level, and the universal threshold
= 2 log( N ) was applied to each wavelet sub-band. The
signal was then reconstructed using the inverse wavelet
transform (IWT). The voiced speech was extracted using a
rule based adaptive endpoint detection method [19].
C.
Upper
24.75
52.25
82.8
116.95
154.9
197.25
244.35
296.85
355.2
420.25
501.7
573.35
663.2
763.2
874.6
998.75
1136.9
1290.7
1462.05
1652.8
1865.35
2101.9
2365.35
2658.8
2985.45
3349.3
3754.5
Wavelet Packet
Bands [Hz]
Lower
Upper
0
250
250
500
500
750
750
1000
1000
1250
1250
1500
1500
1750
1750
2000
2000
2250
2250
2500
2500
2750
2750
3000
3000
3250
3250
3500
3500
3750
3750
4000
-
ICICS 2009
2
2 radial
(3)
m,n
m, n
(7)
m =1
N r N n=1
The averaged arrays were then converted to 1D vectors via
row-by-row concatenation.
F. Optimal Feature Selection Using Mutual Information
Criteria
The total set of NF feature vectors was reduced to a small
sub-set of Ns<NF vectors selected using the mutual
information (MI) feature selection algorithm [18]. MI
represents a measure of information found commonly in two
random variables X and Y, and it is given as:
p( x, y )
p( x, y) log p( x) p( y)
I ( X ;Y ) =
(9)
p (u | , , p) = pi bi ( )
(10)
i =1
IV.
)
( )
ICICS 2009
SUSAS:
wolves under
actual stress (3
stress levels)
SUSAS:
wolves under
actual stress (3
stress levels)
SUSAS: Mixed
vowels under
actual stress (3
stress levels)
SUSAS: words
under actual
stress (3 stress
levels)
ORI: natural
speech
sentences (5
emotions)
APIA%
Averaged
12 LogGabor
Filters
ERBSpectrograms
Wavelet
Packets
and
Log-Gabor
Filters
77.58
81.82
84.85
79.03
79.09
81.27
[4]
[5]
[6]
[7]
[8]
73.76
70.69
76.72
[9]
[10]
64.70
70.63
76.35
39.60
53.40
45.5
[11]
[12]
[13]
V.
CONCLUSIONS
[14]
[15]
[16]
[17]
[18]
[19]
ICICS 2009