Академический Документы
Профессиональный Документы
Культура Документы
Tang C, Li M (2019) Speech Recognition in High Noise Environment. Ekoloji 28(107): 1561-1565.
© Foundation Environmental Protection & Research-FEPR Received: 6 Dec 2017 / Accepted: 17 Sep 2018
Tang and Li
The MFCC speech feature analysis is based on the observation sequence, and finally obtains an optimized
way the human ear accepts the external voice to design model parameter M. Thereby, an optimized training
the sound collection method of the computer system model of the speech is obtained by training the input
through the linear relationship between the height of speech for the recognition of the subsequent input
the sound and the frequency. Through such a bionic speech.
mode simulation system, the voice information closest
The algorithm Viterbi is used as the speech input
to the human ear is received for analysis. This reception
recognition algorithm in the system and is described as
characteristic is more in line with the analog value of the
follows: First create an array 𝑎𝑎′𝑡𝑡 (𝑗𝑗) for each state of the
auditory characteristic of the Mel frequency size. The
calculation process of MFCC coefficient is shown in the training. The initial state S1 corresponds to an array
variable of 𝑎𝑎′0 (1), initialized to 1, and other state values
Fig. 2.
initialized to 0; The state value at this moment is
In the system, we divide a set of voice bands into a calculated by the symbol sequence 𝑜𝑜𝑡𝑡 at time t and
series of triangular filter voice band sequences according written into the array 𝑎𝑎′𝑡𝑡 (𝑗𝑗). 𝑎𝑎𝑖𝑖𝑖𝑖 = 0 if there is no
to the critical band to form a mel filter group. transition in the state; Create an array of state records,
use this array 𝑎𝑎′𝑡𝑡 (𝑗𝑗). to save each time making the largest
state I; Finally, the state array is output when the state
VOICE TRAINING AND RECOGNITION
transitions, which is the best state sequence to be
The pre-processed speech signal is extracted by
obtained. This is the best recognition of the input voice
frame to form a series of feature vector sequences, and
of the environment.
these feature vectors are formed. The sequence can be
used for speech training and recognition.
ANTI-NOISE IMPROVEMENT
In recent years, the hidden Markov model (HMM) In order to improve the recognition rate of speech
(Wu et al. 2009) based on the Baum-welch algorithm information in complex environment such as in
has become one of the main models for the automobiles, many anti-noise recognition techniques
determination of the driving state of traffic vehicles. have been introduced into speech recognition system.
Hidden Markov process is a double stochastic process: The existing popular speech anti-noise technologies
a system used to describe short-term stationary includes: noise compensation method (Zhou et al.
segments of non-stationary signals. Another heavy 2013), speech extension method, feature removal anti-
stochastic process describes how each short-term noise method, noise extraction and speech-insensitive
stationary segment transitions to the next short-term method. Every method has its certain circumstance, but
stationary segment, i.e. the dynamic characteristics of they are not the ideal way to handle complex
short-term statistical features. Based on these two environment. Therefore, we optimized our anti-noise
stochastic processes, the HMM can effectively solve the algorithms in this system.
problem of how to identify short-term stationary signal
Algorithm that we use in this system is based on
segments with different parameters and how to track the
characteristic-abandon algorithm and speech
conversion between them.
enhancement algorithm. With a focus on speech
Algorithm Baum-Welch for solving parameter enhancement algorithms for voice wave includes noise
estimates of hidden Markov models in speech training. filtering, the effective elimination of broadband noise.
It maximizes P by determining an M model parameter Then, we gave up with voice features to give up the
by the algorithm by selecting an observation sequence child with the help of noise pollution algorithm in the
A in the parameter values. It is determined by M that feature, leaving pure son-ban speech characteristics.
the maximum value of P is a functional extreme value The algorithm provides a speech piece model feature
problem. Algorithm Baum-Welch uses a recursive abandoned, which is part of the voice message of noise
stepwise extremum to locally amplify P in the pollution. Thus, the method features may be applied to
abandon the whole band speech signal contaminated by 𝑃𝑃� (𝜔𝜔) =𝑃𝑃 (𝜔𝜔)−𝜂𝜂𝑃𝑃�𝑛𝑛(𝜔𝜔),𝑃𝑃𝑥𝑥(𝜔𝜔)−𝜂𝜂𝑃𝑃�𝑛𝑛(𝜔𝜔)≥𝜇𝜇𝑃𝑃𝑥𝑥 (𝜔𝜔)
� 𝑃𝑃�𝑠𝑠(𝜔𝜔)0=𝜇𝜇𝑃𝑃
𝑥𝑥
�𝑛𝑛(𝜔𝜔)<𝜇𝜇𝑃𝑃𝑥𝑥 (𝜔𝜔) (5)
𝑠𝑠 0 𝑥𝑥 (𝜔𝜔),𝑃𝑃𝑥𝑥 (𝜔𝜔)−𝜂𝜂𝑃𝑃
noise. Since their different advantages, scenes and
complementary anti-noise process between the two The repetitive filter improvements formula such as
algorithms, their combination may be better way. The formula (6).
combination will inherit advantages and abandon the
𝑃𝑃�𝑠𝑠 (𝜔𝜔)𝑖𝑖
defects of the two algorithms. 𝐻𝐻(𝜔𝜔)𝑖𝑖 = ,𝑃𝑃� (𝜔𝜔)𝑖𝑖 −𝜆𝜆𝑃𝑃�𝑛𝑛(𝜔𝜔)≥𝜓𝜓𝑃𝑃𝑥𝑥 (𝜔𝜔)
𝑝𝑝�𝑠𝑠 (𝜔𝜔)𝑖𝑖 +𝑃𝑃�𝑛𝑛(𝜔𝜔) 𝑠𝑠
� 𝜓𝜓𝑃𝑃𝑥𝑥 (𝜔𝜔) (6)
𝐻𝐻(𝜔𝜔)𝑖𝑖 = ,𝑃𝑃� (𝜔𝜔)𝑖𝑖 −𝜆𝜆𝑃𝑃�𝑛𝑛(𝜔𝜔)<𝜓𝜓𝑃𝑃𝑥𝑥 (𝜔𝜔)
We have mainly improved the repeated Wiener 𝜓𝜓𝑃𝑃𝑥𝑥 (𝜔𝜔)+𝑃𝑃�𝑛𝑛(𝜔𝜔) 𝑠𝑠
filtering speech enhancement method in the application
In our new model, we combine two complementary
of speech enhancement method. We use a Wiener
models: the optimized repeating Wiener filter model
filtering algorithm based on a priori SNR estimation
and Characteristics abandon model. During speech
proposed by Scalart et al. For traditional repeater
recognition, the new model will first use a repeated
Wiener filters are often used for speech enhancement
Wiener filter to filter the noisy speech. After the current
under additive noise conditions. The noise model is
frame data signal processed in weighting processing on
shown in formula (4).
a silent data. Accepting the disable MFCC speech
𝑥𝑥(𝑡𝑡) = 𝑠𝑠(𝑡𝑡) + 𝑛𝑛(𝑡𝑡) (4) feature, we use it as an input to the feature
abandonment algorithm model. Finally, we use the
In formula (4), x(t) represents a noisy speech signal,
feature abandonment algorithm model to recognize
S(t) represents a clean signal without noise, and n(t)
speech. After experimental verification, the sound noise
represents a full noise signal in speech (Gao et al. 2012).
has been greatly improved.
We have improved the traditional repetitive Wiener
filter and recycled linear prediction analysis to estimate CONCLUSION
𝑃𝑃�𝑠𝑠 (𝜔𝜔)𝑖𝑖+1 for enhanced speech. The object of this research is speech recognition in
The improvement was modified as follows: high noise in complex environments such as in
automobile. We mainly improve and optimize the anti-
(1) Subtracting 𝛼𝛼𝑃𝑃�𝑛𝑛 (𝜔𝜔)(𝑎𝑎 > 1) at a time frame noise and noise reduction capabilities of the system
with a higher spectral amplitude can better highlight the based on the ordinary speech recognition system.
speech spectrum, suppress pure tone noise, and Through such improvement measures, the speech
improve noise reduction performance. The current recognition system can be identified in a noisy
frame is taken out from the Wiener filtered signal, and environment. In the improvement, the combination of
the frame data is weighted by using the data of the the discard feature method denoising and the enhanced
previous frame of the silent segment to achieve dynamic speech method denoising in the anti-noise technology
updating of the noise spectrum of the silent segment. is adopted. This new method effectively eliminates the
influence of noise on the speech recognition system,
(2) Since the distortion of the filtered spectrum is
and greatly improves the recognition rate and accuracy
largely due to the difference between the noise power
of the speech recognition system under noisy
spectrum 𝑃𝑃𝑛𝑛 (𝜔𝜔) and its estimated value 𝑃𝑃�𝑛𝑛 (𝜔𝜔), the environment.
framed averaging of the noisy speech spectrum 𝑋𝑋(𝜔𝜔)
can reduce the filtered distortion. ACKNOWLEDGEMENTS
The Initialization improvements formula such as Project Supported by Scientific and Technological
formula (5). Research Program of Chongqing Municipal Education
Commission (No. kj1603810, kj1737456).
REFERENCES
Gao LY, Zhu W, Sang ZX, Mi L (2012) Speech enhancement algorithm based on improved spectral subtraction.
Modern Electronics Technique, 35(17): 60-62.
Hu ZQ, Zeng YM, Zong Y, Li MC (2014) Improvement of MFCC parameters extraction in speaker recognition.
Computer Engineering and Application. Computer Engineering and Applications, 50(7): 217-220.
Jiang JZ, Zhang DF, Zhang LH (2012) Speech enhancement algorithm for high noise environment. Computer
Engineering and Applications. Computer Engineering & Applications, 49(20): 222-225.
Lee CC, Mower E, Busso C, Lee S, Narayanan S (2009) Emotion recognition using a hierarchical binary decision
tree approach. Speech Communication, 53(9): 1162-1171.
Moataz EA, Mohamed SK, Fakhri K (2011) Survey on speech emotion recognition: Features, classification
schemes, and databases. Pattern Recognition, 44(2011): 572-587.
Wu SQ, Tiago HF, Chan WY (2009) Automatic speech emotion recognition using modulation spectral features.
International Conference on Digital Signal, 53(5): 768-785.
Zhou YH, Li FL, Tong F (2013) The microphone array speech enhancement and HMM recognition joint
processing in noise environment. NCMMSC’ 2013, 47(6): 564-576.