Академический Документы
Профессиональный Документы
Культура Документы
%Recognition Accuracy
Word LPC+DTW With stored Speech 100 100 100 100 100 100 100 100 100 100 Real time Speech MFCC+DTW With stored Speech 100 100 100 100 100 100 100 100 100 100 Real time Speech MFCC+HMM With stored Speech 100 100 100 100 100 100 100 100 100 100 Real time Speech WAVELET+DTW With stored Speech 100 100 100 100 100 100 100 100 100 100 Real time Speech
Sunna Okkati Rendu Moodu Nalugu Idu Aru Edu Enamidi Tommidi
MFCC have been the dominant feature used for speech recognition for some time . Their success has been due to their ability to represent the Speech amplitude spectrum in a compact form. Each step in the process of cresting MFCC festure is motivated by perceptual or computation considerations. We examine these steps in more details in the following paragraphs. A more complete of the process and assumption
Figure shows the process of creating MFCC features. The first step is to divide the speech signal into frames, usually by applying a windowing function at intervals. The aim here is to model
small sections of the signal thet are statistically stationary. The window function, tipically a hamming window, removes edge effects. We generate a cepstral vector for each frame
Waveform
Voice signal sampled directly from microphone, is processed for extracting the features. Method used for feature extraction process is Linear Predictive Coding using LPC Processor. The basic steps of LPC processor include the following : 1. Preemphasis: The digitized speech signal, s(n), is put through a low order digital system, to spectrally flatten the signal and to make it less susceptible to finite precision effects later in the signal processing. The output of the preemphasizer network is related to the input to the network, s(n) , by difference equation:
2. Frame Blocking: The output of preemphasis step, ~s (n) , is blocked into frames of N samples, with adjacent frames being separated by M samples. If x (n) l is the l th frame of speech, and there are L frames within entire speech signal, then
where n = 0,1,,N 1 and l = 0,1,,L 1 3. Windowing: After frame blocking, the next step is to window each individual frame so as to minimize the signal discontinuities at the beginning and end of each frame. If we define the window as w(n), 0 n N 1, then the result of windowing is the signal:
where 0 n N 1
4. Autocorrelation Analysis: The next step is to auto correlate each frame of windowed signal in order to give
where the highest autocorrelation value, p, is the order of the LPC analysis. 5. LPC Analysis: The next processing step is the LPC analysis, which converts each frame of p + 1 autocorrelations into LPC parameter set by using Durbins method. This can formally be given as the following algorithm:
Solving the above recursively from i=1.2..p the LPC coefficient, m a , is given as
LPC advantages LPC provides good model of speech signal It is a Production based method LPC represents the spectral envelope by low dimension feature vectors Provides linear characteristics LPC leads to a reasonable source-vocal tract separation LPC is analytically tractable model The method of LPC is mathematically precise and straight forward to implement in either software in hardware LPC disadvantages The LP models the input signal with constant weighting for the whole frequency range. However human perception does not have constant frequency perception in the whole frequency range. A serious problem with the LPC is that they are highly correlated but it is desirable to obtain less correlated features for acoustic modeling. An inherent drawback of conventional LP is its inability to include speech specic a priori information in the modeling process.
versa. Essential speech pattern time-axis structures are continuity, monotonicity (or restriction of relative timing in a speech), limitation on the acoustic parameter transition speed in a speech, and so on. These conditions can be realized as the restrictions on warping function F or point c(k)=(i(k),j(k)).
1) Monotonic conditions: i (k-1) i (k) and j (k-1) j (k). 2) Continuity conditions: : i(k)- i(k-1) 1 and j(k)- j(k-1) 1.
Equation Equation
As a result of these two restrictions, the following relation holds between two consecutive points
(( ) ( ) ) ( ) ) {( ( ) (( ) ( ))
Equation
3) Boundary conditions: i (1) =1, j (1) =1, and i (K) =I, j (K) = J. 4) Adjustment window condition: | i (k)-j(k)|
Equation Equation
Where r is an appropriate positive integer, called window length. This condition corresponds to the fact that time-axis fluctuation in usual cases never causes too excessive timing difference.
Neither too steep nor too gentle a gradient should be allowed for warping function F because such deviations may cause undesirable time-axis warping. Too steep a gradient,
for example, causes an unrealistic correspondence between very short patterns A segment and a relatively long pattern B segment. Then, such a case occurs where a short segment in consonant or phoneme transition part happens to be in good coincidence with an entire steady vowel part. Therefore, a restriction called a slope constraint condition was set upon the warping function F, so that its first derivative is of discrete form. The slope constraint condition is realized as a restriction on the possible relation among (or the possible
configuration of) several consecutive points on the warping function, as is shown in Fig. 6.2(a) and (b). To put it concretely, if point c (k) moves forward in the direction of i (or j)axis consecutive m times, then point c (k) is not allowed to step further in the same direction before stepping at least n times in the diagonal direction. The effective intensity of the slope constraint can be evaluated by the following measure P = n/m.
The larger the P measure, the more rigidly the warping function slope is restricted. Whenp = 0, there are no restrictions on the warping function slope. When p = (that is m =
0), the warping function is restricted to diagonal line j = i. Nothing more occurs than a conventional pattern matching no time normalization. Generally speaking, if the slope constraint is too severe, then time-normalization would not work effectively. If the slope constraint is too lax, then discrimination between speech patterns in different categories is degraded. Thus, setting neither a too large nor a too small value for p is desirable. Section IV reports the results of an investigation on an optimum compromise on p value through several experiments.
In Fig. 6.2(c) and (d), two examples of permissible point c (k) paths under slope constraint condition p = 1 are shown. The Fig. 6.2(c) type is directly derived from the above definition, while Fig. 6.2(d) is an approximated type, and there is another constraint. That is, the second derivative of warping function F is restricted, so that the point c (k) path does not orthogonally change its direction. This new constraint reduces the number of paths to be searched. Therefore, the simple Fig. 6.2(d) type is adopted afterward, except for the p = 0 case.