Вы находитесь на странице: 1из 8

speech recognition is the translation of spoken words into text.

%Recognition Accuracy
Word LPC+DTW With stored Speech 100 100 100 100 100 100 100 100 100 100 Real time Speech MFCC+DTW With stored Speech 100 100 100 100 100 100 100 100 100 100 Real time Speech MFCC+HMM With stored Speech 100 100 100 100 100 100 100 100 100 100 Real time Speech WAVELET+DTW With stored Speech 100 100 100 100 100 100 100 100 100 100 Real time Speech

Sunna Okkati Rendu Moodu Nalugu Idu Aru Edu Enamidi Tommidi

MFCC have been the dominant feature used for speech recognition for some time . Their success has been due to their ability to represent the Speech amplitude spectrum in a compact form. Each step in the process of cresting MFCC festure is motivated by perceptual or computation considerations. We examine these steps in more details in the following paragraphs. A more complete of the process and assumption

Figure shows the process of creating MFCC features. The first step is to divide the speech signal into frames, usually by applying a windowing function at intervals. The aim here is to model

small sections of the signal thet are statistically stationary. The window function, tipically a hamming window, removes edge effects. We generate a cepstral vector for each frame

Waveform

Feature Extraction Using Linear Predictive Coding


One of the most powerful signal analysis techniques is the method of linear prediction. LPC of speech has become the predominant technique for estimating the basic parameters of speech. It provides both an accurate estimate of the speech parameters and it is also an efficient computational model of speech. The basic idea behind LPC is that a speech sample can be approximated as a linear combination of past speech samples. Through minimizing the sum of squared differences (over a finite interval) between the actual speech samples and predicted values, a unique set of parameters or predictor coefficients can be determined. These coefficients form the basis for LPC of speech. The analysis provides the capability for computing the linear prediction model of speech over time. The predictor coefficients are therefore transformed to a more robust set of parameters known as cepstral coefficients.

Voice signal sampled directly from microphone, is processed for extracting the features. Method used for feature extraction process is Linear Predictive Coding using LPC Processor. The basic steps of LPC processor include the following : 1. Preemphasis: The digitized speech signal, s(n), is put through a low order digital system, to spectrally flatten the signal and to make it less susceptible to finite precision effects later in the signal processing. The output of the preemphasizer network is related to the input to the network, s(n) , by difference equation:

2. Frame Blocking: The output of preemphasis step, ~s (n) , is blocked into frames of N samples, with adjacent frames being separated by M samples. If x (n) l is the l th frame of speech, and there are L frames within entire speech signal, then

where n = 0,1,,N 1 and l = 0,1,,L 1 3. Windowing: After frame blocking, the next step is to window each individual frame so as to minimize the signal discontinuities at the beginning and end of each frame. If we define the window as w(n), 0 n N 1, then the result of windowing is the signal:

where 0 n N 1

Typical window is the Hamming window, which has the form

4. Autocorrelation Analysis: The next step is to auto correlate each frame of windowed signal in order to give

where the highest autocorrelation value, p, is the order of the LPC analysis. 5. LPC Analysis: The next processing step is the LPC analysis, which converts each frame of p + 1 autocorrelations into LPC parameter set by using Durbins method. This can formally be given as the following algorithm:

Solving the above recursively from i=1.2..p the LPC coefficient, m a , is given as

LPC advantages LPC provides good model of speech signal It is a Production based method LPC represents the spectral envelope by low dimension feature vectors Provides linear characteristics LPC leads to a reasonable source-vocal tract separation LPC is analytically tractable model The method of LPC is mathematically precise and straight forward to implement in either software in hardware LPC disadvantages The LP models the input signal with constant weighting for the whole frequency range. However human perception does not have constant frequency perception in the whole frequency range. A serious problem with the LPC is that they are highly correlated but it is desirable to obtain less correlated features for acoustic modeling. An inherent drawback of conventional LP is its inability to include speech specic a priori information in the modeling process.

Restrictions on Warping Function


Warping function F is a model of time-axis fluctuation in a speech pattern. Accordingly, it should approximate the properties of actual time-axis fluctuation. In other words, function F, when viewed as a mapping from the time axis of pattern A onto that of pattern B, must preserve linguistically essential structures in pattern A time axis and vice

versa. Essential speech pattern time-axis structures are continuity, monotonicity (or restriction of relative timing in a speech), limitation on the acoustic parameter transition speed in a speech, and so on. These conditions can be realized as the restrictions on warping function F or point c(k)=(i(k),j(k)).
1) Monotonic conditions: i (k-1) i (k) and j (k-1) j (k). 2) Continuity conditions: : i(k)- i(k-1) 1 and j(k)- j(k-1) 1.

Equation Equation

As a result of these two restrictions, the following relation holds between two consecutive points

(( ) ( ) ) ( ) ) {( ( ) (( ) ( ))
Equation

3) Boundary conditions: i (1) =1, j (1) =1, and i (K) =I, j (K) = J. 4) Adjustment window condition: | i (k)-j(k)|

Equation Equation

Where r is an appropriate positive integer, called window length. This condition corresponds to the fact that time-axis fluctuation in usual cases never causes too excessive timing difference.

5) Slope constraint condition:

Neither too steep nor too gentle a gradient should be allowed for warping function F because such deviations may cause undesirable time-axis warping. Too steep a gradient,

for example, causes an unrealistic correspondence between very short patterns A segment and a relatively long pattern B segment. Then, such a case occurs where a short segment in consonant or phoneme transition part happens to be in good coincidence with an entire steady vowel part. Therefore, a restriction called a slope constraint condition was set upon the warping function F, so that its first derivative is of discrete form. The slope constraint condition is realized as a restriction on the possible relation among (or the possible

configuration of) several consecutive points on the warping function, as is shown in Fig. 6.2(a) and (b). To put it concretely, if point c (k) moves forward in the direction of i (or j)axis consecutive m times, then point c (k) is not allowed to step further in the same direction before stepping at least n times in the diagonal direction. The effective intensity of the slope constraint can be evaluated by the following measure P = n/m.

Fig. Slope constraint on warping function

The larger the P measure, the more rigidly the warping function slope is restricted. Whenp = 0, there are no restrictions on the warping function slope. When p = (that is m =

0), the warping function is restricted to diagonal line j = i. Nothing more occurs than a conventional pattern matching no time normalization. Generally speaking, if the slope constraint is too severe, then time-normalization would not work effectively. If the slope constraint is too lax, then discrimination between speech patterns in different categories is degraded. Thus, setting neither a too large nor a too small value for p is desirable. Section IV reports the results of an investigation on an optimum compromise on p value through several experiments.

In Fig. 6.2(c) and (d), two examples of permissible point c (k) paths under slope constraint condition p = 1 are shown. The Fig. 6.2(c) type is directly derived from the above definition, while Fig. 6.2(d) is an approximated type, and there is another constraint. That is, the second derivative of warping function F is restricted, so that the point c (k) path does not orthogonally change its direction. This new constraint reduces the number of paths to be searched. Therefore, the simple Fig. 6.2(d) type is adopted afterward, except for the p = 0 case.

Вам также может понравиться