Академический Документы
Профессиональный Документы
Культура Документы
average squared values of the filter output. In analogue representations this is the sampled value of the envelope. An estimate is produced generally 50 times every second and decision is made on whether the speech in this segment is voiced or unvoiced. Voiced sounds are the /a/ /e/ /o/ and unvoiced sounds are the /s/ /f/. Voiced sounds have periodic structure as you can see in the short-time spectrum in the figure below. In fact it is not exactly periodic but quasi-periodic the structure of voiced sounds. This nearly-periodicity of speech is may be the cause vocal chords vibration. The shape of the spectral envelope that can fit the short time spectrum of voiced speech is related with the transfer characteristics of the vocal tract and the spectral tilt due to the glottal pulse. The envelope has peaks, which are the resonant modes of the vocal tract. There are three to five resonants below 5 kHz. The first three usually appear below 3 kHz and they are very important in speech synthesis and perception. These peaks sometimes are called formants. The period of the fundamental harmonic is called pitch period. Unvoiced sounds have a noise like structure.
more energy since its sample have larger amplitudes while unvoiced speech has higher frequencies and consequently crossing the x=0 axis more often than voiced speech. Both, voiced and unvoiced have average values close to zero. After the analysis is done at the transmitter the parameters including this decision and the pitch period are sent to the receiver The speech at the transmitter is sampled at 8000 samples per second and broken into 180 segments that is 22.5 ms of speech per segment. To estimate the pitch period LPC-10 uses an algorithm that is known as the Average Magnitude Difference Function (AMDF) Now, to summarize the parameters that need to be transmitted are the voice/unvoiced indication, the pitch period and the filter coefficients of the vocal tract filter.
Only the index to excitation codebook is transmitted. Because ten bits would be able to index 1024 sequences it is very difficult to encode 1024 sequences every 0.625 ms. G.728 algorithm uses a product codebook where each excitation sequence is represented by a normalized excitation sequence and a gain. The final excitation is the product of the normalized excitation and the gain. Of the ten bits 7 bits are used as index to a codebook with 127 sequences and three bits are used to encode the gain. G.728 Encoder Operation For each input segment the encoder finds the one from the 1024 codevectors that minimizes a frequency-weighted mean squared error. The codebook with 1024 candidate codevectors is available to the encoder. The search is done on the 1024 codebook vectors stored in the excitation codebook. A group of five consecutive samples, which are taken at 125 s intervals, is called a vector or codevector of that signal. A group of four vectors is one adaptation cycle. Each codevector is passed through a gain scaling unit and a synthesis filter. The ten-bit codebook index of the corresponding best codevector is transmitted to the decoder. The excitation gain, the synthesis filter coefficients and perceptual weighting will be periodically updated. G.728 Decoder Operation This also operated on a segment-by-segment basis. When the decoder receives the 10bit index does a table look-up search to find the corresponding codevector from the excitation codebook. This then passed through again scaling unit and a synthesis filter to produce the decoded signal vector. The signal vector is next processed to enhance the quality
The difference between the input x and the output y is called the quantization error and sometimes is referred as quantization distortion or quantization noise. It is interesting to find the variance of this difference (x-y). The variance is the deviation from the mean value. Because it may be negative, we take the square of this difference. Thats why we speak about the mean squared quantization error 2. The variance is the mean of squared deviations. Although amplitude levels of speech define a discrete random variable, they are being modelled with continuous distributions because they simplify the process. The variance of a discrete set is given by 2=pi(xi-y)2. The standard deviation is the square root of 2, which is . For a continuous random variable the standard deviation is given by the square of the integration of (x-y)2f(x)dx. Where f(x) is the pdf of f and the integration is taken from minus infinitive to plus infinitive.