Вы находитесь на странице: 1из 13

EE 284 Homework 2

Karla M. Khalid

Problem #1. (see ee284hw2_1.m)

Answers to Questions 1-5. The spectra of the voiced segment of the signal obtained using FFT and
LPC are shown in Figure 1. From the LPC spectra obtained using Matlab and using PRAAT, the following
formant frequencies were obtained:

F1 F2 F3 F4
From Matlab 253.9 Hz 2332 Hz 3058 Hz 3468 Hz
From Praat 278.7421 Hz 2231.6052 Hz 3048.9945 Hz 3463.3310 Hz

The table shows that the formant frequencies obtained using two different tools were quite similar
(increasingly so the higher the frequency) although not exactly the same.

Figure 1. FFT vs LPC Spectrum of the Voiced Segment


Answer to Question 6. After pre-emphasis, the speech sample sounded brighter. Figure 2 compares the
FFT & LPC spectra of the original vs that of the pre-emphasized speech signal and shows the effect of
pre-emphasis on the speech signal. The amplitude of high frequency bands are increased and the
amplitudes of lower frequency bands decreased thus flattening the spectrum.

Figure 2. FFT vs LPC Spectrum of the Original vs Pre-emphasized Voiced Segment


Answer to Question 7. It can be seen that there is no significant reduction in the residual energy after a
certain filter order, p. In the given voiced segment, error energy remains more or less the same after p =
16. Hence the error energy curve can be used to choose the optimum prediction order for the LP
analysis.

Figure 3. Residual Energy vs LPC Filter Order


Problem #2. (see ee284hw2_2.m)

Answers/Solution to Item 1. The residual has been saved to residual.wav. Figure 4 below shows the plot
of the voiced segment vs the residual. Figure 5 on the next page shows the FFT spectrum of the voiced
segment vs the FFT spectrum of the residual.

Figure 4. Plot of the Voiced Speech Segment vs the Residual


Figure 5. Spectrum of the Voiced Speech Segment vs the Spectrum of the Residual
Answers/Solution to Item 2. Figures 6 (zoomed in) and 7 show that speech can be perfectly reconstructed
by using the residual as input to the LPC synthesis filter.

Figure 6. (Zoomed in) portion of the plot of the original voiced segment vs the synthesized speech
(using the residual as input)

Figure 7. FFT spectrum of the original voiced segment vs the FFT spectrum of the synthesized
speech (using the residual as input)
Problem #3. (see ee284hw2_3.m)

Answers/Solution to Item 1. For this problem, speech is reconstructed using white Gaussian noise as
excitation. The resulting speech sound has been saved as rec_speech_wn_input.wav. Figure 8 shows
the plot of the original voiced segment vs the reconstructed speech. Figure 9 shows the FFT spectrum of
the reconstructed speech plotted against the FFT spectrum of the original voiced segment.

Figure 8. Plot of the Original Voiced Segment vs Speech reconstructed using white noise as
excitation
Figure 9. FFT Spectrum of Voiced Segment vs Speech reconstructed using Gaussian noise as
input
Answers/Solution to Item 2a&b. Figure 10 shows the plots of (two cycles of) the impulse train, the filtered
impulse train, the time-reversed filtered impulse train, and the derivative respectively. It can be observed
that although plots C & D look like glottal flow waveforms, there is a big difference between these plots
and those shown on page 8 of the lecture notes especially with respect to the amplitudes and
widths/duration of the different phases.

Figure 10. Synthetic Glottal Flow Waveforms


Answers/Solution to Item 2c. For this problem, speech is reconstructed using the synthetic glottal flow
derivative obtained in (a) as excitation. The resulting speech sound has been saved as
rec_speech_glottalpulse_input.wav. Figure 11 shows the plot of the original voiced segment vs the
reconstructed speech. Figure 12 shows the FFT spectrum of the reconstructed speech plotted against the
FFT spectrum of the original voiced segment. It can be seen that there are big differences between the
spectrum of the original and the reconstructed signal. The original signal sounds brighter (higher energy
in the higher frequencies) whereas the synthesized signal sounds robotic.
Problem #4. (see ee284hw2_4.m)

For this problem, single vowel sounds were obtained from the North Texas vowel database; voiced
segments were extracted from each vowel file; and the MFCC vectors for each vowel file were obtained
using the Voicebox toolbox. I am still working on the PCA analysis as I still cannot figure out how to
produce the plots I want from the output of the Matlab pca() function. But I will submit these as soon as
they are done.