Академический Документы
Профессиональный Документы
Культура Документы
19
1. Aditi Kanjolia , 1200202
2. Keerthana Sravanthi, 1200313
ee1200202@iiti.ac.in
Contents
CLASSIFICATION OF VOICED
And
UNVOICED SPEECH SIGNAL
Using
FOURIER TRANSFORM
Introduction
Speech is an acoustic signal produced from a speech production system. From our
understanding of signals and systems, the system characteristics depend on the design of
the system. For the case of linear time invariant system, this is completely characterized in
terms its impulse response. However, the nature of response depends on the type of input
excitation to the system. A similar phenomenon happens in the production of speech also.
Based on the input excitation phenomenon, the speech production can be broadly
categorized into three activities. The first case where the input excitation is nearly periodic
in nature, the second case where the input excitation is random noise-like in nature and
third case where there is no excitation to the system. Accordingly, the speech signal can
be broadly categorized into three regions- voiced, unvoiced and silence speech.
Our aim is to classify between voiced and unvoiced speech.
Voiced sounds consist of fundamental frequency and its harmonic components produced by
vocal cords (vocal folds). The vocal tract modifies this excitation signal causing formant
(pole) and sometimes anti-formant (zero) frequencies. With purely unvoiced sounds, there
is no fundamental frequency in excitation signal and therefore no harmonic structure. The
airflow is forced through a vocal tract constriction which can occur in several places
between glottis and mouth. Some sounds are produced with complete stoppage of airflow
followed by a sudden release, producing an impulsive turbulent excitation often followed by
a more protracted turbulent excitation. Unvoiced sounds are also usually more silent and
less steady than voiced ones.
Voiced sounds, e.g., a, b, are essentially due to vibrations of the vocal cords, and are
oscillatory. Therefore, over short periods of time, they are well modelled by sums of
sinusoids. This makes short-time Fourier transform, a useful tool for speech processing.
Unvoiced sounds such as s, sh, are more noise-like, as shown in figure below. They have
wide band spectrum.
FormantsWikipedia defines Formants as the spectral peaks of the sound spectrum of the voice". In
speech science and phonetics, formant is also used to mean an acoustic resonance of the
human vocal tract. It is often measured as an amplitude peak in the frequency spectrum of
the sound, though in vowels spoken with a high fundamental frequency, as in a female or
child voice, the frequency of the resonance may lie between the widely-spread harmonics
and hence no peak is visible.
Fourier TransformThe Fourier transform, named after Joseph Fourier, is a mathematical transformation
employed to transform signals between time domain and frequency domain, which has
many applications in physics and engineering.
The Fourier Transform decomposes any function into a sum of sinusoidal basis functions.
Each of these basis functions is a complex exponential of a different frequency. The Fourier
Transform therefore gives us a unique way of viewing any function - as the sum of simple
sinusoids.
The Fourier Series showed us how to rewrite any periodic function into a sum of sinusoids.
The Fourier Transform is the extension of this idea to non-periodic functions.
The Fourier Transform of a function g(t) is defined by:
[Equation 1]
The result is a function of f, or, frequency. As a result, G(f) gives how much power g(t)
contains at the frequency f. G(f) is often called the spectrum of g. In addition, g can be
obtained from G via the inverse Fourier Transform:
[Equation 2]
Equation [2] states that we can obtain the original function g(t) from the function G(f) via
the inverse Fourier transform. As a result, g(t) and G(f) form a Fourier Pair: they are distinct
representations of the same underlying identity. We can write this equivalence via the
following symbol:
[Equation 3]
Given below is a table of few examples of some alphabets with their classification. And in
parentheses are their phonetic transcriptions.
voiced
unvoiced
b book
(b k)
vanilla
(v nIl )
please
(pliz)
five
(faIv)
they
thirty
( eI)
d dish
ten
(t n)
sir
(s
(dI )
zero
(z
genre
(
nr )
ti)
she
( i)
MATLAB CODE
We will use a MATLAB code to do our required experimentation. We record some sounds
using wavrecord command. Then we get the Fast Fourier Transform of each of them, using
fft command and then we classify them as voiced and unvoiced speech signal.
%Recording sound
% Plotting the
IMPLEMENTATION
The above code was implemented on some vowels and consonants (A,P,B,S,Z,T and D).
Here are the results of the same:
A
10
11
12
13
14
15
BIBLIOGRAPHY
16