Speaker Recognition Using MFCC

DSP Lab Project
Speech Recognition using MFCC

Project Report
Deepak Chandran - B110116EC

Hashin Jithu - B110704EC
Hemanth P - B110147EC
Problem Statement
There has been a dramatic increase in the adoption of biometric verification into
our daily lives, egs. laptop fingerprint scanners, SIRI, etc. Among these, voice
verification occupies a large portion of biometric verification due to its ease of
use. Systems that utilize the human voice for verification do not require the
user to be anywhere near the verification system.
Depending upon the problem specification, the task can be either Automatic
Speaker Identification (determining who is speaking) or Automatic Speaker Verification (validating whether the same person is speaking that has being claimed,
or not).. The aim of this project is to implement a Speaker Identification system
using MFCC concepts.
2
2.1
Theory
Feature Extraction
The recognition performance is dictated by the extraction of the best parametric representation of the speech signals. There are different methods that are
normally used for feature extraction like MFCC, LPC, PLP. In this project we
focus our efforts on MFCC.
The Mel-Frequency Cepstrum Coefficient(MFCC) technique is based off of human hearing perceptions. The mel frequency scale is a linear frequency spacing
below 1000 Hz and logarithmic spacing above 1kHz. The human perception of
the frequency contents of sounds for speech signals does not follow a linear scale
2.2
MFCC
The steps involved in calculating the MFCC coefficients are shown in Fig. 1.
Continuous speech that is coming from a source like a microphone is processed
over a short period of time. It is divided into to frames and overlapped with
the previous one for the clear transition. In second step we used hamming
window for overlapping frame which is used to reduce the distortion caused by
the overlapping. After windowing, the speech signal undergoes FFT and gets
converted from time domain to frequency domain. .In Mel Frequency wrapping,
each frame signals are passed through Mel-Scale band pass filter to mimic the
human ear. In the final stage, again signals converted into time domain using
DCT. Instead of using inverse FFT, Discrete Cosine Transform is used as it is
more appropriate
2.3
Feature Matching
A speaker recognition system should be able to determine upto what probability

does the unknown speakers speech that present in the database. It would be a
2
Figure 1:
tedious task to store all the vectors generated during the training phase. Using
the process of vector quantization, each feature vector can be quantized one
of several template vectors. A small number of representative vectors can be
created from the dataset. In the recognition stage, the unknown speakers speech
is compared to the codebook of each speaker and the difference is measured.
Figure 2:
Implementation
For the framing section, the speech signal is converted into frames consisting of
N samples with the frames being seperated by M samples. In our implementa-
tion, M = 100 and N = 256. In the windowing section we utilized the Hamming
window.
The acoustic vectors that are created from the MFCC process provide the characteristics of a speakers voice. When an unknown speaker records his/her into
MATLAB, a fingerprint of their voice is created similarly and using the Eucliedean distance technique, a suitable match is determined.
Observations
To implement the speaker recognition system, a simple voice command like

Hello was used.
Figure 3: Speech Signal
Figure 4: Framed Signal
Figure 5: Signal after windowing
Figure 6: Autocorrelation
Results
The aim of this project was to implement a speaker recognition system that
could at a high level differentiate between genders. After calculating the features
extracted from the unknown speech, they were then compared to the stored
feature set and the gender of the unknown speakers were identified successfully.
Euclidean Distance was used to compare the test to the database and the speech
was recognized correctly 9 out of 10 times. The crude speaker recognition code
was written in MATLAB and compares the average pitch of the recored wav file
as well as the vector differences between the formant peaks in the PSD of each
file.
References
[1] Campbell, J.P., Jr Speaker recognition: a tutorial, Proceedings of the IEEE
Volume 85, Issue 9, Sept. 1997 Page(s):1437 1462
[2] Kumar Rakesh, Subhangi Dutta and Kumara ShamaGender Recognition using Speech Processing Techniques in Labview ,International Journal of Advances in Engineering Technology, May 2011, ISSN: 2231-1963

Speaker Recognition Using MFCC

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Speaker Recognition Using MFCC

Загружено:

Авторское право:

Доступные форматы

DSP Lab Project

Speech Recognition using MFCC

Deepak Chandran - B110116EC

A speaker recognition system should be able to determine upto what probability

To implement the speaker recognition system, a simple voice command like

Figure 3: Speech Signal

Figure 4: Framed Signal

Figure 5: Signal after windowing

Вам также может понравиться