Академический Документы
Профессиональный Документы
Культура Документы
Problem Statement
There has been a dramatic increase in the adoption of biometric verification into
our daily lives, egs. laptop fingerprint scanners, SIRI, etc. Among these, voice
verification occupies a large portion of biometric verification due to its ease of
use. Systems that utilize the human voice for verification do not require the
user to be anywhere near the verification system.
Depending upon the problem specification, the task can be either Automatic
Speaker Identification (determining who is speaking) or Automatic Speaker Verification (validating whether the same person is speaking that has being claimed,
or not).. The aim of this project is to implement a Speaker Identification system
using MFCC concepts.
2
2.1
Theory
Feature Extraction
The recognition performance is dictated by the extraction of the best parametric representation of the speech signals. There are different methods that are
normally used for feature extraction like MFCC, LPC, PLP. In this project we
focus our efforts on MFCC.
The Mel-Frequency Cepstrum Coefficient(MFCC) technique is based off of human hearing perceptions. The mel frequency scale is a linear frequency spacing
below 1000 Hz and logarithmic spacing above 1kHz. The human perception of
the frequency contents of sounds for speech signals does not follow a linear scale
2.2
MFCC
The steps involved in calculating the MFCC coefficients are shown in Fig. 1.
Continuous speech that is coming from a source like a microphone is processed
over a short period of time. It is divided into to frames and overlapped with
the previous one for the clear transition. In second step we used hamming
window for overlapping frame which is used to reduce the distortion caused by
the overlapping. After windowing, the speech signal undergoes FFT and gets
converted from time domain to frequency domain. .In Mel Frequency wrapping,
each frame signals are passed through Mel-Scale band pass filter to mimic the
human ear. In the final stage, again signals converted into time domain using
DCT. Instead of using inverse FFT, Discrete Cosine Transform is used as it is
more appropriate
2.3
Feature Matching
Figure 1:
tedious task to store all the vectors generated during the training phase. Using
the process of vector quantization, each feature vector can be quantized one
of several template vectors. A small number of representative vectors can be
created from the dataset. In the recognition stage, the unknown speakers speech
is compared to the codebook of each speaker and the difference is measured.
Figure 2:
Implementation
For the framing section, the speech signal is converted into frames consisting of
N samples with the frames being seperated by M samples. In our implementa-
tion, M = 100 and N = 256. In the windowing section we utilized the Hamming
window.
The acoustic vectors that are created from the MFCC process provide the characteristics of a speakers voice. When an unknown speaker records his/her into
MATLAB, a fingerprint of their voice is created similarly and using the Eucliedean distance technique, a suitable match is determined.
Observations
Figure 6: Autocorrelation
Results
The aim of this project was to implement a speaker recognition system that
could at a high level differentiate between genders. After calculating the features
extracted from the unknown speech, they were then compared to the stored
feature set and the gender of the unknown speakers were identified successfully.
Euclidean Distance was used to compare the test to the database and the speech
was recognized correctly 9 out of 10 times. The crude speaker recognition code
was written in MATLAB and compares the average pitch of the recored wav file
as well as the vector differences between the formant peaks in the PSD of each
file.
References
[1] Campbell, J.P., Jr Speaker recognition: a tutorial, Proceedings of the IEEE
Volume 85, Issue 9, Sept. 1997 Page(s):1437 1462
[2] Kumar Rakesh, Subhangi Dutta and Kumara ShamaGender Recognition using Speech Processing Techniques in Labview ,International Journal of Advances in Engineering Technology, May 2011, ISSN: 2231-1963