Voice Recognition Using FFT Transformation

VOICE RECOGNITION USING FFT
TRANSFORMATION
FFT:
Let x0, ...., xN-1 be complex numbers. The DFT is defined by the formula
it is equivalent to ∑xn*((Nth root of unity)^(k*n))
In mathematics, the discrete Fourier transform (DFT) is a specific kind of Fourier

transform, used in Fourier analysis. It transforms one function into another, which is
called the frequency domain representation, or simply the DFT, of the original function
(which is often a function in the time domain). But the DFT requires an input function
that is discrete and whose non-zero values have a limited (finite) duration. Such inputs
are often created by sampling a continuous function, like a person's voice. And unlike the
discrete-time Fourier transform (DTFT), it only evaluates enough frequency components
to reconstruct the finite segment that was analyzed. Its inverse transform cannot
reproduce the entire time domain, unless the input happens to be periodic (forever).
Therefore it is often said that the DFT is a transform for Fourier analysis of finite-domain
discrete-time functions. The sinusoidal basis functions of the decomposition have the
same properties.
Since FFT algorithms are so commonly employed to compute the DFT, the two terms are
often used interchangeably in colloquial settings, although there is a clear distinction:
"DFT" refers to a mathematical transformation, regardless of how it is computed, while
"FFT" refers to any one of several efficient algorithms for the DFT.
Implementation in Matlab:
1. Recording the wav-file. In your case through the speakers and microphone. As the
basis for your subsequent analysis, it is essential that all datasets are derived using
the same equipments and process.
2. Read-in the wav-file into Malab using the core-function wavread .
3. Perform Fast Fourier Transformation on the wave file using the core-function
fft .
4. Represent imaginary numbers in FFT matrix as real numbers by multiplying
the matrix by its complex-conjugate.
5. Look at absolute value of the important part of the data in the new matrix.
6. Split the FFT matrix into bins and get the average of each bin.
7. Standardise the return value by dividing the matrix by it's sum. The higher the
number, the higher the correlation between two samples and the more likely the
voice match.
function [p]=soundSig(filename)
in=wavread(filename);
f=fft(in,100000);
q=f.*conj(f);
q=abs(q(1:5000));
for i=1:50
t=(i-1)*100+1;
p(i)=sum(q(t:t+99));
end
p=p/sum(p);
test=soundSig('mewave');
alok=soundSig('alokwave');
fugi1=soundSig('meagainwave');
fugi2=soundSig('meagain2wave');
jon=soundSig('jonwave');
alokother=soundSig('alok2wave');
fugiother=soundSig('me2wave');
sum(test.*alok)
sum(test.*fugi1)
sum(test.*fugi2)
sum(test.*alokother)
sum(test.*fugiother)
sum(test.*jon)
sum(alok.*jon)
wavfinfo - Returns a text description of the contents of a sound (WAV) file
MATLAB also includes a general-purpose, audio/video file information function named

mmfileinfo. The mmfileinfo function returns information about both the audio data in a
file as well as the video data in the file, if present.
 wavread — Returns sound data from a sound (WAV) file
wavrecord
Fs = 11025;
y = wavrecord(5*Fs,Fs,'int16');
wavplay(y,Fs);
Wavplay
Wavwrite: Write Microsoft WAVE (.wav) sound file- wavwrite(y,filename)
nextpow2
p = nextpow2(A) returns the smallest power of two that is greater than or equal to the
absolute value of A. (That is, p that satisfies 2^p >= abs(A)). This function is useful for
optimizing FFT operations, which are most efficient when sequence length is an exact
power of two.
If A is non-scalar, nextpow2 returns the smallest power of two greater than or equal to
length(A).
Examples:
For any integer n in the range from 513 to 1024, nextpow2(n) is 10.
For a 1-by-30 vector A, length(A) is 30 and nextpow2(A) is 5.
The linspace function generates linearly spaced vectors. It is similar to the colon operator
":", but gives direct control over the number of points.
y = linspace(a,b) generates a row vector y of 100 points linearly spaced between and
including a and b.
y = linspace(a,b,n) generates a row vector y of n points linearly spaced between and
including a and b.

Voice Recognition Using FFT Transformation

Загружено:

Сведения о документе

Исходное описание:

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Voice Recognition Using FFT Transformation

Загружено:

Авторское право:

Доступные форматы

VOICE RECOGNITION USING FFT

it is equivalent to ∑xn((Nth root of unity)^(kn))

In mathematics, the discrete Fourier transform (DFT) is a specific kind of Fourier

wavfinfo - Returns a text description of the contents of a sound (WAV) file

MATLAB also includes a general-purpose, audio/video file information function named

 wavread — Returns sound data from a sound (WAV) file

Wavwrite: Write Microsoft WAVE (.wav) sound file- wavwrite(y,filename)

Вам также может понравиться