Академический Документы
Профессиональный Документы
Культура Документы
OUTLINE
I
THE SPEECH
SIGNAL
II
THE HIDDEN
MARKOV
MODEL
III
SPEECH
RECOGNITION
USING HMM
INTRODUCTION
APPLICATIONS :
1. HANDS-FREE COMPUTING
II. AUTOMATIC TRANSLATION
EARLY HISTORY
OUTLINE:
SPEECH
PRODUCTION
3-STATE
REPRESENTATION
SPEECH
REPRESENTATION
SPECTRAL
REPRESENTATION
THE SPEECH
SIGNAL
PRE-PROCESSING
WINDOWING
SPEECH TO
FEATURE
VECTORS
FEATURE
EXTRACTION
POST
PROCESSING
SPEECH PRODUCTION
Random noise
sounds
generator
Lungs
Epiglottis
Vocal Tract
Lips
Unvoiced
SPEECH REPRESENTATION
Types :
Time-domain representation
Frequency-domain representation
Time-domain representation :
Frequency-domain
representation:
OBTAINING FEATURE
VECTORS
Why do we need feature vectors ?
Preprocessing
Frame
Blocking and
Windowing
Feature
Extraction
Post
processing
Pre-processing :
Purpose : To modify raw speech signal so that
It is more suitable for feature extraction
Noise
cancellation
Pre-emphasis
Voice
Activation
Detection
(VAD)
Pre-emphasis
To emphasize high frequency
components
.because often high frequency
components have low SNR
H(z) = 1- 0.5z-1 ; S1(z) = H(z)S(z)
Windowing
Hamming Window
w(k) = 0.54 0.46 cos ( 2k / K1 )
K = no. of samples in a speech signal
Feature Extraction:
Feature Extraction
LPC
MFCC
Advantages
MFCC reduces information in speech to
small no. of coefficients
MFCC tries to model loudness
MFCC resembles human auditory model,
and it is easy to compute
Post Processing
To give more weightage
to certain features
Weight
function
Normalization
MARKOV CHAINS:
Markov Process ?
First Order Markov Process. ?
HMM example
Notation :
Problem -1
Problem - 1
Evaluation Problem
Problem -11
Problem -11
Problem -111
Solution to Problem - 1
.8
.2
So we move on to a recursive
algorithm called Forward Algorithm
Solution to Problem 11
Given a HMM, we are trying to find the
most-likely state sequence for a
particular observation sequence.
Employing greedy algorithm, we
want to find the seq. of hidden states
that maximizes
So which is preferred. ?
Segmental K-means
algorithm
Let,
Length 1,2,3, .
T
Training a HMM:
For each word v in the vocabulary, we
must build an HMM v , i.e., we must
estimate the model parameters (A,B,)
that optimize the likelihood of the training
set observation vectors of the vth word.
Testing :
For each unknown word which is to be
recognized, first we should measure the
observation sequence O = O1,O2 . OT,
via feature analysis of the speech
corresponding to the word, followed by
calculation of model likelihoods for all
possible models, P(O| v), followed by
selection of the word whose model
likelihood is highest
A simple yes,
no example .
Continuous speech
Recognition
THANK YOU