Speech Recognition

SPEECH
RECOGNITION
SYSTEMS
A Presentation By -
Ayush Rungta
(1409122011)
INTRODUCTION
The process of enabling a computer to identify

and respond to the sounds produced in human
speech is referred to as Speech Recognition.
You talk to your computer, phone or device

and it uses what you said as input to trigger
some action.
TYPES OF SPEECH RECOGNITION
There are two types of Speech Recognition Systems-
Speaker Dependent SRS

Speakerdependent software is commonly
used for dictation software.
Speaker Independent SRS

Speakerindependent software is more
commonly found in telephone applications.
Speaker Dependent System
Speakerdependent software works by learning the

unique characteristics of a single person's voice, in a way
similar to voice recognition. New users must first "train"
the software by speaking to it, so the computer can analyze
how the person talks.
This often means users have to read a few pages of text to

the computer before they can use the speech recognition
software.
Speaker Independent System
Speakerindependent software is designed to recognize

anyone's voice, so no training is involved. This means it is
the only real option for applications such as interactive
voice response systems where businesses can't ask
callers to read pages of text before using the system.
The downside is that speakerindependent software is

generally less accurate than speakerdependent software.
RECOGNITION
Voice Input
Analog
To
Digital
Acoustic Decoder Language

Model Model
Display
Speech Feedback
Engine
Digitization
The analog-to-digital converter (ADC) translates

this analog wave into digital data that the computer can
understand.
To do this, it samples, or digitizes, the sound by taking

precise measurements of the wave at frequent intervals.
Uses filters to measure energy levels for various points

on the frequency spectrum
Noise Removal
The system filters the digitized sound to remove

unwanted noise, and sometimes to separate it into
different bands of frequency (frequency is the
wavelength of the sound waves, heard by humans as
differences in pitch).
It also normalizes the sound, or adjusts it to a constant

volume level.
Two microphones can be used to remove noise. One

facing the speaker and the one facing away from the
speaker.
Speech Engine
It is the software program that recognizes speech. A

speech engine takes a spoken utterance, compares it to
the vocabulary, and matches the utterance to vocabulary
words.
Speech Engine consists of the following three

components-
Acoustic Model
Language Model
Decoder
Acoustic Model
An Acoustic Model is a file that contains statistical

representations of each of the distinct sounds that makes
up a word. Each of these statistical representations is
assigned a label called a phoneme.
The English language has about 40 distinct sounds that

are useful for speech recognition, and thus we have 40
different phonemes.
Language Model
A Statistical Language Model is a file used by a

Speech Recognition Engine to recognize speech. It
contains a large list of words and their probability of
occurrence. It is used in dictation applications.
It tries to capture the properties of a language, and to

predict the next word in a speech sequence.
Decoder
Software program that takes the sounds spoken by a user

and searches the Acoustic Model for the equivalent
sounds.
When a match is made, the Decoder determines
the phoneme corresponding to the sound. It keeps track of
the matching phonemes until it reaches a pause in the
users speech.
It then searches the Language Model or Grammar file for
the equivalent series of phonemes. If a match is made it
returns the text of the corresponding word or phrase to the
calling program.
Neural Networks
Artificial Neural Networks (ANNs) are systems consisting

of interconnected computational nodes working somewhat
similarly to human neurons.
Neural networks can be used to approximate functions or

classify data into similar classes than can be phonemes,
sub-phoneme units, syllables or words in the speech
recognition domain.
The ability to learn by adapting strengths of inter-neuron

connections (synapses) is a fundamental property of
artificial neural networks.
Fig.- A diagram representing a simple neural network.
PROCESS OF SPEECH RECOGNITION
S1
Speaker Speech
Parsing S2
and
Recognition Recognition
Arbitration
SK
SN
Input From User
Switch S1
on
Channel
9
Speaker Speech
Parsing S2
and
Arbitration
SK
SN
Authentication
Who is
speaking?
S1
Speaker Speech
Parsing S2
and
Arbitration
SK
Annie
David SN
Cathy
Understanding
What is he
S1
saying?
Speaker Speech
Parsing S2
and
Arbitration
SK
On,Off,TV SN
Fridge,Door
Inferring & Execution
What is he
talking S1
about?
Speaker Speech
Parsing S2
and
Arbitration
SK
Switch,to,channel,nine
Channel->TV
SN
Dim->Lamp
On->TV,Lamp
Framework of Voice Recognition
S1
Face Gesture
Parsing S2
And
Arbitration
SK
SN
Authentication Understanding Inferring and Execution

APPLICATIONS
In car systems
Healthcare
Military
Telephony
Education and Daily Life
People with Disabilities

THE END

Speech Recognition

Загружено:

Сведения о документе

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Speech Recognition

Загружено:

Авторское право:

Доступные форматы

SPEECH

The process of enabling a computer to identify

You talk to your computer, phone or device

There are two types of Speech Recognition Systems-

Speaker Dependent SRS

Speaker Independent SRS

Speakerdependent software works by learning the

This often means users have to read a few pages of text to

Speakerindependent software is designed to recognize

The downside is that speakerindependent software is

Acoustic Decoder Language

The analog-to-digital converter (ADC) translates

To do this, it samples, or digitizes, the sound by taking

Uses filters to measure energy levels for various points

The system filters the digitized sound to remove

It also normalizes the sound, or adjusts it to a constant

Two microphones can be used to remove noise. One

It is the software program that recognizes speech. A

Speech Engine consists of the following three

An Acoustic Model is a file that contains statistical

The English language has about 40 distinct sounds that

A Statistical Language Model is a file used by a

It tries to capture the properties of a language, and to

Software program that takes the sounds spoken by a user

Artificial Neural Networks (ANNs) are systems consisting

Neural networks can be used to approximate functions or

The ability to learn by adapting strengths of inter-neuron

Authentication Understanding Inferring and Execution

Education and Daily Life

People with Disabilities

Вам также может понравиться