Вы находитесь на странице: 1из 6

Visit: www.geocities.com/chinna_chetan05/forfriends.




Many people today have access to their company’s information system by logging in
from home. Also Internet services and telephone banking are widely used by corporate
and private sectors. Therefore to protect one’s resources or information with simple
password is not reliable and secure in the world of today. Biometrics are methods for
recognizing a user based upon his/her unique physiological and/or behavioral
characteristics. Voice signal as unique behavioral characteristics is presented in this paper
for speaker verification over telephone lines using artificial neural network (ANN) for
banking application. Here Multi-layer feed forward artificial neural network (ANN)
system capable of verifying a speaker among the group of speakers is designed. Spectral
density of recorded voice signal is used for characterization. Finally the feasibility of the
speaker recognition system is tested. This system found more efficient in speaker


There is a vital need for speaker identification in all spheres of life. The most
important being that this system will enable people to have secure access to information
and property. It has significant advantage that in electronic banking and Internet access.
Countless money is lost each year due to white-collar crime, fraud and embezzlement. In
today’s complex economic times, businesses and individuals are both falling victims to
these devastating crimes. Employees embezzle funds or steal goods from employers, then
disappear or hide behind legal issues. Individuals can easily become helpless victims of
identity theft, stock schemes and other scams that rob them of their money.

One solution to avoid such white-collar crimes and shorten the lengthy time in
locating and serving perpetrators with a judgment is by use of biometrics techniques for
verifying individuals. Artificial neural network (ANN) are intelligence systems that are
related in some way to a simplified biological model of human brain. Attenuation and
distortion of voice signals exists over telephone lines and artificial neural network, despite
a nonlinear, noisy and un -stationary environment, is still good at recognizing and
verifying unique characteristics of signal such as speech. Speaker recognition involves
speaker identification or speaker verification based on his\her voice in the form of speech.

Speaker recognition is the generic term used for two related problems:
1. Speaker Identification: the problem is to determine the identity of a
speaker from a known group of (N) possible speakers.
2. Speaker Verification: basically the same problem as speaker identification, except
that claimed identity is also given and the problems are “merely” to confirm or
disconfirm the identity claim.
Speaker recognition problem using ANN is divided into two parts

i) Feature extraction ii) Pattern matching.

1 Email: chinna_chetan05@yahoo.com
Visit: www.geocities.com/chinna_chetan05/forfriends.html
The Text dependant audio signals are recorded over telephone lines for different
speakers. In feature extraction signal – processing toolbox of MATLAB is used to convert

recorded sound files to a presentable form as input vector to a neural network. In pattern
matching, the output of neural network identifies and verifies unique characteristics of the
features of speech signal. The feature extraction, the neural network architecture and the
software and hardware involved in the development of speaker identification and verification
system are described in this paper. First few sections of this paper are dedicated to speaker
recognition system architecture and later its application in e_banking is discussed.

The speaker recognition system over telephone lines is investigated in this paper using
artificial neural network shown in figure 1.

Figure1: Block Diagram of the Speaker recognition system using an ANN

In this paper, the speaker recognition system reported is a text-dependant type. The
system is trained on a group of people to be identified by each person speaking out of same
phrase .The voices is recorded on a standard 16-bit computer sound card from telephone
handset receiver. Although the frequency of human voice ranges from 0 KHz to 20 KHz,
most of signal content lies in 0.3 KHz to 4 KHz range. The frequency over the telephone lines
is limited to 0.3 KHz to 3.4 KHz and this is the frequency band of interest in this paper.
Therefore, a sampling rate of 16 KHz satisfying the Nyquist criteria is used. The voices are
stored as sound files on the computer. Digital processing techniques are used to convert sound
files to a presentable form as an input vectors to neural network. The output of neural network
verifies the speaker in the group.


Speaker recognition over telephone network present the many challenges such as :

1. Variations in handset microphones, which result in severe mismatches between data

gathered from these microphones.
2. Signal distortion due to telephone cannel.

2 Email: chinna_chetan05@yahoo.com
Visit: www.geocities.com/chinna_chetan05/forfriends.html
3. Inadequate control over speaker/speaking conditions.

The bare audio signal cannot be fed into the neural network due to that several speaker
may produce similar signal. The process of feature extraction consists of obtaining
characteristics parameter of a signal to be used to classify the signal.

For speaker recognition, the features extracted from a speech signal should
be consistent with regard to the desired speaker while exhibiting large deviations from
the other speaker. Here in feature extraction signal-processing toolbox of MATLAB is
used to convert recorded sound files to a presentable form as input vector to a neural

Feature like spectral density gives different representation for different speaker
for same text. Here power spectral density of two different speakers uttering same word
is shown in figure 2 for speaker X and figure 3 for speaker Y.

Figure 2: PSD of Speaker X Figure 3: PSD

of Speaker Y

From the figures 2 and figure 3 it can be seen that the power spectral density (PSDs) of
the speaker X and speaker Y differs from each other.


Artificial Neural network (ANNs) are intelligent system that are related in
some way to a simplified biological model of human brain. They are composed of many
simple elements, called neural neurons, operating in parallel and connected to each other
by some multipliers called the connection weights or strengths. Neural networks are
trained by adjusting values of these connection weights between the neurons.

3 Email: chinna_chetan05@yahoo.com
Visit: www.geocities.com/chinna_chetan05/forfriends.html
Neural networks have a self learning capability, are fault tolerant and noise
immune, and have application in system identification, pattern recognition, classification,
speech recognition, image processing, etc. In this application of speaker recognition,
ANN is used for pattern matching. The performance of feed forward artificial neural
network is investigated for this application.

A three layer feed forward neural network with a sigmoidal hidden layer followed by a
liner output layer is used in this application for pattern matching. Error back propagation
algorithm is used for this purpose. In this application, an adoptive learning rate is used,
i.e. the learning rate is adjusted during training to enhance faster global convergence.

Figure 4: The Multi layer feed forward (MPL) neural network.

The MPL network in figure 4 is constructed in MATLAB 6P1 environment.

The input to the MPL network is vector containing the PSDs. 10 hidden nodes is used.
The number of output nodes depends on the number of speaker. An initial learning rate,
an allowable error and maximum number of training cycles/epochs are parameter that is
specified during the training phase to a MATLAB neural Network.


The most straightforward way to employ speaker recognition is in the cases when one
has to gain access to some secure bank account. Voice is completely compatible with the
existing transmission protocols via telephone channels; therefore no special adaptations of the
system (besides the installment of a system) are necessary. For the time being such a service
is restricted to operations within the accounts maintained by a single individual. One can
check the status of their account, transfer money between ones own saving accounts, etc.

4 Email: chinna_chetan05@yahoo.com
Visit: www.geocities.com/chinna_chetan05/forfriends.html
Here voice samples of different users are recorded uttering a same phrase over controlled
and uncontrolled conditions. Users who want to use his account, utters a same phrase over
telephone line. The speaker recognition system identifies a particular user is a particular
account holder and allows him to access the account. If a particular user is not an account
holder, i.e. his voice didn’t matches with any particular person in a group of uses then system
disconfirms his identity and not allow him to access the account.


The MPL network is trained with the PSDs of ten voice samples recorded at
different instance of time under controlled and uncontrolled speaking conditions of ten
different speaker uttering the same phrase at all times. Controlled speaking conditions
refer to noise and distortion free conditions unlike uncontrolled speaking conditions,
which have noise and distortion over transmission lines. The number of PSDs point for
each sample is 1000. An adoptive learning rate is used for MPL network. The initial
learning rate is 0.01. The allowable sum squared error and maximum number of epochs
specified to the MATLAB neural network program is 0.01 and 10000 respectively. It is
found that sum squared error goal is reached within 1000 epochs.

A success rate of 90% is achieved when the trained MPL network is tested with same
samples used in the training phase. However, when untrained samples are used, only a
nearly 70% success rate is obtained. This is due to inconsistency in the PSDs of input
samples with those used in training phase. The MPL network is also tested with unseen
voice samples of people who are not included in the training set and network successfully
classified this voice sample as unidentified.


Use of artificial neural network in speaker recognition system is proved to be

a fair amount of success. Using features like pitch, autocorrelation and cestrum the
success rate of this system can be increased. This concept of speaker recognition has
variety of applications in the fields such as e –banking.


1.Venayagamoorty GK, Sundepersadh N , “Comparison of text – dependent speaker

identification methods for short distance telephone lines using artificial neural network “ ,
Proceedings of IEEE neural network letter 2000, pp 253 to 258.

2. Lawrence R. Rabiner and Ronaid W. Schafer, Digital Processing of Speech

Signals, Prentice- Hall Inc.

3. O. Farooq and S. Datta, Speech Recognition with Emphasis on Wavelet based


5 Email: chinna_chetan05@yahoo.com
Visit: www.geocities.com/chinna_chetan05/forfriends.html
Extraction, IETE Journal of Research, Vol. 1, January-February, 2002, pp. 3-

4. Dr. Chen-Han Sung & William C. Jones, III, A Speech Recognition System
Neural Network Processing of Global Lexical Features, IEEE Conference
Proceeding, Vol.
11, pp. 437-439.

6 Email: chinna_chetan05@yahoo.com