You are on page 1of 46

Artificial Neural Networks

- Introduction -

Biological inspiration
Animals are able to react adaptively to changes in their
external and internal environment, and they use their nervous
system to perform these behaviours.

An appropriate model/simulation of the nervous system


should be able to produce similar responses and behaviours in
artificial systems.

The nervous system is build by relatively simple units, the


neurons, so copying their behaviour and functionality should
be the solution.

Biological inspiration
Dendrites

Soma (cell body)

Axon

Biological inspiration
axon

dendrites

synapses

The information transmission happens at the synapses.

How do our brains work?


Biological neuron : A processing element

Dendrites: Input
Cell body: Processor
Synaptic: Link
Axon: Output

How do our brains work?


Biological neuron : A processing element

A neuron is connected to other neurons through about 10,000


synapses

How do our brains work?


A processing element

A neuron receives input from other neurons. Inputs are


combined.

How do our brains work?


Biological neuron : A processing element

Once input exceeds a critical level, the neuron Generates the


an electrical pulse that travels from the body, down the axon,
to the next neuron(s)

How do our brains work?


Biological neuron : A processing element

The axon endings almost touch the dendrites or cell body of


the next neuron.

How do our brains work?


Biological neuron : A processing element

Transmission of an electrical signal from one neuron to the


next is effected by neurotransmitters.

How do our brains work?


A processing element

Neurotransmitters are chemicals which are released from the


first neuron and which bind to the
Second.

How do our brains work?


Biological neuron : A processing element

This link is called a synapse. The strength of the signal that


reaches the next neuron depends on factors such as the
amount of neurotransmitter available.

Elements of ANNs
Processing Elements (PE)
Processing Element (PE) : PE are artificial neurons
similar to biological neurons.
Each PE receives input, process them and deliver the
output.
Input can be raw input data or output of PE
The output can be the final result.
Or it can be an input to other neurons.

Artificial neurons
Neurons work by processing information. They receive and
provide information in form of spikes.
x1
w1

x2
Inputs

xn-1
xn

z wi xi ; y H ( z )

w2

x3

i 1

..

w3

wn-1
wn
The McCullogh-Pitts model

Output
y

Artificial neurons
The Mc Cullogh-Pitts model:
spikes are interpreted as spike rates;
synaptic strength are translated as synaptic weights;
excitation means positive product between the
incoming spike rate and the corresponding synaptic
weight;

inhibition means negative product between the


incoming spike rate and the corresponding synaptic
weight;

Artificial neural networks

Inputs

Output

An artificial neural network is composed of many artificial


neurons that are linked together according to a specific
network architecture. The objective of the neural network
is to transform the inputs into meaningful outputs.

Artificial neural networks


Tasks to be solved by artificial neural networks:
controlling the movements of a robot based on selfperception and other information (e.g., visual
information);
deciding the category of potential food items (e.g.,
edible or non-edible) in an artificial world;
recognizing a visual object (e.g., a familiar face);
predicting where a moving object goes, when a robot
wants to catch it.

Neural network mathematics


Inputs

Output

y11 2
1
2
2
y1 f ( y , w1 )

y
3
1

1
1
y 2 f ( x2 , w2 ) y 1 y 2 y 2 f ( y 1 , w 2 ) y 2 y 2 y f ( y 2 , w3 )
Out
1
3
1 2
2
1
1
2
y3 y 2 f ( y1 , w 2 )
y3 f ( x3 , w3 )
y3
y1 3
3
4
y14 f ( x4 , w14 )

y11 f ( x1 , w11 )

Computers vs. Neural Networks


Standard Computers
one CPU
fast processing units

Neural Networks
highly parallel processing
slow processing units

reliable units

unreliable units

static infrastructure

dynamic infrastructure

Neural network tasks


control
classification

These can be reformulated


in general as

prediction

FUNCTION
APPROXIMATION

approximation

tasks.

Approximation: given a set of values of a function g(x)


build a neural network that approximates the g(x) values
for any input x.

Why Artificial Neural Networks?


There are two basic reasons why we are interested in

building artificial neural networks (ANNs):


Technical viewpoint: Some problems such as

character recognition or the prediction of future


states of a system require massively parallel and
adaptive processing.
Biological viewpoint: ANNs can be used to
replicate and simulate components of the human
(or animal) brain, thereby giving us insight into
natural information processing.

How do ANNs work?

An artificial neuron is an imitation of a human neuron

How do ANNs work?


Now, let us have a look at the model of an artificial neuron.

How do ANNs work?


Input

xm

........
....

Processing

Output

x2

x1

= X1+X2 + .+Xm =y

How do ANNs work?


Not all inputs are equal
Input

xm

weights

........
....
wm

...
..

x2

w2

Processing

Output

x1

w1

= X1w1+X2w2 + .+Xmwm
=y

How do ANNs work?


The signal is not passed down to the
next neuron verbatim
Input

xm

........
....

weights

Processing
Transfer Function
(Activation Function)

Output

wm

...
..

x2
w2

f(vk)

x1

w1

The output is a function of the input, that is


affected by the weights, and the transfer
functions

Three types of layers: Input, Hidden, and


Output

Artificial Neural Networks


An ANN can:
1. compute any computable function, by the appropriate
selection of the network topology and weights values.
2. learn from experience!

Specifically, by trialanderror

Learning by trialanderror
Continuous process of:
Trial:
Processing an input to produce an output (In terms of
ANN: Compute the output function of a given input)

Evaluate:
Evaluating this output by comparing the actual output with
the expected output.

Adjust:
Adjust the weights.

Learning Paradigms
Supervised learning
Unsupervised learning
Reinforcement learning

Supervised learning
This is what we have seen so far!
A network is fed with a set of training samples (inputs

and corresponding output), and it uses these samples


to learn the general relationship between the inputs
and the outputs.
This relationship is represented by the values of the
weights of the trained network.

Unsupervised learning
No desired output is associated with the training

data!
Faster than supervised learning
Used to find out structures within data:
Clustering

Compression

Reinforcement learning
Like supervised learning, but:
Weights adjusting is not directly related to the error
value.
The error value is used to randomly, shuffle weights!
Relatively slow learning due to randomness.

Q. How does each neuron work in ANNs?


What is back propagation?
A neuron: receives input from many other neurons;
changes its internal state (activation) based on the

current input;
sends one output signal to many other neurons, possibly
including its input neurons (ANN is recurrent network).
Back-propagation is a type of supervised learning, used

at each layer to minimize the error between the layers


response and the actual data.

Applications Areas
Function approximation/ Regression
including time series prediction and modelling.
Trained to predict whose values are integer numbers

(Ex. Stock market index prediction)


Classification : Pattern reorganization
including patterns and sequences recognition, novelty

detection and sequential decision making.


(radar systems, face identification, handwritten text recognition)

Data processing
including filtering, clustering blinds source separation and

compression.
(data mining, e-mail Spam filtering)

Applications Areas
Clustering : data set is so complicated that there is no way

to identify the category. ANN used to identify the special


features of dataset to classify and put then in a cluster.
Ex: used in Adaptive resonance theory
Association : ANN trained to remember number of unique
patterns, if the original pattern is distorted the network
associate with closet pattern,
Ex. Useful in noisy data or incomplete data

Advantages / Disadvantages
Advantages
Adapt to unknown situations
Powerful, it can model complex functions.
Ease of use, learns by example, and very little user

domainspecific expertise needed


Disadvantages
Forgets
Not exact
Large complexity of the network structure

SOME ANN APPLICATIONS


ANN application areas:
Tax form processing to identify tax fraud
Enhancing auditing by finding irregularities
Bankruptcy prediction
Customer credit scoring
Loan approvals
Credit card approval and fraud detection
Financial prediction
Energy forecasting
Computer access security (intrusion detection and
classification of attacks)
Fraud detection in mobile telecommunication networks

Conclusion
Artificial Neural Networks are an imitation of the biological

neural networks, but much simpler ones.


The computing would have a lot to gain from neural networks.
Their ability to learn by example makes them very flexible and
powerful furthermore there is need to device an algorithm in
order to perform a specific task.

Conclusion
Neural networks also contributes to area of research such a

neurology and psychology. They are regularly used to model


parts of living organizations and to investigate the internal
mechanisms of the brain.
Many factors affect the performance of ANNs, such as the
transfer functions, size of training sample, network topology,
weights adjusting algorithm,

CLASSIFICATION AND
REGRESSION TREES (CART)
Also known as decision trees
Decision Trees:

Like a species identification key. Class labels are assigned to objects by


following a path through a series of simple rules or questions, the
answers to which determine the next direction through the path.
Decision tree is a supervised learning algorithm which must be
provided with a training set that contains objects with class labels.
Looks like a cluster analysis dendro gram or partitioning diagram but
these are from unsupervised methods that take no account of preassigned class labels.

CLASSIFICATION AND
REGRESSION TREES (CART)
CART aims to use a set of predictor variables to estimate the
means of one or more response variables. A binary tree is
constructed by repeatedly splitting the data set into subsets. Each
individual split is based on a single predictor variable and is
chosen to minimise the variability of the response variables in
each of the resulting subsets.

The tree begins with the full data set and ends with a series of
terminal nodes. Within each terminal node, the means of the
response variables are taken as predictors for future observations.
Closer to ANOVA than regression in that data are divided into a
discrete number of subsets based on categorical predictors and
predictions are determined by subset means.

CLASSIFICATION AND
REGRESSION TREES (CART)
Must define two criteria:
1. A measure of impurity or inhomogeneity.
2. Rule for selecting optimum tree.
Produce a very large tree and then prune it into successively
smaller trees. Skill of each tree is determined by crossvalidation. Divide the full data into subsets, drop one subset,
grow the tree on the remaining data and test it on the omitted
subset.

CLASSIFICATION AND
REGRESSION TREES (CART)
A SUMMARY:
Explain variation of single response variable by one or more
explanatory or predictor variables.
Response variable can be quantitative (regression trees) or categorical
(classification trees).
Predictor variables can be categorical and/or quantitative.
Trees constructed by repeated splitting of data, defined by a simple rule
based on single predictor variable.

CLASSIFICATION AND
REGRESSION TREES (CART)
At each split, data partitioned into two mutually exclusive
groups, each of which is as homogeneous as possible. Splitting
procedure is then applied to each group separately.
Aim is to partition the response into homogeneous groups but to
keep the tree as small and as simple as possible.
Usually create an overlarge tree first, pruned back to desired size
by cross-validation.

Each group typically characterised by either the distribution


(categorical response) or mean value (quantitative response) of
the response variable, group size, and the predictor variables that
define it.