Академический Документы
Профессиональный Документы
Культура Документы
Institute of
Business Informatics
Supervised Learning
Uwe Lmmel
www.wi.hswismar.de/~laemmel
U.laemmel@wi.hswismar.de
Supervised Learning
Neural Networks
Idea
Artificial Neuron & Network
Supervised Learning
Unsupervised Learning
Data Mining other
Techniques
Supervised Learning
Supervised Learning
Feed-Forward Networks
Examples
Bank Customer
Customer Relationship
Supervised Learning
Connections
Feed-forward
Input layer
Hidden layer
Output layer
Feed-back / auto-associative
From (output) layer back to
previous (hidden/input) layer
All neurons fully connected to
each other
Hopfield
network
Supervised Learning
...
Supervised Learning
Supervised Learning
Perception
mapping layer
Perception
first step of recognition
becoming aware of
something via the senses
output-layer
picture
Supervised Learning
trainable, fully
connected
Perceptron
Input layer
binary input, passed trough,
no trainable links
Propagation function
netj = oiwij
Activation function
oj = aj = 1 if netj j , 0 otherwise
A perceptron can learn all the functions,
that can be represented, in a finite
time .
Supervised Learning
Linear separable
Neuron j should be 0,
iff both neurons 1 and 2 have the same
value (o1=o2), otherwise 1:
netj = o1w1j + o2w2j
0 w1j + 0w2j < j
0 w1j + 1w2j j
1 w1j + 0w2j j
1 w1j + 1w2j < j
9
Supervised Learning
j
w1j
1
w2j
2
Linear
separable
o2
(1,1)
10
Supervised Learning
Learning is easy
Supervised Learning
Exercise
Decoding
input: binary code of a digit
output - unary representation:
as many digits 1, as the digit
represents:
5:11111
architecture:
12
Supervised Learning
Exercise
Decoding
input: Binary code of a digit
output: classification:
0~ 1st Neuron, 1~ 2nd Neuron, ... 5~ 6th
Neuron, ...
architecture:
13
Supervised Learning
Exercises
1. Look at the EXCEL-file of the decoding problem
2. Implement (in PASCAL/Java)
a 4-10-Perceptron which transforms a binary
representation of a digit (0..9) into a decimal
number.
Implement the learning algorithm and train the
network.
3. Which task can be learned faster?
(Unary representation or classification)
14
Supervised Learning
Exercises
5. Develop a perceptron for the
recognition of digits 0..9. (pixel
representation)
input layer: 3x7-input neurons
Use the SNNS or JavaNNS
6. Can we recognize numbers greater
than 9 as well?
7. Develop a perceptron for the
recognition of capital letters. (input
layer 5x7)
15
Supervised Learning
multi-layer Perceptron
Cancels the limits of a
perceptron
several trainable layers
a two layer perceptron can classify convex
polygons
a three layer perceptron can classify any sets
16
Supervised Learning
17
Supervised Learning
Feed-Forward Network
18
Supervised Learning
Training
pattern p
Oi=pi
netj
Nj
Oj=actj
netk
Nk
Ok=act
k
Input-Layer
19
Supervised Learning
hidden Layer(s)
Output Layer
Backpropagation-Learning
Algorithm
supervised Learning
error is a function of the weights w i :
E(W) = E(w1,w2, ... , wn)
We are looking for a minimal error
minimal error = hollow in the error
surface
Backpropagation uses the gradient
for weight adaptation
20
Supervised Learning
error curve
weight1
weight2
21
Supervised Learning
Problem
output
hidden
layer
input layer
22
Supervised Learning
teaching
output
Gradient descent
Gradient:
Vector orthogonal to a
surface in direction
of the strongest slope
0,80
0,40
-1
-0,6
0,00
-0,2
0,2
0,6
23
Supervised Learning
derivation of a function
in a certain direction is
the projection of the
gradient in this
direction
Example: Newton-Approximation
tan = f(x) = 2x
tan = f(x) / (x-x)
x =(x + a/x)
24
Supervised Learning
x = 2
x = (x + 5/x) = 2.25
X= (x + 5/x) =
2.2361
Backpropagation - Learning
gradient-descent algorithm
supervised learning:
error signal used for weight adaptation
error signal:
teaching calculated output , if output neuron
weighted sum of error signals of successor
weight adaptation:
: Learning rate
: error signal
25
Supervised Learning
w wij oi j
'
ij
Standard-Backpropagation Rule
gradient descent: derivation of a function
logistic function:
f Logistic ( x)
1
1 ex
o j (1 o j ) k w jk
if j is hidden neuron
o j (1 o j ) (t j o j ) if j is output neuron
wij' wij oi j
26
Supervised Learning
Backpropagation
Examples:
XOR (Excel)
Bank Customer
27
Supervised Learning
Backpropagation - Problems
28
Supervised Learning
Backpropagation-Problems
A: flat plateau
weight adaptation is slow
finding a minimum takes a lot of time
C: leaving a minimum
if the modification in one training step is to
high,
the minimum can be lost
29
Supervised Learning
Supervised Learning
Solution: Quickprop
assumption: error curve is a square function
calculate the vertex of the curve
S (t )
wij (t )
wij (t 1)
S (t 1) S (t )
slope of the error curve:
S (t )
-2
31
Supervised Learning
E
wij (t )
bij(t) =
bij(t-1)
bij(t-1) +
bij(t-1) otherwise
if S(t-1)S(t) > 0
if S(t-1)S(t) < 0
-wij(t-1)
if S(t-1)S(t) < 0
(*)
-sgn(S(t))bij(t)
otherwise
(*) S(t) is set to 0, S(t):=0 ; at time (t+1) the 4th case will be applied.
32
Supervised Learning
33
Supervised Learning
Exercise - JavaNNS
Supervised Learning
Pattern Recognition
Eingabeschicht
input layer
35
Supervised Learning
1. 1.
verdeckte
hidden
Schicht
layer
2.2.verdeckte
hidden
schicht
layer
Ausgabeoutput
schicht
layer
36
Supervised Learning
font Example
input = 24x24 pixel-array
output layer: 75 neurons, one neuron for each
character:
digits
letters (lower case, capital)
separators and operator characters
Supervised Learning
Exercise
load the network font_untrained
train the network, use various learning
algorithms:
(look at the SNNS documentation for the
parameters and their meaning)
Backpropagation
Backpropagation
with momentum
Quickprop
Rprop
=2.0
=0.8
=0.1
=0.6
mu=0.6
c=0.1
mg=2.0
n=0.0001
38
Supervised Learning
0.3
0.7
0.1
0.5
0.5
0.2
1.0
0.0
Supervised Learning
Data Pre-processing
objectives
prospects of better
results
adaptation to algorithms
data reduction
trouble shooting
40
Supervised Learning
methods
selection and
integration
completion
transformation
normalization
coding
filter
41
Supervised Learning
Completion / Cleaning
Missing values
ignore / omit attribute
add values
manual
global constant (missing
value)
average
highly probable value
remove data set
noised data
inconsistent data
42
Supervised Learning
Transformation
Normalization
Coding
Filter
43
Supervised Learning
Normalization of values
Normalization equally distributed
in the range [0,1]
e.g. for the logistic function
act = (x-minValue) / (maxValue - minValue)
in the range [-1,+1]
e.g. for activation function tanh
act = (x-minValue) / (maxValue - minValue)*2-1
logarithmic normalization
act = (ln(x) - ln(minValue)) / (ln(maxValue)ln(minValue))
44
Supervised Learning
45
Supervised Learning
Bank Customer
credit
history
46
Supervised Learning
debt
collateral income
3
2
special offer
estimated annual income per customer:
customer
given:
will
cancel
will
not cancel
gets an offer
43.80
66.30
gets no offer
0.00
72.00
problem:
test set containing 10,000 customer data
Who will cancel ? Whom to send an offer?
47
Supervised Learning
no mailing action:
9,000 x 72.00
gets an
offer
43.80
66.30
gets no
offer
0.00
72.00
= 648,000
48
Supervised Learning
will
will
cancel not
cancel
will
will
cancel not
cancel
gets an
offer
43.80
66.30
gets no
offer
0.00
72.00
49
Supervised Learning
<important
Data
results>
^missing values^
50
Supervised Learning
Supervised Learning
Results
data mining cup
2002
gain:
additional income by the mailing action
if target group was chosen according analysis
52
Supervised Learning
enthusias
m
better results
wishes
53
Supervised Learning
54
Supervised Learning
Data
55
Supervised Learning
DMC2007
56
Supervised Learning
57
Supervised Learning
Ability to generalize
a trained net can classify data
58
Supervised Learning
Development of an NN-application
calculate
network
output
build a network
architecture
input of training
pattern
modify
weights
change
parameters
error is too high
compare to
teaching
output
quality is good
enough
use Test set data
error is too
high
evaluate output
compare to teaching
output
quality is good enough
59
Supervised Learning
Possible Changes
Architecture of NN
size of a network
shortcut connection
partial connected layers
remove/add links
receptive areas
learning parameter
size of layers
using genetic algorithms
60
Supervised Learning
Memory Capacity
Number of patterns
a network can store without generalisation
figure out the memory capacity
change output-layer: output-layer input-layer
train the network with an increasing number of random
patterns:
error becomes small:
network stores all
patterns
error remains:
network can not store all patterns
in between: memory capacity
61
Supervised Learning
Supervised Learning
63
Supervised Learning
Summary
Feed-forward network
Perceptron (has limits)
Learning is Math
Backpropagation is a Backpropagation of Error
Algorithm
works like gradient descent
Activation Functions: Logistics, tanh
64
Supervised Learning