Вы находитесь на странице: 1из 16

Deep Learning

Anil Saraswathy
CTO, Inapp
AI over the years
2

A program that can


sense, reason, act,
and adapt

Algorithms whose
performance improve as
they are exposed to
more data over time
• Naive Bayes
• kNN
• Decision Tree Multi-layer neural
• SVM
• K-Means networks learn from
• Dimensionality Reduction, vast amounts of data
e.g., FLD, PCA
Intuition of Neural Network 3
Feed forward Neural networks 4

(x(i),y(i))

hθ(x(i))

hypothesis

Forward Propagation

Learning is the adjusting of the weights wi,j such that


the cost function J(θ) is minimized (a form of Hebbian
learning).
Simple learning procedure: Back Propagation (of the error signal)
5

Example Application
• Handwriting Digit Recognition
x1 y1
x2
y2
Machine “2”
……

……
x256 y10
𝑓: 𝑅256 → 𝑅10
In deep learning, the function 𝑓 is
represented by neural network
6

Element of Neural Network


Neuron 𝑓: 𝑅𝐾 → 𝑅

a1 w1 z  a1w1  a2 w2    aK wK  b

a2 w2
z  z 
 a
wK

aK Activation
weights function
b
bias
Neural Network
neuron
Input Layer 1 Layer 2 Layer L Output
x1 …… y1
x2 …… y2

……
……

……

……

……
xN …… yM
Input Output
Layer Hidden Layers Layer

Deep means many hidden layers


Example of Neural Network
1 4 0.98
1
-2
1
-1 -2 0.12
-1
1
0
Sigmoid Function  z 

 z  
1
z
1 e z
Example of Neural Network
1 0.73 2 0.72 3 0.51
0
-2 -1 -1
1 0 -2
-1 0.5 -2 0.12 -1 0.85
0
1 -1 4
0 0 2

𝑓: 𝑅2 → 𝑅2 1 0.62 0 0.51
𝑓 = 𝑓 =
−1 0.83 0 0.85
Different parameters define different function
Softmax as output
Probability:
 1 > 𝑦𝑖 > 0
 σ𝑖 𝑦𝑖 = 1
Softmax Layer

3 0.88 3

e
20
z1 e e z1
 y1  e z1 zj

j 1

1 0.12 3
z2 e e z 2 2.7
 y2  e z2
e
zj

j 1
0.05 ≈0
z3 -3
3
e e z3
 y3  e z3
e
zj

3 j 1

 e zj

j 1
How to set network parameters
𝜃 = 𝑊 1 , 𝑏1 , 𝑊 2 , 𝑏 2 , ⋯ 𝑊 𝐿 , 𝑏 𝐿
x1 …… y1
0.1 is 1

Softmax
x2 ……
…… y2
0.7 is 2

……

……
x256 …… y10
0.2 is 0
16 x 16 = 256
Ink → 1 Set the network parameters 𝜃 such that ……
No ink → 0
Input: y1 has the maximum value

Input: y2 has the maximum value


Training Data
• Preparing training data: images and their
labels

“5” “0” “4” “1”

“9” “2” “1” “3”

Using the training data to find


the network parameters.
Cost
Given a set of network parameters 𝜃, each
example has a cost value.
“1”

x1 …… y0.2
1 1
x2 …… y2
0.3 0
Cost
……

……
……

……

……
x256 …… y0.5 𝐿(𝜃) 0
10

target
Cost can be Euclidean distance or cross entropy of the
network output and target
Total Cost
For all training data …
Total Cost: 𝑅
x1 NN y1 𝑦ො 1 𝐶 𝜃 = ෍ 𝐿𝑟 𝜃
𝐿1 𝜃
𝑟=1
x2 NN y2 𝑦ො 2
𝐿2 𝜃 How bad the network
parameters 𝜃 is on
x3 NN y3 𝑦ො 3 this task
𝐿3 𝜃
……
……

……
……

Find the network


parameters 𝜃 ∗ that
xR NN yR 𝑦ො 𝑅 minimize this value
𝐿𝑅 𝜃
Mini-batch Faster Better!
 Randomly initialize 𝜃 0
x1 NN y1 𝑦ො 1  Pick the 1st batch
Mini-batch

𝐶1 𝐶 = 𝐶 1 + 𝐶 31 + ⋯
x31 NN y31 𝑦ො 31 𝜃 1 ← 𝜃 0 − 𝜂𝛻𝐶 𝜃 0
𝐶 31  Pick the 2nd batch
……

𝐶 = 𝐶 2 + 𝐶 16 + ⋯
𝜃 2 ← 𝜃 1 − 𝜂𝛻𝐶 𝜃 1
x2 NN y2 𝑦ො 2
Mini-batch


𝐶2  Until all mini-batches
have been picked
x16 NN y16 𝑦ො 16
𝐶 16 one epoch
……

Repeat the above process


Conclusion 16

• Be well versed in data structures and algorithms.


• Be fluent in one or more programming languages
• Be a full-stack engineer
• Be familiar with several design approaches
• Be able to translate vague design requirements
into clear specifications
• Be able to move among several abstractions at
different stages of a project.
• Be able to write solid code (reliable, efficient and
easy to maintain)
• Be able to communicate well with others

Вам также может понравиться