Академический Документы
Профессиональный Документы
Культура Документы
INPUT
(1) ()
1 1
X=[ ]
(1) ()
OUTPUT
= [
(1)
() ] where
is the probability that y = 1
PARAMETERS
1
weights : [ ]
1
biases : [ ]
activation function : g(z)
HYPERPARAMETERS
learning rate
number of iterations
WORKING
FORWARD PROPAGATION
Outputs A for given X.
= +
= () // Here, g = sigmoid
COST CALCULATION
Calculates the cost for generated A.
= (, )
Note that cost-function J(w, b) contains all the information about correct answers. J can be
modeled parallel to any distance function which has a minima, such as [ylog(a) + (1
y)log(1 a)].
BACKWARD PROPAGATION
We want to minimize J by updating w and b. To do that, well need J functions, which will
retain information about Y. We derive J based on the model chosen for J and plug-in the
missing values for Y. Once we have J(w) and J(b) functions, we can easily see if we need to
increment or decrement w and b just by plugging them in.
= // dZ depends on the choice of g(Z) and J(Y, A)
1
=
1
=
PARAMETER UPDATE
Finally, we update w and b using dw and db calculated in the last step.
Shallow Neural Network
INPUT
(1) ()
1 1
X=[ ]
(1) ()
[1]
1
[1] = [ ] (1 , )
[1]
1
[1]
1
[1] = [ ] (1 , 1)
[1]
1
[1] () // Usually ReLU
1 : number of units in hidden layer
WORKING
FORWARD PROPAGATION
COST CALCULATION
= (, )
BACKWARD PROPAGATION
The cache is passed as input to hidden layer in back propagation.
[2] = [2] //
1
[2] = [2] [1]
1
[2] = [2]
=1
[1] = [2] [2] [1] ( [1] )
1
[1] = [1]
1
[1] = [1]
=1
PARAMETER UPDATE
Finally, the parameters are updated using the calculated derivatives.