Вы находитесь на странице: 1из 17

1/20/2018 3 Types of Gradient Descent Algorithms for Small & Large Data Sets | HackerEarth Blog

(http://www.blog.hackerearth.com/)

3 Types of Gradient Descent Algorithms for Small & Large


Data Sets
Machine Learning (http://blog.hackerearth.com/machine-learning)
 March 7, 2017

http://blog.hackerearth.com/3-types-gradient-descent-algorithms-small-large-data-sets 1/17
1/20/2018 3 Types of Gradient Descent Algorithms for Small & Large Data Sets | HackerEarth Blog

http://blog.hackerearth.com/3-types-gradient-descent-algorithms-small-large-data-sets 2/17
1/20/2018 3 Types of Gradient Descent Algorithms for Small & Large Data Sets | HackerEarth Blog

   40 5
(https:// (https:// (http:// (https://
plus.go twitter. www.fa www.lin
ogle.co com/sh cebook. kedin.c
m/shar are? com/sh om/cws
e? original are.php /share?
url=htt _referer ? url=htt
p%3A% =/&text u=http p%3A%
2F%2Fb =3+Typ %3A%2 2F%2Fb
log.hac es+of+ F%2Fbl log.hac
kereart Gradien og.hack kereart
h.com% t+Desc erearth. h.com%
2F3- ent+Alg com%2 2F3-
types- orithms F3- types-
gradien +for+S types- gradien
t- mall+% gradien t-
descent 26+Lar t- descent
- ge+Dat descent -
algorith a+Sets - algorith
ms- &url=ht algorith ms-
small- tp://blo ms- small-
large- g.hacke small- large-
data- rearth.c large- data-
sets) om/3- data- sets)
types- sets)
gradien
t-
descent
-
algorith
ms-
small-
large-
data-
sets)

Introduction

http://blog.hackerearth.com/3-types-gradient-descent-algorithms-small-large-data-sets 3/17
1/20/2018 3 Types of Gradient Descent Algorithms for Small & Large Data Sets | HackerEarth Blog

Gradient Descent Algorithm is an iterative algorithm to nd a Global Minimum of an objective function (cost  function) J(Ө). The
categorization of GD algorithm is for accuracy and time consuming factors that are discussed below in detail. This algorithm is
widely used in machine learning for minimization of functions. Here,the algorithm to achieve objective goal of picture below is in
this tutorial below.

Why use gradient descent algorithm?


We use gradient descent to minimize the functions like J(Θ). In gradient descent, our rst step is to initialize the parameters by
some value and keep changing these values till we reach the global minimum. In this algorithm, we calculate the derivative of
 cost function in every iteration and update the values of parameters simultaneously using  the formula:

where 'α' is the learning rate.

We will consider linear regression for algorithmic example in this article while talking about gradient descent, although the ideas
apply to other algorithms too, such as

Logistic regression

Neural networks

In linear regression we have a hypothesis function:

Where are parameters and are input features. In order to solve the model, we try to nd the parameter, such
that the hypothesis ts the model in the best possible way.To nd the value of parameters we develop a cost function J( ) and use
gradient descent to minimize this function.

Cost function (ordinary least square error)

http://blog.hackerearth.com/3-types-gradient-descent-algorithms-small-large-data-sets 4/17
1/20/2018 3 Types of Gradient Descent Algorithms for Small & Large Data Sets | HackerEarth Blog

Gradient of Cost function

Plot between parameter and cost function

How gradient descent algorithm works?


The following pseudo-code explains the working :

1. Initialize the parameter with some values. (Say )

2. Keep changing these values iteratively in such a way it minimize the objective function, J( ).

Types of Gradient Descent Algorithms


Various variants of gradient descent are de ned on the basis of how we use the data to calculate derivative of cost function in
gradient descent. Depending upon the amount of data used, the time complexity and accuracy of the algorithms differs with each
other.

1. Batch Gradient Descent

2. Stochastic Gradient Descent

http://blog.hackerearth.com/3-types-gradient-descent-algorithms-small-large-data-sets 5/17
1/20/2018 3 Types of Gradient Descent Algorithms for Small & Large Data Sets | HackerEarth Blog

3. Mini-Batch Gradient Descent

How does the batch gradient descent work?


It is the rst basic type of gradient descent in which we use the complete dataset available to compute the gradient of cost
function.
As we need to calculate the gradient on the whole dataset to perform just one update, batch gradient descent can be very slow
and is intractable for datasets that don't t in memory. After initializing the parameter with arbitrary values we calculate gradient
of cost function using following relation:

where 'm' is the number of training examples.

If you have 300,000,000 records you need to read in all the records into memory from disk because you can't store them all
in memory.

 After calculating sigma for one iteration, we move one step.

Then repeat for every step.

This means it take a long time to converge.

Especially because disk I/O is typically a system bottleneck anyway, and this will inevitably require a huge number of reads.

http://blog.hackerearth.com/3-types-gradient-descent-algorithms-small-large-data-sets 6/17
1/20/2018 3 Types of Gradient Descent Algorithms for Small & Large Data Sets | HackerEarth Blog

Contour plot: after every iteration

Batch gradient descent is not suitable for huge datasets. The code below explains implementing gradient descent in python.

1 import numpy as np
2 import random
3
4
5 def gradient_descent(alpha, x, y, ep=0.0001, max_iter=10000):
6 converged = False
7 iter = 0
8 m = x.shape[0] # number of samples
9
10 # initial theta
11 t0 = np.random.random(x.shape[1])
12 t1 = np.random.random(x.shape[1])
13
14 # total error, J(theta)
15 J = sum([(t0 + t1*x[i] - y[i])**2 for i in range(m)])
16
17 # Iterate Loop
18 while not converged:
19 # for each training sample, compute the gradient (d/d_theta j(theta))
20 grad0 = 1.0/m * sum([(t0 + t1*x[i] - y[i]) for i in range(m)])
21 grad1 = 1.0/m * sum([(t0 + t1*x[i] - y[i])*x[i] for i in range(m)])

http://blog.hackerearth.com/3-types-gradient-descent-algorithms-small-large-data-sets 7/17
1/20/2018 3 Types of Gradient Descent Algorithms for Small & Large Data Sets | HackerEarth Blog

22
23 # update the theta_temp
24 temp0 = t0 - alpha * grad0
25 temp1 = t1 - alpha * grad1
26
27 # update theta
28 t0 = temp0
29 t1 = temp1
30
31 # mean squared error
32 e = sum( [ (t0 + t1*x[i] - y[i])**2 for i in range(m)] )
33
34 if abs(J-e) <= ep:
35 print 'Converged, iterations: ', iter, '!!!'
36 converged = True
37
38 J = e # update error
39 iter += 1 # update iter
40
41 if iter == max_iter:
42 print 'Max interactions exceeded!'
43 converged = True
44
45 return t0,t1

gradient_descent view raw (https://gist.github.com/saraswatmks/f460c1113ed2d3c999e1324f76fceb43/raw/2a7cb9fc828cc91421e427902f3a9c8463b0fc84/gradient_descent)


(https://gist.github.com/saraswatmks/f460c1113ed2d3c999e1324f76fceb43# le-gradient_descent) hosted with ❤ by GitHub (https://github.com)

How does stochastic gradient descent works?


Batch Gradient Descent turns out to be a  slower algorithm. So, for faster computation, we prefer to use stochastic gradient
descent.
The rst step of algorithm is to randomize the whole training set. Then, for updation of every parameter we use only one training
example in every iteration to compute the gradient of cost function. As it uses one training example in every iteration this algo is
faster for larger data set. In SGD, one might not achieve accuracy, but the computation of results are faster.
After initializing the parameter with arbitrary values we calculate gradient of cost function using following relation:

http://blog.hackerearth.com/3-types-gradient-descent-algorithms-small-large-data-sets 8/17
1/20/2018 3 Types of Gradient Descent Algorithms for Small & Large Data Sets | HackerEarth Blog

where, 'm' is the number of training examples

Following is the pseudo code for stochastic gradient descent:

In the inner loop:

Taking rst step: pick rst training example and update the parameter using this example, then for second example and so
on

Taking second step: pick second training example and update the parameter using this example, and so on for ' m '.

Now take third ... n steps in algorithm.

Until we reach global minimum.

http://blog.hackerearth.com/3-types-gradient-descent-algorithms-small-large-data-sets 9/17
1/20/2018 3 Types of Gradient Descent Algorithms for Small & Large Data Sets | HackerEarth Blog

SGD Never actually converges like batch gradient descent does,but ends up wandering around some region close to the global
minimum.

How does mini batch gradient descent work?


Mini batch algorithm is the most favorable and widely used algorithm that makes precise and faster results using a batch of ' m ' 
training examples. In mini batch algorithm rather than using  the complete data set, in every iteration we use a set of 'm' training
examples called batch to compute the gradient of the cost function. Common mini-batch sizes range between 50 and 256, but can
vary for different applications.
In this way, algorithm

reduces the variance of the parameter updates, which can lead to more stable convergence.

can make use of highly optimized matrix, that  makes computing of  gradient very ef cient.

After initializing the parameter with arbitrary values we calculate gradient of cost function using following relation:

http://blog.hackerearth.com/3-types-gradient-descent-algorithms-small-large-data-sets 10/17
1/20/2018 3 Types of Gradient Descent Algorithms for Small & Large Data Sets | HackerEarth Blog

where ' b '  is number of batches and ' m ' is number training examples.

Some of the important points to remember are:

Updating Parameter Simultaneously-


While implementing the algorithm, updating of parameter should be done simultaneously. This means,
during values of parameters should be store rst in some temporary variable then assigned to the parameters.

Learning rate 'α'-


α is crucial parameter that controls how large steps our algorithm takes.

1. If α is too large algorithm would take larger steps and algorithm may not converge .

2. if α is small, then smaller will be the steps and esay to converge.

http://blog.hackerearth.com/3-types-gradient-descent-algorithms-small-large-data-sets 11/17
1/20/2018 3 Types of Gradient Descent Algorithms for Small & Large Data Sets | HackerEarth Blog

Checking working of gradient descent-


Plot the curve  between Number of Iterations and value of cost function after that number of iteration. This plot helps to
identify whether gradient descent is working properly or  not.

" J(Θ) should decrease after every iteration and should become constant (or converge ) after some iterations."

Above statement is because after every iteration of gradient descent and Θ takes values such that J(Θ) moves towards depth
i.e. value of J(Θ) decreases after every iteration.

http://blog.hackerearth.com/3-types-gradient-descent-algorithms-small-large-data-sets 12/17
1/20/2018 3 Types of Gradient Descent Algorithms for Small & Large Data Sets | HackerEarth Blog

a                       J(Θ) decreases with iteration

Variation in gradient descent with learning rate-

http://blog.hackerearth.com/3-types-gradient-descent-algorithms-small-large-data-sets 13/17
1/20/2018 3 Types of Gradient Descent Algorithms for Small & Large Data Sets | HackerEarth Blog

Summary
In this article, we learned about the basics of gradient descent algorithm and its types. These optimization algorithms are being
widely used in neural networks these days. Hence, it's important to learn. The image below shows a quick comparison in all 3
types of gradient descent algorithms:

(http://blog.hackerearth.com/wp-content/uploads/2017/02/Gradient_Descent_Types.png)

http://blog.hackerearth.com/3-types-gradient-descent-algorithms-small-large-data-sets 14/17
1/20/2018 3 Types of Gradient Descent Algorithms for Small & Large Data Sets | HackerEarth Blog

   40 5
(https:// (https:// (http:// (https://
plus.go twitter. www.fa www.lin
ogle.co com/sh cebook. kedin.c
m/shar are? com/sh om/cws
e? original are.php /share?
url=htt _referer ? url=htt
p%3A% =/&text u=http p%3A%
2F%2Fb =3+Typ %3A%2 2F%2Fb
log.hac es+of+ F%2Fbl log.hac
kereart Gradien og.hack kereart
h.com% t+Desc erearth. h.com%
2F3- ent+Alg com%2 2F3-
types- orithms F3- types-
gradien +for+S types- gradien
t- mall+% gradien t-
descent 26+Lar t- descent
- ge+Dat descent -
algorith a+Sets - algorith
ms- &url=ht algorith ms-
small- tp://blo ms- small-
large- g.hacke small- large-
data- rearth.c large- data-
sets) om/3- data- sets)
types- sets)
gradien
t-
descent
-
algorith
ms-
small-
large-
data-
sets)

ABOUT THE AUTHOR


http://blog.hackerearth.com/3-types-gradient-descent-algorithms-small-large-data-sets 15/17
1/20/2018 3 Types of Gradient Descent Algorithms for Small & Large Data Sets | HackerEarth Blog

Anubhav Anushi Tushar (Http://Blog.Hackerearth.Com/Author/Anubhav?Post)  (Mailto: Anubhavaron000051@Gmail.Com)


3 learners pursuing graduation (DTUite)

AUTHOR POST

http://blog.hackerearth.com/3-types-gradient-descent-algorithms-small-large-data-sets 16/17
1/20/2018 3 Types of Gradient Descent Algorithms for Small & Large Data Sets | HackerEarth Blog

(http://blog.hackerearth.com/3-types-gradient-descent-algorithms-small-large-data-sets)

MACHINE LEARNING (HTTP://BLOG.HACKEREARTH.COM/MACHINE-LEARNING)


3 Types of Gradient Descent Algorithms for Small & Large Data Sets
(http://blog.hackerearth.com/3-types-gradient-descent-algorithms-small-large-data-sets)
Mar 7, 2017

http://blog.hackerearth.com/3-types-gradient-descent-algorithms-small-large-data-sets 17/17

Вам также может понравиться