Assignment9 DeepLearning

NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur
Deep Learning
Assignment- Week 9
TYPE OF QUESTION: MCQ/MSQ
Number of questions: 10 Total mark: 10 X 2 = 20
______________________________________________________________________________
QUESTION 1:
Gradient of a particular dimension points to the same direction. What can you say about the
momentum parameter, γ. Choose the correct option.
a. The momentum term increases

b. The momentum term decreases
c. Cannot comment
d. It remains the same
Correct Answer: a
Detailed Solution:
When we push a ball down a hill, the ball accumulates momentum as it rolls downhill,
becoming faster and faster on the way (until it reaches its terminal velocity if there is air
resistance, i.e. γ<1). The analogy is same for parameter updates and thus option a.
______________________________________________________________________________
QUESTION 2:
Comment on the learning rate of Adagrad. Choose the correct option.
a. Learning rate is adaptive
b. Learning rate increases for each time step
c. Learning rate remains the same for each update
d. None of the above
Correct Answer: a
Detailed Solution:
Adagrad is an adaptive learning rate method. Each and every parameter adapts varied learning
rate
______________________________________________________________________________
QUESTION 3:
Adagrad has its own limitations. Can you choose that limitation from the following options?
a. Accumulation of the positive squared gradients in the denominator
b. Overshooting minima
c. Learning rate increases thus hindering convergence and cause the loss function
to fluctuate around the minimum or even to diverge
d. Getting trapped in local minima
Correct Answer: a
Detailed Solution:
Accumulation of the squared gradients in the denominator is a problem as every added term is
positive, thus accumulated sum keeps growing during training. This in causes the learning rate
to shrink and eventually become infinitesimally small.
______________________________________________________________________________
QUESTION 4:
What is the full form of RMSProp?
a. Retain Momentum Propagation
b. Round Mean Square Propagation
c. Root Mean Square Propagation
Correct Answer: c
Detailed Solution:
RMSProp is for Root Mean Square Propagation
______________________________________________________________________________
QUESTION 5:
RMSProp resolves the limitation of which optimizer?
a. Adagrad
b. Momentum
c. Both a and b
d. Neither a nor b
Correct Answer: a
Detailed Solution:
RMSProp tries to resolve Adagrad’s radically diminishing learning rates by using a moving
average of the squared gradient. It utilizes the magnitude of the recent gradient descents to
normalize the gradient
____________________________________________________________________________
QUESTION 6:
Which of the following statement is true?
a. Gradient update rule for Momentum optimizer and Nesterov Accelerated

Gradient (NAG) optimizer are same
b. Momentum optimizer and Nesterov Accelerated Gradient (NAG) optimizer
perform differently irrespective of learning rates
c. Possibility of oscillations is lower in Nesterov Accelerated Gradient (NAG) than
Momentum optimiser
Correct Answer: c
Detailed Solution:
We can view Nesterov Accelerated Gradient (NAG) as the correction factor for Momentum
optimizer. If the velocity added gives a high loss, momentum method can be very slow as the
optimization path taken exhibits large oscillations. In NAG, if the added velocity (which is used
to calculate intermediate parameter) leads to a bad loss, then the gradient will direct back
towards last position. This helps NAG avoid oscillations.
_____________________________________________________________________________
QUESTION 7:
The following is the equation of update vector for momentum optimizer:
v 𝑡𝑡=𝛾𝛾v 𝑡𝑡−1 + 𝜂𝜂 𝛻𝛻 𝜃𝜃 𝐽𝐽(𝜃𝜃)
What is the range of 𝛾𝛾?
a. 0 and 1
b. >0
c. >=0
d. >=1
Correct Answer: a
Detailed Solution:
A fraction of the update vector of the past time step is added to the current update vector. 𝛾𝛾 is
that fraction which lies between 0 and 1.
______________________________________________________________________________
QUESTION 8:
Why it is at all required to choose different learning rates for different weights?
a. To avoid the problem of diminishing learning rate

b. To avoid overshooting the optimum point
c. To reduce vertical oscillations while navigating the optimum point
d. This would aid to reach the optimum point faster
Correct Answer: d
Detailed Solution:
In case of adaptive learning rate, learning rate will be reduced for parameters with high
gradient and learning rate will be increased for parameter with small gradient. This would aid in
reaching the optimum point much faster. This is the benefit of choosing different learning rate
for different weights.
______________________________________________________________________________
QUESTION 9:
What is the major drawback of setting a large learning rate for updating weight parameter for
Gradient Descent?
a. Slower convergence
b. Struck in local minima
c. Overshoots optimum point
Correct Answer: b
Detailed Solution:
Too large learning rate can cause the loss function to fluctuate around the optimum point or
even diverge.
____________________________________________________________________________
QUESTION 10:
For a smaller magnitude gradient descent, what should be the suggested learning rate for
updating the weights?
a. Small
b. Large
c. Cannot comment
d. Same learning rate for small and large gradient magnitudes
Correct Answer: b
Detailed Solution:
For a smaller gradient magnitude, learning rate should be large to reach the optimum quickly
______________________________________________________________________
************END*******

Assignment9 DeepLearning

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Assignment9 DeepLearning

Загружено:

Авторское право:

Доступные форматы

NPTEL Online Certification Courses

Indian Institute of Technology Kharagpur

a. The momentum term increases

RMSProp is for Root Mean Square Propagation

a. Gradient update rule for Momentum optimizer and Nesterov Accelerated

a. To avoid the problem of diminishing learning rate

Вам также может понравиться