Вы находитесь на странице: 1из 6

NPTEL Online Certification Courses

Indian Institute of Technology Kharagpur

Deep Learning
Assignment- Week 9
TYPE OF QUESTION: MCQ/MSQ
Number of questions: 10 Total mark: 10 X 2 = 20
______________________________________________________________________________

QUESTION 1:
Gradient of a particular dimension points to the same direction. What can you say about the
momentum parameter, γ. Choose the correct option.

a. The momentum term increases


b. The momentum term decreases
c. Cannot comment
d. It remains the same

Correct Answer: a

Detailed Solution:

When we push a ball down a hill, the ball accumulates momentum as it rolls downhill,
becoming faster and faster on the way (until it reaches its terminal velocity if there is air
resistance, i.e. γ<1). The analogy is same for parameter updates and thus option a.

______________________________________________________________________________

QUESTION 2:
Comment on the learning rate of Adagrad. Choose the correct option.
a. Learning rate is adaptive
b. Learning rate increases for each time step
c. Learning rate remains the same for each update
d. None of the above

Correct Answer: a
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur

Detailed Solution:

Adagrad is an adaptive learning rate method. Each and every parameter adapts varied learning
rate

______________________________________________________________________________

QUESTION 3:
Adagrad has its own limitations. Can you choose that limitation from the following options?
a. Accumulation of the positive squared gradients in the denominator
b. Overshooting minima
c. Learning rate increases thus hindering convergence and cause the loss function
to fluctuate around the minimum or even to diverge
d. Getting trapped in local minima

Correct Answer: a

Detailed Solution:

Accumulation of the squared gradients in the denominator is a problem as every added term is
positive, thus accumulated sum keeps growing during training. This in causes the learning rate
to shrink and eventually become infinitesimally small.

______________________________________________________________________________

QUESTION 4:
What is the full form of RMSProp?
a. Retain Momentum Propagation
b. Round Mean Square Propagation
c. Root Mean Square Propagation
d. None of the above

Correct Answer: c
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur

Detailed Solution:

RMSProp is for Root Mean Square Propagation

______________________________________________________________________________

QUESTION 5:
RMSProp resolves the limitation of which optimizer?
a. Adagrad
b. Momentum
c. Both a and b
d. Neither a nor b

Correct Answer: a

Detailed Solution:

RMSProp tries to resolve Adagrad’s radically diminishing learning rates by using a moving
average of the squared gradient. It utilizes the magnitude of the recent gradient descents to
normalize the gradient

____________________________________________________________________________

QUESTION 6:
Which of the following statement is true?

a. Gradient update rule for Momentum optimizer and Nesterov Accelerated


Gradient (NAG) optimizer are same
b. Momentum optimizer and Nesterov Accelerated Gradient (NAG) optimizer
perform differently irrespective of learning rates
c. Possibility of oscillations is lower in Nesterov Accelerated Gradient (NAG) than
Momentum optimiser
d. None of the above

Correct Answer: c
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur

Detailed Solution:

We can view Nesterov Accelerated Gradient (NAG) as the correction factor for Momentum
optimizer. If the velocity added gives a high loss, momentum method can be very slow as the
optimization path taken exhibits large oscillations. In NAG, if the added velocity (which is used
to calculate intermediate parameter) leads to a bad loss, then the gradient will direct back
towards last position. This helps NAG avoid oscillations.

_____________________________________________________________________________

QUESTION 7:
The following is the equation of update vector for momentum optimizer:
v 𝑡𝑡=𝛾𝛾v 𝑡𝑡−1 + 𝜂𝜂 𝛻𝛻 𝜃𝜃 𝐽𝐽(𝜃𝜃)
What is the range of 𝛾𝛾?
a. 0 and 1
b. >0
c. >=0
d. >=1

Correct Answer: a

Detailed Solution:

A fraction of the update vector of the past time step is added to the current update vector. 𝛾𝛾 is
that fraction which lies between 0 and 1.
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur

______________________________________________________________________________

QUESTION 8:
Why it is at all required to choose different learning rates for different weights?

a. To avoid the problem of diminishing learning rate


b. To avoid overshooting the optimum point
c. To reduce vertical oscillations while navigating the optimum point
d. This would aid to reach the optimum point faster

Correct Answer: d

Detailed Solution:

In case of adaptive learning rate, learning rate will be reduced for parameters with high
gradient and learning rate will be increased for parameter with small gradient. This would aid in
reaching the optimum point much faster. This is the benefit of choosing different learning rate
for different weights.

______________________________________________________________________________

QUESTION 9:
What is the major drawback of setting a large learning rate for updating weight parameter for
Gradient Descent?

a. Slower convergence
b. Struck in local minima
c. Overshoots optimum point
d. None of the above

Correct Answer: b
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur

Detailed Solution:

Too large learning rate can cause the loss function to fluctuate around the optimum point or
even diverge.

____________________________________________________________________________

QUESTION 10:
For a smaller magnitude gradient descent, what should be the suggested learning rate for
updating the weights?

a. Small
b. Large
c. Cannot comment
d. Same learning rate for small and large gradient magnitudes

Correct Answer: b

Detailed Solution:

For a smaller gradient magnitude, learning rate should be large to reach the optimum quickly

______________________________________________________________________

************END*******

Вам также может понравиться