Written Assignment 1

Загружено:

Narayan

0% нашли этот документ полезным (0 голосов)

84 просмотров2 страницы

RL assignment

Оригинальное название

Written Assignment 1 (1)

Авторское право

Доступные форматы

PDF, TXT или читайте онлайн в Scribd

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Пожаловаться на этот документ

RL assignment

Авторское право:

Доступные форматы

Скачайте в формате PDF, TXT или читайте онлайн в Scribd

Отметить как неприемлемый контент

0% нашли этот документ полезным (0 голосов)

84 просмотров2 страницы

Written Assignment 1

Загружено:

Narayan

RL assignment

Авторское право:

Доступные форматы

Скачайте в формате PDF, TXT или читайте онлайн в Scribd

Отметить как неприемлемый контент

Перейти к странице

Вы находитесь на странице: 1из 2

Поиск в документе

CS6700 : Reinforcement Learning

Written Assignment #1
Intro to RL and Bandits Deadline: 16 Feb 2018, 11:55 pm
• This is an individual assignment. Collaborations and discussions are strictly prohibited.
• Be precise with your explanations. Unnecessary verbosity will be penalized.
• Check the Moodle discussion forums regularly for updates regarding the assignment.
• Please start early.
• Turn in only the answers on Turnitin.

1. The results shown in Figure 2.3 (of course text book uploaded in moodle) should be
quite reliable because they are averages over 2000 individual, randomly chosen 10-armed
bandit tasks. Why, then, are there oscillations and spikes in the early part of the curve
for the optimistic method? In other words, what might make this method perform
particularly better or worse, on average, on particular early steps?

2. Many tic-tac-toe positions appear different but are really the same because of symme-
tries. How might we amend the learning process described above to take advantage of
this? In what ways would this change improve the learning process?
Now think again. Suppose the opponent did not take advantage of symmetries. In
that case, should we? Is it true, then, that symmetrically equivalent positions should
necessarily have the same value?

3. Suppose, instead of playing against a random opponent, the reinforcement learning al-
gorithm described above played against itself, with both sides learning. What do you
think would happen in this case? Would it learn a different policy for selecting moves?

4. RL systems do not have to be ’taught’ by knowledgeable ’teachers’; they learn from

their own experiences. But teachers of various types can still be helpful. Describe two
different ways in which a teacher might facilitate RL. For each, explain how it can make
learning more efficient.

5. Consider a multi-armed bandit setup where horizon is T = 10000 timesteps and the
number of arms K = 100. After every 1000 timesteps the distribution changes. For eg:
for t = 1 to 1000 the arm a10 was the optimal arm while from t = 1000 to 2000 arm a80
becomes the optimal arm. The distribution for the other arms also changes. The point
from where the distribution changes is termed as breakpoint. So in this environment
there are 9 breakpoints.

• Between UCB1 and softmax algorithm which one will you choose in this setting?
Justify your answer.
• Devise a modified UCB1 algorithm which will work in this setting and justify your
choice. Will this work better than softmax? Why? (Hint: Use your ideas from
MDP.)

6. Define a bandit set up as follows. At each time instant for each arm of the bandit we
sample a reward from some unknown distribution. Now the agent picks an arm. The
environment then reveals all the rewards that were chosen. Regret is now defined as
the difference between the best arm at that instant and the one chosen summed over
all times steps. Would the existing algorithms for bandit problems work well in this
setting? Can we do better by taking advantage of the fact that all rewards are revealed?
For e.g., exploration is not an issue now, since all arms are revealed at each time step.

7. Suppose you face a 2-armed bandit task whose true action values change randomly from
time step to time step. Specifically, suppose that, for any time step, the true values of
actions 1 and 2 are respectively 0.1 and 0.2 with probability 0.5 (case A), and 0.9 and
0.8 with probability 0.5 (case B). If you are not able to tell which case you face at any
step, what is the best expectation of success you can achieve and how should you behave
to achieve it? Now suppose that on each step you are told whether you are facing case
A or case B (although you still dont know the true action values). This is an associative
search task. What is the best expectation of success you can achieve in this task, and
how should you behave to achieve it?

Page 2

Вам также может понравиться

notes_exercise_RL
Документ31 страница
notes_exercise_RL
Keerthana Chirumamilla
Оценок пока нет
Linear Programming Formulation Exercises
Документ40 страниц
Linear Programming Formulation Exercises
Ikram Boukhelif
100% (1)
Modelling
Документ6 страниц
Modelling
Cat
Оценок пока нет
Rec 3
Документ3 страницы
Rec 3
gsdgsd
Оценок пока нет
Introduction
Документ6 страниц
Introduction
Arjun Singh
Оценок пока нет
Stat Ess Mod 3 Ses 1
Документ29 страниц
Stat Ess Mod 3 Ses 1
Akila
100% (1)
Linear Programming
Документ209 страниц
Linear Programming
Duma Dumai
Оценок пока нет
Solution Manual For Introduction To Data Mining 2nd by Tan
Документ38 страниц
Solution Manual For Introduction To Data Mining 2nd by Tan
glaivefang2a8v67
100% (7)
What Metrics Do You Use To Evaluate A Model?
Документ10 страниц
What Metrics Do You Use To Evaluate A Model?
Saumya Kumari
Оценок пока нет
Advanced Linear Programming
Документ209 страниц
Advanced Linear Programming
praneeth nagasai
100% (1)
Data Handling in Machine Learning
Документ8 страниц
Data Handling in Machine Learning
Shivendra Chand
Оценок пока нет
c3 Coursework Model Answer
Документ5 страниц
c3 Coursework Model Answer
xokcccifg
100% (2)
Deep Learning Interview Questions - Deep Learning Questions
Документ21 страница
Deep Learning Interview Questions - Deep Learning Questions
hehee
Оценок пока нет
Reinforcement Learning - Chapter 2
Документ22 страницы
Reinforcement Learning - Chapter 2
Sivasathiya G
100% (1)
What Is Bagging and How It Works Machine Learning 1615649432
Документ5 страниц
What Is Bagging and How It Works Machine Learning 1615649432
Kshatrapati Singh
100% (1)
Optimization: Practice Problems Assignment Problems
Документ11 страниц
Optimization: Practice Problems Assignment Problems
Nardo Gunayon Finez
Оценок пока нет
Review Empirical Rule, Normality, Decision Making
Документ19 страниц
Review Empirical Rule, Normality, Decision Making
lance Wu
Оценок пока нет
1 e Exercises
Документ31 страница
1 e Exercises
zubairulhassan
Оценок пока нет
Basic Interview Q's On ML PDF
Документ243 страницы
Basic Interview Q's On ML PDF
sourajit roy chowdhury
100% (2)
Reinforcement Learning: States, Algorithms, and Factors
Документ5 страниц
Reinforcement Learning: States, Algorithms, and Factors
Sudharshan Venkatesh
Оценок пока нет
Target: Learning Guide Module Subject Code Stat1 Module Code 10.0 Lesson Code Time Limit
Документ4 страницы
Target: Learning Guide Module Subject Code Stat1 Module Code 10.0 Lesson Code Time Limit
Maeve Dizon
Оценок пока нет
6103mar2022 Repeat
Документ2 страницы
6103mar2022 Repeat
ajay_bhoriya
Оценок пока нет
Math 8 Q2.
Документ47 страниц
Math 8 Q2.
Kclyn Tagayun
Оценок пока нет
ML Bias-Variance Tradeoff
Документ5 страниц
ML Bias-Variance Tradeoff
gdfgertr
Оценок пока нет
Interview Questions On Machine Learning
Документ22 страницы
Interview Questions On Machine Learning
Praveen
100% (4)
Linear Programming
Документ26 страниц
Linear Programming
vsuarezf2732
100% (1)
ML 19.03 Sidenotes
Документ30 страниц
ML 19.03 Sidenotes
asma
Оценок пока нет
UOPEOPLE Assignment on STRIPS and CSP Planning
Документ11 страниц
UOPEOPLE Assignment on STRIPS and CSP Planning
Francisco Reina
Оценок пока нет
Calculus - Module 2.1 of Week 3 PDF
Документ4 страницы
Calculus - Module 2.1 of Week 3 PDF
Jimbo J. Antipolo
Оценок пока нет
Machine Learning Interview Questions
От Everand
Machine Learning Interview Questions
Tech Interviews
Рейтинг: 5 из 5 звезд
5/5 (1)
Deep Learning Loss and Output Functions
Документ15 страниц
Deep Learning Loss and Output Functions
sixfaceraj
Оценок пока нет
Assignment 3
Документ3 страницы
Assignment 3
Saumya Modi
Оценок пока нет
ML Q & A Interview Prep:: ML Algorithms + Basic ML
Документ12 страниц
ML Q & A Interview Prep:: ML Algorithms + Basic ML
Srikanth Nampalli
Оценок пока нет
Tycs Ai Unit 2
Документ84 страницы
Tycs Ai Unit 2
jeasdsdasda
Оценок пока нет
Stochastic Optimization Lecture Notes
Документ23 страницы
Stochastic Optimization Lecture Notes
Shipra Agrawal
Оценок пока нет
Question 1) Search 10 Marks: Final Term Examination Spring-2020
Документ5 страниц
Question 1) Search 10 Marks: Final Term Examination Spring-2020
Busy Tech
Оценок пока нет
Indefinite Integrals Explained
Документ16 страниц
Indefinite Integrals Explained
Sandra Bote
Оценок пока нет
Final 2007 S
Документ14 страниц
Final 2007 S
Muhammad Murtaza
Оценок пока нет
CS412 Assignment 1 Ref Solution Key Insights
Документ8 страниц
CS412 Assignment 1 Ref Solution Key Insights
Yavuz Sayan
50% (2)
School of Science and Technology - Alamo Lesson Plan Technology Application - 7 Grade
Документ5 страниц
School of Science and Technology - Alamo Lesson Plan Technology Application - 7 Grade
api-302415858
Оценок пока нет
Homework Help Calculus Limits
Документ4 страницы
Homework Help Calculus Limits
gubuvituweh2
100% (1)
TopTenSecretstosuccess WithOptimization
Документ9 страниц
TopTenSecretstosuccess WithOptimization
Paula Ignacia Romero Matus
Оценок пока нет
512 - Assignment1
Документ7 страниц
512 - Assignment1
M. Fatih DİNÇ
Оценок пока нет
A Beginner's Guide To Hierarchical Clustering
Документ23 страницы
A Beginner's Guide To Hierarchical Clustering
Abhinava
Оценок пока нет
Math 8 Q2
Документ46 страниц
Math 8 Q2
Kclyn Tagayun
Оценок пока нет
Exercise Set 1
Документ7 страниц
Exercise Set 1
Tasia Bueno De Mesquita
Оценок пока нет
Question 1) Search 10 Marks: Final Term Examination Spring-2020
Документ4 страницы
Question 1) Search 10 Marks: Final Term Examination Spring-2020
ABDUL WARIS AFTAB
Оценок пока нет
Management Science - Week 21 Solver Sensitivity, Limits and Answer Reports On LP Solution
Документ55 страниц
Management Science - Week 21 Solver Sensitivity, Limits and Answer Reports On LP Solution
jkle
Оценок пока нет
Week 1 Homework
Документ7 страниц
Week 1 Homework
Yinwu Zhao
Оценок пока нет
Thesis Reinforcement Learning
Документ5 страниц
Thesis Reinforcement Learning
kmxrffugg
100% (1)
ML, DL Questions: Downloaded From
Документ4 страницы
ML, DL Questions: Downloaded From
Aryan Raj
Оценок пока нет
Assignment 2 (ECN 511)
Документ5 страниц
Assignment 2 (ECN 511)
Galina Fateeva
Оценок пока нет
Assignment #2 Due: April 8 by 1:15 PM Central Standard Time
Документ4 страницы
Assignment #2 Due: April 8 by 1:15 PM Central Standard Time
Andrew Sohn
Оценок пока нет
Math Reflections
Документ5 страниц
Math Reflections
api-190806285
Оценок пока нет
Mechanical Advantage Homework
Документ8 страниц
Mechanical Advantage Homework
afeurbmvo
100% (1)
Midterm I Review - Final
Документ25 страниц
Midterm I Review - Final
Farah Tarek
Оценок пока нет
Sample Problems-DMOP
Документ5 страниц
Sample Problems-DMOP
Chakri Munagala
Оценок пока нет
Confusion Matrix Machine Learning
Документ9 страниц
Confusion Matrix Machine Learning
yordanos birhanu
Оценок пока нет
Intro To Modelling - W1
Документ31 страница
Intro To Modelling - W1
Cat
Оценок пока нет
Fundamental Math
От Everand
Fundamental Math
Russell Pead
Оценок пока нет
Comedy Club Workshop
Документ6 страниц
Comedy Club Workshop
Narayan
Оценок пока нет
Assignment6 Report
Документ1 страница
Assignment6 Report
Narayan
Оценок пока нет
Principled Methods
Документ17 страниц
Principled Methods
Narayan
Оценок пока нет
Myresume 2
Документ1 страница
Myresume 2
Narayan
Оценок пока нет
Vector Algebra and Calculus
Документ18 страниц
Vector Algebra and Calculus
Narayan
Оценок пока нет
Generative Adversarial Nets: Pitting Neural Nets Against Each Other
Документ9 страниц
Generative Adversarial Nets: Pitting Neural Nets Against Each Other
Bogdan Musat
Оценок пока нет
PH1101-Unit 1-Tutorial Sheet 1
Документ1 страница
PH1101-Unit 1-Tutorial Sheet 1
Narayan
Оценок пока нет
UG Math
Документ6 страниц
UG Math
Vishak H Pillai
Оценок пока нет
FIFA Video Game - Players Classification
Документ26 страниц
FIFA Video Game - Players Classification
immi1989
Оценок пока нет
Ch.7 - Modality, Scope and Quantification
Документ41 страница
Ch.7 - Modality, Scope and Quantification
Taif Alzahrani
Оценок пока нет
Computer Project ICSE
Документ6 страниц
Computer Project ICSE
siddesh mankar
Оценок пока нет
1011202222101PM-Class 5 Revision Worksheet For First Term Examination - Maths
Документ7 страниц
1011202222101PM-Class 5 Revision Worksheet For First Term Examination - Maths
ak d
Оценок пока нет
C:G M S C: 6401 S:S 2023 Level:Ade (2) Assignment No 1
Документ10 страниц
C:G M S C: 6401 S:S 2023 Level:Ade (2) Assignment No 1
israr
Оценок пока нет
Probability
Документ6 страниц
Probability
Sondos Magdy22
Оценок пока нет
Architectual Structures
Документ300 страниц
Architectual Structures
Aleksxander Gova
Оценок пока нет
Kraus Helix Idea
Документ10 страниц
Kraus Helix Idea
Pavan Nanduri
Оценок пока нет
Functions, limits, derivatives and integrals
Документ5 страниц
Functions, limits, derivatives and integrals
THE ACE
Оценок пока нет
Database Management System - DBMS (COMPUTER SCIENCE) Video Lecture For GATE Preparation (CS IT MCA)
Документ3 страницы
Database Management System - DBMS (COMPUTER SCIENCE) Video Lecture For GATE Preparation (CS IT MCA)
Vishal Sharma
Оценок пока нет
CATIA Full Book 2 Print - 2
Документ147 страниц
CATIA Full Book 2 Print - 2
Sum Sumne Sumanth
Оценок пока нет
Computer Oriented Statistical Methods
Документ3 страницы
Computer Oriented Statistical Methods
chetanking40
33% (3)
Math5 Q4 SUMMATIVE TESTS 1 2
Документ4 страницы
Math5 Q4 SUMMATIVE TESTS 1 2
John David Juave
Оценок пока нет
TQM-important Questions
Документ5 страниц
TQM-important Questions
Sridhar Atla
100% (1)
Fluids: RANS Simulations of Aerodynamic Performance of NACA 0015 Flapped Airfoil
Документ27 страниц
Fluids: RANS Simulations of Aerodynamic Performance of NACA 0015 Flapped Airfoil
lafnboy
Оценок пока нет
Analysis of Factors That Affect Intention To Use E-Wallet Through The Technology Acceptance Model Approach (Case Study - GO-PAY)
Документ8 страниц
Analysis of Factors That Affect Intention To Use E-Wallet Through The Technology Acceptance Model Approach (Case Study - GO-PAY)
Nhi Nguyen
Оценок пока нет
CH 2
Документ16 страниц
CH 2
Nazia Enayet
Оценок пока нет
Progressive Baccarat
Документ36 страниц
Progressive Baccarat
Carlos Andres
0% (1)
Stewart&Gees Apparatus
Документ3 страницы
Stewart&Gees Apparatus
DR.P.V.Kanaka Rao
100% (7)
IITJEE 2014-Physics-School Handout-Magnetism and Matter
Документ10 страниц
IITJEE 2014-Physics-School Handout-Magnetism and Matter
Dikshant Gupta
Оценок пока нет
Life
Документ2 страницы
Life
kaiser_m00n
Оценок пока нет
Geometry m5 Topic A Lesson 1 Teacher
Документ12 страниц
Geometry m5 Topic A Lesson 1 Teacher
Gina Casinillo
Оценок пока нет
Specimen MA
Документ20 страниц
Specimen MA
Yousif Omer
Оценок пока нет
Jayshree Periwal Global School Grade 7D Timetable
Документ1 страница
Jayshree Periwal Global School Grade 7D Timetable
ranveer 0
Оценок пока нет
Chemical Reaction Engineering
Документ101 страница
Chemical Reaction Engineering
Gerard Toby Calixto
Оценок пока нет
Earthquake Geotechnical Engineering PDF
Документ339 страниц
Earthquake Geotechnical Engineering PDF
Lj Jean
Оценок пока нет
Part13 UniversalConcepts
Документ29 страниц
Part13 UniversalConcepts
DavidMarabottini
Оценок пока нет
Wolfe - Cellular Thermodynamics
Документ13 страниц
Wolfe - Cellular Thermodynamics
andres_old_conde
Оценок пока нет
Statistics - Lying Without Sinning?: - "Lies, Damned Lies, and Statistics"
Документ48 страниц
Statistics - Lying Without Sinning?: - "Lies, Damned Lies, and Statistics"
αγαπημένη του Χριστού
Оценок пока нет
RP Lecture8
Документ18 страниц
RP Lecture8
stipe1p
Оценок пока нет