Report Thesis PDF

Thesis:
Second report
Franklin Cardeñoso Fernández
fracarfer5@aluno.puc-rio.br
February 21, 2019
Introduction
Robot learning ”consists of a multitude of machine learning approaches in the context of robotics”
[1], humans are the most common data generator used in the robot learning process. Task learning can
be done through imitation learning, learning from demonstration (LfD) or reinforcement learning (RL)
techniques, which are the most popular techniques that ”(imitate)” the human behaviour.
Imitation learning copy the user’s movements or actions regardless the final objective. In comparison,
LfD learns the complete skill take into account the final objective by human teaching. Finally, rein-
forcement learning bases its operation trying to find the agent’s optimal policy, who perform actions and
observations in the environment, in order to maximize the sum of rewards in each episode.
These techniques are widely used in endowing robots the capability of perform task from human
guindance or human-inspired learning process. This human guindance is made with different plataforms
for demonstration in order to provide the enought data (task information) to the robot, the most popular
are: Visual perception, kinesthetic teaching, recording motions, teleoperated scenarios or speech feedback.
(Kinesthetic teaching is used to guide the robot’s movements through human interaction, this inter-
action can be performed by direct perception or using an)
haptic device in order to reproduce the correct actions. Normally, haptic devices are used in teleop-
erated escenarios where the user can not interact derectly with the robot (e.g., industrial escenarios),
These plataforms use different interfaces to perform the interaction between humans and robots, one
of them are Haptic devices.
Haptic devices have been widely used in robot learning, tactile interfaces were used with sucess in
recognition task of human touching behaviours applied in pet-robots [2], in autonomous robots perform-
ing common human tasks [3] [4], or learning systems for automobile drivers [5]. [6] and [7] show the
performance obtained using teleoperation and robotics arms in the empty a rigid container task.
On the other hand, [8] presents a different approach combining kinesthetic teaching technique and
haptics device for positional/force control applying imitiation learning.
Finally, for medical tasks, [9] [10] use a robotic arm as a robotic-based haptic device.
Usually, the most popular technique used is RL. However, RL presents some issues, one of them is that
RL needs large quantity of episodes in every trial, in order to accelerate that process (solve that issue)
researches are using LfD, imitation learning as initialization or adding Human-in-the-loop approach, thus
can decrease the number of trials and time in the learning process.
Human-in-the-loop is the technique which use a human operator into the control process. This tech-
nique combined with RL is another approach widely used in learning task for presenting good results in
teaching robots [11], collaborative vehicle driving [12] and helping amputees persons [13]. Also shown
good performance when was applyied with another teaching methods [14], [15].
1
This human introduction can be using visual feedback for Deep Reinforcement Learning techniques
[16], [17] and [18] or EEG signals feedback as in [19].
Some research were made in order to improve performance in RL algorithms which use Human-in-
the-loop techniques [20].
One application for Reinforcement Learning is helping humans with prosthetics devices, where (appear
different (inconvenients) like ) (RL deals with different issues with good performance), RL was used to find
estimulation parameters [21], in control for robotic arms [22] [23] [24] [25] [26] and using visual perception
[27] [28] [29]
- Learning from humans
- As initialization
- Imitation learning / Learning from
demonstration - Kinesthetic teaching / teleoperation
- As reward function
- Human-in-the-loop
- Explicit / facial recogntion / EEG / haptics / etc.
- As task definition
- Inverse reinforcement learning
- Learning to help humans perform a task
- Teleoperation
- Haptics
haptic devices are widely used for medical purposes as well as others - Prosthetics
- Human-robot Collaboration
- Rehabilitiation
- Generic assistance
- Learn what the task is, while at the same time assisting the human in peforming it.
- Objective? (prosthetics?)
- Inputs? (EEG / haptics?)
- Outputs? (Movement / speech?)
- Tasks? (Manipulation?)
[30]
References
[1] B. Siciliano and O. Khatib, Springer Handbook of Robotics. Springer Publishing Company, Incorpo-
rated, 2nd ed., 2016.
[2] F. Naya, J. Yamato, and K. Shinozawa, “Recognizing human touching behaviors using a haptic inter-
face for a pet-robot,” in IEEE SMC’99 Conference Proceedings. 1999 IEEE International Conference
on Systems, Man, and Cybernetics (Cat. No.99CH37028), vol. 2, pp. 1030–1034 vol.2, Oct 1999.
[3] M. C. Gemici and A. Saxena, “Learning haptic representation for manipulating deformable food
objects,” in 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems, Chicago,
IL, USA, September 14-18, 2014, pp. 638–645, 2014.
[4] R. B. Hellman, C. Tekin, M. van der Schaar, and V. J. Santos, “Functional contour-following via
haptic perception and reinforcement learning,” IEEE Trans. Haptics, vol. 11, no. 1, pp. 61–72, 2018.
2
[5] M. A. Goodrich and M. Quigley, “Learning haptic feedback for guiding driver behavior,” in 2004
IEEE International Conference on Systems, Man and Cybernetics (IEEE Cat. No.04CH37583),
vol. 3, pp. 2507–2512 vol.3, Oct 2004.
[6] L. D. Rozo, P. Jiménez, and C. Torras, “Learning force-based robot skills from haptic demonstration,”
in Artificial Intelligence Research and Development - Proceedings of the 13th International Conference
of the Catalan Association for Artificial Intelligence, l’Espluga de Francolı́, Tarragona, Spain, 20-22
October 2010, pp. 331–340, 2010.
[7] L. D. Rozo, P. Jiménez, and C. Torras, “Force-based robot learning of pouring skills using parametric
hidden markov models,” in 9th Workshop on Robot Motion and Control, RoMoCo 2013, Kuslin,
Poland, July 3-5, 2013, pp. 227–232, 2013.
[8] P. Kormushev, S. Calinon, and D. G. Caldwell, “Imitation learning of positional and force skills
demonstrated via kinesthetic teaching and haptic input,” Advanced Robotics, vol. 25, no. 5, pp. 581–
603, 2011.
[9] V. Squeri, M. Casadio, E. Vergaro, P. Giannoni, P. Morasso, and V. Sanguineti, “Bilateral robot
therapy based on haptics and reinforcement learning: Feasibility study of a new concept for treatment
of patients after stroke,” Journal of rehabilitation medicine : official journal of the UEMS European
Board of Physical and Rehabilitation Medicine, vol. 41, pp. 961–5, 11 2009.
[10] M. Gomez Rodriguez, J. Peters, J. Hill, B. Schölkopf, A. Gharabaghi, and M. Grosse-Wentrup,
“Closing the sensorimotor loop: haptic feedback facilitates decoding of motor imagery,” Journal of
Neural Engineering, vol. 8, pp. 1–12, June 2011.
[11] D. Abel, J. Salvatier, A. Stuhlmüller, and O. Evans, “Agent-agnostic human-in-the-loop reinforce-
ment learning,” CoRR, vol. abs/1701.04079, 2017.
[12] H. Liang, L. Yang, H. Cheng, W. Tu, and M. Xu, “Human-in-the-loop reinforcement learning,” in
2017 Chinese Automation Congress (CAC), pp. 4511–4518, Oct 2017.
[13] Y. Wen, J. Si, A. Brandt, X. Gao, and H. Huang, “Online reinforcement learning control for the
personalization of a robotic knee prosthesis,” IEEE Transactions on Cybernetics, pp. 1–11, 2019.
[14] L. Peternel, T. Petric, and J. Babic, “Human-in-the-loop approach for teaching robot assembly tasks
using impedance control interface,” 2015 IEEE International Conference on Robotics and Automation
(ICRA), pp. 1497–1502, 2015.
[15] L. Peternel, T. Petrič, and J. Babič, “Robotic assembly solution by human-in-the-loop teaching
method based on real-time stiffness modulation,” Autonomous Robots, vol. 42, pp. 1–17, Jan 2018.
[16] D. Arumugam, J. K. Lee, S. Saskin, and M. L. Littman, “Deep reinforcement learning from policy-
dependent human feedback,” 2018.
[17] P. F. Christiano, J. Leike, T. B. Brown, M. Martic, S. Legg, and D. Amodei, “Deep reinforcement
learning from human preferences,” in NIPS, 2017.
[18] R. Arakawa, S. Kobayashi, Y. Unno, Y. Tsuboi, and S. Maeda, “DQN-TAMER: human-in-the-loop
reinforcement learning with intractable feedback,” CoRR, vol. abs/1810.11748, 2018.
[19] L. Schiatti, J. Tessadori, N. Deshpande, G. Barresi, L. King, and L. Mattos, “Human in the loop of
robot learning: Eeg-based reward signal for target identification and reaching task,” pp. 4473–4480,
05 2018.
3
[20] T. Mandel, Y.-E. Liu, E. Brunskill, and Z. Popovic, “Where to add actions in human-in-the-loop
reinforcement learning,” in AAAI 2017, 2017.
[21] A. J. Brockmeier, J. S. Choi, M. M. DiStasio, J. T. Francis, and J. C. Prı́ncipe, “Optimizing micros-

timulation using a reinforcement learning framework,” in 2011 Annual International Conference of
the IEEE Engineering in Medicine and Biology Society, pp. 1069–1072, Aug 2011.
[22] K. M. Jagodnik, P. S. Thomas, A. J. van den Bogert, M. S. Branicky, and R. F. Kirsch, “Human-like
rewards to train a reinforcement learning controller for planar arm movement,” IEEE Transactions
on Human-Machine Systems, vol. 46, pp. 723–733, Oct 2016.
[23] P. M. Pilarski, T. B. Dick, and R. S. Sutton, “Real-time prediction learning for the simultaneous ac-
tuation of multiple prosthetic joints,” in 2013 IEEE 13th International Conference on Rehabilitation
Robotics (ICORR), pp. 1–8, June 2013.
[24] K. M. Jagodnik, P. S. Thomas, A. J. van den Bogert, M. S. Branicky, and R. F. Kirsch, “Training
an actor-critic reinforcement learning controller for arm movement using human-generated rewards,”
IEEE Transactions on Neural Systems and Rehabilitation Engineering, vol. 25, pp. 1892–1905, Oct
2017.
[25] R. Barone, A. L. Ciancio, R. A. Romeo, A. Davalli, R. Sacchetti, E. Guglielmelli, and L. Zollo,

“Multilevel control of an anthropomorphic prosthetic hand for grasp and slip prevention,” Advances
in Mechanical Engineering, vol. 8, no. 9, p. 1687814016665082, 2016.
[26] G. Vasan and P. M. Pilarski, “Context-aware learning from demonstration: Using camera data
to support the synergistic control of a multi-joint prosthetic arm,” 2018 7th IEEE International
Conference on Biomedical Robotics and Biomechatronics (Biorob), pp. 199–206, 2018.
[27] K. D. Katyal, “In-hand robotic manipulation via deep reinforcement learning,” 2016.
[28] M. Mudigonda, P. Agrawal, M. Deweese, and J. Malik, “Investigating deep reinforcement learning
for grasping objects with an anthropomorphic hand,” 2018.
[29] G. Vasan and P. M. Pilarski, “Learning from demonstration: Teaching a myoelectric prosthesis
with an intact limb via reinforcement learning,” in 2017 International Conference on Rehabilitation
Robotics (ICORR), pp. 1457–1464, July 2017.
[30] A. G. Billard, S. Calinon, and R. Dillmann, “Learning from humans,” in Springer Handbook of
Robotics, pp. 1995–2014, 2016.

Report Thesis PDF

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Report Thesis PDF

Загружено:

Авторское право:

Доступные форматы

Thesis:

February 21, 2019

[21] A. J. Brockmeier, J. S. Choi, M. M. DiStasio, J. T. Francis, and J. C. Prı́ncipe, “Optimizing micros-

[25] R. Barone, A. L. Ciancio, R. A. Romeo, A. Davalli, R. Sacchetti, E. Guglielmelli, and L. Zollo,

Вам также может понравиться