Академический Документы
Профессиональный Документы
Культура Документы
Funding Information:
Abstract: In this work, the trajectory tracking control scheme is the framework of optimal control
and Robust Integral of the Sign of the Error (RISE), sliding mode control technique for
an uncertain/disturbed nonlinear robot manipulator without holonomic constraint force
is presented. The sliding variable combining with RISE enable to deal with external
disturbance and reduced the order of closed systems. The adaptive reinforcement
learning technique by tuning simultaneously the actor-critic network to approximate the
control policy and the cost function, respectively. The convergence of weight as well as
tracking control problem was determined by theoretical analysis. Finally, the numerical
example is investigated to validate the effectiveness of proposed control scheme.
Author Comments:
We hope that you find the revised manuscript suitable for publication in Journal of
Electrical Engineering & Technology.
Sincerely yours,
Powered by Editorial Manager® and ProduXion Manager® from Aries Systems Corporation
Title Page
† Corresponding Author : School of Electrical and Electronic Engineering, Hanoi University of Science and Technology, Vietnam.
( nam.daophuong@hust.edu.vn))
* Haiphong Univerity, Vietnam. (tu.vv@dhhp.edu.vn)
** School of Electrical and Electronic Engineering, Hanoi University of Science and Technology, Vietnam. (phamthanhloc@gmail.com)
*** School of Electrical and Electronic Engineering, Hanoi University of Science and Technology, Vietnam. (huytk23nttyb@gmail.com)
Cover letter
Dear Editor-in-Chief,
I am submitting a manuscript for consideration of publication in this journal. The
manuscript is entitled “Sliding Variable-based Online Adaptive Reinforcement
Learning of Uncertain/Disturbed Nonlinear Mechanical Systems ”.
It has not been published elsewhere and that it has not been submitted simultaneously
for publication elsewhere. There are no conflicts of interest to disclose. We (all these co-
authors) have directly liaised each other to confirm agreement with the final content of
this manuscript.
In this manuscript, we reveal an online adaptive dynamic programming algorithm for a
manipulator by which serious problems of actuator saturation, and model uncertainties are
efficiently addressed. The reason is that classical nonlinear control method is difficult to
overcome these above challenges as well as traditional optimal control using HJB
equation, which is hardly solved.
Keywords consist of Adaptive Dynamic Programming (ADP), Robotic Systems, Robust
Integral of the Sign of the Error (RISE), Sliding Mode Control (SMC).
1
2
3
4
5
Sliding Variable-based Online Adaptive
6 Reinforcement Learning of Uncertain/Disturbed
7 Nonlinear Mechanical Systems
8
9
10
11
12
13
14
15 Received: date / Accepted: date
16
17
18 Abstract In this work, the trajectory tracking control scheme is the frame-
19 work of optimal control and Robust Integral of the Sign of the Error (RISE),
20 sliding mode control technique for an uncertain/disturbed nonlinear robot ma-
21 nipulator without holonomic constraint force is presented. The sliding variable
22 combining with RISE enable to deal with external disturbance and reduced
23 the order of closed systems. The adaptive reinforcement learning technique
24 by tuning simultaneously the actor-critic network to approximate the control
25 policy and the cost function, respectively. The convergence of weight as well
26 as tracking control problem was determined by theoretical analysis. Finally,
27 the numerical example is investigated to validate the effectiveness of proposed
28 control scheme.
29
30 Keywords Adaptive Dynamic Programming(ADP) · Robotic Systems ·
31 Robust Integral of the Sign of the Error(RISE) · Sliding Mode Control (SMC)
32
33
34 1 Introduction
35
36
The motion of a physical systems group such as robotic manipulators, ship,
37
surface vessels, quad-rotor can be considered as mechanical systems with dy-
38
39 namic uncertainties, external disturbances [1]. Furthermore, the actuator sat-
40 uration and full-state constraint, finite time control have been mentioned in [2]
41 - [7]. Dealing with unknown parameters and disturbances, the terminal sliding
42 mode control (SMC) is one of the remarkable solutions with the consideration
43 of finite-time convergence. In [8], the non-singular terminal sliding surface was
44 employed to obtain the adaptive terminal SMC for a manipulator system. The
45 work in [10] was also based on the non-singular terminal sliding manifold to
46 investigate the finite time control, which seem to be effective in counteracting
47 not only uncertain dynamics but also unbounded disturbances. Authors in [11]
48
49 Address(es) of author(s) should be given
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
2
1 have extended terminal sliding mode technique to establish control design for
2
exoskeleton systems to ensure the trajectory of the closed-loop system can be
3
driven onto the sliding surface in finite time. In order to tackle the challenges
4
5 of external disturbance, the classical robust control design was investigated the
6 input-state stability (ISS) with equivalent attraction region. However, in the
7 situation that the external disturbance can be a combination of finite number
8 of step signals and sinusoidal signals, the closed-loop system in [9] is asymptotic
9 stability. In [14], the optimal gain matrices based disturbance observer, com-
10 bining with SMC was presented for under-actuated manipulators. Authors in
11 [15] considered the frame of generalized proportional integral observer(GPIO)
12 technique and continuous SMC to overcome the matched/mismatched time-
13 varying disturbances guarantting a high tracking performance in compliantly
14 actuated robots. SMC technique is not only employed for classical manipu-
15 lators but also for different types including bilateral teloperators (BTs) and
16 mobile robotic systems (Wheeled Mobile Robotics, Tractor-Trailer systems)
17 [22] - [24]. Several control schemes have been considered for manipulators to
18 handle the input saturation disadvantage by integrating the additional terms
19 into the control structure [2], [4], [7]. In [2], a new desired trajectory has been
20 proposed due to the actuator saturation. The additional term would be ob-
21
tained after taking the derivative of initial Lyapunov candidate function along
22
the state trajectory in presence of actuator saturation [2]. Furthermore, a new
23
24 approach was given in [2] to tackle not only the actuator constraints but also
25 handling external disturbances. The given sliding manifold was realized the
26 Sat function of joint variables. The equivalent SMC scheme was computed
27 then the boundedness of input signal was mentioned. This approach leads to
28 adjust absolutely input bound by choosing appropriate several parameters.
29 The work in [7] give a technique to tackle the actuator saturation using a
30 modified Lyapunov Candidate function. Due to the actuator saturation, the
31 Lyapunov function would be integrated more the quadratic form from the re-
32 lation between the input signal from controller and the real signal applied to
33 object. The control design was obtained after considering the Lyapunov func-
34 tion derivative along the system trajectory. In order to tackle the drawback of
35 state constraints in manipulator, the framework of Barrier Lyapunov function
36 and Moore-Penrose inverse matrix, Fuzzy-Neural Network technique was pro-
37 posed in [4], [7], [2]. However, these aforementioned classical nonlinear control
38 techniques have several challenges, such as appropriate Lyapunov function, ad-
39 ditional terms dynamic [5], [6], [7]. Optimal control solution has the remarkable
40
way that can solve above constraint problems by considering the constraint
41
based optimization [12], [13], [16] - [21] and Model predictive control (MPC)
42
43 is one of the most effective solutions to tackle the these constraint problems
44 for manipulators [17]. The terminal controller as well as equivalent terminal
45 region has been established for a nominal system of disturb manipulators with
46 finite horizon cost function [17]. This technique of robust MPC was also consid-
47 ered for Wheeled mobile robotics (WMRs) with the consideration of kinematic
48 model after adding more disturbance observer (DO) [13]. This work has been
49 extended for the inner loop model by backstepping technique [12]. Thanks to
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
Sliding Variable-based Online ARL 3
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15 Fig. 1 2-DOF Planar Robot Manipulator
16
17
18 Section 3. The offline simulation is shown in Section 4. Finally, the conclusions
19 are pointed out in Section 5.
20
21
22 2 Dynamic Model of Robot Manipulator
23
24 Consider the planar robot manipulator systems described by the following
25 dynamic equation:
26
27 M (η)η̈ + C(η, η̇)η̇ + G(η) + F (η̇) + d(t) = τ (t) (1)
28
29 where M (η) ∈ Rn×n is a generalized inertia matrix,C(η, η̇) ∈ Rn×n is a gen-
30 eralized centripetal-Coriolis matrix, G(η) ∈ Rn is a gravity vector, F (η̇) ∈ Rn
31 is a generalized friction, d(t) is a vector of disturbances, τ (t) is the vector of
32 control inputs.
33 Property 01: The inertia symmetric matrix M (η) is positive definite, and
34
satisfies ∀ξ ∈ Rn :
35
36 akξk2 ≤ ξ T M (η)ξ ≤ b(η)kξk2 (2)
37 ξ T (Ṁ (η) − 2C(η, η̇)ξ = 0 (3)
38
39 where a ∈ R is a positive constant, b(η) ∈ R is a positive function with
40 respect to η. Several following assumptions will be employed in considering
41 the stability later.
42
43 Assumption 1 If η(t), η̇(t) ∈ L∞ , then all these functions C(η, η̇), F (η̇),
44 G(η) and the first, second partial derivatives of all functions of M (η), C(η, η̇),
45 G(η) with respect to η(t) as well as of the elements of C(η, η̇), F (η̇) with respect
46 to η̇(t) exist and are bounded.
47
48 Assumption 2 The desired trajectory ηd (t) as well as the first, second, third
49 and fourth time derivatives of it exist and are bounded.
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
Sliding Variable-based Online ARL 5
1 Assumption 3 The vector of external disturbance term d(t) and the deriva-
2
tives with respect to time of d(t) are bounded by known constants.
3
4 The control objective is to ensure the system tracks a desired time-varying
5 trajectory ηd (t) in presence of dynamic uncertainties by using the frame of
6 online adaptive reinforcement learning based optimal control design and dis-
7 turbance attenuation technique. Considering the sliding variable s (t) = ė1 +
8
λ1 e1 , λ1 ∈ n×n > 0, e1 (t) = η ref − η and the corresponding sliding surface
9
as follows:
10
M = {e1 (t) ∈ Rn : s (t) = 0} (4)
11
12 According to (1), the dynamic equation of the sliding variable s(t) can be given
13 as:
14 M ṡ = −Cs − τ + f + d (5)
15
16 where f (η, η̇, ηr ef, η̇r ef, η̈r ef ) is nonlinear function defined:
17
f = M (η̈ ref + α1 ė1 ) + C(η̇ ref + α1 e1 ) + G + F (6)
18
19 Remark 1: The role of above sliding variable is considered to reduce the
20 order of second-order uncertain/disturbed manipulator systems. It enables us
21 to employ the adaptive reinforcement learning for a first-order continuous-time
22 nonlinear autonomous system. Additionally, the external disturbance d(t) and
23 nonlinear function f are handled by RISE in the next section.
24
25
26 3 Adaptive Reinforcement Learning based Optimal Control Design
27
28
Assume that the dynamic model of robot manipulator is known, the control
29
input can be designed as
30
31 τ =f +d−u (7)
32 where the term u is designed by using optimal control algorithm and the
33 remaining term f + d will be estimated later. Therefore, it can be seen that
34
35 M ṡ = −Cs + u (8)
36
37 According to (4) and (8), we obtain the following time-varying model
38
−λ1 e1 + s
0n×n
39 ẋ = + u (9)
−M (η ref − e1 )−1 C(η ref − e1 , η̇ ref + λ1 e1 − s)s M −1
40
41 where x = [eT1 , sT ]T and the infinite horizon cost function to be minimized is
42
43 Z∞
1 T 1
44 J(x, u) = x Qx + uT Ru dt (10)
45 2 2
0
46
2n×2n
47 where Q ∈ R and R ∈ Rn×n are positive definite symmetric matrices.
48 However, in order to deal with the problem of tracking control, some additional
49 states are given. This work leads us to avoid the non-autonomous systems.
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
6
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
Fig. 2 System states q(t) and its references qd (t) = ηd with persistently excited input for
20 the first 100 times
21
22
23 Case 1: The time-varying desired reference signal is defined as ηd =
T
24
3sin(0.1t) 3cos(0.1t) with the vector of disturbances is given as d(t) =
25 T
26 0.5sin(t) 0.5cos(t) . The positive definite symmetric matrices in cost func-
27 tion (10) are:
28 40 2 −4 4
2 40 4 −6
29 Q= , R = 0.25 0
30 −4 4 4 0 0 0.25
31 4 −6 0 4
32 The design parameters in sliding variable s (t) = ė1 +λ1 e1 are chosen to satisfy
33 that λ1 ∈ Rn×n is a constant positive definite matrix:
34
35 15.6 10.6
36 λ1 =
10.6 10.4
37
38 The remaining control gains in RISE framework are chosen satisfying (25),
39 (26), (27) as:
40 60 0 140 0
41 λ2 = , ks = , γ1j = 5
0 35 0 20
42
43 and the gains in Actor-Critic learning laws are selected guaranteeing (21)-(24)
44 as:
45 kc = 800, ν = 1, ka1 = 0.01, ka2 = 1,
46 On the other hand, according to [1], the consideration of V in (16) can be
47 calculated precisely as
48
49 V = 2x21 − 4x1 x2 + 3x22 + 2.5x23 + x23 cos(η2 ) + x3 x4 + x3 x4 cos(η2 ) + 0.5x24 (29)
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
10
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
Fig. 3 Estimation of the total of external disturbance and nonlinear function by RISE
20
21
22 Although we can choose arbitrary ψ(X) in (16), to facilitate later compar-
23 ison between result from experiences and result in (29), and the ψ(X) was
24
considered as
25
26 T
ψ(X) = x21 x1 x2 x22 x23 x23 cos(η2 ) x3 x4 x3 x4 cos(η2 ) x24
27
28 and according to (29), exact value of Ŵc in (18) and Ŵa in (19) are
29
30
Ŵc = 2 −4 3 2.5 1 1 1 0.5
31 (30)
32 Ŵa = 2 −4 3 2.5 1 1 1 0.5
33
34 In the simulation, the covariance matrix is initialized as
35
36 Ψ (0) = diag 100 300 300 1 1 1 1 1
37
while all the NN weights Wc , Wa are randomly initialized in [−1, 1], and
38
the states and the its first time derivative are initialized to random matrices
39
40 q(0), q̇(0) ∈ R2 . It is necessary to guarantee of PE conditions of the critic re-
41 gression vector (in Remark 1) in using this developed algorithm. Unlike linear
42 systems, where PE conditions of the regression translates to sufficient richness
43 of the external input, there is no verifiable method exists to ensure PE regres-
44 sion translates in nonlinear regulation problems. To deal with this situation, a
45 small exploratory signal consisting of sinusoids of varying frequencies is added
46 to the control signal for first 100 times. Each experiment was performed 150
47 times, and data from experiments are displayed in Figure 2, Figure 4, Figure
48 5 and Figure 3 depict the tracking states, control and the updating of NN
49 weights Wc , Wa . It is clear that the problem of tracking was satisfied after
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
Sliding Variable-based Online ARL 11
1 15
2 W c1 W c5
W c2 W c6
3
10 W c3 W c7
4 W c4 W c8
5
6 5
7
8
0
9
10
11 -5
12
13 -10
14
15
16 -15
0 50 100 150
17
18
Fig. 4 The weight of NN for Critic
19
20 15
21 W a1 W a5
22 W a2 W a6
10 W a3 W a7
23
W a4 W a8
24
25 5
26
27
0
28
29
30 -5
31
32
-10
33
34
35 -15
0 50 100 150
36
37
38 Fig. 5 The weight of NN for Actor
39
40
only about 2.5 times through Figure 2. Meanwhile, the weights of NNs are
41
42 compared to (30) as table 1. The highest error which is approximately 0.05 is
43 a acceptable result although the time of convergence is still high. Furthermore,
44 we obtain the tracking performance of the total of external disturbance d(t)
45 and nonlinear function f (t), enabling the disturbance attenuation property of
46 proposed control scheme in Figure 3. These results proved the correctness of
47 the algorithm.
48 Case 2: The step function desired reference signal is defined as ηd =
T T
49 2 ∗ 1(t) 3 ∗ 1(t) with the disturbance is given as d(t) = 50sin(t) 50cos(t) .
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
12
1 4. Guo, Yong and Huang, Bing and Li, Ai-jun and Wang, Chang-qing, Integral sliding mode
2 control for Euler-Lagrange systems with input saturation, International Journal of Robust
3 and Nonlinear Control, 29(4), 1088–1100 (2019)
4 5. He, Wei and Chen, Yuhao and Yin, Zhao, Adaptive Neural Network Control of an Uncer-
5 tain Robot With Full-State Constraints, IEEE transactions on cybernetics, 46(3), 620–629
(2015)
6 6. He, Wei and Dong, Yiting, Adaptive fuzzy neural network control for a constrained robot
7 using impedance learning, IEEE transactions on neural networks and learning systems,
8 29(4), 1174–1186 (2017)
9 7. He, Wei and Dong, Yiting and Sun, Changyin, Adaptive neural impedance control of
a robotic manipulator with input saturation, IEEE Transactions on Systems, Man, and
10 Cybernetics: Systems, 46(3), 334–344 (2015)
11 8. Mondal, Sanjoy and Mahanta, Chitralekha, Adaptive second order terminal sliding mode
12 controller for robotic manipulators, Journal of the Franklin Institute, 351(4), 2356–2377
13 (2014)
9. Lu, Maobin and Liu, Lu and Feng, Gang, Adaptive tracking control of uncertain Euler–
14 Lagrange systems subject to external disturbances, Automatica, 104, 207–219 (2019)
15 10. Galicki, Miroslaw, Finite-time control of robotic manipulators, Automatica, 51, 49–54
16 (2015)
17 11. Madani, Tarek and Daachi, Boubaker and Djouani, Karim, Modular controller design
based fast terminal sliding mode for articulated exoskeleton systems, IEEE Transactions
18 on Control Systems Technology, 25(3), 1133–1140 (2016)
19 12. Yang, Hongjiu and Guo, Mingchao and Xia, Yuanqing and Sun, Zhongqi, Dual closed-
20 loop tracking control for wheeled mobile robots via active disturbance rejection control and
21 model predictive control, International Journal of Robust and Nonlinear Control (2019)
13. Sun, Zhongqi and Xia, Yuanqing and Dai, Li and Liu, Kun and Ma, Dailiang, Distur-
22 bance rejection MPC for tracking of wheeled mobile robot, IEEE/ASME Transactions on
23 Mechatronics, 22(6), 2576–2587, (2017)
24 14. Huang, Jian and Ri, Songhyok and Fukuda, Toshio and Wang, Yongji, A disturbance
25 observer based sliding mode control for a class of underactuated robotic system with
26 mismatched uncertainties, IEEE Transactions on Automatic Control, 64(6), 2480–2487,
(2018)
27 15. Wang, Huiming and Pan, Yongping and Li, Shihua and Yu, Haoyong, Robust sliding
28 mode control for robots driven by compliant actuators, IEEE Transactions on Control
29 Systems Technology,27(3), 1259–1266, (2018)
30 16. Vamvoudakis, Kyriakos G and Vrabie, Draguna and Lewis, Frank L, Online adaptive
algorithm for optimal control with integral reinforcement learning, International Journal
31 of Robust and Nonlinear Control, 24(17), 2686–2710 (2014)
32 17. Yu, Yuantao and Dai, Li and Sun, Zhongqi and Xia, Yuanqing, Robust Nonlinear Model
33 Predictive Control for Robot Manipulators with Disturbances, The 37th Chinese Control
34 Conference (CCC), 3629–3633 (2018)
18. Zhu, Yuanheng and Zhao, Dongbin and Li, Xiangjun, Using reinforcement learning
35 techniques to solve continuous-time non-linear optimal tracking problem without system
36 dynamics, IET Control Theory & Applications, 10(12), 1339–1347 (2016)
37 19. Lv, Yongfeng and Na, Jing and Yang, Qinmin and Wu, Xing and Guo, Yu, Online
38 adaptive optimal control for continuous-time nonlinear systems with completely unknown
dynamics, International Journal of Control, 89(1), 99–112 (2016)
39 20. Sun, Zhongqi and Dai, Li and Xia, Yuanqing and Liu, Kun, Event-based model predic-
40 tive tracking control of nonholonomic systems with coupled input constraint and bounded
41 disturbances, IEEE Transactions on Automatic Control, 63(2), 608–615 (2017)
42 21. Li, Shu and Ding, Liang and Gao, Haibo and Liu, Yan-Jun and Huang, Lan and Deng,
Zongquan, ADP-based online tracking control of partially uncertain time-delayed nonlin-
43 ear system and application to wheeled mobile robots, IEEE transactions on cybernetics,
44 (2019)
45 22. Y. Liu and N. Dao and K. Y. Zhao, On Robust Control of NonlinearTeleoperators under
46 Dynamic Uncertainties with Variable Time Delays and without Relative Velocity, (2019).
23. Nguyena, Tinh and Hoang, Thuong and Pham, Minhtuan and Dao, Namphuong, A
47 Gaussian wavelet network-based robust adaptive tracking controller for a wheeled mo-
48 bile robot with unknown wheel slips, International Journal of Control, 92(11), 2681–2692
49 (2019)
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
14
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19 Fig. 6 System states q(t) and its references qd (t) = ηd with persistently excited input for
the first 100 times
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39 Fig. 7 Estimation of the total of external disturbance and nonlinear function by RISE
40
41
24. Binh, Nguyen Thanh and Tung, Nguyen Anh and Nam, Dao Phuong and Quang, Nguyen
42 Hong, An adaptive backstepping trajectory tracking control of a tractor trailer wheeled
43 mobile robot, 17(2), 465–473 (2019)
44 25. Bhasin, Shubhendu and Kamalapurkar, Rushikesh and Johnson, Marcus and
45 Vamvoudakis, Kyriakos G and Lewis, Frank L and Dixon, Warren E, A novel actor–
46 critic–identifier architecture for approximate optimal control of uncertain nonlinear sys-
tems, Automatica, 49(1), 82–92 (2013)
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
Sliding Variable-based Online ARL 15
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19 Fig. 8 The weight of NN for Critic
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38 Fig. 9 The weight of NN for Actor
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
Click here to access/download;Figure;Sin_RISE.png
Click here to access/download;Figure;Sin_Tracking_q.jpg
Click here to access/download;Figure;Step_RISE.jpg
Click here to access/download;Figure;Step_Tracking_q.jpg
Click here to access/download;Figure;W_a.jpg
Click here to access/download;Figure;W_c.jpg
Click here to access/download
Supplementary Material
JEET.tex
Authorship change form Click here to access/download;Authorship change form;Final
Response_JEET_24-2-2020-Final.pdf
We hope that you find the revised manuscript suitable for publication in Journal of
Electrical Engineering & Technology.
Sincerely yours,
List of changes made in the revised manuscript:
1) The abstract, introduction, conclusion have been rewritten to clarify the
contribution of this submission as mentioned by Reviewer 1.
2) As mentioned by Reviewer 1, the title was changed to “Sliding Variable-based
Online Adaptive Reinforcement Learning of Uncertain/Disturbed Nonlinear
Mechanical Systems.”
3) As mentioned by Reviewer 1, Reviewer 4, and Reviewer 5, the simulation results
have been improved to point out the disturbance attenuation property of proposed
controller. Furthermore, the case of step function desired reference was also
added to emphasize the contribution in the revised manuscript.
4) As mentioned by Reviewer 1, several remarks and paragraphs, sentences have
been added to point out the role of Sliding variables, RISE in the disturbance
attenuation property and model order reduction of proposed controller.
5) As mentioned by Reviewer 4, several sentences, formulas have been modified
and added to explain several definitions, equations, selection of parameters in
Simulation Results.
6) As mentioned by Reviewer 5, the revised manuscript has been added to describe
the Neural Network in Actor/Critic parts.
7) As mentioned by Reviewer 1, Reviewer 4, and Reviewer 5, several remarks and
paragraphs, sentences have been added, modified to compare this submission
with existing works.
8) Several minor typos have been corrected, and some sentences have been revised
to improve the readability of the revised manuscript.
We further clarified several issues raised by the Editor and reviewers. Please find our
revised manuscript, and a summary of our responses to the reviewers.
Response to the report of the Editor
We would like to thank the Editor for handling this submission and raising valuable
suggestions to help us improve the quality of this paper. Here is our response to Editor’s
comment.
Comment - Based on the advice received, I have decided that your manuscript could
be reconsidered for publication should you be prepared to incorporate major revisions.
When preparing your revised manuscript, you are asked to carefully consider the
reviewer comments which can be found below, and submit a list of responses to the
comments. You are kindly requested to also check the website for possible reviewer
attachment(s).
While submitting, please check the filled in author data carefully and update them if
applicable - they need to be complete and correct in order for the revision to be
processed further.
In order to submit your revised manuscript, please access the following web site:
https://www.editorialmanager.com/eete/
Your username is: nam.daophuong@hust.edu.vn
If you forgot your password, you can click the 'Send Login Details' link on the EM Login
page.
Comment 0 –
Response The authors really appreciate the reviewer’s effort reading through this
submission. The comments and suggestions raised by the reviewer have been
taken into account. Please see the following responses to each of the review
comments.
I think in the present work much more emphasis must be placed on the effect of
the \external disturbance" on the overall RL control system, which was not dealt
with explicitly in [25]. More specifically, optimality of the underlying RL
controller may be lost when an (unmodeled) external disturbance enters the
system and the controller is designed without any consideration on this subtle
issue. A possible remedy for tackling the robustness would be to introduce a
robust control technique into the RL approach, which is actually done in the
present work with the sliding mode control. I believe this point would be
interesting to some possible readers and thus should be clarified in the revision.
Response: We would like to thank the reviewer for raising this suggestion.
The reviewer is correct that “…, the previous work [25] (and other related works
on the RL) already took the plant uncertainty into account”. The reference [25]
has been mentioned in the section of introduction in the revised manuscript. It
should be noted that authors in [25] considered an online adaptive reinforcement
learning-based method for a first-order continuous-time nonlinear autonomous
system x f x,u without any external disturbance. However, unlike the work
in [25], a disturbed manipulator described by a second-order continuous-time
nonlinear systems (1). Therefore, in order to employ the algorithm in [25], the
sliding variable s t e1 1e1, 1 0 is proposed in this work to reduce the
nn
Comment 2 Since the current manuscript seems not properly point out the main
contribution of the work, the authors are suggested to rewrite the abstract, the
introduction, and even the title of the mansucript.
Response: The authors are grateful to the reviewer for this valuable suggestion. The
abstract, the introduction, and the title of the revised manuscript have been
rewritten to clarify the main contribution of this work as follows:
Comment 3 - The simulation part also needs to be improved, especially for clarifying
the role of the SMC in enhancing the robustness of the present framework against
extnernal disturbance. For example, the relevant part would be more persuasive
with additional simulations where the RL controller without the SMC does not
guarantee any convergence of the weights to optimal values due to the
disturbance.
Response: We are grateful to the reviewer for this valuable suggestion. The role of
SMC in proposed controller was described that the sliding variable
s t e1 1e1, 1 nn 0 enables us to reduce the order of manipulator model,
which is the second-order continuous-time nonlinear systems (1). Furthermore,
the external disturbances in manipulator (1) was handled by framework of sliding
variable s t e1 1e1, 1 0 and the main part of Robust Integral of the
nn
Sign of Error (RISE) (25, 26, 27). The sections of Simulation Results in the
revised manuscript has mentioned the solution for external disturbances based
on the RISE estimating the total of external disturbance d t and nonlinear
Comment 4 - In the main body of the manuscript, every sentence begins with an
indentation, which decreases the readability of the manuscript. Please make the
indentation only at the beginning of a paragraph.
Response: The authors really appreciate the reviewer’s suggestion. The revised
manuscript has been modified to make the readability.
Response: We would like to thank the reviewer for raising this valuable suggestion.
The symbol d(t) in eq.1 has been explained in revised manuscript.
Comment 2 - Has eq. 4 written correctly?
Response: We would like to thank the reviewer for pointing out this comment. The eq.
4 in the revised manuscript described the sliding surface, which is defined as the
Set of e1 t satisfying the following equality: s t e1 1e1 0,1 0
ref nn
. Regarding the comment by the reviewer, we have modified eq.4 and added the
sentence in front of eq.4 in the revised manuscript.
Comment 3 - The authors mentioned that eq. 5 comes from eq. 1 and eq. 4. Could you
show that in detail.
Response We are grateful to the reviewer for this suggestion. The eq. 5 is obtained from
the sliding variable s t e1 1e1 and the dynamic equation (1). Therefore, we
have added more comments in front of eq. 5 in the revised manuscript to make
the result more complete.
Comment 4 - In the Simulation Results section, the values of many parameters are
selected such as: lemda 1, lemda 2, kc, …. Which method the authors have used
to find those values?
Response: We would like to thank the reviewer for raising this suggestion. The section
of Simulation Results in the revised manuscript has been modified to describe
the method as follows “… The design parameters in sliding variable s t e1 1e1
remaining control gains in RISE framework are chosen satisfying (25), (26), (27)
as: ”, “…and the gains in Actor-Critic learning laws are selected guaranteeing
(21) – (24) as: ”. Moreover, the content in front of Remark 3 is also revised as
“… and ks , 1 , 2 are the positive diagonal matrices and
nn nn nn
1 , 2 are the positive control gains selected satisfying the sufficient
condition as: ...”
Comment 5 - The step function sould be used as input signal to test the performance
of the controlled system.
Response: The authors are grateful the reviewer for this valuable suggestion. The
section of Simulation Results in the revised manuscript has been added and
modified the case 2 of the step function desired reference signal with the results
in Fig 6-9.
Response: We would like to thank the reviewer for raising this suggestion. The
robustness in manipulator (1) was obtained by framework of sliding variable
s t e1 1e1, 1 nn 0 and the main part of Robust Integral of the Sign of Error
(RISE) (25, 26, 27). The sections of Simulation Results in the revised manuscript
has mentioned the solution for external disturbances based on the RISE
estimating the total of external disturbance d t and nonlinear function f t .
Several additional simulation results (Fig. 3, 7) and comments have been added
to clarify the disturbance attenuation property in the revised manuscript.
Comment 7 - The proposed controller must be compared with other work that is done
in this field.
Response: The authors really appreciate the reviewer for raising this valuable
suggestion. The revised manuscript has been modified extensively by adding
more the remarks, comments to show the difference in comparison with other
work. The sections of abstract, introduction, conclusion, main content,
simulation in the revised manuscript have been modified and added to clarify the
main contribution as follows:
Response: We would like to thank the reviewer for providing several constructive
suggestions that lead to significant improvement of this manuscript. In addition
to the modifications resulting from the reviewer’s comments, we have also
checked and corrected many typos and grammar mistakes. Hope you will find
this manuscript satisfactory for publication in Journal of Electrical Engineering
& Technology.
Comment 2 - In line 27, parameter τd(t) does not appear, however, d(t) in eq.(1) is not
described.
Response: We would like to thank the reviewer for raising this valuable suggestion.
The symbol d(t) in eq.1 has been explained in revised manuscript.
Comment 3 - In this paper, the neural network (NN) has been mentioned, but it is not
described in detail. The content of neural network (NN) should be added and
described.
Response: We are grateful to the reviewer for this valuable suggestion. The content of
neural network (NN) has been added and modified in the chapter 3 of the revised
manuscript as follows “The objective of establishing the NN (16) is to find the
, W
actor/critic NN updating laws W to approximate the actor and critic parts
a c
obtaining the optimal control law without solving the HJB equation (more details
see [25]). Moreover, the smooth NN activation function X is chosen
N
Remark 4: It is worth noting that the simulation results in Figs. 2-9 illustrate the
good behavior of trajectory tracking problem, the convergence in actor/critic
neural network weights in the presence of dynamic uncertainties, external
disturbances. This work is the remarkable extension of the work in [25], which
only mention the first-order mathematical model without any disturbances.
Additionally, the optimal control algorithm for manipulators was not considered
the adaptive dynamic programming technique [1].
Response: We would like to thank the reviewer for the constructive comment. The
section of conclusions has been modified as “This paper addresses the problem
of adaptive reinforcement learning design for a second-order uncertain/disturbed
manipulators manipulator in connection with sliding variable and RISE
technique. Thanks to the online ADP algorithm based on the Neural Network,
the solution of HJB equation was achieved by iteration algorithm to obtain the
controller satisfying not only the weight convergence but also the trajectory
tracking problem in the situation of non-autonomous closed systems. Offline
simulations were developed to demonstrate the performance and effectiveness of
the optimal control for manipulators.”