Eete D 19 01430 - R1 PDF

Journal of Electrical Engineering & Technology
Sliding Variable-based Online Adaptive Reinforcement Learning of Uncertain/Disturbed

Nonlinear Mechanical Systems
--Manuscript Draft--
Manuscript Number: EETE-D-19-01430R1
Full Title: Sliding Variable-based Online Adaptive Reinforcement Learning of Uncertain/Disturbed

Article Type: Original Article
Funding Information:
Abstract: In this work, the trajectory tracking control scheme is the framework of optimal control
and Robust Integral of the Sign of the Error (RISE), sliding mode control technique for
an uncertain/disturbed nonlinear robot manipulator without holonomic constraint force
is presented. The sliding variable combining with RISE enable to deal with external
disturbance and reduced the order of closed systems. The adaptive reinforcement
learning technique by tuning simultaneously the actor-critic network to approximate the
control policy and the cost function, respectively. The convergence of weight as well as
tracking control problem was determined by theoretical analysis. Finally, the numerical
example is investigated to validate the effectiveness of proposed control scheme.
Corresponding Author: Nam Phuong Dao, Dr

Hanoi University of Science and Technology
Hanoi, Hai Ba Trung District VIET NAM
Corresponding Author Secondary

Information:
Corresponding Author's Institution: Hanoi University of Science and Technology
Corresponding Author's Secondary

Institution:
First Author: Tu Van Vu, Graduate Student
First Author Secondary Information:
Order of Authors: Tu Van Vu, Graduate Student
Nam Phuong Dao, Dr
Loc Thanh Pham, Graduate Student
Huy Quang Tran, Gradute Student
Order of Authors Secondary Information:
Author Comments:
Response to Reviewers: Dear Professor Jeong-whan Lee,
Please find enclosed the revised submission of the manuscript EETE-D-19-01430

entitled “Online Adaptive Reinforcement Learning of Uncertain Nonlinear Mechanical
Systems.” We would like to thank you and the Associate Editor for considering the
submission, and three reviewers for providing constructive comments and suggestions.
The revised version of this submission has been carefully checked and modified, and
all the comments and suggestions provided by the reviewers have been taken into
account. List of changes of the revision and detailed responses to the reviewers’
comments are provided. Based on the comment of Reviewer #1, the title was changed
to “Sliding Variable-based Online Adaptive Reinforcement Learning of
Uncertain/Disturbed Nonlinear Mechanical Systems.”
We hope that you find the revised manuscript suitable for publication in Journal of
Electrical Engineering & Technology.
Sincerely yours,
Powered by Editorial Manager® and ProduXion Manager® from Aries Systems Corporation
Title Page
Title Page(Cover letter)
Sliding Variable-based Online Adaptive

Reinforcement Learning of Uncertain/Disturbed
Van Tu Vu *, Phuong Nam Dao†, Thanh Loc Pham** and Quang Huy Tran***
† Corresponding Author : School of Electrical and Electronic Engineering, Hanoi University of Science and Technology, Vietnam.
( nam.daophuong@hust.edu.vn))
* Haiphong Univerity, Vietnam. (tu.vv@dhhp.edu.vn)
** School of Electrical and Electronic Engineering, Hanoi University of Science and Technology, Vietnam. (phamthanhloc@gmail.com)
*** School of Electrical and Electronic Engineering, Hanoi University of Science and Technology, Vietnam. (huytk23nttyb@gmail.com)
Cover letter
Dear Editor-in-Chief,
I am submitting a manuscript for consideration of publication in this journal. The
manuscript is entitled “Sliding Variable-based Online Adaptive Reinforcement
Learning of Uncertain/Disturbed Nonlinear Mechanical Systems ”.
It has not been published elsewhere and that it has not been submitted simultaneously
for publication elsewhere. There are no conflicts of interest to disclose. We (all these co-
authors) have directly liaised each other to confirm agreement with the final content of
this manuscript.
In this manuscript, we reveal an online adaptive dynamic programming algorithm for a
manipulator by which serious problems of actuator saturation, and model uncertainties are
efficiently addressed. The reason is that classical nonlinear control method is difficult to
overcome these above challenges as well as traditional optimal control using HJB
equation, which is hardly solved.
Keywords consist of Adaptive Dynamic Programming (ADP), Robotic Systems, Robust
Integral of the Sign of the Error (RISE), Sliding Mode Control (SMC).
We believe that this manuscript is appropriate for publication by Journal of Electrical

Engineering & Technology because it contributes an adaptive reinforcement learning
control method based on the sliding mode technique to Robotics' applications.
This manuscript expands on the prior research conducted by Prof. Dixon, Lewis et al. in
the article "Asymptotic optimal control of uncertain nonlinear Euler-Lagrange systems"
and “A novel Actor Critic identifier architecture for approximate optimal control of
uncertain nonlinear systems” published in Automatica (ISSN: 0005-1098). They are the
1st, 25th references in this manuscript.
Thank you for your consideration!
Sincerely,
Dr. Phuong Nam Dao, the corresponding author.
Blinded Manuscript Click here to access/download;Blinded
Manuscript;JEET_Tu_Me_Loc__Huy__Major_Revision_Blind_F
Click here to view linked References
Noname manuscript No.

(will be inserted by the editor)
1
2
3
4
5
Sliding Variable-based Online Adaptive
6 Reinforcement Learning of Uncertain/Disturbed
7 Nonlinear Mechanical Systems
8
9
10
11
12
13
14
15 Received: date / Accepted: date
16
17
18 Abstract In this work, the trajectory tracking control scheme is the frame-
19 work of optimal control and Robust Integral of the Sign of the Error (RISE),
20 sliding mode control technique for an uncertain/disturbed nonlinear robot ma-
21 nipulator without holonomic constraint force is presented. The sliding variable
22 combining with RISE enable to deal with external disturbance and reduced
23 the order of closed systems. The adaptive reinforcement learning technique
24 by tuning simultaneously the actor-critic network to approximate the control
25 policy and the cost function, respectively. The convergence of weight as well
26 as tracking control problem was determined by theoretical analysis. Finally,
27 the numerical example is investigated to validate the effectiveness of proposed
28 control scheme.
29
30 Keywords Adaptive Dynamic Programming(ADP) · Robotic Systems ·
31 Robust Integral of the Sign of the Error(RISE) · Sliding Mode Control (SMC)
32
33
34 1 Introduction
35
36
The motion of a physical systems group such as robotic manipulators, ship,
37
surface vessels, quad-rotor can be considered as mechanical systems with dy-
38
39 namic uncertainties, external disturbances [1]. Furthermore, the actuator sat-
40 uration and full-state constraint, finite time control have been mentioned in [2]
41 - [7]. Dealing with unknown parameters and disturbances, the terminal sliding
42 mode control (SMC) is one of the remarkable solutions with the consideration
43 of finite-time convergence. In [8], the non-singular terminal sliding surface was
44 employed to obtain the adaptive terminal SMC for a manipulator system. The
45 work in [10] was also based on the non-singular terminal sliding manifold to
46 investigate the finite time control, which seem to be effective in counteracting
47 not only uncertain dynamics but also unbounded disturbances. Authors in [11]
48
49 Address(es) of author(s) should be given
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
2
1 have extended terminal sliding mode technique to establish control design for
2
exoskeleton systems to ensure the trajectory of the closed-loop system can be
3
driven onto the sliding surface in finite time. In order to tackle the challenges
4
5 of external disturbance, the classical robust control design was investigated the
6 input-state stability (ISS) with equivalent attraction region. However, in the
7 situation that the external disturbance can be a combination of finite number
8 of step signals and sinusoidal signals, the closed-loop system in [9] is asymptotic
9 stability. In [14], the optimal gain matrices based disturbance observer, com-
10 bining with SMC was presented for under-actuated manipulators. Authors in
11 [15] considered the frame of generalized proportional integral observer(GPIO)
12 technique and continuous SMC to overcome the matched/mismatched time-
13 varying disturbances guarantting a high tracking performance in compliantly
14 actuated robots. SMC technique is not only employed for classical manipu-
15 lators but also for different types including bilateral teloperators (BTs) and
16 mobile robotic systems (Wheeled Mobile Robotics, Tractor-Trailer systems)
17 [22] - [24]. Several control schemes have been considered for manipulators to
18 handle the input saturation disadvantage by integrating the additional terms
19 into the control structure [2], [4], [7]. In [2], a new desired trajectory has been
20 proposed due to the actuator saturation. The additional term would be ob-
21
tained after taking the derivative of initial Lyapunov candidate function along
22
the state trajectory in presence of actuator saturation [2]. Furthermore, a new
23
24 approach was given in [2] to tackle not only the actuator constraints but also
25 handling external disturbances. The given sliding manifold was realized the
26 Sat function of joint variables. The equivalent SMC scheme was computed
27 then the boundedness of input signal was mentioned. This approach leads to
28 adjust absolutely input bound by choosing appropriate several parameters.
29 The work in [7] give a technique to tackle the actuator saturation using a
30 modified Lyapunov Candidate function. Due to the actuator saturation, the
31 Lyapunov function would be integrated more the quadratic form from the re-
32 lation between the input signal from controller and the real signal applied to
33 object. The control design was obtained after considering the Lyapunov func-
34 tion derivative along the system trajectory. In order to tackle the drawback of
35 state constraints in manipulator, the framework of Barrier Lyapunov function
36 and Moore-Penrose inverse matrix, Fuzzy-Neural Network technique was pro-
37 posed in [4], [7], [2]. However, these aforementioned classical nonlinear control
38 techniques have several challenges, such as appropriate Lyapunov function, ad-
39 ditional terms dynamic [5], [6], [7]. Optimal control solution has the remarkable
40
way that can solve above constraint problems by considering the constraint
41
based optimization [12], [13], [16] - [21] and Model predictive control (MPC)
42
43 is one of the most effective solutions to tackle the these constraint problems
44 for manipulators [17]. The terminal controller as well as equivalent terminal
45 region has been established for a nominal system of disturb manipulators with
46 finite horizon cost function [17]. This technique of robust MPC was also consid-
47 ered for Wheeled mobile robotics (WMRs) with the consideration of kinematic
48 model after adding more disturbance observer (DO) [13]. This work has been
49 extended for the inner loop model by backstepping technique [12]. Thanks to
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
Sliding Variable-based Online ARL 3
1 the advantages of the event-triggering mechanism, the computation load of ro-

2
bust MPC has been reduced in control systems for uni-cycle [20]. The optimal
3
control algorithm has been mentioned in the work of [1] after using classi-
4
5 cal nonlinear control law. However, the online computation technique has not
6 considered yet in [1]. Furthermore, it is difficult to find the explicit solution
7 of Ricatti equation and partial differential HJB (Hamilton-Jacobi-Bellman)
8 equation in general nonlinear systems [16]. The reinforcement learning strat-
9 egy was established to obtain the controller by Q learning, temporal difference
10 learning and then was developed to a novel stage by the approximate/adaptive
11 dynamic programming (ADP), which has been the appropriate solution in re-
12 cent years. Thanks to the neural network approximation technique, authors
13 in [16] proposed the novel online ADP algorithm which enables to tune simul-
14 taneously both actor and critic terms. The training problem of critic neural
15 network (NN) was determined by modified Levenberg-Marquardt technique
16 to minimize the square residual error. Furthermore, the weights convergence
17 and convergence problem were shown by the weights in actor and critic NN
18 tuning the need of persistence of excitation (PE) condition [16]. Considering
19 the approximate Bellman error, the proposed algorithm in [16] enables to on-
20 line simultaneously adjust with unknown drift term. Extending this work, by
21
using the special cost function, a model-free adaptive reinforcement learning
22
has been presented without any information of the system dynamics [18]. Fur-
23
24 thermore, by integrating the additional identifier, the nonlinear systems were
25 controlled by online adaptive reinforcement learning with completely unknown
26 dynamics [19], [25]. However, these three above works have not mentioned for
27 robotic systems as well as non-autonomous systems yet [18], [19], [25]. In the
28 work of [21], under the consideration of approximation and discrete time sys-
29 tems, online ADP tracking control was proposed for the dynamic of mobile
30 robots. Inspired by the above works and analysis from traditional nonlinear
31 control technique to optimal control strategy, the work focus on the frame of
32 online adaptive reinforcement learning for manipulators and nonlinear control
33 with main contribution are described in the following:
34
35 1) In comparison with the previous papers [1] - [11], [14], [15], which were
36 presented classical nonlinear controller in manipulator control systems, an
37 adaptive reinforcement learning based optimal control design is proposed for
38 a uncertain manipulator system in this paper.
39
40 2) Unlike the reinforcement learning scheme based optimal control in [16],
41 [18], [19], [21], [25] are considered for mathematical systems of a first-order
42 continuous-time nonlinear autonomous system without any external distur-
43 bance, the contribution is described that the adaptive dynamic programming
44 combining with the sliding variable and the Robust Integral of the Sign of Er-
45 ror (RISE) were employed for second-order uncertain/disturbed manipulators
46 in the situation of trajectory tracking control, non-autonomous systems. The
47 remainder of this paper is organized as follows. The dynamic model of robotic
48 manipulators and control objective are given in Section 2. The proposed adap-
49 tive reinforcement learning algorithm and theoretical analysis is presented in
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
4
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15 Fig. 1 2-DOF Planar Robot Manipulator
16
17
18 Section 3. The offline simulation is shown in Section 4. Finally, the conclusions
19 are pointed out in Section 5.
20
21
22 2 Dynamic Model of Robot Manipulator
23
24 Consider the planar robot manipulator systems described by the following
25 dynamic equation:
26
27 M (η)η̈ + C(η, η̇)η̇ + G(η) + F (η̇) + d(t) = τ (t) (1)
28
29 where M (η) ∈ Rn×n is a generalized inertia matrix,C(η, η̇) ∈ Rn×n is a gen-
30 eralized centripetal-Coriolis matrix, G(η) ∈ Rn is a gravity vector, F (η̇) ∈ Rn
31 is a generalized friction, d(t) is a vector of disturbances, τ (t) is the vector of
32 control inputs.
33 Property 01: The inertia symmetric matrix M (η) is positive definite, and
34
satisfies ∀ξ ∈ Rn :
35
36 akξk2 ≤ ξ T M (η)ξ ≤ b(η)kξk2 (2)
37 ξ T (Ṁ (η) − 2C(η, η̇)ξ = 0 (3)
38
39 where a ∈ R is a positive constant, b(η) ∈ R is a positive function with
40 respect to η. Several following assumptions will be employed in considering
41 the stability later.
42
43 Assumption 1 If η(t), η̇(t) ∈ L∞ , then all these functions C(η, η̇), F (η̇),
44 G(η) and the first, second partial derivatives of all functions of M (η), C(η, η̇),
45 G(η) with respect to η(t) as well as of the elements of C(η, η̇), F (η̇) with respect
46 to η̇(t) exist and are bounded.
47
48 Assumption 2 The desired trajectory ηd (t) as well as the first, second, third
49 and fourth time derivatives of it exist and are bounded.
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
1 Assumption 3 The vector of external disturbance term d(t) and the deriva-
2
tives with respect to time of d(t) are bounded by known constants.
3
4 The control objective is to ensure the system tracks a desired time-varying
5 trajectory ηd (t) in presence of dynamic uncertainties by using the frame of
6 online adaptive reinforcement learning based optimal control design and dis-
7 turbance attenuation technique. Considering the sliding variable s (t) = ė1 +
8
λ1 e1 , λ1 ∈ n×n > 0, e1 (t) = η ref − η and the corresponding sliding surface
9
as follows:
10
M = {e1 (t) ∈ Rn : s (t) = 0} (4)
11
12 According to (1), the dynamic equation of the sliding variable s(t) can be given
13 as:
14 M ṡ = −Cs − τ + f + d (5)
15
16 where f (η, η̇, ηr ef, η̇r ef, η̈r ef ) is nonlinear function defined:
17
f = M (η̈ ref + α1 ė1 ) + C(η̇ ref + α1 e1 ) + G + F (6)
18
19 Remark 1: The role of above sliding variable is considered to reduce the
20 order of second-order uncertain/disturbed manipulator systems. It enables us
21 to employ the adaptive reinforcement learning for a first-order continuous-time
22 nonlinear autonomous system. Additionally, the external disturbance d(t) and
23 nonlinear function f are handled by RISE in the next section.
24
25
26 3 Adaptive Reinforcement Learning based Optimal Control Design
27
28
Assume that the dynamic model of robot manipulator is known, the control
29
input can be designed as
30
31 τ =f +d−u (7)
32 where the term u is designed by using optimal control algorithm and the
33 remaining term f + d will be estimated later. Therefore, it can be seen that
34
35 M ṡ = −Cs + u (8)
36
37 According to (4) and (8), we obtain the following time-varying model
38
−λ1 e1 + s

0n×n

39 ẋ = + u (9)
−M (η ref − e1 )−1 C(η ref − e1 , η̇ ref + λ1 e1 − s)s M −1
40
41 where x = [eT1 , sT ]T and the infinite horizon cost function to be minimized is
42
43 Z∞
1 T 1
44 J(x, u) = x Qx + uT Ru dt (10)
45 2 2
0
46
2n×2n
47 where Q ∈ R and R ∈ Rn×n are positive definite symmetric matrices.
48 However, in order to deal with the problem of tracking control, some additional
49 states are given. This work leads us to avoid the non-autonomous systems.
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
6
1 Subsequently, the adaptive reinforcement learning is considered to find optimal

2
control solution for autonomous affine state-space model with the assumption
3
that the desired trajectory η ref (t) satisfies η̇ ref (t) = f ref (η ref ):
4
5 Ẋ = A(X) + B(X)u (11)
6
7 where X = [xT , η ref T , η̇ ref T ]T
8  
9 −λ1 e1 + s  
10 −M (η ref − e1 )−1 C(η ref − e1 , η̇ ref + λ1 e1 − s)s 0n×n
A(X) =  , B(X) =  M −1 
11  f ref (η ref ) 
12 ˙ref ref 02n×n
f (η )
13
14 Define the new infinite horizon integral cost function to be minimized is
15
16 Z∞
1 T 1 T
17 J(X, u) = X QT X + u Ru dτ (12)
2 2
18 t
19
20 where
Q0
21 QT = . (13)
0 0
22
23 In order to guarantee the stability in optimal control design, we can consider
24 the class of ”Admissible Policy” described in [16], [18]:
25 Definition1[16], [18], (Admissible Policy): A control input µ(X) is call as
26 admissible in term of (12) on U , if µ(X) is continuous on U , µ(X) = 0 and the
27 affine system (11) was stabilized by this control signal µ(X) on U and J(X)
28 is finite for any X ∈ U .
29 The optimal control objective can now be considered finding an admissible
30 control signal µ∗ (X) such that the cost function (12) associated with affine sys-
31
tem (11) is minimized. According to the classical Haminlton-Jacobi-Bellman
32
(HJB) equation theory [25], the optimal controller u∗ (X) and equivalent op-
33
34 timal cost function V ∗ (X) are derived as:
35 T
1 ∂V ∗ (X)
36 u∗ (X) = − R−1 B T (X) (14)
37 2 ∂X
38 ∂V ∗ ∂V ∗

1 1
39 H ∗ X, u∗ , = (A + Bu∗ ) + X T QT X + u∗T Ru∗ = 0 (15)
∂X ∂X 2 2
40
41 However, it is hard to directly solve the HJB equation as well as offline solution
42 requires complete knowledge of the mathematical model. Thus, the simultane-
43 ous learning based online solution is considered by using Neural networks to
44 represent the optimal cost function and the equivalent optimal controller [25]:
45
46 V (X) = W T ψ(X) + v (X), (16)
47 T T !
48 1 ∂ψ ∂ευ (x)
u∗ (X) = − R−1 B T (X) W+ (17)
49 2 ∂x ∂x
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
1 where W ∈ RN is vector of unknown ideal NN weights, N is the number of

2
neurons, ψ(X) ∈ RN is a smooth NN activation function, v (X) ∈ R is the
3
function reconstruction error. The objective of establishing the NN (16) is
4
5 to find the actor/critic NN updating laws W ca , W
cc to approximate the actor
6 and critic parts obtaining the optimal control law without solving the HJB
7 equation (more details see [25]). Moreover, the smooth NN activation function
8 ψ(X) ∈ RN is chosen depending on the description of manipulators (see chap-
9 ter 4). In [25], the Weierstrass approximation theorem enables us to uniformly
∂V ∗ (X)

10 ∂ευ (x)
approximate not only V ∗ (X) but also ∂X with ευ (x) , ∂x → 0 as
11
12 N → ∞. Consider to fix the number N , the critic V̂ (X) and the actor û(X) are
13 employed to approximate the optimal cost function and the optimal controller
14 as:
15 V̂ (X) = ŴcT ψ(X) (18)
16 T
1 ∂ψ
17 b (X) = − R−1 B T (X)
u W
ca (19)
18 2 ∂x
19 The adaptation laws of critic Ŵc and actor Ŵa weights are simultaneously
20
implemented to minimize the integral squared Bellman error and the squared
21
Bellman error δhjb , respectively.
22
23 ! ∗

∂ V̂ ∗ ∗ ∂V
24 δhjb = Ĥ X, û, − H X, u ,
25 ∂X ∂X
(20)
26 T 1 T 1 T
27 = Ŵc σ + X QT X + û Rû
2 2
28
29 where σ(X, û) = ∂ψ∂x (A + B û) is the critic regression vector. Similar to the
30 work in [25], the adaptation law of Critic weights is given:
31
32 d σ
Ŵc = −kc λ δhjb (21)
33 dt 1 + νσ T λσ
34
where ν, kc ∈ R are constant positive gains, and λ ∈ RN ×N is a symmetric
35
estimated gain matrix computed as follows
36
37 d λσ T
38 λ = −kc λ λ; λ(t+
s ) = λ(0) = ϕ0 I (22)
dt 1 + νσ T Ψ σ
39
40 where t+
s is resetting time satisfying αmin {λ (t)} ≤ ϕ1 , ϕ0 > ϕ1 . It can be
41 seen that ensure λ(t) is positive definite and prevent the covariance wind-up
42 problem [25].
43 ϕ1 I ≤ λ(t) ≤ ϕ0 I (23)
44
45 Moreover, the actor adaptation law can be described as:
46 "
T
#
47 d ka1 ∂ψ −1 T ∂ψ
Ŵa = proj − √ BR B (Ŵa − Ŵc )δhjb − ka2 (Ŵa − Ŵc )
48 dt 1 + σ T σ ∂x ∂x
49 (24)
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
8
1 Remark 2: The convergences of estimated actor/critic weights Ŵc and

2
Ŵa depend on the PE condition of √ σ T ∈ RN in [25]. Unlike the work
3 1+vσ λσ
4 in [25], this algorithm do not mentioned the identifier design and focuses on
5 the manipulator control design. Moreover, the learning technique in adapta-
6 tion law (22), (24) is different from data-driven online integral reinforcement
7 learning in [16], [18]. In order to develop this adaptive reinforcement learn-
8 ing for manipulator systems in the trajectory tracking control problem, it is
9 necessary to consider the manipulator dynamic as affine systems (11).
10 Consequently, the control design (7) is completed by implementing the
11 estimation of = f + d, which is designed based on the Robust Integral of the
12
Sign of the Error (RISE) framework [1] as follows:
13
14 j (t) = (ksj + 1)sj (t) − (ksj + 1)sj (0) + ρj (t) (25)
15
16 where ρ(t) ∈ Rn is computed by the following equation:
17
18 d
19 ρj = (ksj + 1)λ2j sj + γ1j sgn(sj ) (26)
dt
20
21 and ks ∈ Rn×n , γ1 ∈ Rn×n , λ2 ∈ Rn×n are the positive diagonal matrices and
22 ζ1 ∈ R, ζ2 ∈ R are the positive control gains selected satisfying the sufficient
23 condition as:
24 1
γ1j > ζ1 + ζ2 . (27)
25 λ2j
26 Remark 3: In early works [1], the optimal control design was considered
27
for uncertain/disturbed mechanical systems by the RISE framework. The work
28
in [1] was extended by integrating adaptive reinforcement learning in the tra-
29
30 jectory tracking problem with the consideration of non-autonomous systems.
31
32
33 4 Simulation Results
34
35 In this section, to verify the effectiveness of the proposed tracking control
36 algorithm, the simulation is carried out by a 2-DOF planar robot manipulator
37 system, which is modeled by Euler-Lagrange formulation (1). In the case of
38 2-DOF planar robot manipulator systems (n = 2), the above matrices in (1)
39 can be represented as follows:
40
41 % + 2%2 cos η2 %3 + %2 cos η2 % cos η1 + %5 cos(η1 + η2 )
M (η) = 1 , G(η) = 4
42 %3 + %2 cos η2 %3 %5 cos(η1 + η2 )
43
44 −%2 sin η2 η̇2 −%2 sin η2 (η̇1 + η̇2 )
C(η, η̇) = (28)where %i , i = 1...5 are con-
45 %2 sin η2 η̇1 0
46 stant parameters depending on mechanical parameters and gravitational ac-
47 celeration. In this simulation, these constant parameters are chosen as %1 =
48 5, %2 = 1, %3 = 1, %4 = 1.2g, %5 = g. The two simulation scenarios are consid-
49 ered to validate the performance of proposed controller as follows:
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
Fig. 2 System states q(t) and its references qd (t) = ηd with persistently excited input for
20 the first 100 times
21
22
23 Case 1: The time-varying desired reference signal is defined as ηd =
T
24

3sin(0.1t) 3cos(0.1t) with the vector of disturbances is given as d(t) =
25 T
26 0.5sin(t) 0.5cos(t) . The positive definite symmetric matrices in cost func-
27 tion (10) are:  
28 40 2 −4 4
 2 40 4 −6
29 Q=  , R = 0.25 0
30 −4 4 4 0  0 0.25
31 4 −6 0 4
32 The design parameters in sliding variable s (t) = ė1 +λ1 e1 are chosen to satisfy
33 that λ1 ∈ Rn×n is a constant positive definite matrix:
34
35 15.6 10.6
36 λ1 =
10.6 10.4
37
38 The remaining control gains in RISE framework are chosen satisfying (25),
39 (26), (27) as:
40 60 0 140 0
41 λ2 = , ks = , γ1j = 5
0 35 0 20
42
43 and the gains in Actor-Critic learning laws are selected guaranteeing (21)-(24)
44 as:
45 kc = 800, ν = 1, ka1 = 0.01, ka2 = 1,
46 On the other hand, according to [1], the consideration of V in (16) can be
47 calculated precisely as
48
49 V = 2x21 − 4x1 x2 + 3x22 + 2.5x23 + x23 cos(η2 ) + x3 x4 + x3 x4 cos(η2 ) + 0.5x24 (29)
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
10
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
Fig. 3 Estimation of the total of external disturbance and nonlinear function by RISE
20
21
22 Although we can choose arbitrary ψ(X) in (16), to facilitate later compar-
23 ison between result from experiences and result in (29), and the ψ(X) was
24
considered as
25
26 T
ψ(X) = x21 x1 x2 x22 x23 x23 cos(η2 ) x3 x4 x3 x4 cos(η2 ) x24

27
28 and according to (29), exact value of Ŵc in (18) and Ŵa in (19) are
29
30
Ŵc = 2 −4 3 2.5 1 1 1 0.5

31 (30)
32 Ŵa = 2 −4 3 2.5 1 1 1 0.5
33
34 In the simulation, the covariance matrix is initialized as
35
36 Ψ (0) = diag 100 300 300 1 1 1 1 1
37
while all the NN weights Wc , Wa are randomly initialized in [−1, 1], and
38
the states and the its first time derivative are initialized to random matrices
39
40 q(0), q̇(0) ∈ R2 . It is necessary to guarantee of PE conditions of the critic re-
41 gression vector (in Remark 1) in using this developed algorithm. Unlike linear
42 systems, where PE conditions of the regression translates to sufficient richness
43 of the external input, there is no verifiable method exists to ensure PE regres-
44 sion translates in nonlinear regulation problems. To deal with this situation, a
45 small exploratory signal consisting of sinusoids of varying frequencies is added
46 to the control signal for first 100 times. Each experiment was performed 150
47 times, and data from experiments are displayed in Figure 2, Figure 4, Figure
48 5 and Figure 3 depict the tracking states, control and the updating of NN
49 weights Wc , Wa . It is clear that the problem of tracking was satisfied after
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
1 15
2 W c1 W c5
W c2 W c6
3
10 W c3 W c7
4 W c4 W c8
5
6 5
7
8
0
9
10
11 -5
12
13 -10
14
15
16 -15
0 50 100 150
17
18
Fig. 4 The weight of NN for Critic
19
20 15
21 W a1 W a5
22 W a2 W a6
10 W a3 W a7
23
W a4 W a8
24
25 5
26
27
0
28
29
30 -5
31
32
-10
33
34
35 -15
0 50 100 150
36
37
38 Fig. 5 The weight of NN for Actor
39
40
only about 2.5 times through Figure 2. Meanwhile, the weights of NNs are
41
42 compared to (30) as table 1. The highest error which is approximately 0.05 is
43 a acceptable result although the time of convergence is still high. Furthermore,
44 we obtain the tracking performance of the total of external disturbance d(t)
45 and nonlinear function f (t), enabling the disturbance attenuation property of
46 proposed control scheme in Figure 3. These results proved the correctness of
47 the algorithm.
48 Case 2: The step function desired reference signal is defined as ηd =
T T
49 2 ∗ 1(t) 3 ∗ 1(t) with the disturbance is given as d(t) = 50sin(t) 50cos(t) .
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
12
1 Table 1 Comparison between the proposed algorithm and exact values

2
3 W proposed algorithm exact value
4 W1 2.02 2.00
5 W2 -3.95 -4.00
6 W3 2.98 3.00
7 W4 2.50 2.50
8 W5 1.00 1.00
W6 1.00 1.00
9
W7 1.00 1.00
10 W8 0.50 0.50
11
12
13
14 The parameters in simulation are chosen as described in case 1, it can be seen
15 that, our algorithm is effective in tracking the desired reference, weight con-
16 vergence and disturbance attenuation as described in Fig. 6 - 9.
17 Remark 4: It is worth noting that the simulation results in Figs. 2-9 il-
18 lustrate the good behavior of trajectory tracking problem, the convergence in
19 actor/critic neural network weights in the presence of dynamic uncertainties,
20 external disturbances. This work is the remarkable extension of the work in
21 [25], which only mention the first-order mathematical model without any dis-
22 turbances. Additionally, the optimal control algorithm for manipulators was
23 not considered the adaptive dynamic programming technique [1].
24
25
26
27 5 Conclusions
28
29 This paper addresses the problem of adaptive reinforcement learning design
30 for a second-order uncertain/disturbed manipulators in connection with slid-
31 ing variable and RISE technique. Thanks to the online ADP algorithm based
32 on the Neural Network, the solution of HJB equation was achieved by iteration
33 algorithm to obtain the controller satisfying not only the weight convergence
34 but also the trajectory tracking problem in the situation of non-autonomous
35 closed systems. Offline simulations were developed to demonstrate the perfor-
36 mance and effectiveness of the optimal control for manipulators.
37
38
39
40 References
41
1. Dupree, Keith and Patre, Parag M and Wilcox, Zachary D and Dixon, Warren E, Asymp-
42 totic optimal control of uncertain nonlinear Euler–Lagrange systems, Automatica, 1, 99–
43 107 (2011)
44 2. Hu, Xin and Wei, Xinjiang and Zhang, Huifeng and Han, Jian and Liu, Xiuhua, Robust
45 adaptive tracking control for a class of mechanical systems with unknown disturbances
46 under actuator saturation, International Journal of Robust and Nonlinear Control, 6(29),
1893–1908 (2019)
47 3. Yang, Liang and Yang, Jianying, Nonsingular fast terminal sliding-mode control for non-
48 linear dynamical systems, International Journal of Robust and Nonlinear Control, 21(16),
49 1865–1879 (2011)
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
1 4. Guo, Yong and Huang, Bing and Li, Ai-jun and Wang, Chang-qing, Integral sliding mode
2 control for Euler-Lagrange systems with input saturation, International Journal of Robust
3 and Nonlinear Control, 29(4), 1088–1100 (2019)
4 5. He, Wei and Chen, Yuhao and Yin, Zhao, Adaptive Neural Network Control of an Uncer-
5 tain Robot With Full-State Constraints, IEEE transactions on cybernetics, 46(3), 620–629
(2015)
6 6. He, Wei and Dong, Yiting, Adaptive fuzzy neural network control for a constrained robot
7 using impedance learning, IEEE transactions on neural networks and learning systems,
8 29(4), 1174–1186 (2017)
9 7. He, Wei and Dong, Yiting and Sun, Changyin, Adaptive neural impedance control of
a robotic manipulator with input saturation, IEEE Transactions on Systems, Man, and
10 Cybernetics: Systems, 46(3), 334–344 (2015)
11 8. Mondal, Sanjoy and Mahanta, Chitralekha, Adaptive second order terminal sliding mode
12 controller for robotic manipulators, Journal of the Franklin Institute, 351(4), 2356–2377
13 (2014)
9. Lu, Maobin and Liu, Lu and Feng, Gang, Adaptive tracking control of uncertain Euler–
14 Lagrange systems subject to external disturbances, Automatica, 104, 207–219 (2019)
15 10. Galicki, Miroslaw, Finite-time control of robotic manipulators, Automatica, 51, 49–54
16 (2015)
17 11. Madani, Tarek and Daachi, Boubaker and Djouani, Karim, Modular controller design
based fast terminal sliding mode for articulated exoskeleton systems, IEEE Transactions
18 on Control Systems Technology, 25(3), 1133–1140 (2016)
19 12. Yang, Hongjiu and Guo, Mingchao and Xia, Yuanqing and Sun, Zhongqi, Dual closed-
20 loop tracking control for wheeled mobile robots via active disturbance rejection control and
21 model predictive control, International Journal of Robust and Nonlinear Control (2019)
13. Sun, Zhongqi and Xia, Yuanqing and Dai, Li and Liu, Kun and Ma, Dailiang, Distur-
22 bance rejection MPC for tracking of wheeled mobile robot, IEEE/ASME Transactions on
23 Mechatronics, 22(6), 2576–2587, (2017)
24 14. Huang, Jian and Ri, Songhyok and Fukuda, Toshio and Wang, Yongji, A disturbance
25 observer based sliding mode control for a class of underactuated robotic system with
26 mismatched uncertainties, IEEE Transactions on Automatic Control, 64(6), 2480–2487,
(2018)
27 15. Wang, Huiming and Pan, Yongping and Li, Shihua and Yu, Haoyong, Robust sliding
28 mode control for robots driven by compliant actuators, IEEE Transactions on Control
29 Systems Technology,27(3), 1259–1266, (2018)
30 16. Vamvoudakis, Kyriakos G and Vrabie, Draguna and Lewis, Frank L, Online adaptive
algorithm for optimal control with integral reinforcement learning, International Journal
31 of Robust and Nonlinear Control, 24(17), 2686–2710 (2014)
32 17. Yu, Yuantao and Dai, Li and Sun, Zhongqi and Xia, Yuanqing, Robust Nonlinear Model
33 Predictive Control for Robot Manipulators with Disturbances, The 37th Chinese Control
34 Conference (CCC), 3629–3633 (2018)
18. Zhu, Yuanheng and Zhao, Dongbin and Li, Xiangjun, Using reinforcement learning
35 techniques to solve continuous-time non-linear optimal tracking problem without system
36 dynamics, IET Control Theory & Applications, 10(12), 1339–1347 (2016)
37 19. Lv, Yongfeng and Na, Jing and Yang, Qinmin and Wu, Xing and Guo, Yu, Online
38 adaptive optimal control for continuous-time nonlinear systems with completely unknown
dynamics, International Journal of Control, 89(1), 99–112 (2016)
39 20. Sun, Zhongqi and Dai, Li and Xia, Yuanqing and Liu, Kun, Event-based model predic-
40 tive tracking control of nonholonomic systems with coupled input constraint and bounded
41 disturbances, IEEE Transactions on Automatic Control, 63(2), 608–615 (2017)
42 21. Li, Shu and Ding, Liang and Gao, Haibo and Liu, Yan-Jun and Huang, Lan and Deng,
Zongquan, ADP-based online tracking control of partially uncertain time-delayed nonlin-
43 ear system and application to wheeled mobile robots, IEEE transactions on cybernetics,
44 (2019)
45 22. Y. Liu and N. Dao and K. Y. Zhao, On Robust Control of NonlinearTeleoperators under
46 Dynamic Uncertainties with Variable Time Delays and without Relative Velocity, (2019).
23. Nguyena, Tinh and Hoang, Thuong and Pham, Minhtuan and Dao, Namphuong, A
47 Gaussian wavelet network-based robust adaptive tracking controller for a wheeled mo-
48 bile robot with unknown wheel slips, International Journal of Control, 92(11), 2681–2692
49 (2019)
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
14
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19 Fig. 6 System states q(t) and its references qd (t) = ηd with persistently excited input for
the first 100 times
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39 Fig. 7 Estimation of the total of external disturbance and nonlinear function by RISE
40
41
24. Binh, Nguyen Thanh and Tung, Nguyen Anh and Nam, Dao Phuong and Quang, Nguyen
42 Hong, An adaptive backstepping trajectory tracking control of a tractor trailer wheeled
43 mobile robot, 17(2), 465–473 (2019)
44 25. Bhasin, Shubhendu and Kamalapurkar, Rushikesh and Johnson, Marcus and
45 Vamvoudakis, Kyriakos G and Lewis, Frank L and Dixon, Warren E, A novel actor–
46 critic–identifier architecture for approximate optimal control of uncertain nonlinear sys-
tems, Automatica, 49(1), 82–92 (2013)
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19 Fig. 8 The weight of NN for Critic
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38 Fig. 9 The weight of NN for Actor
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
Click here to access/download;Figure;Sin_RISE.png
Click here to access/download;Figure;Sin_Tracking_q.jpg
Click here to access/download;Figure;Step_RISE.jpg
Click here to access/download;Figure;Step_Tracking_q.jpg
Click here to access/download;Figure;W_a.jpg
Click here to access/download;Figure;W_c.jpg
Click here to access/download
Supplementary Material
JEET.tex
Authorship change form Click here to access/download;Authorship change form;Final
Response_JEET_24-2-2020-Final.pdf
Re: Submission EETE-D-19-01430 entitled “Online Adaptive Reinforcement

Learning of Uncertain Nonlinear Mechanical Systems”
Dear Professor Jeong-whan Lee,
Please find enclosed the revised submission of the manuscript EETE-D-19-01430

entitled “Online Adaptive Reinforcement Learning of Uncertain Nonlinear Mechanical
Systems.” We would like to thank you and the Associate Editor for considering the
submission, and three reviewers for providing constructive comments and suggestions.
The revised version of this submission has been carefully checked and modified, and
all the comments and suggestions provided by the reviewers have been taken into
account. List of changes of the revision and detailed responses to the reviewers’
comments are provided. Based on the comment of Reviewer #1, the title was changed
to “Sliding Variable-based Online Adaptive Reinforcement Learning of
Uncertain/Disturbed Nonlinear Mechanical Systems.”
We hope that you find the revised manuscript suitable for publication in Journal of
Electrical Engineering & Technology.
Sincerely yours,
List of changes made in the revised manuscript:
1) The abstract, introduction, conclusion have been rewritten to clarify the
contribution of this submission as mentioned by Reviewer 1.
2) As mentioned by Reviewer 1, the title was changed to “Sliding Variable-based
Online Adaptive Reinforcement Learning of Uncertain/Disturbed Nonlinear
Mechanical Systems.”
3) As mentioned by Reviewer 1, Reviewer 4, and Reviewer 5, the simulation results
have been improved to point out the disturbance attenuation property of proposed
controller. Furthermore, the case of step function desired reference was also
added to emphasize the contribution in the revised manuscript.
4) As mentioned by Reviewer 1, several remarks and paragraphs, sentences have
been added to point out the role of Sliding variables, RISE in the disturbance
attenuation property and model order reduction of proposed controller.
5) As mentioned by Reviewer 4, several sentences, formulas have been modified
and added to explain several definitions, equations, selection of parameters in
Simulation Results.
6) As mentioned by Reviewer 5, the revised manuscript has been added to describe
the Neural Network in Actor/Critic parts.
7) As mentioned by Reviewer 1, Reviewer 4, and Reviewer 5, several remarks and
paragraphs, sentences have been added, modified to compare this submission
with existing works.
8) Several minor typos have been corrected, and some sentences have been revised
to improve the readability of the revised manuscript.
We further clarified several issues raised by the Editor and reviewers. Please find our
revised manuscript, and a summary of our responses to the reviewers.
Response to the report of the Editor
We would like to thank the Editor for handling this submission and raising valuable
suggestions to help us improve the quality of this paper. Here is our response to Editor’s
comment.
Comment - Based on the advice received, I have decided that your manuscript could
be reconsidered for publication should you be prepared to incorporate major revisions.
When preparing your revised manuscript, you are asked to carefully consider the
reviewer comments which can be found below, and submit a list of responses to the
comments. You are kindly requested to also check the website for possible reviewer
attachment(s).
While submitting, please check the filled in author data carefully and update them if
applicable - they need to be complete and correct in order for the revision to be
processed further.
In order to submit your revised manuscript, please access the following web site:
https://www.editorialmanager.com/eete/
Your username is: nam.daophuong@hust.edu.vn
If you forgot your password, you can click the 'Send Login Details' link on the EM Login
page.
We look forward to receiving your revised manuscript before 26 Feb 2020.

In order to add the due date to your electronic calendar, please open the attached file.
Response The authors really appreciate the Editor for handling this submission. The
reviewers’ comments have been taken into account, and the manuscript has been
modified accordingly. The responses to the reviewers’ comments are listed in the
following contents. We hope that you find the revised manuscript being satisfied
for publication in Journal of Electrical Engineering & Technology.
Response to the comments of Reviewer #1
The authors would like to thank the reviewer for the useful valuable comments and
advice. Here is a list of the changes that were made according to the reviewer’s and the
replies to the reviewer’s comments.
Comment 0 –
Please see the attachment.REFEREE’S REPORT [Submission ID: EETE-D-19-01430]

\Online Adaptive Reinforcement Learning of Uncertain Nonlinear Mechanical
Systems" The submitted manuscript aims to present an approximate optimal
control for uncertain mechanical systems being the form of Euler-Lagrange
systems, via the online adaptive reinforcement learning technique. In particular,
to deal with model uncertainty and external disturbance, the authors integrate
the sliding mode control (SMC) into the reinforcement learning (RL) framework.
For a possible publication at JEET, the authors are recommended to revise the
manuscript based on the comments listed below.
Response The authors really appreciate the reviewer’s effort reading through this
submission. The comments and suggestions raised by the reviewer have been
taken into account. Please see the following responses to each of the review
comments.
1) Comment 1 - Main contribution of this work is very unclear to me, especially

when compared with the work [25]. To the best of my knowledge, the previous
work [25] (and other related works on the RL) already took the plant
uncertainty into account. Thus, at first glance, it was hard for me to find the
underlying reason behind employing the SMC in addition to the RL controller
(that will be already robust in a sense).
I think in the present work much more emphasis must be placed on the effect of
the \external disturbance" on the overall RL control system, which was not dealt
with explicitly in [25]. More specifically, optimality of the underlying RL
controller may be lost when an (unmodeled) external disturbance enters the
system and the controller is designed without any consideration on this subtle
issue. A possible remedy for tackling the robustness would be to introduce a
robust control technique into the RL approach, which is actually done in the
present work with the sliding mode control. I believe this point would be
interesting to some possible readers and thus should be clarified in the revision.
Response: We would like to thank the reviewer for raising this suggestion.
The reviewer is correct that “…, the previous work [25] (and other related works
on the RL) already took the plant uncertainty into account”. The reference [25]
has been mentioned in the section of introduction in the revised manuscript. It
should be noted that authors in [25] considered an online adaptive reinforcement
learning-based method for a first-order continuous-time nonlinear autonomous
system x  f  x,u without any external disturbance. However, unlike the work
in [25], a disturbed manipulator described by a second-order continuous-time
nonlinear systems (1). Therefore, in order to employ the algorithm in [25], the
sliding variable s t   e1  1e1, 1   0 is proposed in this work to reduce the
nn
order of manipulator model. Moreover, it is worth noting that the adaptive

reinforecement learning was not still carried out in the above reduced order
system (9) because the description of non-autonomous systems. Hence, this
work presents the additional states to avoid these non-autonomous systems.
This proposed technique enables us to develop the adaptive reinforcement
learning as descibed in the revised manuscript. Furthermore, the external
disturbance in manipulator (1) was handled by framework of sliding variable
s t   e1 1e1, 1 nn  0 and the Robust Integral of the Sign of Error (RISE) (25,
26, 27). The sections of abstract, introduction, conclusion, main content and
Simulation Results in the revised manuscript have been modified and added to
clarify the main contribution as follows:
1. In the section of Abstract:

a. “In this work, the trajectory tracking control scheme is the framework of
optimal control and Robust Integral of the Sign of the Error (RISE),
sliding mode control technique for an uncertain/disturbed nonlinear robot
manipulator without holonomic constraint force is presented. The sliding
variable combining with RISE enable to deal with external disturbance
and reduced the order of closed systems. The adaptive reinforcement
learning technique by tuning simultaneously the actor-critic network to
approximate the control policy and the cost function, respectively. The
convergence of weight as well as tracking control problem was
determined by theoretical analysis. Finally, the numerical example is
investigated to validate the effectiveness of proposed control scheme.”
2. In the section of Introduction:
a. “Furthermore, by integrating the additional identifier, the nonlinear
systems were controlled by online adaptive reinforcement learning with
completely unknown dynamics [9], [25]. However, these three above
works have not mentioned for robotic systems as well as non-autonomous
systems yet [18], [19], [25]”.
b. “Unlike the reinforcement learning scheme based optimal control in [16],
[18], [19], [21], [25] are considered for mathematical systems of a first-
order continuous-time nonlinear autonomous system without any
external disturbance, the contribution is described that the adaptive
dynamic programming combining with the sliding variable and the Robust
Integral of the Sign of Error (RISE) were employed for second-order
uncertain/disturbed manipulators in the situation of trajectory tracking
control, non-autonomous systems.”
3. In the section of Conclusion:
a. This paper addresses the problem of adaptive reinforcement learning
design for a second-order uncertain/disturbed manipulators in connection
with sliding variable and RISE technique. Thanks to the online ADP
algorithm based on the Neural Network, the solution of HJB equation was
achieved by iteration algorithm to obtain the controller satisfying not only
the weight convergence but also the trajectory tracking problem in the
situation of non-autonomous closed systems. Offline simulations were
developed to demonstrate the performance and effectiveness of the
optimal control for manipulators.
4. In the section of Dynamic Model of Robot Manipulator:
a. Remark 1: The role of above sliding variable is considered to reduce the
order of second-order uncertain/disturbed manipulator systems. It enables
us to employ the adaptive reinforcement learning for a first-order
continuous-time nonlinear autonomous system. Additionally, the
external disturbance d  t  and nonlinear function f are handled by RISE
in the next section.
b. Remark 3: In early works [1], the optimal control design was considered
for uncertain/disturbed mechanical systems by the RISE framework. The
work in [1] was extended by integrating adaptive reinforcement learning
in the trajectory tracking problem with the consideration of non-
autonomous systems.
5. In the section of Simulation Results:
c. Remark 4: It is worth noting that the simulation results in Figs. 2-9
illustrate the good behavior of trajectory tracking problem, the
convergence in actor/critic neural network weights in the presence of
dynamic uncertainties, external disturbances. This work is the remarkable
extension of the work in [25], which only mention the first-order
mathematical model without any disturbances. Additionally, the optimal
control algorithm for manipulators was not considered the adaptive
dynamic programming technique [1].
Comment 2 Since the current manuscript seems not properly point out the main
contribution of the work, the authors are suggested to rewrite the abstract, the
introduction, and even the title of the mansucript.
Response: The authors are grateful to the reviewer for this valuable suggestion. The
abstract, the introduction, and the title of the revised manuscript have been
rewritten to clarify the main contribution of this work as follows:
1. The title of the manuscript is modified as “Sliding Variable-based Online

Adaptive Reinforcement Learning of Uncertain/Disturbed Nonlinear
Mechanical Systems”
optimal control and Robust Integral of the Sign of the Error (RISE),
sliding mode control technique for an uncertain/disturbed nonlinear robot
variable combining with RISE enable to deal with external disturbance
and reduced the order of closed systems. The adaptive reinforcement
learning technique by tuning simultaneously the actor-critic network to
approximate the control policy and the cost function, respectively. The
convergence of weight as well as tracking control problem was
determined by theoretical analysis. Finally, the numerical example is
investigated to validate the effectiveness of proposed control scheme.”
a. “Furthermore, by integrating the additional identifier, the nonlinear
systems were controlled by online adaptive reinforcement learning with
completely unknown dynamics [9], [25]. However, these three above
works have not mentioned for robotic systems as well as non-autonomous
systems yet [18], [19], [25]”.
b. “Unlike the reinforcement learning scheme based optimal control in [16],
Comment 3 - The simulation part also needs to be improved, especially for clarifying
the role of the SMC in enhancing the robustness of the present framework against
extnernal disturbance. For example, the relevant part would be more persuasive
with additional simulations where the RL controller without the SMC does not
guarantee any convergence of the weights to optimal values due to the
disturbance.
Response: We are grateful to the reviewer for this valuable suggestion. The role of
SMC in proposed controller was described that the sliding variable
s t   e1 1e1, 1 nn  0 enables us to reduce the order of manipulator model,
which is the second-order continuous-time nonlinear systems (1). Furthermore,
the external disturbances in manipulator (1) was handled by framework of sliding
variable s t   e1  1e1, 1   0 and the main part of Robust Integral of the
nn
Sign of Error (RISE) (25, 26, 27). The sections of Simulation Results in the
revised manuscript has mentioned the solution for external disturbances based
on the RISE estimating the total of external disturbance d  t  and nonlinear
function f  t  . Futhermore, the simulation scenario is still considered in a new

situation that the desired trajectory is a step function. Several additional
simulation results (Fig. 3, 7) and comments have been added to clarify the
disturbance attenuation property in the revised manuscript.
Comment 4 - In the main body of the manuscript, every sentence begins with an
indentation, which decreases the readability of the manuscript. Please make the
indentation only at the beginning of a paragraph.
Response: The authors really appreciate the reviewer’s suggestion. The revised
manuscript has been modified to make the readability.

Comment 1 - The symbol d(t) in eq. 1 is not defined.
Response: We would like to thank the reviewer for raising this valuable suggestion.
The symbol d(t) in eq.1 has been explained in revised manuscript.
Comment 2 - Has eq. 4 written correctly?
Response: We would like to thank the reviewer for pointing out this comment. The eq.
4 in the revised manuscript described the sliding surface, which is defined as the
Set of e1  t    satisfying the following equality: s t   e1 1e1  0,1   0
ref nn
. Regarding the comment by the reviewer, we have modified eq.4 and added the
sentence in front of eq.4 in the revised manuscript.
Comment 3 - The authors mentioned that eq. 5 comes from eq. 1 and eq. 4. Could you
show that in detail.
Response We are grateful to the reviewer for this suggestion. The eq. 5 is obtained from
the sliding variable s t   e1 1e1 and the dynamic equation (1). Therefore, we
have added more comments in front of eq. 5 in the revised manuscript to make
the result more complete.
Comment 4 - In the Simulation Results section, the values of many parameters are
selected such as: lemda 1, lemda 2, kc, …. Which method the authors have used
to find those values?
Response: We would like to thank the reviewer for raising this suggestion. The section
of Simulation Results in the revised manuscript has been modified to describe
the method as follows “… The design parameters in sliding variable s t   e1 1e1
are chosen to satisfy that 1  is a constant positive definite matrix:”, “… The

nn
remaining control gains in RISE framework are chosen satisfying (25), (26), (27)
as: ”, “…and the gains in Actor-Critic learning laws are selected guaranteeing
(21) – (24) as: ”. Moreover, the content in front of Remark 3 is also revised as
“… and ks  , 1  , 2  are the positive diagonal matrices and
nn nn nn
1 , 2  are the positive control gains selected satisfying the sufficient
condition as: ...”
Comment 5 - The step function sould be used as input signal to test the performance
of the controlled system.
Response: The authors are grateful the reviewer for this valuable suggestion. The
section of Simulation Results in the revised manuscript has been added and
modified the case 2 of the step function desired reference signal with the results
in Fig 6-9.
Comment 6 - The robustness of the proposed controller must be tested.
Response: We would like to thank the reviewer for raising this suggestion. The
robustness in manipulator (1) was obtained by framework of sliding variable
s t   e1 1e1, 1 nn  0 and the main part of Robust Integral of the Sign of Error
(RISE) (25, 26, 27). The sections of Simulation Results in the revised manuscript
has mentioned the solution for external disturbances based on the RISE
estimating the total of external disturbance d  t  and nonlinear function f  t  .
Several additional simulation results (Fig. 3, 7) and comments have been added
to clarify the disturbance attenuation property in the revised manuscript.
Comment 7 - The proposed controller must be compared with other work that is done
in this field.
Response: The authors really appreciate the reviewer for raising this valuable
suggestion. The revised manuscript has been modified extensively by adding
more the remarks, comments to show the difference in comparison with other
work. The sections of abstract, introduction, conclusion, main content,
simulation in the revised manuscript have been modified and added to clarify the
main contribution as follows:

optimal control and Robust Integral of the Sign of the Error (RISE), sliding
mode control technique for an uncertain/disturbed nonlinear robot
variable combining with RISE enable to deal with external disturbance and
reduced the order of closed systems. The adaptive reinforcement learning
technique by tuning simultaneously the actor-critic network to approximate
the control policy and the cost function, respectively. The convergence of
weight as well as tracking control problem was determined by theoretical
analysis. Finally, the numerical example is investigated to validate the
effectiveness of proposed control scheme.”
a. “Unlike the reinforcement learning scheme based optimal control in [16],
3. In the section of Dynamic Model of Robot Manipulator:
a. Remark 3: In early works [1], the optimal control design was considered
for uncertain/disturbed mechanical systems by the RISE framework. The
work in [1] was extended by integrating adaptive reinforcement learning
in the trajectory tracking problem with the consideration of non-
autonomous systems.
4. In the section of Simulation Results:
a. Remark 4: It is worth noting that the simulation results in Figs. 2-5
illustrate the good behavior of trajectory tracking problem, the
convergence in actor/critic neural network weights in the presence of
dynamic uncertainties, external disturbances. This work is the
remarkable extension of the work in [25], which only mention the first-
order mathematical model without any disturbances. Additionally, the
optimal control algorithm for manipulators was not considered the
adaptive dynamic programming technique [1].
Comment 1 - The English language of the paper needs improvement.
Response: We would like to thank the reviewer for providing several constructive
suggestions that lead to significant improvement of this manuscript. In addition
to the modifications resulting from the reviewer’s comments, we have also
checked and corrected many typos and grammar mistakes. Hope you will find
this manuscript satisfactory for publication in Journal of Electrical Engineering
& Technology.
Comment 2 - In line 27, parameter τd(t) does not appear, however, d(t) in eq.(1) is not
described.
Response: We would like to thank the reviewer for raising this valuable suggestion.
The symbol d(t) in eq.1 has been explained in revised manuscript.
Comment 3 - In this paper, the neural network (NN) has been mentioned, but it is not
described in detail. The content of neural network (NN) should be added and
described.
Response: We are grateful to the reviewer for this valuable suggestion. The content of
neural network (NN) has been added and modified in the chapter 3 of the revised
manuscript as follows “The objective of establishing the NN (16) is to find the
, W
actor/critic NN updating laws W  to approximate the actor and critic parts
a c
obtaining the optimal control law without solving the HJB equation (more details
see [25]). Moreover, the smooth NN activation function   X   is chosen
N
depending on the description of manipulators (see chapter 4).”
Comment 4 - In Chapter 4, how to show the superiority of the proposed algorithm?

Please give details.
Response: We would like to thank the reviewer for the constructive comment. In
chapter 4 of revised manuscript, the remark 4 has been added to illustrate the
superiority of the proposed algorithm:
Remark 4: It is worth noting that the simulation results in Figs. 2-9 illustrate the
good behavior of trajectory tracking problem, the convergence in actor/critic
neural network weights in the presence of dynamic uncertainties, external
disturbances. This work is the remarkable extension of the work in [25], which
only mention the first-order mathematical model without any disturbances.
Additionally, the optimal control algorithm for manipulators was not considered
the adaptive dynamic programming technique [1].
Comment 5 - Chapter 5 conclusions is not specific and detailed enough.
Response: We would like to thank the reviewer for the constructive comment. The
section of conclusions has been modified as “This paper addresses the problem
of adaptive reinforcement learning design for a second-order uncertain/disturbed
manipulators manipulator in connection with sliding variable and RISE
technique. Thanks to the online ADP algorithm based on the Neural Network,
the solution of HJB equation was achieved by iteration algorithm to obtain the
controller satisfying not only the weight convergence but also the trajectory
tracking problem in the situation of non-autonomous closed systems. Offline
simulations were developed to demonstrate the performance and effectiveness of
the optimal control for manipulators.”

Eete D 19 01430 - R1 PDF

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Eete D 19 01430 - R1 PDF

Загружено:

Авторское право:

Доступные форматы

Journal of Electrical Engineering & Technology

Sliding Variable-based Online Adaptive Reinforcement Learning of Uncertain/Disturbed

Manuscript Number: EETE-D-19-01430R1

Full Title: Sliding Variable-based Online Adaptive Reinforcement Learning of Uncertain/Disturbed

Article Type: Original Article

Corresponding Author: Nam Phuong Dao, Dr

Corresponding Author Secondary

Corresponding Author's Institution: Hanoi University of Science and Technology

Corresponding Author's Secondary

First Author: Tu Van Vu, Graduate Student

First Author Secondary Information:

Order of Authors: Tu Van Vu, Graduate Student

Nam Phuong Dao, Dr

Loc Thanh Pham, Graduate Student

Huy Quang Tran, Gradute Student

Order of Authors Secondary Information:

Response to Reviewers: Dear Professor Jeong-whan Lee,

Please find enclosed the revised submission of the manuscript EETE-D-19-01430

Title Page(Cover letter)

Sliding Variable-based Online Adaptive

We believe that this manuscript is appropriate for publication by Journal of Electrical

Noname manuscript No.

1 the advantages of the event-triggering mechanism, the computation load of ro-

1 Subsequently, the adaptive reinforcement learning is considered to find optimal

1 where W ∈ RN is vector of unknown ideal NN weights, N is the number of

1 Remark 2: The convergences of estimated actor/critic weights Ŵc and

1 Table 1 Comparison between the proposed algorithm and exact values

Re: Submission EETE-D-19-01430 entitled “Online Adaptive Reinforcement

Dear Professor Jeong-whan Lee,

Please find enclosed the revised submission of the manuscript EETE-D-19-01430

We look forward to receiving your revised manuscript before 26 Feb 2020.

Please see the attachment.REFEREE’S REPORT [Submission ID: EETE-D-19-01430]

1) Comment 1 - Main contribution of this work is very unclear to me, especially

order of manipulator model. Moreover, it is worth noting that the adaptive

1. In the section of Abstract:

1. The title of the manuscript is modified as “Sliding Variable-based Online

function f  t  . Futhermore, the simulation scenario is still considered in a new

Response to the comments of Reviewer #4

Comment 1 - The symbol d(t) in eq. 1 is not defined.

are chosen to satisfy that 1  is a constant positive definite matrix:”, “… The

Comment 6 - The robustness of the proposed controller must be tested.

1. In the section of Abstract:

Comment 1 - The English language of the paper needs improvement.

depending on the description of manipulators (see chapter 4).”

Comment 4 - In Chapter 4, how to show the superiority of the proposed algorithm?

Comment 5 - Chapter 5 conclusions is not specific and detailed enough.

Вам также может понравиться