Near Optimal Event-Triggered Control of Nonlinear ... - IEEE Xplore

22 downloads 89 Views 1MB Size Report
Abstract— This paper presents an event-triggered near optimal control of uncertain nonlinear discrete-time systems. Event- driven neurodynamic programming ...
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS

1

Near Optimal Event-Triggered Control of Nonlinear Discrete-Time Systems Using Neurodynamic Programming Avimanyu Sahoo, Hao Xu, Member, IEEE, and Sarangapani Jagannathan, Senior Member, IEEE

Abstract— This paper presents an event-triggered near optimal control of uncertain nonlinear discrete-time systems. Eventdriven neurodynamic programming (NDP) is utilized to design the control policy. A neural network (NN)-based identifier, with event-based state and input vectors, is utilized to learn the system dynamics. An actor–critic framework is used to learn the cost function and the optimal control input. The NN weights of the identifier, the critic, and the actor NNs are tuned aperiodically once every triggered instant. An adaptive event-trigger condition to decide the trigger instants is derived. Thus, a suitable number of events are generated to ensure a desired accuracy of approximation. A near optimal performance is achieved without using value and/or policy iterations. A detailed analysis of nontrivial inter-event times with an explicit formula to show the reduction in computation is also derived. The Lyapunov technique is used in conjunction with the event-trigger condition to guarantee the ultimate boundedness of the closedloop system. The simulation results are included to verify the performance of the controller. The net result is the development of event-driven NDP. Index Terms— Event-triggered control (ETC), Hamilton– Jacobi–Bellman equation, neural networks (NNs), neurodynamic programming (NDP), optimal control.

I. I NTRODUCTION

E

VENT-TRIGGERED control (ETC) [1]–[6], which is evolved as an alternate control paradigm in the recent times, is found to be effective in terms of resource utilization. The ETC scheme uses events to sample the system state and execute the controller in an aperiodic manner. The aperiodic sampling and execution reduces the computational costs for the closed-loop system. In the case of a networked control system (NCS) [7], the ETC scheme saves network bandwidth due to the event-based aperiodic transmissions. The sampling and transmission instants, referred to as event-trigger instants, are decided using a state-dependent criterion. The threshold in

Manuscript received January 23, 2014; revised May 15, 2015 and June 7, 2015; accepted June 19, 2015. This work was supported in part by Intelligent Systems Center, Missouri University of Science and Technology, Rolla, MO, USA, and in part by the National Science Foundation under Grant ECCS 1406533. A. Sahoo and S. Jagannathan are with the Department of Electrical and Computer Engineering, Missouri University of Science and Technology, Rolla, MO 65409 USA (e-mail: [email protected]; [email protected]). H. Xu is with the Department of Electrical Engineering, College of Science and Engineering, Texas A&M University–Corpus Christi, Corpus Christi, TX 78412 USA (e-mail: [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TNNLS.2015.2453320

the criterion is designed analytically via the Lyapunov stability technique. Thus, the event-triggered paradigm saves resources, and maintains both stability and closed-loop performance. Recently, various ETC schemes [1]–[6] have been introduced in the literature for linear [3], [4] and nonlinear systems [1], [2]. Typically, in the ETC schemes, the system dynamics are considered either completely known [1], [2], or with a small uncertainty [3]. In contrast, in [5] and [6], an attempt has been made to design the event-based controllers for systems with uncertain dynamics. In [5], the knowledge of the system dynamics is partially relaxed using an event-based neural network (NN) approximator. The NN-based design is extended to the case of completely unknown dynamics in [6]. In both the cases, the state-dependent criteria, referred to as event-trigger conditions, are made adaptive. This is in contrast with the traditional nonadaptive event-trigger conditions [1], [2]. These adaptive criteria generated a required number of events during the initial online learning phase of NN. This facilitated the eventbased approximation of the unknown dynamics with aperiodic weight update. A tradeoff is observed between the accuracy of NN approximation and the reduction in computation. However, these controller designs [5], [6] render only stability without optimizing any performance index. Imer and Basar [9] studied the optimal ETC in a constrained communication scenario using the certainty equivalence principle. Furthermore, Molin and Hirche [10] extended the linear quadratic Gaussian approach to an event-triggered context using a separation principle. However, these methods [9], [10] use backward-in-time Riccati equation-based solution with completely known system dynamics. Traditionally, adaptive dynamic programming [11]–[14] or neurodynamic programming (NDP) [15] techniques are used to design the optimal control policy in a forward-in-time and online manner. These techniques use the policy and/or value iterations to solve the Hamilton–Jacobi–Bellman (HJB) equation online. However, a significant number of iterations within a sampling interval are needed to maintain the system stability resulting in a high computational cost. Furthermore, the knowledge of the control coefficient function is also necessary to compute the optimal control policy. For a finite-time [16] optimal control, the solution to the HJB equation (i.e., the cost function) becomes explicitly time varying. The terminal cost constraint must also be satisfied at the same time. The event-based sampling of the state

2162-237X © 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. 2

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS

vector and uncertain system dynamics complicate the problem further. Therefore, NDP over the finite-horizon becomes more involved than in the infinite horizon case. Motivated by the above limitations, in this paper, we propose a novel NDP technique to solve the fixed final time optimal control. An event-triggered uncertain nonlinear discrete-time system is considered for the purpose of design. The proposed approach functions in a forward-in-time and online manner. Two NNs, in an actor–critic [17] framework, are used to approximate the time-varying cost function and the optimal control input. An NN identifier is also used to relax the complete knowledge of the system dynamics. A novel adaptive event-trigger condition is developed which not only reduces the number of controller updates but also facilitates the NN approximation. Aperiodic NN tuning laws are introduced to update the identifier, the actor, and the critic NN weights. The NN weights are updated once a triggered instant and held during the interevent times. These aperiodic updates reduce the computation when compared with the traditional NN-based schemes [18]. The Lyapunov direct method in [2] and [19] is used to prove the ultimate boundedness (UB) of the closed-loop eventtriggered system. The contributions of this paper include: 1) the design of the event-triggered finite-time optimal control scheme for an uncertain nonlinear discrete-time system; 2) the design of a novel adaptive event-trigger condition; 3) the development of the tuning scheme to update the NNs aperiodically to save the computation; and 4) the demonstration of the closed-loop stability using the Lyapunov technique. The rest of this paper is organized as follows. Section II presents the background along with the problem statement. Section III details the finite-horizon event-based optimal control design. The main results are claimed in Section IV, and the nontriviality of the inter-event times is discussed in Section V. The simulation results are included in Section VI. Finally, the conclusions are drawn in Section VII. The Appendix contains the detailed proofs of the lemmas and the theorems. II. BACKGROUND In this section, we present a brief background on the ETC. Subsequently, the near optimal control design is formulated. A discussion on the extension of NN approximation to eventbased sampling is also presented. A. Background on ETC Consider the uncertain nonlinear discrete-time system represented as x k+1 = f (x k ) + g(x k )u k

(1)

where x k ∈ n and u k ∈ m represent the system state and the control input vectors, respectively. The smooth functions f (x k ) ∈ n and g(x k ) ∈ n×m denote the system dynamics that are considered unknown. Let the equilibrium point x = 0 be unique in a compact set D x for all x k ∈ D x ⊂ n . The following standard assumption is necessary in order to proceed.

Assumption 1: The system (1) is controllable and observable. The unknown control coefficient matrix g(x k ) is bounded for all x k ∈ D x , such that g(x k ) ≤ g M , where g M > 0 is a known positive constant. The state vector is available for the measurement. In the event-triggered formalism, the system state vector x k is released, and the controller is updated only when an event occurs. Hence, zero-order-holds (ZOHs) are used to retain the last event-sampled state and the control input vectors until the next arrive. The error between the current measured state vector, x k , and the state vector at the ZOH, x˘k , is referred to as event-trigger error. It is defined by eET,k = x k − x˘k .

(2)

The event-trigger error (2) is used to determine the event-trigger instants by comparing it with a state-dependent threshold. A monotonically increasing subsequence of time instants {ki }∞ i=1 with k 0 = 0 can be defined as the event-trigger instants. The last held state vector, x˘k , at ZOH is updated at each k = ki for i = 1, 2, . . . with the current system state. Thus, the last held state vector can be written as x˘k = x ki , ki ≤ k < ki+1 , i = 1, 2, . . .

(3)

In an event-based framework, the control input can be described as u k = υ(x˘k ), ki ≤ k < ki+1 ∀ i = 1, 2, . . .

(4)

where υ(x˘k ) is a function of the event-based state vector. Next, the problem for the finite-horizon optimal control in an event-based scenario is formulated. B. Problem Formulation Our primary objective is to design a sequence of control inputs, u k , to minimize a time-varying cost function in an ETC framework. The cost function is given by V (x k , k) = ψ(x N , N) +

N−1 

r (x j , u j , j )

(5)

j =k

where r (x j , u j , j ) = Q(x j , j ) + u Tj Ru j is the cost-to-go in the interval of interest j ∈ [k, N]. The function Q(x k , k) ∈  is a positive definite function that penalizes the system state, x k . The matrix R ∈ m×m is a positive definite matrix that penalizes the control input, u k . The terminal cost ψ(x N , N) penalizes the terminal state x N , where N is the terminal time instant. For the finite-horizon case, the cost-to-go, r (x k , u k , k), depends explicitly on time k in the interval of interest [k, N]. Therefore, the control input also becomes time varying. Assumption 2: The initial control input, u 0 , is admissible [17] to keep the cost function finite. The terminal cost for the finite-horizon cost function (5) can be written as V (x N , N) = ψ(x N , N)

(6)

where V (x N , N) is the cost at the terminal time N. The cost function (5) can also be rewritten as V (x k , k) = r (x k , u k , k) + V (x k+1 , k + 1)

(7)

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. SAHOO et al.: NEAR OPTIMAL ETC OF NONLINEAR DISCRETE-TIME SYSTEMS

3

 where V (x k+1 , k + 1) = V (x N , N ) + N−1 j =k+1 r (x j , u j , j ) is the cost from time instant k + 1 onward. According to Bellman’s principle of optimality, the optimal cost, V ∗ (x k , k), satisfies the discrete-time HJB equation. It is given by V ∗ (x k , k) = min{r (x k , u k , k) + V ∗ (x k+1 , k + 1)} uk

(8)

where V ∗ (x k , k) is the optimal cost at the time instant k, and V ∗ (x k+1 , k + 1) is the optimal cost for k + 1 onward. The optimal control sequence u ∗k can be derived using the stationarity condition [8] and written as u ∗k = −(1/2)R −1 g T (x k )∂ V ∗ (x k+1 , k + 1)/∂ x k+1 .

(9) Fig. 1.

The optimal control policy (9) depends explicitly on the solution of the HJB equation, i.e., the optimal cost V ∗ (x k , k). The control policy is also a function of control coefficient function g(x k ) and the state vector x k+1 at the time instant k. It is practically almost impossible to find an analytical solution of the HJB equation. Therefore, approximationbased techniques (NDP) are used to solve the HJB equation. In this paper, the actor and the critic NNs are utilized to approximate both the optimal control policy and the cost function, respectively, with the event-based availability of the system state vector. Hence, the universal approximation property of the NNs is revisited with an extension to eventbased approximation. C. NN Approximation with Event-Based Sampling The universal approximation property [18] of NN can be extended to achieve a desired level of accuracy with the eventbased availability of the state vector in (3). The following theorem extends the approximation property of NNs for eventbased sampling. Theorem 1: Let h(x k , k) ∈ n be a smooth and continuous function in a compact set for all x ∈ D x . Then, there exists an NN with a sufficient number of neurons, such that h(x k , k) can be approximated with event-sampled inputs. Furthermore, the function h(x k , k) with the constant weights and the eventbased time-varying activation function is given by h(x k , k) = W T σ (x˘k , k) + εe (x˘k , eET,k , k)

(10)

where W ∈ l×n is the constant unknown target weight matrix with l hidden-layer neurons, while σ (x˘k , k) ∈ l is a bounded event-based time-varying activation function. The function εe (x˘k , eET,k , k) = W T [σ (ϑ(x˘k , eET,k ), k) − σ (x˘k , k)] + ε(ϑ(x˘k , eET,k ), k) is the event-sampled reconstruction error, where ϑ(x˘k , eET,k ) = x˘k + eET,k . Then, the function σ (ϑ(x˘k , eET,k ), k) = σ (x k , k) is the periodic timebased activation function, ε(ϑ(x˘k , eET,k ), k) = ε(x k , k) is the traditional reconstruction error, and x˘k is the latest available event-sampled state. Proof: Refer to the Appendix. Remark 1: The event-based reconstruction error εe (x˘k , eET,k , k) is a function of event-trigger error, eET,k , and the traditional NN reconstruction error, ε(x k , k). An arbitrarily small event-based reconstruction error can be obtained by increasing both the frequency of events and the number of neurons. Consequently, the design of an event-trigger

Block diagram representation of the ETC system.

condition is necessary by considering a tradeoff between the reconstruction error and the computational load. III. E VENT-BASED O PTIMAL C ONTROLLER D ESIGN In this section, the near optimal event-triggered controller design is detailed for the uncertain discrete-time system. A. Proposed Solution The proposed optimal ETC system is shown in Fig. 1. It consists of: 1) a nonlinear discrete-time system, smart sensor, and trigger mechanism with a mirror actor–critic network and 2) an event-based optimal controller. The event-based optimal controller entails three NNs as online approximators: 1) the identifier; 2) the critic; and 3) the actor NNs. These three NNs are used to approximate the system dynamics, the time-varying cost function, which is the solution to the HJB equation, and the control input, respectively. All the NNs use activation functions with event-sampled inputs. The NN weights are updated at the trigger instants only in an aperiodic manner. The event-trigger instants, ki for i = 1, 2, . . . are decided by the smart sensor and the trigger-mechanism. The event-trigger condition is evaluated at every time instant k to determine the trigger instants. At the trigger instants, the current system state vector, x ki , and its previous value x ki −1 for i = 1, 2, . . . are together sent to the controller. These event sampled state vectors are subsequently used to update the NN weights and the control input. The updated value of the control input is then sent to the system and held by the ZOH, and utilized until the next update. Most importantly, the event-trigger condition is made adaptive by designing a suitable threshold. This adaptive trigger condition ensures an online approximation of nonlinear functions, as discussed in Remark 1. The threshold is designed as a function of the actor NN weight estimates and the system state vector. To evaluate the event-trigger condition, the trigger mechanism consists of a mirror actor–critic NN (see Fig. 1). This mirror actor–critic NN operates in synchronism with the one at the controller. Both the actor–critic NNs are initialized with the same initial values. The NN weights are adjusted with the events. Thus, the adaptive trigger condition gets updated at every trigger instant.

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. 4

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS

Remark 2: The mirror actor–critic NN estimates the NN weights locally at the trigger mechanism, thus relaxing the need for the transmission of NN weights from the controller to the trigger mechanism in the case of NCS. Therefore, the transmission cost only depends upon the transmission of the system state and the control input vector. Although, the addition of a mirror actor–critic NN increases the computational cost, the overall computation is still reduced due to the eventbased execution (also see the simulation section). B. Identifier Design The input coefficient matrix function g(x k ) is required to compute the optimal control policy (9). This will be generated by the NN-based identifier. The universal approximation property of NNs, in a compact set, can be used to represent the nonlinear system in (1). It is given by x k+1 = [W Tf

W IT σ I (x k )u¯ k

WgT ]T

+ ε I (x k )

(11)

(m+1)l I ×n

where W I = ∈ denotes the unknown constant target weight matrix of the identifier NN. The matriT W T . . . W T ]T ∈ ml I ×n ces W f ∈ l I ×n , Wg = [Wg1 gm g2 l ×n and Wgp ∈  I for p = 1, . . . , m. The function σ I (x k ) = diag{σ f (x k ), σg (x k )} ∈ (m+1)l I ×(m+1) represents the NN activation function matrix, where σ f (x k ) ∈ l I and σg (x k ) = diag{σg1 (x k ), σg2 (x k ), . . . , σgm (x k )} ∈ ml I ×m . Furthermore, ε I (x k ) = ε f (x k ) + εg (x k )u k ∈ n denotes the identifier NN reconstruction error, where ε f (x k ) ∈ n and εg (x k ) ∈ n×m are the traditional reconstruction errors and u¯ k = [1 u kT ]T ∈ m+1 is the augmented control input. The subscript f and g are used to denote the variable for the functions f (x k ) and g(x k ), respectively. The number of neurons in the hidden layer is denoted by l I . The notation diag{·} denotes the matrix formed by the activation function vectors as diagonal blocks, and the off diagonals are zero vectors of appropriate dimensions. Assumption 3 [18]: The target weight vector, W I , the activation function, σ I (x k ), and the traditional reconstruction error, ε I (x k ), of the NN are upper bounded, such that W I  ≤ W I,M , σ I (·) ≤ σ I,M and ε I (·) ≤ ε I,M , where W I,M , σ I,M , and ε I,M are positive constants. The control input is updated only at the event-trigger instants and requires the approximated identifier dynamics at these instants. Therefore, the event-based identifier dynamics can be represented as ˆ x˘k )u k , ki ≤ k < ki+1 , i = 1, 2, . . . xˆk+1 = fˆ(x˘k ) + g(

(12)

where xˆk ∈ n being the identifier state vector at the time instant k. The functions fˆ(x˘k ) ∈ n and g( ˆ x˘k ) ∈ n×m represent the approximated identifier dynamics. Note that the identifier structure is based on event-sampled states and held during the inter-event times. This novel event-based structure is selected to reduce an additional and redundant computation during the inter-event times. The identifier dynamics (12) with the NN approximation can be written as T σ I (x˘k )u¯ k , ki ≤ k < ki+1 xˆk+1 = Wˆ I,k

(13)

T ]T ∈ (m+1)l I ×n is the actual where Wˆ I,k = [Wˆ Tf,k Wˆ g,k estimated weight matrix, and σ I (x˘k ) ∈ (m+1)l I ×(m+1) is the event-sampled activation function matrix for the identifier NN. The identification error can be written as e I,k = x k − xˆk . Hence, the identification error dynamics using (11) and (13) are found to be T T e I,k+1 = W˜ I,k σ I (x k )u¯ k + Wˆ I,k (σ I (x k ) − σ I (x˘k ))u¯ k + ε I,k (14)

for ki ≤ k < ki+1 , i = 1, 2, . . ., where W˜ I.k = W I − Wˆ I,k is the identifier NN weight estimation error. The reconstruction error is denoted by ε I,k = ε I (x k ) for brevity. Consider the case when an event is triggered, i.e., x˘k = x k for k = ki . The identifier dynamics in (13) with the updated state vector can be expressed as T σ I (x k )u¯ k , k = ki , i = 1, 2, . . . xˆk+1 = Wˆ I,k

(15)

Therefore, the identification error dynamics from (14) for k = ki are written as T σ I (x k )u¯ k + ε I,k , k = ki , i = 1, 2, . . . e I,k+1 = W˜ I,k

(16)

The event-based tuning law for the NN identifier weights now can be selected as ⎧ α I σ I (x k−1 )u¯ k−1 e TI,k ⎪ ⎪ ⎪ ⎨Wˆ I,k−1 + [σ I (x k−1 )u¯ k−1 ]T [σ I (x k−1 )u¯ k−1 ] + 1 Wˆ I,k = k = ki ⎪ ⎪ ⎪ ⎩ˆ W I,k−1 , ki−1 < k < ki (17) where α I is the learning gain. The update law (17) requires the state vector x ki −1 at trigger instant k = ki . Hence, the current state, x ki , and the previous state, x ki −1 , are together sent to the controller, as proposed in Section III-A. The weight update law (17) is aperiodic in nature and hence saves the computation. The identifier NN weight estimation error dynamics from (17), forwarding one time instant ahead, can be expressed as W˜ I,k+1 ⎧ ⎪ α I σ I (x k )u¯ k e TI,k+1 ⎨˜ , W I,k − = [σ I (x k )u¯ k ]T [σ I (x k )u¯ k ] + 1 ⎪ ⎩W˜ , I,k

k = ki ki < k < ki+1 . (18)

The UB of the identifier NN weight estimation error is guaranteed by the following lemma. Before introducing the lemma, the following assumption is needed. Assumption 4: The identifier NN activation function σ I (x k ) is Lipschitz continuous in a compact set for all x k ∈ D x . Then, there exists a constant Cσ I , such that σ I (x k ) − σ I (x˘k ) ≤ Cσ I x k − x˘k . Lemma 1: Consider the nonlinear discrete-time system (1) along with the identifier (13). Assume Assumption 1 through 4 hold and the NN initial weights, Wˆ I,0 , is initialized in a compact set. Let the identifier NN weights are tuned by (17) at the event-trigger instants, and the activation function σ I (x k ) satisfies the persistency of excitation (PE) condition [18].

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. SAHOO et al.: NEAR OPTIMAL ETC OF NONLINEAR DISCRETE-TIME SYSTEMS

Suppose the control input is stabilizing and the learning gain α I satisfies 0 < α I < 1/2. Then, there exist two positive integers T and T¯ , such that the weight estimation error W˜ I,k is UB with a bound B M˜ for all ki > k0 + T or, alternatively, W ,I k ≥ k0 + T¯ for T¯ is a function of T. Proof: Refer to the Appendix. The stabilizing assumption for the control input is later relaxed in the closed-loop stability proof.

In this section, event-based actor–critic NN designs are presented. Besides the HJB or temporal difference (TD) error, an additional error term corresponding to the terminal cost is defined and used to tune the critic NN, such that the terminal cost can be properly satisfied. 1) Critic NN Design: Consider (7). It can be rewritten as (19)

The cost function in (5) using the universal approximation property of NN [18] in a compact set can be written as V (x k , k) = WVT ϕ(x k , k) + εV ,k

(20)

where WV ∈ lV is the unknown constant target critic NN weights, and ϕ(x k , k) ∈ lV is the time-varying activation function. The traditional NN reconstruction error is denoted by εV ,k = εV (x k , k) ∈ , for brevity. The number of hidden layer neurons in the network is given by l V . The following assumption holds for the critic NN. Assumption 5 [17]: The target NN weights, the activation functions, and the reconstruction errors of the critic NN are bounded above and satisfy WV  ≤ WV ,M , ϕ(·, ·) ≤ ϕ M , and |εV (·, ·)| ≤ εV ,M , where WV ,M , ϕ M and εV ,M are positive constants. The gradient of the activation function and the reconstruction error satisfies ∂ϕ(·, k)/∂(·) ≤ ∇ϕ M and ∂εV (·, k)/∂(·, k) ≤ ∇εV ,M , where ∇ϕ M and ∇εV ,M are positive constants. In addition, the activation function, ϕ(x k , k), is Lipschitz continuous for all x k ∈ D x and satisfies ϕ(x k , k) − ϕ(x˘k , k) ≤ Cϕ x k − x˘k  = Cϕ eET,k , where Cϕ is a positive constant. Equation (19) with (20) can be expressed as 0 = WVT ϕ(x k , k) + Q(x k , k) + u kT Ru k + εV ,k

(21)

where ϕ(x k , k) = ϕ(x k+1 , k + 1) − ϕ(x k , k) and εV ,k = εV ,k+1 − εV ,k . The approximated/estimated cost function by the critic NN with the event-based system states, x˘k , can be represented as Vˆ (x˘k , k) = Wˆ VT,k ϕ(x˘k , k), ki ≤ k < ki+1 , i = 1, 2, . . .

HJB error or the TD error, eHJB,k , associated with (21) can be written as eHJB,k = Q(x˘k , k) + u kT Ru k + Vˆ (x˘k+1 , k + 1) − Vˆ (x˘k , k) (23) for ki ≤ k < ki+1 , i = 1, 2, . . . Note that Q(x˘k , k) is a function of the event-based state vector. The HJB equation or the TD error (23) with the approximated cost function (22) can be represented as eHJB,k = Wˆ VT,k ϕ(x˘k , k)

C. Controller Design

0 = V (x k+1 , k + 1) + Q(x k , k) + u kT Ru k − V (x k , k).

5

(22)

where Wˆ VT,k ∈ lV is the estimated weight, and ϕ(x˘k , k) ∈ lV is the event-based time-varying activation function. The activation function is selected, such that ϕ(0, k) = 0 for x k  = 0 in order to ensure Vˆ (0) = 0. The approximated cost function (22) with the event-based availability of the system state x˘k for ki ≤ k < ki+1 , i = 1, 2, . . . does not satisfy the relation (21). Therefore, the

+ Q(x˘k , k) + u kT Ru k , ki ≤ k < ki+1

(24)

where Vˆ (x˘k+1 , k + 1) = Wˆ VT,k ϕ(x˘k+1 , k + 1), and

ϕ(x˘k , k) = ϕ(x˘k+1 , k + 1) − ϕ(x˘k , k). The terminal cost (6) in term of NN approximation (20) can also be represented as V (x N , N) = WVT ϕ(x N , N) + εV ,N

(25)

where ϕ(x N , N) and εV ,N = εV (x N , N) are the activation function and the reconstruction error, respectively, at the terminal time N. The approximated/estimated terminal cost from (22) can be expressed as Vˆ (x N , N) = Wˆ VT,N ϕ(x N , N).

(26)

The terminal state vector, x N , is not known. Thus, it is not possible to compute the estimated terminal cost (26) at time k and hence the actual terminal cost error. Therefore, a projected terminal cost error, eFC,k , can be represented as the difference between the desired terminal cost and the estimated cost at time instant, k. It is represented by eFC,k = ψ(x N , N) − Wˆ VT,k ϕ(x˘k , N) ki ≤ k < ki+1 , i = 1, 2, . . .

(27)

The activation function, ϕ(x˘k , N), is an explicit function of the final time N which is known. Thus, we can compute ϕ(x˘k , N) at time k. The total error in cost function estimation becomes etotal,k = eHJB,k + eFC,k ,

ki ≤ k < ki+1 , i = 1, 2, . . . (28)

At the event-trigger instants, k = ki , i = 1, 2, . . . the HJB equation or the TD error can be written from (24) as eHJB,k = Wˆ VT,k ϕ(x k , k) + Q(x k , k) + u kT Ru k

(29)

where ϕ(x k , k) = ϕ(x k+1 , k + 1) − ϕ(x k , k). Similarly, the terminal cost error from (27) for k = ki , i = 1, 2, . . . becomes eFC,k = ψ(x N , N) − Wˆ VT,k ϕ(x k , N).

(30)

The total error at trigger instant by combining (29) and (30) becomes etotal,k = Wˆ VT,k ϕ(x ¯ k , k) + Q(x k , k) + u kT Ru k + ψ(x N , N) (31) ¯ k , k) = ϕ(x k , k) − for k = ki , i = 1, 2, . . ., where ϕ(x ϕ(x k , N).

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. 6

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS

To minimize the total error in an event-triggered context, the update law of the critic NN, using the previous values, can be selected as ⎧ T ¯ k−1 , k − 1)etotal,k−1 αV ϕ(x ⎪ ⎪ ⎪ ⎨Wˆ V ,k−1 −

ϕ¯ T (x k−1 , k − 1) ϕ(x ¯ k−1 , k − 1) + 1 Wˆ V ,k = k = ki ⎪ ⎪ ⎪ ⎩ˆ WV ,k−1 , ki−1 < k < ki (32)

Assumption 6: The target NN weights, the activation function, and the reconstruction error of the actor NN are upper bounded and satisfy Wu  ≤ Wu,M , σu (·, ·) ≤ σu,M and εu (·, ·) ≤ εu,M , where Wu,M , σu,M , and εu,M are positive constants. The actor NN activation function is Lipschitz continuous for all x k ∈ D x , such that σu (x k , k) − σu (x˘k , k) ≤ Cσu x k − x˘k  = Cσu eET,k , where Cσu is a positive constant. Moreover, the optimal control input (9) using the gradient of cost function (20) can be expressed as

¯ k−1 , k − 1) = where αV > 0 is the learning gain, ϕ(x

ϕ(x k−1 , k − 1) − ϕ(x k−1 , N). The total error etotal,k−1 can be computed from (31) by moving one time step backward. Remark 3: Similar to the identifier NN, the critic NN weights are updated at the trigger instants only and held during the inter-event times in an aperiodic manner. This further saves the computation when compared to the traditional NN-based control. Adding the difference between (21) and (24) to (27), the total error can be represented in terms of the critic NN weight estimation error, W˜ V ,k = WV − Wˆ V ,k . It is found to be

u ∗V ,k = −(1/2)R −1 g T (x k )∇ϕ T (x k+1 , k + 1)WV

˜ k , x˘k , k) etotal,k = −W˜ VT,k ϕ( ¯ x˘k , k) − WVT ϕ(x ˜ k , x˘k , k) − Q(x + WVT (ϕ(x N , N)−ϕ(x˘k , N))− ¯εV ,k , ki ≤ k < ki+1 (33) where ¯εV ,k = εV ,k − εV ,N , ϕ( ¯ x˘k , k) = ϕ(x˘k , k) − ˜ k , x˘k , k) = Q(x k , k) − Q(x˘k , k), and ϕ(x˘k , N), Q(x

ϕ(x ˜ k , x˘k , k) = ϕ(x k , k) − ϕ(x˘k , k). It is routine to check  ϕ(·, ¯ ·) ≤ ϕ¯ M and  ¯εV ,k  ≤ ¯εV ,M from Assumption 5, where ϕ¯ M and ¯εV ,M are positive constants. The total error at the event-trigger instant from (33) with x˘k = x k for k = ki becomes etotal,k =

−W˜ VT,k ϕ(x ¯ k , k) T + WV (ϕ(x N , N) − ϕ(x k ,

N)) − ¯εV ,k . (34)

The critic NN weight estimation error dynamics, from (32) by moving one step forward, can be expressed as W˜ V ,k+1 ⎧ T ⎪ ¯ k , k)etotal,k αV ϕ(x ⎨˜ , WV ,k + =

ϕ¯ T (x k , k) ϕ(x ¯ k , k) + 1 ⎪ ⎩W˜ , V ,k

k = ki ki < k < ki+1 . (35)

Next, the actor NN design is presented. 2) Actor NN Design: In this section, we approximate the optimal control policy through the actor NN to implement it forward in time. The identified control coefficient matrix of the NN identifier is also used to update the actor NN. The optimal control input (9) by the approximation property of NN [18] in a compact set can be written as u ∗k = WuT σu (x k , k) + εu,k

(36)

where Wu ∈ lu ×m is the unknown constant target weight matrix, σu (x k , k) ∈ lu is the time-varying activation function, and εu,k = εu (x k , k) ∈ m is the traditional reconstruction error with lu neurons in the hidden layer.

− (1/2)R −1 g T (x k )∇εV ,k+1

(37)

where ∇ϕ(x k+1 , k + 1) = ∂ϕ(x k+1 , k + 1)/∂ x k+1 and ∇εV ,k+1 = ∂εV (x k+1 , k + 1)/∂ x k+1 . Both the optimal control inputs (36) and (37) should be equal. Their difference can be expressed as 0 = WuT σu (x k , k)+εu,k +(1/2)R −1 g T (x k ) ∇ϕ T(x k+1 , k +1)WV + (1/2)R −1 g T (x k )∇εV ,k+1 .

(38)

The approximated/estimated optimal control input by the actor NN in an event-trigger context can be represented as T σu (x˘k , k), ki ≤ k < ki+1 , i = 1, 2, . . . u k = Wˆ u,k

(39)

where Wˆ u,k ∈ is the estimated actor NN weights, and σu (x˘k , k) ∈ lu denotes the time-varying event-based activation function. Furthermore, the estimated control input, u V ,k , using the gradient of the estimated cost function (22), can also be written as lu ×m

u V ,k = −(1/2)R −1 gˆ T (x˘k )∇ϕ T (x˘k+1 , k + 1)Wˆ V ,k

(40)

ˆ x) ˘ is the for ki ≤ k < ki+1 , i = 1, 2, . . ., where g( approximated event-based control coefficient matrix from the NN-based identifier and ∇ϕ(x˘k+1 , k + 1) = ∂ϕ(x˘k+1 , k + 1)/∂ x k+1 . The control policy (39) applied to the system (1) and the control policy (40), which minimizes the estimated cost function (22) will not satisfy (38). Hence, the control input estimation error, eu,k for ki ≤ k < ki+1 , i = 1, 2, . . . is represented as the difference between (39) and (40), and found to be T σu (x˘k , k) eu,k = Wˆ u,k

+ (1/2)R −1 gˆ T (x˘k )∇ϕ T (x˘k+1 , k + 1)Wˆ V ,k .

(41)

Similar to the critic NN, the actor NN weight update law in an event-triggered context, using the previous values, is chosen as ⎧ T αu σu (x k−1 , k − 1)eu,k−1 ⎪ ⎪ ˆ ⎪ W − ⎨ u,k−1 σuT (x k−1 , k − 1)σu (x k−1 , k − 1) + 1 Wˆ u,k = ⎪ k = ki ⎪ ⎪ ⎩ˆ Wu,k−1 , ki−1 < k < ki (42) where αu is the learning gain. The error eu,k−1 , k = ki can be computed form (41) with x˘ki −1 = x ki −1 as T eu,k−1 = Wˆ u,k−1 σu (x k−1 , k − 1)

+ (1/2)R −1 gˆ T(x k−1 )∇ϕ T(x k−1 , k −1)Wˆ V ,k−1 .

(43)

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. SAHOO et al.: NEAR OPTIMAL ETC OF NONLINEAR DISCRETE-TIME SYSTEMS

The control input estimation error can be expressed in terms of the actor NN weight estimation error, W˜ u,k , by subtracting (38) from (41). This is described by T σu (x˘k , k) − (1/2)R −1 g T (x˘k ) eu,k = −W˜ u,k

× ∇ϕ T (x˘k+1 , k + 1)W˜ V ,k + (1/2)R −1 g˜ T (x k ) × ∇ϕ T (x˘k+1 , k + 1)W˜ V ,k − (1/2)R −1 g˜ T (x˘k ) sum1 × ∇ϕ T (x˘k+1 , k + 1)WV + εu,k

ki ≤ k < ki+1 , i = 1, 2, . . .

(44)

where sum1 εu,k = −WuT σu (x k , x˘k , k) − (1/2)R −1 g T (x˘k )

×∇ ϕ˜ T (x k+1 , x˘k+1 , k + 1)WV − (1/2)R −1 g˜ T (x k , x˘k ) ×∇ϕ T (x k+1 , k + 1)WV − (1/2)R −1 g T (x k ) ×∇εV ,k+1 − εu,k with g(x ˜ k , x˘k ) = g(x k ) − g(x˘k ), σu (x k , x˘k , k) = σu (x k , k) − σu (x˘k , k) and ∇ ϕ(x ˜ k+1 , x˘k+1 , k + 1) = ∇ϕ(x k+1 , k + 1) − sum1  ≤ ε sum1 , where ∇ϕ(x˘k+1 , k + 1). It clear that 0 ≤ εu,k u,M sum1 εu,M is a positive constant. Furthermore, from (44), the control input estimation error at k = ki , i = 1, 2, . . . can be written as T σu (x k , k) − (1/2)R −1 g T (x k ) eu,k = −W˜ u,k

∇ϕ T (x k+1 , k + 1)W˜ V ,k + (1/2)R −1 g˜ T (x k )∇ϕ T (x k+1 , k + 1)W˜ V ,k − (1/2)R −1 g˜ T (x k ) sum × ∇ϕ T (x k+1 , k + 1)WV + εu,k , k = ki

(45)

sum = −(1/2)R −1 g T (x )∇ε where εu,k k V ,k+1 − εu,k and it holds sum  ≤ ε sum , where ε sum is a positive constant. that εu,k u,M u,M The weight estimation error dynamics of the actor NN, from (42), moving one time step ahead, become ⎧ T ⎪ αu σu (x k , k)eu,k ⎨˜ W , k = ki + u,k W˜ u,k+1 = σuT (x k , k)σu (x k , k) + 1 ⎪ ⎩W˜ , ki < k < ki+1 . u,k (46)

Next, the main results of the near optimal event-triggered system are claimed. IV. E VENT-T RIGGER C ONDITION AND S TABILITY A NALYSIS In this section, we formulate the closed-loop event-triggered dynamics. The main results are claimed by designing an adaptive event-trigger condition. The closed-loop system dynamics are obtained using (1), the actual control input (39), and the ideal control input (36). With simple mathematical manipulation, it is given by  T σu (x k , k) + εu,k x k+1 = f (x k ) + g(x k )u ∗k − g(x k ) W˜ u,k T − g(x k )Wˆ u,k (σu (x k , k) − σu (x˘k , k)), ki ≤ k < ki+1 . (47)

7

At the event-trigger instants, k = ki with updated state vector, the closed-loop system dynamics from (47) become  T x k+1 = f (x k ) + g(x k )u ∗k − g(x k ) W˜ u,k σu (x k , k) + εu,k . (48) Before, claiming the main result in the theorem, the following lemma is necessary. Lemma 2 [17]: Consider the nonlinear discrete-time system given by (1). Then, there exists an optimal control policy u ∗k for (1), such that the closed-loop dynamics satisfy the inequality



f (x k ) + g(x k )u ∗ 2 ≤ K ∗ x k 2 (49) k where 0 < K ∗ < 1 is a constant. Now, consider the event-trigger error (2). The following condition is given by: D(eET,k ) > σET,k x k 

(50)

is selected as the event-trigger condition, where the threshold coefficient is denoted by (51) σET,k = (1 − 2K ∗ ) ET /4g 2M Cσ2u Wˆ u,k 2 0 < ET < 1, 0 < K ∗ < 1/2, provided Wˆ u,k  > κ and the dead-zone operator D(·) is defined as x k  > bx eET,k , D(eET,k ) = (52) 0, otherwise with bx being the ultimate bound for the state. The constant κ is a user-defined small positive constant to ensure the threshold coefficient is well defined. The system state and the control input vectors are transmitted to the controller and the plant, respectively, when the event-trigger condition in (50) is satisfied or the event-trigger error exceeds the threshold. Furthermore, an event is also triggered when the estimated NN weight Wˆ u,k  < κ irrespective of (50). Next, the theorem guarantees the UB of the closed-loop event-trigger system. The UB is shown using a Lyapunov function for both the cases of triggering, i.e., at the events and the inter-event. It is important to mention that, the Lyapunov function is not monotonically converging to the ultimate bound during both the events and the inter-event times. This is also not necessary to show the stability of the system, as discussed in [2], for the ETC system, and [19] for the switched systems. Therefore, in our case, during the interevent times, the Lyapunov function is allowed to increase but within a time-varying upper bound. Furthermore, it is shown that with trigger of events, the time-varying upper bound and the Lyapunov function converge to the UB, as shown in Fig. 2. Theorem 2: Consider the nonlinear discrete-time system (1), the NN identifier (13), the NN critic (22), and the NN actor networks (39). Assume u 0 be an initial stabilizing control policy for the system (1) and Assumption 1 through 6 hold. Let the identifier, the critic, and the actor NN weight estimates are Wˆ I,0 , Wˆ V ,0 , and Wˆ u,0 , respectively, are initialized in their

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. 8

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS

Theorem 3: Let the hypothesis in Theorem 2 holds. The minimum inter-event time can be expressed as δkmin ≥ min{ln(1 + (1/Ni )((Mi − 1)σET,min ))/ln(Mi )} i∈N

(53) for i = 1, 2, . . . and the nontriviality of the inter-event times are guaranteed if the following condition is satisfied: Fig. 2.

Evolution of the Lyapunov function.

respective compact sets with nonzero Wˆ u,0 . Suppose, the system state vector is sent to the controller, and the NN weights are updated using (17), (32), and (42) through the eventtrigger condition (50). Let the activation functions σ I (x k ), ϕ(x k , k), and σu (x k , k) satisfy the PE condition [18]. Then, there exist positive constants 0 < α I < 1/2, 0 < αV < 1/3, and 0 < αu < 1/5, such that the closed-loop event-triggered system state vector, x k , the identifier, the critic, and the actor NN weight estimation errors W˜ I,k , W˜ V ,k , and W˜ u,k , respectively, are UB for all ki ≥ k0 + T or, alternatively k ≥ k0 + T¯ . Furthermore, V ∗ − Vˆ  ≤ bV and u ∗ − u ≤ bu with bV and bu are small positive constants. Proof: Refer to the Appendix. Remark 4: The selection of 0 < K ∗ < 1/2 satisfies Lemma 2 and varies according to the desired performance of the system. The adaptive event-trigger condition (50) with (51) implicitly depends upon the actor NN weight estimation error, W˜ u,k . During the initial learning phase, the NN weight estimation error will be large. Hence, the events are triggered frequently. This facilitates the approximation of the cost function, the control policy, and the system dynamics to achieve the optimal performance. Remark 5: The dead-zone operator (52) used with the eventtrigger condition helps to stop unnecessary triggering due to the NN reconstruction error. The dead zone is enabled once the x x , b2,M ) system state is in the ultimate bound bx = max(b1,M computed from (A.14) and (A.18). The ultimate bound is a function of the tuning parameters α I , αV , and αu , and the NN reconstruction error bounds ε I,M , εV ,M , and εu,M . Therefore, the bound can be made arbitrarily small, as mentioned in Remark A.1. V. N ONTRIVIAL M INIMUM I NTER -E VENT T IMES In this section, we discuss the minimum and the nontriviality of the inter-event times for the near optimal ETC system. The minimum inter-event time is the minimum time interval between two consecutive event sampling instants over all sampling instants, i.e., δkmin = mini∈N {δki }, where δki = ki+1 − ki for i = 1, 2, . . . are the inter-event times. This is implicitly defined by the event-trigger condition (50). In the case of a discrete-time system, the minimum inter-event time is trivial and becomes the sampling time, Ts or δkmin = 1. Therefore, it is important to guarantee nontrivial inter-event times, i.e., δki > 1 to reduce the computational load. In the case of approximation-based control design, the inter-event times largely depend on the NN weight estimation error and presented in the following theorem.

ln(1+(1/Ni )(Mi −1)σET,min ) > ln(Mi ), for each i= 1, 2, . . . (54) √ where Ni = (( √K ∗ + 1)x ki  + g M (σu,M W˜ u,ki  + εu,M )) and Mi = ( K ∗ + g M Cσu Wˆ u,ki ), and σET,min = mink∈N {σET,k x k } is the minimum event-trigger threshold. Proof: Refer to the Appendix. Remark 6: It is important to note that the inter-event times will be nontrivial, i.e., δki > 1, i = 1, 2, . . . if (54) is satisfied. To achieve nontrivial inter-event times during the initial learning, the initial NN weights need to be selected close to the target parameters. This will reduce the NN weight estimation error, W˜ u,k , which in turn decreases the value of Ni and increases Mi in (53). Thus, the condition (54) can be satisfied leading to nontrivial inter-event times. In addition, along with the update of the NN weights, Wˆ u,k , the weight estimation error, W˜ u,k , will further decrease and hence the variable Ni . This, further, ensures larger inter-event times. VI. S IMULATION R ESULTS In this section, a two-link robot has been considered for the simulation. The dynamics of the two-link robot are given by (1) with internal dynamics f (x k ) and control coefficient matrix g(x k ), as given in [20]. The following simulation parameters were selected to carry out the simulation. The cost-to-go was selected as a quadratic function with Q(x k ) = x kT Q x x k , Q x = I4×4 and R = 0.001 × I2×2 , where I is the identity matrix. The nonquadratic terminal cost was chosen as ψ(x N , N) = 1. The initial weights for the critic NN were selected as zero. The actor and the identifier NN weights were initialized with random values from a uniform distribution in the interval of zero to one. The time-varying activation functions for both the critic and the actor NNs were constructed as state-dependent and time-dependent terms, i.e., ϕ(x k , k) = ϕt (k)ϕ x (x k ). The state-dependent part, ϕ x (x k ), was chosen as 2 , . . . , x2 , x x , . . . , x3 x , . . . , x4 , . . . , ϕ x (x k ) = {x 1,k 4,k 1,k 2,k 1,k 2,k 1,k 4 ,...,x x x x } x 4,k ∈ 45×1 [20], and the 1,k 2,k 3,k 4,k time-dependent part, ϕt (k), was also selected as ϕt (k) = {1, [exp(−τ )]1 , . . . , [exp(−τ )]44 ; . . . ; [exp(−τ )]44, [exp(−τ )]43 , . . . , 1} ∈ 45×45 [17], where τ = (N − k)/N is the normalized time index. The identifier activation function was chosen as The tanh{(x 1,k )2 , x 1,k x 2,k , . . . , (x 1,k )5 (x 2,k ), . . . , (x 4,k )6 }. number of neurons for the identifier was 39, and the critic and the action NN were 45 each. The learning rates for the NN tuning were selected as α I = 0.03, αV = 0.01, and αu = 0.05 per the conditions derived in Theorem 2. The event-trigger condition parameters were K ∗ = 0.45, ET = 0.92, Cσu = 2,

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. SAHOO et al.: NEAR OPTIMAL ETC OF NONLINEAR DISCRETE-TIME SYSTEMS

9

Fig. 3. (a) Triggering threshold with event-trigger error. (b) Cumulative number of triggered events versus sampling instants. TABLE I C OMPARISON OF THE C OMPUTATIONAL L OAD B ETWEEN THE T RADITIONAL AND THE E VENT-B ASED D ISCRETE -T IME S YSTEMS

Fig. 4. Convergence of (a) system state, (b) near optimal control inputs, (c) HJB equation or TD error, and (d) terminal cost error.

The performance of the optimal controller is shown in Fig. 4. The optimal control input [Fig. 4(b)] regulates the system states to zero, as shown in Fig. 4(a). The control input also converges to zero with the system states. This implies that with a reduced number of controller executions, the system is near optimally regulated. Furthermore, the HJB equation or the TD error, shown in Fig. 4(c), converges to the near zero implying the optimality achieved in finite time. The terminal cost error also converges to the near zero and shown in Fig. 4(d). VII. C ONCLUSION

and g M = 1.5. The initial admissible control was selected as u 0 = [−500x 1 − 500x 3, −200x 2 − 200x 4]T , and the terminal time was N = 10 000. The ultimate bound selected for the system state was 0.0005. The event-trigger threshold was computed using (50), with (51), and (52) with the above parameters selected for the simulation. Fig. 3(a) shows the evolution of the threshold (solid line) over time along with the event-trigger error (dotted line). From this figure, it is evident that the event-trigger error reset to zero once it reaches the threshold with trigger of events. In Fig. 3(b), the cumulative number of trigger instants is plotted against the total sampling instants. Even though a large number of triggering occurs in the initial phase, the cumulative number of triggers is reduced. The cumulative triggering became constant after 8000 time instants. This implies the system state is in the ultimate bound bx = 5e −4. The number of events during the simulation time of 10 s with a sampling interval of 0.001 s was found to be 110. A comparison of the computational load in terms of the multiplication and addition that is required to compute the event-trigger condition and the controller is given in Table I. It indicates a reduction in the computation of around 65.5% for the event-triggered system. Furthermore, if a communication network is included between the plant and the controller, fewer transmissions are needed due to event-based sampling. This will reduce the communication cost significantly.

In this paper, a near optimal ETC of an uncertain nonlinear discrete-time system in affine form is introduced. The actor– critic framework used to solve the finite-horizon optimal control problem with event-based approximation was able to regulate the system. The novel adaptive event-trigger condition generated the required number of events at the initial learning phase to achieve a small approximation error. This also saved the computation by fewer updates in the control law. Near optimality was achieved in a finite time with complete unknown system dynamics. With an explicit formula, it is shown that a nontrivial inter-event time can exist with a proper initialization of weights and event-based NN weight updates. It was observed that the cumulative number of triggered events varies with the initial NN weights. The effectiveness of the controller is validated using the simulation. A PPENDIX Proof of Theorem 1: The smooth and continuous function h(x k , k), with the universal approximation theorem [21] of NN, can be represented in a compact set as h(x k , k) = W T σ (x k , k) + ε(x k , k)

(A.1)

with x k as input to the activation function at every sampling instant k. Consider the event-based sampling, where the state x k is available intermittently as defined in (3). Equation (A.1) can be expressed as h(x k , k) = W T σ (x k , k)−W T σ (x˘k , k)+W T σ (x˘k , k) + ε(x k , k) (A.2)

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. 10

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS

where σ (x˘k , k) and x˘k are the event-based activation function and the state vectors. The state, x k , in terms of the event-based state, x˘k , and event-trigger error, eET,k , in (2) can be written as x k = ϑ(x˘k , eET,k ) = x˘k + eET,k . Substituting this expression, (A.2) can be represented as h(x k , k) = W T σ (x˘k , k) + εe (x˘k , eET,k , k)

(A.3)

where εe (x˘k , eET,k , k) = W T [σ (ϑ(x˘k , eET,k ), k) − σ (x˘k , k)] + ε(ϑ(x˘k , eET,k ), k).  Proof of Lemma 1: The UB of the identifier weight estimation error is proven by demonstrating the boundedness of the weight estimation error for both cases of trigger condition. A single Lyapunov function is used to evaluate the first difference and combined at the end to show an overall UB. Case I: Event instants, i.e., k = ki , i = 1, 2, . . . Consider a Lyapunov function candidate given by

T  (A.4) L I,k = tr W˜ I,k W˜ I,k . T The first difference, L I,k = tr{W˜ I,k+1 W˜ I,k+1 } − T ˜ ˜ tr{W I,k W I,k }, along the dynamics of the identifier NN weight estimation error (18) for k = ki , becomes

L I,k



T = −2α I tr W˜ I,k σ I (x k )u¯ k e TI,k+1 /([σ I (x k )u¯ k ]T [σ I (x k )u¯ k ] + 1)

 + α 2I tr e I,k+1 [σ I (x k )u¯ k ]T [σ I (x k )u¯ k ]e TI,k+1 / ([σ I (x k )u¯ k ]T [σ I (x k )u¯ k ]+1)2.

Substituting the identification error dynamics (16), and using the Cauchy–Schwartz (C–S) inequality with the fact that [σ I (x k )u¯ k ]T [σ I (x k )u¯ k ]/([σ I (x k )u¯ k ]T [σ I (x k )u¯ k ] + 1) ≤ 1, the first difference is bounded by T  W˜ I,k [σ I (x k )u¯ k ][σ I (x k )u¯ k ]T W˜ I,k

L I,k ≤ −α I (1 − 2α I )tr [σ I (x k )u¯ k ]T [σ I (x k )u¯ k ] + 1 +

α I (2 + α I ) ε I,k 2 . σ I (x k )u¯ k 2 + 1

By the definition, the augmented control input u¯ k  ≥ 1, and 0 < σ I,m ≤ σ I (x k ) ≤ σ I,M is satisfied due to the PE condition [17] and Assumption 3. Hence, it holds that 0 < σ I,m ≤ σ I (x k )u¯ k . By the above facts, the first term in the above equation  T W˜ I,k [σ I (x k )u¯ k ][σ I (x k )u¯ k ]T W˜ I,k tr [σ I (x k )u¯ k ]T [σ I (x k )u¯ k ] + 1



T

T

W˜ σ I (x k )u¯ k 2 /u¯ k 2

W˜ σ I (x k )u¯ k 2 I,k I,k ≥ = (σ I (x k )u¯ k 2 + 1) (σ I (x k )2 u¯ k 2 + 1)/u¯ k 2 2  I,m ≥ 2 W˜ I,k 2 σ I,M + 1 where 0 <  I,m ≤ σ I (x k )u¯ k /u¯ k . Substituting the above inequality, the first difference leads to   2 + 1 W˜ I,k 2 + BW˜ I

L I,k ≤ −α I (1 − 2α I ) 2I,m / σ I,M (A.5)

2 ). = α I (1 + 2α I )ε2I,M /(1 + σ I,m From where BW˜ I (A.5), by selecting 0 < α I < 1/2, the Lyapunov first difference L I,k < 0 as long as W˜ I,k  > 2 + 1)B /α (1 − 2α )2 )1/2 = B M . Therefore, by ((σ I,M I I ˜I I,m W ˜ ,I W the Lyapunov theorem [18], the identifier weight estimation error, W˜ I,k , is UB with a bound B M˜ for all ki ≥ k0 + T W ,I with the occurrence of events. Case II: inter-event times, i.e., ki < k < ki+1 i = 1, 2, . . . Consider the same Lyapunov function (A.4). The first difference along the identifier weight estimation error dynamics (18) for ki < k < ki+1 

T 

T

L I,k = tr W˜ I,k+1 W˜ I,k+1 − tr W˜ I,k W˜ I,k = 0. (A.6)

From (A.6), the Lyapunov first difference, L I,k , during the inter-event time remains at zero. This implies the NN weight estimation error, W˜ I,k , remains constant during the interevents times. The initial weight estimate, Wˆ I,0 , is finite and from Assumption 3, and the target weight matrix is bounded. Therefore, the initial weight estimation error, W˜ I,0 , is also bounded. Furthermore, W˜ I,k is bounded at the trigger instants, as shown in Case I. Thus, the initial value W˜ I,ki , i = 1, 2, . . ., for each inter-event time, which is the updated value at the previous trigger instant, is also bounded. Consequently, the weight estimation error, W˜ I,k , is constant and bounded during the inter-event times, i.e., ki < k < ki+1 for i = 1, 2, . . . From Cases I and II, the identifier weight estimation error is bounded both at the trigger instants and at the inter-event times. Furthermore, with the occurrence of events followed by each inter-event time, the identifier weight estimation error, for all ki ≥ k0 + T . W˜ I,k , is UB with a bound B M˜ W ,I Alternatively, W˜ I,k is UB for all k ≥ k0 + T¯ as ki is a subsequence of k and T¯ is a function of T .  Proof of Theorem 2: The stability of the closed-loop system is proved by considering both cases of the event condition, i.e., event instants k = ki and inter-event times, ki < k < ki+1 , i = 1, 2, . . . A single Lyapunov function is evaluated for both the cases and combined at the end to show the UB. Case 1: Event instants, k = ki , i = 1, 2, . . . Consider the Lyapunov function candidate given by L cl,k = L x,k + L I,k + L V ,k + L u,k + L A,k + L B,k

(A.7)

where



T L x,k = x x kT x k , L I,k = I tr W˜ I,k W˜ I,k 

T W˜ u,k L V ,k = V W˜ VT,k W˜ V ,k , L u,k = tr W˜ u,k 2

2

T L A,k = I 2 tr W˜ I,k W˜ I,k , L B,k = V 2 W˜ VT,k W˜ V ,k

The positive constants

 2 2 2 x = αu (1 − 5αu )σu,m / 8g 2M σu,M (σu,M + 1)  2 + 1 /α I (1 − 2α I )2I,m I = 2 σ I,M  2 V = 2ϑ ϕ¯ 2M + 1 /αV (1 − 3αV ) ϕ¯ m  2

2  2 σ I,M + 1 / I 2 = 2αu (4 + 5αu )λ2max (R −1 )∇ϕ 2M σ I,M

  2 4α I (1 − 2α I ) 2 σ I,M + 1 − α I (1 − 2α I )2I,m  2  × 2I,m σu,m +1

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. SAHOO et al.: NEAR OPTIMAL ETC OF NONLINEAR DISCRETE-TIME SYSTEMS

and 

2 ( ϕ¯ 2M + 1)2 / V 2 = 2αu (4 + 5αu )λ2max (R −1 )∇ϕ 2M σ I,M 

2 2 ) ϕ¯ m 4αV (1−3αV )(2 ϕ¯ 2M + 1 −αV (1 − 3αV ) ϕ¯ m  2 × (σu,m + 1)

11

Substituting etotal,k from (34) into the above equation and using the C–S inequality, the first difference leads to

L V ,k ≤ −

2 V αV W˜ VT,k ϕ(x ¯ k , k) ϕ¯ T (x k , k)W˜ V ,k

ϕ¯ T (x k , k) ϕ(x ¯ k , k) + 1 T ˜ ¯ k , k)(ϕ(xN ,N)−ϕ(x k , N))TWVT WV ,k ϕ(x

+2 V αV

with 2 2  = {αu (4 + 5αu )λ2max (R −1 )∇ϕ 2M WV2 ,M σ I,M /2(σu,m + 1)} 2 2 +1)−α I (1−2α I )2I,m )/(σ I,M +1)} + { I 2 BW˜ I (2(σ I,M

and

  2  ϑ = αu (4+5αu )λ2max (R −1 ) g 2M +2ε2I,M ∇ϕ 2M /4 σu,m +1   2

2 ϕ¯ M + 1 + V 2 ε1,M V  2  2 / ϕ¯ M + 1 . − αV (1 − 3αV ) ϕ¯ m Consider the first term in the Lyapunov function candidate (A.7), L x,k = x x kT x k . The first difference along the closed-loop system dynamics (48) is bounded above by

≤ x  f (x k ) + g(x k )u ∗k  T − g(x k ) W˜ u,k σu (x k , k) + εu,k 2 − x x k 2 . Recalling Lemma 2 and applying the C–S inequality (a + b)2 ≤ 2a 2 + 2b2 , it reveals that

L x,k ≤ −(1 − 2K ∗ ) x x k 2 2 2 + 4g 2M σu,M x W˜ u,k 2 + 4 x g 2M εu,M .

(A.8)

Consider the second term in the Lyapunov function (A.7), T W ˜ I,k }. The first difference can be written L I,k = I tr{W˜ I,k from (A.5) and is given by   2

L I,k ≤ − I α I (1−2α I ) 2I,m / σ I,M +1 W˜ I,k 2 + I BW˜ I . (A.9) Moving on for the third term L V ,k = V W˜ VT,k W˜ V ,k , in the Lyapunov function candidate (A.7), the first difference becomes L V ,k = W˜ VT,k+1 W˜ V ,k+1 − W˜ VT,k W˜ V ,k . Along the critic NN weight estimation error dynamics (35) for k = ki , the first difference can be represented as T ¯ k , k)etotal,k 2 V αV W˜ VT,k ϕ(x

ϕ¯ T (x k , k) ϕ(x ¯ k , k) + 1 +

+

ϕ¯ T (x k , k) ϕ(x ¯ k , k) + 1 ¯ k , k) 3 V αV2 W˜ VT,k ϕ(x

( ϕ¯ T (x k , k) ϕ(x ¯ k , k) + 1)2 T × ϕ¯ (x k , k) ϕ(x ¯ k , k) ϕ¯ T (x k , k)W˜ V ,k 2 T 3 V αV WV (ϕ(x N , N) − ϕ(x k , N)) + ( ϕ¯ T (x k , k) ϕ(x ¯ k , k) + 1)2 × ϕ¯ T (x k , k) ϕ(x ¯ k , k)(ϕ(x N , N)−ϕ(x k , N))T WV

¯εV ,k ϕ¯ T (x k , k) ϕ(x ¯ k , k) ¯εVT ,k + 3 V αV2 . ( ϕ¯ T (x k , k) ϕ(x ¯ k , k) + 1)2 Using Young’s inequality 2a T b ≤ qa T a + (1/q)b T b, with q > 0, ϕ¯ T (x k , k) ϕ(x ¯ k , k)/( ϕ¯ T (x k , k) ϕ(x ¯ k , k) + 1) ≤ 1 T ¯ k , k) + 1) ≤ 1, the first difference and 1/( ϕ¯ (x k , k) ϕ(x becomes

L V ,k ≤ − V αV (1 − 3αV )

L x,k ≤ x x k+1 2 − x x k 2

L V ,k =

− 2 V αV

ϕ¯ T (x k , k) ϕ(x ¯ k , k)+1 T T ˜ ¯ k , k) ¯εV ,k WV ,k ϕ(x

T

ϕ(x ¯ k , k)etotal,k

V αV2 etotal,k ϕ¯ T (x k , k) . ( ϕ¯ T (x k , k) ϕ(x ¯ k , k) + 1)2

¯ k , k) ϕ¯ T (x k , k)W˜ V ,k W˜ VT,k ϕ(x

ϕ¯ T (x k , k) ϕ(x ¯ k , k) + 1 T W (ϕ(x N , N) − ϕ(x k , N)) + V αV (2 + 3αV ) V T

ϕ¯ (x k , k) ϕ(x ¯ k , k) + 1 T × (ϕ(x N , N) − ϕ(x k , N)) WV V αV (2 + 3αV ) ¯εV ,k ¯εVT ,k . +

ϕ¯ T (x k , k) ϕ(x ¯ k , k) + 1

From Assumption 5, ϕ(x N , N) − ϕ(x k , k) < 2ϕ M and  ¯εV ,k  ≤ ¯εV ,M . With these facts, simple manipulation using the C–S inequality and Frobenius norm, we arrive at

L V ,k ≤ − V αV (1 − 3αV )  2  2 / ϕ¯ M + 1 W˜ V ,k 2 + V εV1,M × ϕ¯ m

(A.10)

2 + 1)) + where εV1,M = αV (2 + 3αV )(ϕ 2M WV2 ,M /( ϕ¯m 2 + 1), 0 < α < 1/3 and 0 < αV (2 + 3αV ) ( ¯ε2V ,M / ϕ¯ m V ¯ k , k) ≤ ϕ¯ M , which is satisfied by ensuring

ϕ¯ m ≤  ϕ(x the PE condition [17]. Consider the next term in the Lyapunov function candiT W ˜ u,k }. The first difference along the date (A.7), L u,k = tr{W˜ u,k actor NN weight estimation error dynamics (46) for k = ki becomes

T   T σu (x k , k)eu,k / σuT (x k , k)σu (x k , k) + 1

L u,k = 2αu tr W˜ u,k

T / + αu2 tr eu,k σuT(x k , k)σu (x k , k)eu,k 2  T σu (x k , k)σu (x k , k)+1 .

Substitute the control input estimation error eu,k from (45) in the above equation. After some mathematical manipulation using the C–S inequality and the fact ≤ 1, σuT (x k , k)σu (x k , k)/(σuT (x k , k)σu (x k , k) + 1)

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. 12

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS

we arrive at

L u,k ≤ −αu (1 − 5αu )tr



T σ (x , k)σ T (x , k) W ˜ u,k W˜ u,k u k k u

+ αu (4 + 5αu )tr



σuT (x k , k)σu (x k , k) + 1 (R −1 g T (x k )∇ϕ T (x k+1 , k + 1)W˜ V ,k ) 4(σuT (x k , k)σu (x k , k) + 1)



× (R −1 g T (x k )∇ϕ T (x k+1 , k +1)W˜ V ,k )T + αu (4 + 5αu )tr

(R −1 g˜ T (x k )∇ϕ T (x k+1 , k + 1)W˜ V ,k ) 4(σuT (x k , k)σu (x k , k) + 1)

Similarly, the first difference of the last term L B,k = {W˜ VT,k W˜ V ,k }2 using (A.10) can be written as  2  2

L B,k ≤ − V 2 αV (1 − 3αV ) ϕ¯ m / ϕ¯ M + 1  2  2   / ϕ¯ M + 1 W˜ V ,k 4 × 2− αV (1 − 3αV ) ϕ¯ m     2 + V 2 εV1,M 2− αV (1−3αV ) ϕ¯ m / ϕ¯ 2M +1  2 (A.13) ×W˜ V ,k 2 + V 2 εV1,M . At the final step, combine the individual first differences (A.8)–(A.13) to get the overall first difference. Substituting the constants x , I , V , I 2 , and V 2 , from (A.7), the overall first difference satisfies

 L cl,k ≤ −(1 − 2K ∗ ) x x k 2 −  W˜ I,k 2 − ϑW˜ V ,k 2 × (R −1 g˜ T (x k )∇ϕ T(x k+1 , k + 1)W˜ V ,k )T  2 1 2 − αu (1 − 5αu ) σu,m /σu,M + 1 W˜ u,k 2 − αu (4 + 5αu )  −1 T (R g˜ (x k )∇ϕ T (x k+1 , k + 1)WV ) 2 2  2 2 c1 + αu (4 + 5αu )tr × λmax (R −1 )∇ϕ 2M σ I,M /4 σu,m + 1 W˜ I,k 4 + εcl,total T 4(σu (x k , k)σu (x k , k) + 1)   2 2  −αu (4+5αu ) λ2max (R −1 )∇ϕ 2M σ I,M /4 σu,m + 1 W˜ V ,k 4 × (R −1 g˜ T (x k )∇ϕ T (x k+1 , k + 1)WV )T

sum  sum T  T  εu,k / σu (x k , k)σu (x k , k)+1 . + αu (4+5αu )tr εu,k Using Frobenius norm, Young’s inequality and the relation g(x ˜ k ) ≤ σ I,M W˜ I,k  + ε I,M , it holds that

L u,k   2 2 ≤ − αu (1 − 5αu )σu,m / σu,M + 1 W˜ u,k 2   + αu (4 + 5αu ) λ2max (R −1 ) g 2M + 2ε2I,M ∇ϕ 2M /4  2 × σu,m + 1 W˜ V ,k 2   2 2 + αu (4+5αu ) λ2max (R −1 )σ I,M ∇ϕ 2M /4 σu,m + 1 W˜ I,k 4   2 2 + αu (4+5αu ) λ2max (R −1 )σ I,M ∇ϕ 2M /4 σu,m + 1 W˜ V ,k 4  2 + αu (4 + 5αu ) λ2max (R −1 )σ I,M WV2 ,M ∇ϕ 2M /2  2 × σu,m + 1 W˜ I,k 2   2 + αu (4 + 5αu ) λ2max (R −1 )WV2 ,M ∇ϕ 2M ε2I,M /2 σu,m +1  sum 2  2 + αu (4 + 5αu ) εu,M /4 σu,m + 1 (A.11) where 0 < σu,m ≤ σu (x k , k) ≤ σu,M is ensured by the PE condition, λmax (R −1 ) is the maximum eigenvalue of R −1 . T W ˜ I,k }2 . Considering the next term L A,k = I 2 tr{W˜ I,k T W˜ I,k+1 }2 − The first difference, L A,k = I 2 tr{W˜ I,k+1 T 2 I 2 tr{W˜ I,k W˜ I,k } , from (A.9), becomes

L A,k

   2 ≤ − I 2 α I (1 − 2α I ) 2 − α I (1 − 2α I )2I,m / σ I,M +1   2 × 2I,m / σ I,M + 1 W˜ I,k 4 + I 2 (BW˜ I )2 + I 2 BW˜ I    2 × 2 − α I (1 − 2α I )2I,m / σ I,M + 1 W˜ I,k 2 (A.12)

2 + 1))) > 0 with 0 < where (2 − (α I (1 − 2α I )2I,m /(σ I,M αu < 1/5.

(A.14) c1 = I BW˜ I + V εV1,M + I 2 (BW˜ I )2 + where εcl,total 2 + 1)) + (2αu (4 + 5αu )λ2max (R −1 )∇ϕ 2M WV2 ,M ε2I,M /4(σu,m 1,M 2 sum )2 /(σ 2 +1))+4 g 2 ε 2 (αu (4 + 5αu )(εu,M x M u,M + V 2 (ε V ) . u,m From (A.14), and selecting 0 < αu < 1/5, the first difference of the Lyapunov function, L cl,k < 0 as long as

W˜ I,k    2 2 2 +1)ε c1 2 −1 > max 4 4(σu,m cl,total αu (4+5αu )λmax (R )∇ϕ M σ I,M,   c1 εcl,total  ≡ b W˜ I or W˜ V ,k   2 2 2 + 1)ε c1 2 −1 > 4 4(σu,m cl,total /αu (4 + 5αu )λmax (R )∇ϕ M σ I,M ,  c1 εcl,total /ϑ ≡ bW˜ V or W˜ u,k  >



c1 2 2 )≡b (2(σu,M + 1)εcl,total /αu (1 − 5αu )σu,m W˜ u

or x k  >



x c1 εcl,total /(1 − K ∗ ) x = b1,M .

This implies the system state, x k , the NN weight estimation errors for the identifier, the critic, and the actor, W˜ I,k , W˜ V ,k , and W˜ u,k are UB for all ki ≥ k0 + T . Case 2: inter-event times, i.e., ki < k < ki+1 , i = 1, 2, . . . Consider the same Lyapunov function candidate (A.7) as in Case I. The first difference L x,k = x x k+1 2 − x x k 2 of the first term along (47) with Lemma 2 and the C–S inequality can be written as

2

T T σ (x k , k) − Wˆ u,k σ (x˘k , k)

L x,k ≤ 2K ∗ x k 2 + 4g 2M Wˆ u,k  2 2 . − x k 2 + 4g 2M σu,M W˜ u,k 2 + εu,M

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. SAHOO et al.: NEAR OPTIMAL ETC OF NONLINEAR DISCRETE-TIME SYSTEMS

From the Lipschitz continuity of the actor NN activation function, in Assumption 6, it holds that

L x,k ≤ −(1 − 2K ∗ )x k 2 + 4g 2M Cσ2 Wˆ u,k 2 eET,k 2 + εc2

cl,total,k

u

c2 εcl,total,k

2 W ˜ u,k 2 4g 2M (σu,M

less than the previous inter-event time ki−1 < k < ki and x hence b2,k . This implies that for all ki ≥ k0 + T , the function c2 c2 , where ε c2 2 2 2 2 εcl,total,k → εcl,M cl,M = 4g M (σu,M b ˜ + εu,M ) Wu

(A.15)

2 ). + εu,M

with = Recall the eventtrigger condition (50). During the inter-event times, for the case when the system state vector is outside the ultimate bound, it holds that eET,k  ≤ σET,k x k . Substituting this inequality in (A.15), the first difference satisfies c2

L x,k ≤ −(1 − 2K ∗ )(1 − ET )x k 2 + εcl,total,k

13

(A.16)

K∗

where 0 < ET < 1 and 0 < < 1/2. Considering the remaining terms of Lyapunov function candidates (A.7), the first differences become zero due to no update. They are represented as

L I,k = 0, L V ,k = 0, L A,k = 0 and L B,k = 0. (A.17) Finally, combining (A.16) and (A.17), the first difference of the overall system is given by c2

L cl,k ≤ −(1 − 2K ∗ )(1 − ET )x k 2 + εcl,total,k . (A.18)

From (A.18), the first difference L cl,k < 0 as long as c2 x . The actor /(1 − 2K ∗ )(1 − ET ))1/2 = b2,k x k  > (εcl,total,k NN weight estimation error, W˜ u,k , is constant during each i th inter-event time, ki < k < ki+1 , as the weights are c2 x held. Therefore, εcl,total,k and hence b2,k are piecewise constant functions. Thus, the system state is bounded by a time-varying x during the inter-event times. The boundedness of bound b2,k the NN weight estimation errors during the inter-event times can be shown as follows. The NN initial weight estimates are finite. Therefore, the initial the weight estimation errors are also bounded. From Case I, the NN weight estimation errors are bounded at the trigger instants. Therefore, the initial values during the inter-event times are bounded. Furthermore, from (A.17), the NN weight estimation errors are remain constant at their respective previous values during the interevent times. Therefore, the NN weight estimation errors W˜ I,k , W˜ V ,k , and W˜ u,k remain bounded during the inter-event times. Note that, from Case I, with trigger of events, the system state vector and the NN weight estimation errors converge to UB for all ki ≥ k0 + T . During the inter-event times, from Case II, the system states are bounded by the time-varying x bound, b2,k , and the NN weight estimation errors are held at their previous values. During the initial learning phase, the x may be large. Therefore, the piecewise constant bound b2,k system state vector may increase. Alternatively, the Lyapunov function L cl,k may increase during the inter-event times, ki < k < ki+1 , for i = 1, 2, . . . as shown in Fig. 2. Since the change in the system state vector is governed by the eventtrigger condition, a large value of the system state vector will lead to an event. Hence, the NN weights and the control inputs will be updated which will make the state and the weight estimation error to converge. Furthermore, since each inter-event is followed by an event, c2 for ki < k < ki+1 , in (A.18), is the function εcl,total,k

is a constant and bW˜ u is the ultimate bound for W˜ u,k from x will Case I. Therefore, the bound for the system state b2,k x x also converge, i.e., b2,k → b2,M for ki ≥ k0 + T , where x c2 /(1 − 2K ∗ )(1 − ))1/2 is a constant. b2,M = (εcl,M ET Consequently, from Cases I and II, the system state, x k , the NN weight estimation errors for the identifier, the critic, and the actor, W˜ I,k , W˜ V ,k , and W˜ u,k are UB with trigger of events for all ki ≥ k0 + T , or alternatively, for all k ≥ k0 + T¯ , since ki is a subsequence of k and hence T¯ is a function of T . Therefore, the Lyapunov function converges to its ultimate value. Remark A.1: From both the cases, the UB for the system state, the NN weight estimation errors of the identifier, the critic, and the actor NNs are found to be x x b x = max(b1,M , b2,M ), bW˜ I , bW˜ V and bW˜ u , respectively. The bounds bx , bW˜ I , b W˜ V , and bW˜ u are a function of learning parameters α I ,αV , and αu , and the NN reconstruction error bounds ε I,M , εV ,M , and εu,M . Hence, a smaller UB for the closed-loop system can be obtained by selecting α I , αV , and αu properly and increasing the number of neurons in the NN to reduce ε I,M , εV ,M , and εu,M . Finally, to show the convergence of estimated value function and control input to their optimal values, subtract (22) from (20) and (39) from (36) to get V ∗ − Vˆ  = W˜ VT,k ϕ(x k , k) + Wˆ VT,k (ϕ(x k , k) − ϕ(x˘k , k)) + εV ,k ≤ bW˜ V ϕ M + Wˆ V ,max Cϕ σET,max bx + εV ,M ≡ b V

(A.19)

and ||u ∗k − u k || T T = W˜ u,k σu (x k , k) + Wˆ u,k (σu (x k , k) − σu (x˘k , k)) + εu,k ≤ bW˜ u σu,M + Wˆ u,max Cσu σET,max bx + εu,M ≡ bu

(A.20)

where Wˆ V ,max = maxk {Wˆ V ,k } and Wˆ u,max = maxk {Wˆ u,k } are the maximum estimated values for the critic and the actor NNs, and σET,max is the maximum value of the eventtrigger threshold coefficient. The constants Cϕ and Cσu are the Lipschitz constants for the critic and the actor NN activation functions, respectively. Note that bounds bV and bu depend on the UB of the system state vector bx , the NN weights bW˜ V and bW˜ u , which are small, as mentioned in Remark A.1. Thus, bV and bu are small constants and the estimated control input converge to the near optimal value.  Proof of Theorem 3: Consider the event-trigger error (2) eET,k = x k − x˘k . The error dynamics, eET,k+1 = x k+1 − x˘k+1 , using the closed-loop system dynamics (47) are upper bounded by eET,k+1  ≤ Mi eET,k  + Ni , ki < k < ki+1 , i = 1, 2, . . . (A.21) √ where Ni = ((√ K ∗ + 1)x ki  + g M (σu,M W˜ u,ki  + εu,M )) and Mi = ( K ∗ + g M Cσu Wˆ u,ki ), i = 1, 2, . . . with 0 < K ∗ < 1/2.

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. 14

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS

Remark A.2: The variables Mi and Ni are the piecewise constant functions, since W˜ u,ki , Wˆ u,ki , and x ki are constant for each i th inter-event time. Hence, the error eET,k is also a piecewise continuous function. By comparison lemma [22], the solution of the inequality (A.21) is bounded above as eET,k  ≤

k−1 

k− j −1

Mi

 k−k Ni = Ni Mi i − Ni /(Mi − 1)

j =ki

(A.22) for ki < k < ki+1 , each i = 1, 2, . . . The lower bound on the inter-event times for the i th inter-event duration, δki = ki+1 − ki , is the time it takes eET,k  in (A.22) to reach the minimum threshold, σET,min , for all k ∈ N. It is computed using (50) given as min{σET x k } k∈N   2 b x = σET,min (1 − 2K ∗ ) ET /4g 2M Cσ2u Wˆ u,max =

(A.23)

where bx is the lower bound of the system state for an event to trigger, as in (52). The weight matrix Wˆ u,max = maxk {Wˆ u,k } is the maximum value of the actor NN weight estimates for all k ∈ N. The maximum value of the NN weight matrix Wˆ u,max exists, since the weight estimates are bounded for all time. Since the triggering instants are decided by the event-trigger condition, at ki+1 for the i th inter-event interval, it holds that eET,ki+1  = σET,min . Therefore, from (A.22), we obtain  k −k Ni Mi i+1 i − Ni /(Mi − 1) ≥ σET,min , i = 1, 2, . . . (A.24) Solving the above inequality, the lower bound on the interevent times found to be δki ≥ ln(1+(1/Ni )((Mi −1)σET,min ))/ln(Mi ), i = 1, 2, . . . (A.25) From (A.25), the minimum value of inter-event time δkmin = min(δki ) i∈N

≥ min(ln(1 + (1/Ni )((Mi − 1)σET,min ))/ln(Mi )). i∈N

The inter-event times becomes nontrivial, i.e., δki > 1 when ln(1 + (1/Ni )((Mi − 1)σET,min )) > ln(Mi ), i ∈ N, is satisfied. 

[5] A. Sahoo, H. Xu, and S. Jagannathan, “Neural network-based adaptive event-triggered control of affine nonlinear discrete time systems with unknown internal dynamics,” in Proc. Amer. Control Conf. (ACC), Washington, DC, USA, Jun. 2013, pp. 6418–6423. [6] A. Sahoo, H. Xu, and S. Jagannathan, “Neural network-based adaptive event-triggered control of nonlinear continuous-time systems,” in Proc. IEEE Int. Symp. Intell. Control (ISIC), Hyderabad, India, Aug. 2013, pp. 35–40. [7] X. Hao, S. Jagannathan, and F. L. Lewis, “Stochastic optimal control of unknown linear networked control system in the presence of random delays and packet losses,” Automatica, vol. 48, no. 6, pp. 1017–1030, Jun. 2012. [8] F. L. Lewis and V. L. Syrmos, Optimal Control. Hoboken, NJ, USA: Wiley, 1995. [9] O. C. Imer and T. Basar, “To measure or to control: Optimal control with scheduled measurements and controls,” in Proc. Amer. Control Conf., Minneapolis, MN, USA, Jun. 2006, pp. 14–16. [10] A. Molin and S. Hirche, “On the optimality of certainty equivalence for event-triggered control systems,” IEEE Trans. Autom. Control, vol. 58, no. 2, pp. 470–474, Feb. 2013. [11] Q. Wei, D. Liu, and X. Yang, “Infinite horizon self-learning optimal control of nonaffine discrete-time nonlinear systems,” IEEE Trans. Neural Netw. Learn. Syst., vol. 26, no. 4, pp. 866–879, Apr. 2015. [12] D. Liu and Q. Wei, “Finite-approximation error based discrete-time nonlinear systems,” IEEE Trans. Cybern., vol. 43, no. 2, pp. 779–789, Mar. 2013 [13] Z. Ni, H. He, D. Zhao, X. Xu, and D. V. Prokhorov, “GrDHP: A general utility function representation for dual heuristic dynamic programming,” IEEE Trans. Neural Netw. Learn. Syst., vol. 26, no. 3, pp. 614–627, Mar. 2015. [14] Q. Wei and D. Liu, “Data-driven neuro-optimal temperature control of water–gas shift reaction using stable iterative adaptive dynamic programming,” IEEE Trans. Ind. Electron., vol. 61, no. 11, pp. 6399–6408, Nov. 2014. [15] D. P. Bertsekas and J. Tsitsiklis, Neuro-Dynamic Programming. Belmont, MA, USA: Athena Scientific, 1996. [16] H. Xu and S. Jagannathan, “Neural network based finite horizon stochastic optimal controller design for nonlinear networked control systems,” in Proc. IEEE Int. Joint Conf. Neural Netw., Dallas, TX, USA, Aug. 2013, pp. 1–7. [17] T. Dierks and S. Jagannathan, “Online optimal control of affine nonlinear discrete-time systems with unknown internal dynamics by using timebased policy update,” IEEE Trans. Neural Netw. Learn. Syst., vol. 23, no. 7, pp. 1118–1129, Jul. 2012. [18] S. Jagannathan, Neural Network Control of Nonlinear Discrete-Time Systems. Boca Raton, FL, USA: CRC Press, 2006. [19] H. Ye, A. N. Michel, and L. Hou, “Stability theory for hybrid dynamical systems,” IEEE Trans. Autom. Control, vol. 43, no. 4, pp. 461–474, Apr. 1998. [20] Z. Chen and S. Jagannathan, “Generalized Hamilton–Jacobi–Bellman formulation-based neural network control of affine nonlinear discretetime systems,” IEEE Trans. Neural Netw., vol. 19, no. 1, pp. 90–106, Jan. 2008. [21] G. Cybenko, “Approximation by superpositions of a sigmoidal function,” Math. Control Signals Syst., vol. 2, no. 4, pp. 303–314, Dec. 1989. [22] G. Bitsoris and E. Gravalou, “Comparison principle, positive invariance and constrained regulation of nonlinear systems,” Automatica, vol. 31, no. 2, pp. 217–222, Feb. 1995.

R EFERENCES [1] P. Tabuada, “Event-triggered real-time scheduling of stabilizing control tasks,” IEEE Trans. Autom. Control, vol. 52, no. 9, pp. 1680–1685, Sep. 2007. [2] X. Wang and M. Lemmon, “On event design in event-triggered feedback systems,” Automatica, vol. 47, no. 10, pp. 2319–2322, Oct. 2011. [3] E. Garcia and P. J. Antsaklis, “Model-based event-triggered control for systems with quantization and time-varying network delays,” IEEE Trans. Autom. Control, vol. 58, no. 2, pp. 422–434, Feb. 2013. [4] W. P. M. H. Heemels and M. C. F. Donkers, “Model-based periodic event-triggered control for linear systems,” Automatica, vol. 49, no. 3, pp. 698–711, Mar. 2013.

Avimanyu Sahoo received the B.S. degree in electrical technology from the Cochin University of Science and Technology, Cochin, India, in 2008, and the M.S. degree in electrical engineering from IIT Varanasi, Varanasi, India, in 2011. He is currently pursuing the Ph.D. degree with the Department of Electrical Engineering, Missouri University of Science and Technology, Rolla, MO, USA. His current research interests include event sampled control, adaptive control, neural network control, networked control system, and optimal control.

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. SAHOO et al.: NEAR OPTIMAL ETC OF NONLINEAR DISCRETE-TIME SYSTEMS

Hao Xu (M’12) was born in Nanjing, China, in 1984. He received the master’s degree in electrical engineering from Southeast University, Nanjing, in 2009, and the Ph.D. degree from the Missouri University of Science and Technology, Rolla, MO, USA, in 2012. He is currently with Texas A&M University– Corpus Christi, Corpus Christi, TX, USA, where he is an Assistant Professor with the College of Science and Engineering and the Director of the Unmanned Systems Research Laboratory. His current research interests include autonomous unmanned aircraft systems, wireless passive sensor network, localization, detection, networked control system, cyberphysical system, distributed network protocol design, optimal control, and adaptive control.

15

Sarangapani Jagannathan (SM’09) is currently with the Missouri University of Science and Technology, Rolla, MO, USA, where he is a Rutledge-Emerson Endowed Chair Professor of Electrical and Computer Engineering and the Site Director of the NSF Industry/University Cooperative Research Center on Intelligent Maintenance Systems. He has co-authored 132 peer-reviewed journal articles, most of them in the IEEE T RANS ACTIONS , 235 refereed IEEE conference articles, several book chapters, three books, and holds 20 U.S. patents. He supervised to graduation around 19 Ph.D. and 29 M.S. level students, and his funding is in excess of U.S. $14 million from various U.S. federal and industrial members. His current research interests include neural network control, adaptive event-triggered control, secure networked control systems, prognostics, and autonomous systems/robotics. Dr. Jagannathan is a fellow of the Institute of Measurement and Control in U.K., and the Institution of Engineering and Technology in U.K. He received many awards and has been on the organizing committees of several IEEE conferences. He was a Co-Editor of the IET book series on control from 2010 to 2013, serves as the Editor-in-Chief of Discrete Dynamics in Nature and Society, and serves on many editorial boards. He is an IEEE CSS Tech Committee Chair on Intelligent Control.