A Reproducing Kernel Hilbert Space Approach for the ... - DASLAB

7 downloads 27768 Views 2MB Size Report
nology, Atlanta GA. G. Chowdhary is ... learn the uncertainty, a set of RBF centers must be chosen over the ..... the standard baseline law (18), we call the resulting algorithm ..... BKR-CL does the best job approximating the uncertainty online ...
A Reproducing Kernel Hilbert Space Approach for the Online Update of Radial Bases in Neuro-Adaptive Control Hassan A. Kingravi, Girish Chowdhary, Patricio A. Vela, and Eric N. Johnson

Abstract—Classical work in model reference adaptive control for uncertain nonlinear dynamical systems with a Radial Basis Function (RBF) neural network adaptive element does not guarantee that the network weights stay bounded in a compact neighborhood of the ideal weights when the system signals are not Persistently Exciting (PE). Recent work has shown, however, that an adaptive controller using specifically recorded data concurrently with instantaneous data guarantees boundedness without PE signals. However, the work assumes fixed RBF network centers, which requires domain knowledge of the uncertainty. Motivated by Reproducing Kernel Hilbert Space theory, we propose an online algorithm for updating the RBF centers to remove the assumption. In addition to proving boundedness of the resulting neuro-adaptive controller, a connection is made between PE signals and kernel methods. Simulation results show improved performance. Index Terms—Kernel, Adaptive control, Radial basis function networks, Nonlinear control systems

I. INTRODUCTION In practice, it is impractical to assume that a perfect model of a system is available for controller synthesis. Assumptions introduced in modeling, and changes to system dynamics during operation, often result in modeling uncertainties that must be mitigated online. Model Reference Adaptive Control (MRAC) has been widely studied for broad classes of uncertain nonlinear dynamical systems with significant modeling uncertainties [1, 14, 29, 41]. In MRAC, the system uncertainty is approximated using a weighted combination of basis functions, with the numerical values of the weights adapted online to minimize the tracking error. When the structure of the uncertainty is unknown, a neuro-adaptive approach is often employed, in which a Neural Network (NN) with its weights adapted online is used to capture the uncertainty. A popular example of such a NN is the Gaussian Radial Basis Function (RBF) NN for which the universal approximation property is known to hold [31]. Examples of MRAC systems employing Gaussian RBFs can be found in [26, 33, 39, 43]. In practice, then, nonlinear MRAC systems typically consist of an additional online learning component. What differentiates MRAC from traditional online learning strategies is the existence of the dynamical system and its connection to a control task. The strong connection between the learning and the evolving dynamical system through the controller imposes constraints on the online learning strategy employed. These conH. Kingravi and Assoc. Prof. P. A. Vela are with the School of Electrical and Computer Engineering, and Assoc. Prof. E. N. Johnson is with the School of Aerospace Engineering at the Georgia Institute of Technology, Atlanta GA. G. Chowdhary is with the Department of Aeronautics and Astronautics at the Massachusetts Institute of Technology, Cambridge, MA. [email protected], [email protected],

[email protected], [email protected]

straints ensure viability of the system’s closed loop stability and performance. For example, classical gradient-based MRAC methods require a condition on Persistency of Excitation (PE) in the system states [3]. This condition is often impractical to implement and/or infeasible to monitor online. Hence, authors have introduced various modifications to the adaptive law to ensure that the weights stay bounded around an a-priori determined value (usually set to 0). Examples of these include Ioannou’s σ-mod, Narendra’s e-mod, and the use of a projection operator to bound the weights [14, 41]. Recent work in concurrent learning has shown that if carefully selected and online recorded data is used concurrently with current data for adaptation, then the weights remain bounded within a compact neighborhood of the ideal weights. [8, 9] Neither concurrent learning nor the aforementioned modifications require PE, When concurrent learning is used, in order to successfully learn the uncertainty, a set of RBF centers must be chosen over the domain of the uncertainty. In prior neuro-adaptive control research, it is either assumed that the centers for the RBF network are fixed [8, 9, 19, 32, 42], or that the centers are moved to minimize the tracking error e [27, 38]. In the former case, the system designer is assumed to have some domain knowledge about the uncertainty to determine how the centers should be selected. In the latter case, the updates favor instantaneous tracking error reduction, the resulting center movement does not guarantee optimal uncertainty capture across the entire operating domain. In this work, utilizing the fact that the Gaussian RBF satisfies the properties of a Mercer kernel, we use methods from the theory of Reproducing Kernel Hilbert Spaces (RKHSs) to remove the need for domain knowledge and to propose a novel kernel adaptive control algorithm. Partly due to the success of Support Vector Machines, there has been great interest in recent years in kernel methods, a class of algorithms exploiting RKHS properties which can be used for classification [6], regression[2, 37] and unsupervised learning [10]. We first use RKHS theory to make a connection between PE signals in the state space (e.g., the input space) and PE signals in the higher dimensional feature space (w/respct to the RBF bases). Particularly, we show that the amount of persistent excitation inserted in the states of the system may vanish or greatly diminish if the RBF centers are improperly chosen. We then utilize this connection along with the kernel linear independence test [30] to propose an online algorithm, called Budgeted Kernel Restructuring (BKR). BKR picks centers so as to ensure that any excitation in the system does not vanish or diminish. The algorithm is then augmented with Concurrent Learning (CL). The resulting kernel adaptive algorithm (BKR-CL) uses instantaneous as well as specifically selected and online recorded data points concurrently for

adaptation. Lyapunov-like analysis proves convergence of the RBF weights to a compact neighborhood of their ideal values (in the sense of the universal approximation property), if the system states are exciting over a finite interval. BKR-CL does not require PE. Further, BKR-CL can control an uncertain multivariable dynamical system effectively even if all the RBF centers are initialized to the same value (for example to 0). In addition to removing the assumption of fixed RBF centers, we show that, given a fixed budget (maximum number of allowable centers), the presented method outperforms existing methods that uniformly distribute the centers over an expected domain. Effectiveness of the algorithm is shown through simulation and comparison with other neuro-adaptive controllers. To the authors’ knowledge, little work exists explicitly connecting ideas from RKHS theory to neuro-adaptive control. Recent work on bridging kernel methods with filtering and estimation exists under the name of kernel adaptive filtering [22]. Kernel recursive square methods [11] and kernel least mean square algorithms [21] have been previously explored for kernel filtering, which is mainly used for system-identification problems, where the goal is to minimize the error between the system model and an estimation model. The main contribution of this paper however is the presentation of an adaptive control technique which uses RKHS theory. The key requirements being that the tracking error between reference model and the system, and the adaptation error between the adaptive element and the actual uncertainty be simultaneously driven to a bounded set. Kernel adaptive filtering only addresses the former requirement. While there has been other work applying kernel methods such as support vector machines to nonlinear control problems [20, 30], it does not make any explicit connections to PE conditions, and does not show the boundedness of the weights in a region around their true values. This paper takes the first crucial steps in establishing that connection, and therefore serves to further bring together ideas from machine learning and adaptive control. Organization: Section II outlines some preliminaries, including the definition of PE and the relevant ideas from RKHS theory. Section III poses the MRAC problem for uncertain multivariable nonlinear dynamical systems, and outlines the concurrent learning method. Section IV establishes a connection between PE and RKHS theory, then utilizes the connection to outline the BKR algorithm for online center selection. Section V outlines the BKR-CL algorithm, and establishes boundedness of the weights for the centers using Lyapunov like analysis. Section VI presents the results of a simulated experiments. Section VII concludes the paper.

true in an arbitrary Hilbert space). It is possible to construct these spaces using tools from operator theory. The following is standard material; see [13] and [10] for more details. A Mercer kernel on D ⊂ Rn is any continuous, symmetric positive-semidefinite function of the form k : D × D → R. Associated to k is a linear operator Tk : L2 (D) → R which can be defined as Z Tk (f )(x) := k(x, s)f (s)ds

II. P RELIMINARIES A. Reproducing Kernel Hilbert Spaces Let H be a Hilbert space of functions defined on the domain D ⊂ Rn . Given any element f ∈ H and x ∈ D, by the Riesz representation theorem [12], there exists some unique element Kx (called a point evaluation functional) s.t.

with bandwidth µ2 is an example of a bounded kernel function, which generates an infinite dimensional RKHS H 1 . Given a point x ∈ D, the mapping ψ(x) → H can be thought of as a vector in the Hilbert space H. In D, we have the relation

where f is a function in L2 (D), the space of square integrable functions on the domain D. Then the following theorem holds: Theorem 1 (Mercer) [23] Suppose k is a continuous, symmetric positive-semidefinite kernel. Then there is an orthonormal basis {ei }i∈I of L2 (D) consisting of eigenfunctions of Tk s.t the corresponding sequence of eigenvalues (λi )i∈I is nonnegative. Futhermore, the eigenfunctions corresponding to non-zero eigenvalues are continuous on D, and k has the representation k(s, t) =

∞ X

λj ej (s) ej (t).

j=1

where the convergence is absolute and uniform. P∞ 2 If i=1 λi < ∞, this orthonormal set of eigenfunctions {e } can be used to construct a space H := {f |f = i i∈I P∞ of reals (αi )i∈I , with the i=1 αi ei } for some sequence P∞ constraint that kf k2H := i=1 αi2 /λi < ∞, where kf kH is the norm induced by k. It can be P seen that the point evaluation functional Kx is given by Kx = i λi ei (x). This implies that k(x, y) = hKx , Ky iH , and for a function f ∈ H, Equation (1) is continuous, which is known as the reproducing property. The importance of these spaces from a machine learning perspective is based on the fact that a kernel function meeting the above conditions implies that there exists some Hilbert space H (of functions) and a mapping ψ : D → H s.t. k(x, y) = hψ(x), ψ(y)iH ,

(2)

For a given kernel function, the mapping ψ(x) → H (i.e. the set of point evaluation functionals) does not have to be unique, and is often unknown. However, since ψ is implicit, it does not need to be known for most machine learning algorithms, which exploit the nonlinearity of the mapping to create nonlinear algorithms from linear ones [10]. The Gaussian function used in RBF networks given by k(x, y) = e(−kx−yk

2

/2µ2 )

ψ(x) = Kx (·) = e(−kx−·k

2

,

/2µ2 )

.

(1)

The remainder of the paper uses the notation Kx (·) = 2 2 k(x, ·) = e(−kx−·k /2µ ) .

H is called a reproducing kernel Hilbert space if every such linear functional Kx is continuous (something which is not

1 In this paper, we assume that the basis functions for the RBF network are Gaussian, but these ideas can be generalized to other kernel functions.

f (x) = hf, Kx iH .

Fixing a dataset C = {c1 , . . . , cl }, where ci ∈ Rl , let ( l ) X FC := αi k(ci , ·) : αi ∈ R ,

(3)

i=1

be the linear subspace generated by C in H. Note that FC is a class of functions: any RBF network with the same centers and the same fixed bandwidth (but without bias) is an element of this subspace. Let σ(x) = [k(x, c1 ) . . . , k(x, cl )]T , and let W = [w1 , . . . , wl ] where wi ∈ R. Then W T σ(x) is the output of a standard RBF network. In the machine learning literature, σ(x) is sometimes known as the empirical kernel map. Two different datasets C1 and C2 generate two different families of functions FC1 and FC2 , and two different empirical kernel maps σC1 (x) and σC2 (x). Finally, given the above dataset C, one can form an l × l kernel matrix given by Kij := k(ci , cj ). The following theorem is one of the cornerstones of RBF network theory, and will be used in section IV: Theorem 2 (Micchelli)[24] If the function k(x, y) is a positive-definite Mercer kernel, and if all the points in C are distinct, the kernel matrix K is nonsingular. B. Persistently Exciting Signals In this paper, we use Tao’s definition of a PE signal [41]: Definition 1 A bounded vector signal Φ(t) is exciting over an interval [t, t + T ], T > 0 and t ≥ t0 if there exists γ > 0 such that Z t+T Φ(τ )ΦT (τ )dτ ≥ γI. (4) t

Definition 2 A bounded vector signal Φ(t) is persistently exciting if for all t > t0 there exists T > 0 and γ > 0 such that Z t+T Φ(τ )ΦT (τ )dτ ≥ γI.

(5)

t

The strength of the signal depends on the value of γ. III. M ODEL R EFERENCE A DAPTIVE C ONTROL AND C ONCURRENT L EARNING This section outlines the formulation of Model Reference Adaptive Control using approximate model inversion [8, 15, 17, 18, 27]. Let Dx ∈ Rn be compact, x(t) ∈ Dx be the known state vector, and δ ∈ Rk denote the control input. Consider the uncertain multivariable nonlinear dynamical system x˙ = f (x(t), δ(t)),

(6)

where the function f is assumed to be Lipschitz continuous in x ∈ Dx , and the control input δ is assumed to be bounded and piecewise continuous. These conditions ensure the existence and the uniqueness of the solution to (6) over a sufficiently large domain Dx . Since the exact model (6) is usually not available or not invertible, an approximate inversion model fˆ(x, δ) is introduced, which is used to determine the control input δ δ = fˆ−1 (x, ν),

(7)

where ν is the pseudo control input, which represents the desired model output x˙ and is expected to be approximately achieved by δ. Hence, the pseudo control input is the output of the approximate inversion model ν = fˆ(x, δ).

(8)

This approximation results in a model error of the form x˙ = fˆ(x, δ) + ∆(x, δ)

(9)

where the model error ∆ : Rn+k → Rn is given by ∆(x, δ) = f (x, δ) − fˆ(x, δ).

(10)

A reference model is chosen to characterize the desired response of the system x˙ rm (t) = frm (xrm (t), r(t)),

(11)

where frm (xrm (t), r(t)) denotes the reference model dynamics which are assumed to be continuously differentiable in x for all x ∈ Dx ⊂ Rn , and it is assumed that all requirements for guaranteeing the existence of a unique solution to (11) are satisfied. The command r(t) is assumed to be bounded and piecewise continuous. Furthermore, it is also assumed that the reference model states remain bounded for a bounded reference input. Consider a tracking control law consisting of a linear feedback part νpd = Kx, a linear feedforward part νrm = x˙ rm , and an adaptive part νad (x) in the following form [7, 16, 27] ν = νrm + νpd − νad .

(12)

Define the tracking error e as e(t) = xrm (t) − x(t), then, letting A = −K the tracking error dynamics are found to be e˙ = Ae + [νad (x, δ) − ∆(x, δ)].

(13)

The baseline full state feedback controller νpd = Kx is designed to ensure that A is a Hurwitz matrix. Hence for any positive definite matrix Q ∈ Rn×n , a positive definite solution P ∈ Rn×n exists to the Lyapunov equation: AT P + P A + Q = 0.

(14)

Let x ¯ = [x, δ] ∈ Rn+k . Generally, two cases for characterizing the uncertainty ∆(¯ x) are considered. In structured uncertainty, the mapping Φ(¯ x) is known, whereas in unstructured uncertainty, it is unknown. We focus on the latter in this paper. Assume that it is only known that the uncertainty ∆(¯ x) is continuous and defined over a compact domain D ⊂ Rn+k . Let σ(¯ x) = [1, σ2 (¯ x), σ3 (¯ x), ....., σl (¯ x)]T be a vector of known radial basis functions. For i = 2, 3..., l let ci denote the RBF centroid and let µ denote the RBF bandwidth. Then for each i, the radial basis functions are given as σi (x) = e−k¯x−ci k

2

/2µ2

,

(15)

which can also be written as k(ci , ·) as per the notation introduced earlier (§II). Appealing to the universal approximation property of RBF NN [31],[40], we have that given a fixed number of radial basis functions l there exist ideal weights

W ∗ ∈ Rn×l and a real number ˜(¯ x) such that the following approximation holds for all x ∈ D where D is compact x) + ˜(¯ x), ∆(x) = W ∗ T σ(¯

(16)

and ¯ = supx¯∈D k˜ (¯ x)k can be made arbitrarily small when sufficient number of radial basis functions are used. To make use of the universal approximation property, a Radial Basis Function (RBF) Neural Network (NN) can be used to approximate the unstructured uncertainty in (10). In this case, the adaptive element takes the form νad (¯ x) = W T σ(¯ x),

recall that FC represents the linear subspace generated by C in H (see (3)). Let G = σ(¯ x(t))σ(¯ x(t))T , then   k(¯ x, c1 )   ..  x, c1 ) · · · k(¯ x, cl ) G =  .  k(¯ k(¯ x, cl ) 

 k(¯ x, c1 )k(¯ x, c1 ) · · · k(¯ x, c1 )k(¯ x, cl )   .. .. .. =  . . . k(¯ x, cl )k(¯ x, c1 ) · · · k(¯ x, cl )k(¯ x, cl )   hψx¯,c1 ihψx¯,c1 i · · · hψx¯,c1 ihψx¯,cl i   .. .. .. = , . . .

(17)

where W ∈ Rn×l . The goal is to design an adaptive law such that W (t) → W ∗ as t → ∞ and the tracking error e remains bounded. A commonly used update law, which will be referred to here as the baseline adaptive law, is given as [1, 29, 41] ˙ = −ΓW σ(¯ W x)eT P B.

(18)

hψx¯,cl ihψx¯,c1 i

···

hψx¯,cl ihψx¯,cl i

where hψx¯,ci i is shorthand for hψ(¯ x), ψ(ci )i. A matrix G is positive definite if and only if v T Gv > 0 ∀v ∈ Rl . In the above, this translates to T

v Gv =

l X

vi vj Gi,j

i,j=1

The adaptive law (18) guarantees that the weights approach their ideal values (W ∗ ) if and only if the signal σ(¯ x) is PE. In absence of PE, without additional modifications such as σ-mod [14], e-mod [28], nor projection based adaptation [41], the law does not guarantee that the weights W remain bounded. The work in [8] shows that if specifically selected recorded data is used concurrently with instantaneous measurements, then the weights approach and stay bounded in a compact neighborhood of the ideal weights subject to a sufficient condition on the linear independence of the recorded data; PE is not needed. This is captured in the following theorem Theorem 3 [8] Consider the system in (6), the control law of (12), x ¯(0) ∈ D where D is compact, and the case of unstructured uncertainty. For the j th recorded data point let j (t) = W T (t)σ(¯ xj ) − ∆(¯ xj ), with ∆(¯ xj ) for a stored data point j calculated using (10) as ∆(¯ xj ) = x˙ j − νj . Also, let p be the number of recorded data points σ(¯ xj ) in the matrix Z = [σ(¯ x1 ), ...., σ(¯ xp )]. If rank(Z) = l, then the following weight update law ˙ = −ΓW σ(¯ W x)eT P B − ΓW

p X

σ(¯ xj )Tj ,

(19)

j=1

renders the tracking error e and the RBF NN weight errors ˜ uniformly ultimately bounded. Furthermore, the adaptive W weights W (t) will approach and remain bounded in a compact neighborhood of the ideal weights W ∗ . The matrix Z will be referred to as the history stack. IV. K ERNEL L INEAR I NDEPENDENCE AND THE B UDGETED K ERNEL R ESTRUCTURING (BKR) A LGORITHM A. PE Signals and the RKHS In this section we make a connection between adaptive control and kernel methods by leveraging RKHS theory to relate PE of x ¯(t) to PE of σ(¯ x(t)). Let C = {c1 , . . . , cl }, and

=

l X

vi vj hψ(¯ x), ψ(ci )ihψ(¯ x), ψ(cj )i

i,j=1

* =

ψ(¯ x),

l X

+* vi ψ(ci )

ψ(¯ x),

=

ψ(¯ x),

l X

+ vj ψ(cj )

j=1

i=1

*

l X

+2 vi ψ(ci )

.

i=1

From Micchelli’s theorem (Theorem 2), it is known that if ci 6= cj , then ψ(ci ) and ψ(cj ) are linearly independent in H. Further, if the trajectory x ¯(t) ∈ Rl is bounded for all time, then k(¯ x(t), c) 6= 0. From this, it follows trivially that R t+T the signal t G(τ )dτ is bounded. Furthermore, the next theorem follows immediately: Theorem 4 Suppose x ¯(t) evolves in the state space according to Equation 6, then if there exists some time tf ∈ R+ s.t. the mapping ψ(¯ x(t)) → H for t > tf is orthogonal to the linear subspace FC ⊂ H for all time, the signal σ(¯ x(t)) is not persistently exciting. If the state x ¯(t) reaches a stationary point, we can state a similar theorem. Theorem 5 Suppose x ¯(t) evolves in the state space according to Equation 6. If there exists some state x ¯f ∈ Rn and some tf ∈ R+ s.t. x ¯(t) = x ¯f ∀t > tf , σ(¯ x(t)) is not persistently exciting. Therefore PE of σ(¯ x(t)) follows only if neither of the above conditions are met. Figure 1 shows an example of the mapping ψ in H given the trajectory x ¯(t), while Figure 2 depicts a geometric description of a non PE signal in the Hilbert space. The signal ψ(¯ x(t)) in H becomes orthogonal to the centers C = {c1 , . . . , cl } mapped to H if x ¯(t) moves far away from them in Rn . Though orthogonality is desired in the state space Rn for guaranteeing PE, orthogonality in H is

R

H

n

ψ(¯ x(t))

ψ(c2 )

x ¯(t) c1

ψ

ψ(c1 )

c2

FC

analysis, σ(¯ xj ) is the projection of ψ(¯ xj ) onto FC . If ψ(¯ x(t)) is orthogonal to FC in H, the first term in (19) (which is also the baseline adaptive law of (18)) vanishes, causing the evolution of weights toPstop with respect to the tracking error e. The condition that p σ(¯ xj )σ(¯ xj )T has the same rank as the number of centers in the RBF network is equivalent to the statement that one has a collection of vectors {¯ xj }pj=1 where p the projections of {ψ(¯ xj )}j=1 onto FC allow the weights to continue evolving to minimize the difference in uncertainty. B. Linear Independence

Fig. 1. An example Hilbert space mapping. The trajectory x ¯(t) evolves in the state space Rn via an ODE; the mapping ψ an equivalent ODE in H. If c1 and c2 are the centers for the RBFs, then they generate the linear subspace FC ⊂ H, which is a family of linearly parameterized functions, via the mappings ψ(c1 ) and ψ(c2 ).

H

ψ(¯ x(t))

FC

Fig. 2. An example of a non PE signal: Any trajectory x ¯(t) ∈ Rn inducing a trajectory ψ(¯ x(t)) ∈ H that is orthogonal to FC after some time tf renders σ(¯ x(t)) non-PE.

detrimental to PE of σ(¯ x(t)). Recall that without PE of σ(¯ x(t)) the baseline adaptive law of (18) does not guarantee that the weights converge to a neighborhood of the ideal weights. This shows that in order to guarantee PE in σ(¯ x(t)), not only does x ¯(t) have to be PE but also the centers should be such that the mapping ψ(¯ x(t)) → H is not orthogonal to the linear subspace FC generated by the centers. It is intuitively clear that locality is a concern when learning the uncertainty; if the system evolves far away from the original set of centers, the old centers will give less accurate information than new ones that are selected to better represent the current operating domain. Theorem 4 enables us to state this more rigorously; keeping the centers close to x ¯(t) ensures persistency of excitation of σ(¯ x(t)) (as long as x ¯(t) is also PE). At the same time, we do not want to throw out all of the earlier centers either. This is in order to preserve global function approximation (over the entire domain), especially if the system is expected to revisit these regions of state space. The work in [8] shows that if the history stack is linearly independent, the law in (19) suffices to ensure convergence to a compact neighborhood of the ideal weight vector W ∗ without requiring PE of σ(¯ x(t)). We can reinterpret the concurrent learning gradient descent in our RKHS framework. Note that the concurrent learning adaptive law in (19) will ensure j (t) → ˜ by driving W (t) → W ∗ . By the above

This section outlines a strategy to select RBF centers online. The algorithm presented here ensures that the RBF centers cover the current domain of operation while keeping at least some centers reflecting previous domains of operation. We use the scheme introduced in [30] to ensure that the RBF centers reflect the current domain of operation. Similar ideas are used in the kernel adaptive filtering literature in order to curb filter complexity [22]. At any point in time, our algorithm maintains a ‘dictionary’ of centers Cl = {ci }li=1 , where l is the current size of the dictionary, and ND is the upper limit on the number of points (the budget). To test whether a new center cl0 should be inserted into the dictionary, we check whether it can or cannot be approximated in H by the current set of centers. This test is performed using

2 l

X

γ= ai ψ(ci ) − ψ(cl0 ) , (20)

i=1

H

where the ai denote the coefficients of the linear independence. Unraveling the above equation in terms of (2) shows that the coefficients ai can be determined by minimizing γ, which yields the optimal coefficient vector a ˆl = Kl−1 kˆl , where Kl = k(Cl , Cl ) is the kernel matrix for the dictionary dataset Cl , and kˆl = k(Cl , cl0 ) is the kernel vector. After substituting the optimal value a ˆl into (20), we get γ = K(cl0 , cl0 ) − kˆlT a ˆl .

(21)

The algorithm is summarized in Algorithm 1. For more details see [30]; in particular, an efficient way to implement the updates is given there, which we do not get into here. Algorithm 1 Kernel Linear Independence Test Input: New point cl0 , η, ND . Compute a ˆl = Kl−1 kˆl kl = k(Cl , cl0 ) Compute γ as in Equation 21 if γ > η then if l < ND then Update dictionary by storing the new point cl0 , and recalculating γ’s for each of the points else Update dictionary by deleting the point with the minimal γ, and then recalculate γ’s for each of the points end if end if

Note that due to the nature of the linear independence test, every time a state x ¯(t) is encountered that cannot be approximated within tolerance η by the current dictionary Cl , it is added to the dictionary. Further, since the dictionary is designed to keep the most varied basis possible, this is the best one can do on a budget using only instantaneous online data. Therefore, in the final algorithm, all the centers can be initialized to 0, and the linear independence test can be periodically run to check whether or not to add a new center to the RBF network. This is fundamentally different from update laws of the kind given in [27], which attempt to move the centers to minimize the tracking error e. Since such update laws are essentially rank-1, if all the centers are initialized to 0, they will all move in the same direction together. Furthermore, from the above analysis and from (15), it is clear that the amount of excitation is maximized when the centers are previous states themselves. Summarizing, the following are the advantages of picking centers online with the linear independence test (21): 1) In light of Theorem 4, Algorithm 1 ensures inserted excitation at the current time in the system does not disappear by selecting centers such that FC is not orthogonal to current states. In less formal terms, this means that at least some centers are “sufficiently” close to the current state. 2) This also implies that not all of the old centers are discarded, which means that all of the regions where the system has previously operated are represented to some extent in the dictionary. This implies global function approximation. 3) Algorithm 1 enables the design of adaptive controllers without any prior knowledge of the domain. For example, this method would allow one to initialize all centers to zero, with appropriate centers then selected by Algorithm 1 online. 4) On a budget, selecting centers with Algorithm 1 is better than evenly spacing centers, since the centers are selected along the path of the system in the state space. This results in a more judicious distribution of centers without any prior knowledge of the uncertainty. If the centers for the system are picked using the kernel linear independence test, and the weight update law is given by the standard baseline law (18), we call the resulting algorithm Budgeted Kernel Restructuring (BKR). V. T HE BKR-CL A LGORITHM A. Motivation The BKR algorithm selects centers in order to ensure that any inserted excitation does not vanish. However, to fully leverage the universal approximation property (see (16)), we must also prove that the weights are driven towards their ideal values. In this section we present a concurrent learning adaptive law that guarantees the weights approach and remain bounded in compact domain around their ideal weights by concurrently utilizing online selected and recorded data with instantaneous data for adaptation. The concurrent learning adaptive law is wedded with BKR to create BKR-CL.

We begin by describing the algorithm used to select, record, and remove data points in the history stack. Note that Algorithm 1 picks the centers discretely; therefore, there always exists an interval [tk , tk+1 ] where k ∈ N where the centers are fixed. This discrete update of the centers introduces switching in the closed loop system. Let σ k (¯ x) denote the value of σ ∗ given by this particular set of centers, denote by W k the ideal set of weights for these centers and by σ k the radial basis function for these centers. Let p ∈ N denote the subscript of the last point stored. For a stored data point x ¯j , let σjk ∈ Rn denote σ k (¯ xj ). We will let St = [¯ x1 , . . . , x ¯p ] denote the matrix containing the recorded information in the history stack at time t, then Zt = [σ1k , . . . , σpk ] is the matrix containing the output of the RBF function for the current set of RBF centers. The p-th column of Zt will be denoted by Zt (:, p). It is assumed that the maximum allowable number of recorded data points is limited due to memory or processing power considerations. Therefore, we will require that Zt has a maximum of p¯ ∈ N columns; clearly, in order to be able to satisfy rank(Zt ) = l, p ≥ l. For the j-th data point, the associated model error ∆(¯ xj ) is assumed to be ¯ j) = ∆(¯ stored in the array ∆(:, xj ). The uncertainty of the system is estimated by estimating x˙ j for the j th recorded data point using optimal fixed point optimal smoothing and solving equation (10) to get ∆(¯ x) = x˙ j − νj . The history stack is populated using an algorithm that aims to maximize the singular value of the symmetric Pminimum p k kT matrix Ω = (xj ). Thus any data point j=1 σ (xj )σ linearly independent of the data stored in the history stack is included in the history stack. At the initial time t = 0 and k = 0, the algorithm begins by setting Zt (:, 1) = σ 0 (¯ x(t0 )). The algorithm then selects sufficiently different points for storage; a point is considered sufficiently different if it is linearly independent of the points in the history stack, or if it is sufficiently different in the sense of the Euclidean norm of the last point stored. Furthermore, if Algorithm 1 updates the dictionary by adding a center, then this point is also added to the history stack (if the maximum allowable size of the history stack is not reached), so the rank of Zt is maintained. If the number of stored points exceeds the maximum allowable number, the algorithm seeks to incorporate new data points in manner that the minimum singular value of Zt is increased; the current data point is added to the history stack only if swapping it with an old point increases the singular value. One can see that Algorithm 1 and Algorithm 2 work similarly to one another. The structure of Algorithm 2 is a condition that relates to the PE of the signal x ¯(t), by ensuring that the history stack is as linearly independent in Rn as possible, while Algorithm 1 does the same for the PE of the signal σ(¯ x(t)) as well, trying to ensure that the center set is as linearly independent in H as possible. In order to approximate the uncertainty well, both algorithms are needed. method by the acronym BKR-CL. B. Lyapunov Analysis In this section, we prove the boundedness of the closed loop signals when using BKR-CL. Over each interval, between

Algorithm 2 Singular Value Maximizing Algorithm for Recording Data Points Require: p ≥ 1 kσ k (x(t))−σ k k2 if kσk (x(t))kp ≥  or a new center added by Algorithm 1 without replacing an old center then p=p+1 ¯ p) = x˙ − ν(t)} St (:, p) = x(t); {store ∆(:, else if new center added by Algorithm 1 by replacing an old center then if old center was found in St then overwrite old center in the history stack with new center set p equal to the location of the data point replaced in the history stack end if end if Recalculate Zt = [σ1k , . . . , σpk ] if p ≥ p¯ then T = Zt Sold = min SVD(ZtT ) for j = 1 to p do Zt (:, j) = σ k (x(t)) S(j) = min SVD(ZtT ) Zt = T end for find max S and let k denote the corresponding column index if max S > Sold then ¯ k) = x˙ − ν(t)} St (:, k) = x(t); {store ∆(:, k Zt (:, k) = σ (x(t)) p=p−1 else p=p−1 Zt = T end if end if

switches in the NN approximation, the tracking error dynamics are given by the following differential equation e˙ = Ae + [W T σ k (¯ x) − ∆(¯ x)].

(22)

The NN approximation error of (16) for the k th system can be rewritten as ∆(¯ x) = W k

∗T

[σ1k , ...., σpk ], such that rank(Z0 ) = l at t = 0. Assume that the RBF centers are updated using Algorithm 1 and the history stack updated using Algorithm 2. Then, the weight update law in (24) ensures that the tracking error e of (22) and the RBF ˜ k of (23) are bounded. NN weight errors W Proof: Consider the tracking error dynamics given by (22) and the update law of (24). Since νad = W T (σ k (x)), we have j = W T σ k (¯ x) − ∆(¯ x). Then the NN approximation error is ˜ k = W − W k∗ now given by (23). With W ∗T

kj (¯ x) = W T σ k (¯ x) − W k σ k (¯ x) − ˜k (¯ x) k k k ˜ = W σ (¯ x) − ˜ (¯ x). Therefore over [tk , tk+1 ], the weight dynamics are given by the following switching system   p X T ˜˙ k (t) = −Γ  ˜ k (t) W σ k (¯ xj )σ k (¯ xj ) W j=1

+

p X



(25)

T

σ k (¯ xj )˜ k (zj ) − σ k (¯ x(t))eT (t)P B  .

j=1

Consider the family of positive definite functions V k = 1 1 T ˜ k ), where Tr(·) denotes the trace ˜ kT Γ−1 W W 2 e P e + 2 Tr(W k ˜ k) > operator. Note that V ∈ C 1 , V k (0) = 0 and V k (e, W P k kT k k ˜ 0 ∀e, W 6= 0, and define Ω := j σj sj . Then   X T 1 ˜ k ) = − eT Qe + eT P B˜ ˜k  ˜k V˙ k (e, W k − Tr W σjk σjkT W 2 j  X − σj ˜kj  j

1 ˜k ˜ kT W k k − λmin (Ωk )W ≤ − λmin (Q)eT e + keT P B˜ 2 X ˜ kT − kW σik ˜k k i  1 ≤ kek C1k − λmin (Q)kek 2 ˜ k k), + kW˜ k k(C2k − λmin (Ωk )kW √ where C1k = kP Bk˜ k , C2 = p˜ k l (l being the number of p P T RBFs in the network), and Ωk = σ k (¯ xj )σ k . Note that



j=1

σ k (¯ x) + ˜k (¯ x),

(23)

We prove the following equivalent of Theorem 3 for the following switching weight update law ˙ = −ΓW σ k (¯ W x)eT P B − ΓW

p X

T

σ k (¯ xj )kj .

(24)

j=1

Theorem 6 Consider the system in (6), the control law of (12), x ¯(0) ∈ D where D is compact, and the case of unstructured uncertainty. For the j th recorded data point let kj (t) = W T (t)σ k (¯ xj ) − ∆(¯ xj ) and let p be the number of recorded data points σjk := σ k (¯ xj ) in the matrix Z =

since Algorithm 2 guarantees that only sufficiently different points are added to the history stack then due to Micchelli’s theorem [24], Ωk is guaranteed to be positive definite. Hence ˜ k k > C3 /λmin (Ωk ), we have if kek > 2C1 /λmin (Q) and kW k ˜ ) < 0. Hence the set Πk = {(e, W ˜ k ) : kek+kW ˜ kk ≤ V˙ (e, W k 2C1 /λmin (Q) + C2 /λmin (Ω )} is positively invariant for the k th system. Let S = {(t1 , 1), (t2 , 2), . . . } be an arbitrary switching sequence with finite switches in finite time (note that this is always guaranteed due to the discrete nature of Algorithm 1). The sequence denotes that a system Sk was active between tk and tk+1 . Suppose at time tk+1 , the system switches from Sk ˜ k+1 = W ˜ k + ∆W k∗ , to Sk+1 . Then e(tk ) = e(tk+1 ) and W

∗ ∗ ∗ ˜ k ) is guarwhere ∆W k = W k − W (k+1) . Since V˙ k (e, W anteed to be negative definite outside of a compact interval, it ˜ k (tk ) are guaranteed to be bounded. follows that e(tk ) and W ˜ k+1 (tk+1 ) are also bounded and Therefore, e(tk+1 ) and W k+1 k+1 ˙ ˜ since V (e, W ) is guaranteed to be negative definite ˜ k+1 (t) are also bounded. outside of a compact set, e(t) and W Furthermore, over every interval [tk , tk+1 ], they will approach the positively invariant set Πk+1 or stay bounded within Πk+1 if inside.

Remark 1 Note that BKR-CL along with the assumption that rank(Z) = l guarantees that the weights stay bounded in a compact neighborhood of their ideal values without requiring any additional damping terms such as σ modification [14] or e modification [29] (even in the presence of noise). Due to Micchelli’s theorem [24], as long as the history stack matrix Zk contains l different points, rank(Z) = l. Therefore, it is expected that this rank condition will be met within the first few time steps even when one begins with no apriori recorded data points in the history stack. In addition, a projection operator (see for example [41]) can be used to bound the weights until rank(Zt ) = l. This is a straight forward extension of the presented approach. VI. A PPLICATION TO C ONTROL OF W ING ROCK DYNAMICS Modern highly swept-back or delta wing fighter aircraft are susceptible to lightly damped oscillations in roll known as “Wing Rock”. Wing rock often occurs at conditions commonly encountered at landing ([34]); making precision control in presence of wing rock critical for safe landing. In this section we use BKR-CL control to track a sequence of roll commands in the presence of simulated wing rock dynamics. Let θ denote the roll attitude of an aircraft, p denote the roll rate and δa denote the aileron control input. Then a model for wing rock dynamics is [25] θ˙ = p

(26)

p˙ = Lδa δa + ∆(x),

(27)

where ∆(x) = W0∗ + W1∗ θ + W2∗ p + W3∗ |θ|p + W4∗ |p|p + W5∗ θ3 (28) and Lδa = 3. The parameters for wing rock motion are adapted from [36]; they are W1∗ = 0.2314, W2∗ = 0.6918, W3∗ = −0.6245, W4∗ = 0.0095, W5∗ = 0.0214. In addition to these parameters, a trim error is introduced by setting W0∗ = 0.8. The ideal parameter vector W ∗ is assumed to be unknown. The chosen inversion model has the form ν = L1δ δa . This choice a results in the modeling uncertainty of (10) being given by ∆(x). The adaptive controller uses the control law of (12). The linear gain K of the control law is given by [0.5, 0.4], a second order reference model with natural frequency of 1 rad/sec and damping ratio of 0.5 is chosen, and the learning rate is set to ΓW = 1. The simulation uses a time-step of 0.01 sec.

We choose to make two sets of comparisons. In the first set, we compare BKR-CL with e-mod, σ-mod and projection operator adaptive laws (henceforth collectively denoted as ‘classical adaptive laws’) for neuroadaptive control. In all cases, the number of centers was chosen to be 12. In BKR-CL, the centers were all initialized to 0 ∈ R2 at t = 0, while in the classical adaptive laws, they were placed across the domain of uncertainty evenly (where the prior knowledge of the domain was determined through experiments conducted beforehand). The BKR-CL algorithm uses the switching weight update law (24). Figure 3(a) shows the tracking of the reference model by the classical laws and BKR-CL. Figure 3(b) shows the tracking error for the same. As can be seen, overall error, and especially the transient error for BKR-CL is lower than that for the classical laws. Figure 4 shows an example of how concurrent learning affects the evolution of weights; it has a tendency to spread the weights across a broader spectrum of values than the classical laws, leading the algorithm to choose from a richer class of functions in FC . Figure 5 shows the final set of centers that are used; as can be seen, the centers follow the path of the system in the state space. This is another reason why the uncertainty is approximated better by BKR-CL than the classical algorithms (this is discussed further in VI-A). To illustrate this further, we also ran another set of comparisons between BKR-Cl and the classical laws with the centers picked according to the BKR scheme instead of being uniformly distributed beforehand. In all subsequent figures, the BKR versions of e-mod, σ-mod and the projection operator are denoted by BKR e-mod, BKR σ-mod and BKR proj respectively. The tracking performance is compared in Figure 6, and it can be seen here that BKR-CL is again the better scheme. Finally, in both sets of comparisons, note that the control input is tracked better by BKR-CL than the other algorithms. Figure 7 shows that BKR-CL again drives the weights over a wider domain. A. Illustration of long-term learning effect introduced by BKRCL One of the main contributions of the presented BKRCL scheme is the introduction of long-term learning in the adaptive controller. Long-term learning here is characterized by better approximation of the uncertainty over the entire domain. Long term learning results due to the convergence of the NN weights to a neighborhood of their ideal values (in sense of the Universal Approximation Property of (17)). Figure 8 presents an interesting comparison between the online NN outputs (W T (t)σ(¯ x(t))) for all the controllers, and the NN output with the weights and centers frozen at their values at the final time T of the simulation (W T (T )σ(¯ x(t))), then resimulated with these “learned weights”. Figure 8(a) compares the online adaptation performance during the simulation, that is it compares ∆(¯ x(t)) with W T (t)σ(¯ x(t)). We can see that BKR-CL does the best job approximating the uncertainty online, particularly in the transient, whereas the classical laws tend to perform similar to each other. In Figure 8(b), we see that the difference is more dramatic when the weights are frozen at their learned values during the simulation; only BKRCL completely captures the transient, and approximates the

Angular Position 3

1.5 ref model e−mod σ−mod proj BKR−CL

θ (rad)

2

e−mod weights σ−mod weights proj BKR−CL weights 1

1

0

−1

0.5

0

10

20

30 time (seconds)

40

50

60 0

Angular Rate 2 −0.5

θ Dot (rad/s)

1

0 −1 −1

−2

0

10

20

30 time (seconds)

40

50

θ Err (rad)

e−mod σ−mod proj BKR−CL

1000

2000

3000

4000

5000

6000

7000

RBF centers in state space

−0.5 −1

0

Fig. 4. Weight evolution with BKR-CL, e-mod ,σ-mod, and the projection operator.

Position Error 0.5 0

−1.5

60

2

0

10

20

30

40

50

system path BKR centers σ and e−mod centers

60 1.5

Angular Rate Error θ DotErr (rad/s)

2 1

1 0 −1

0.5 0

10

20

30

40

50

60 0

Control Input 5

δ (rad)

−0.5 0 −1 −5

0

10

20

30 time (seconds)

40

50

60 −1.5

−2 −1

Fig. 3. Tracking with e-mod, σ-mod, the projection operator, and with BKR and concurrent learning (BKR-CL). Fig. 5.

steady-state uncertainty fairly closely, while the classical laws do quite poorly. This implies that the weights learned with BKR-CL have converged to the values that provide a good approximation of the uncertainty over the entire operating domain. On the other hand, in Figure 8(b) we see that if BKRCL is not used, the NN does not approximate the uncertainty as well, indicating that the weights have not approached their ideal values during the simulation. From Figure 4 we see that since the weights for e-mod and the projection operator are closer to those approached by BKR-CL than σ-mod, and they do slightly better in uncertainty approximation. The effect of long-term learning can be further characterized by studying the behavior of the BKR algorithm without concurrent learning and with a higher learning rate (ΓW = 10). Higher learning rates are often used in traditional adaptive control (without concurrent learning) to guarantee good tracking. Figure 9(a) compares the online adaptation performance. Except for the initial transient, the NN outputs of the classical laws approximate the uncertainty well locally in time with a higher learning rate. However, Figure 9(b) shows that they do

−0.5

0

0.5

1

1.5

2

2.5

3

Center placement for the classical laws and BKR-CL.

a poor job in estimating the ideal weights with the learning rate increased, indicating that good local approximation of the uncertainty through high adaptation gains does not necessarily guarantee long-term learning. This is motivated by the fact that the classical laws relying on the gradient based weight update law of (18) drive the weights only in the direction of maximum reduction of instantaneous tracking error cost, and hence, do not guarantee that the weights are driven close to the ideal values that best represent the uncertainty globally. On the other hand, BKR-CL has a better long-term adaptation performance as not only are the centers updated to best reflect the domain of the uncertainty but the weights are also driven towards the ideal values that best represent the uncertainty over the domain visited. B. Computational Complexity The accuracy of the algorithm comes at a computational cost. The two main components of BKR-CL are the BKR

Angular Position 3

2 ref model e−mod σ−mod proj BKR−CL

θ (rad)

2

e−mod weights σ−mod weights proj BKR−CL weights

1.5

1 1 0

−1

0

10

20

30 time (seconds)

40

50

60

Angular Rate

0.5

0

2

θ Dot (rad/s)

1 −0.5 0 −1 −1

−2

0

10

20

30 time (seconds)

40

50

60

θ Err (rad)

0

1000

e−mod σ−mod proj BKR−CL

−0.5

3

−1

2.5

0

10

20

30

40

50

2000

3000

4000

5000

6000

7000

Fig. 7. Weight evolution with BKR-CL and with BKR e-mod, BKR σ-mod and BKR proj.

Position Error 0.5 0

−1.5

e−mod σ−mod proj BKR−CL true uncertainty

60 2

Angular Rate Error θ DotErr (rad/s)

1 1.5

0.5 0

1

−0.5 −1

0.5

0

10

20

30

40

50

60

0

Control Input 2

−0.5

δ (rad)

0 −1

−2 −1.5

−4 −6

0

10

20

30 time (seconds)

40

50

0

1000

2000

3000

4000

5000

6000

7000

60

(a) Online uncertainty tracking 2.5 e−mod σ−mod proj BKR−CL true uncertainty

Fig. 6. Tracking with BKR e-mod, BKR σ-mod, BRK proj and with BKR and concurrent learning (BKR-CL). 2

1.5

step (Algorithm 1) and the CL step (Algorithm 2). Once the budget of the dictionary ND is reached, the BKR step roughly 2 takes O(ND ) steps to add a new center, and is thus reasonably efficient. The main computational bottleneck in BKR-Cl is the computation of SVD of the history stack matrix Ω in the CL step. If there are p points in Ω, this takes O(p3 ) steps. Given the fact that p is usually larger than ND , this can be quite expensive. Table I shows a computation time comparison between the non-BKR controllers and BKR-CL for the Wingrock simulation run for 60 seconds in MATLAB on a dual core Intel 2.93 Ghz processor with 8 GB RAM, while Table VI-B shows the same comparison between BKR controllers and BKR-CL. As can be seen, the computation time increases dramatically for BKR-CL as the size of Ω increases. There are two things to note here however; the number of points added to the history stack Ω is determined by the tolerance  in Algorithm 2. Therefore, if too many points are being added to the history stack, one can increase the tolerance, or set a separate budget to make p fixed. Secondly, we used the native implementation of the SVD in MATLAB

1

0.5

0

0

1000

2000

3000

4000

5000

6000

7000

(b) Learned weight uncertainty tracking Fig. 8. Plots of the online NN output against the system uncertainty, and the NN output using the final weights frozen at the end of the simulation run. Note that the NN weights and centers with BKR-CL better approximate the uncertainty in both cases.

for these experiments, which is not optimized for an online setting. Recent work in incremental SVD computation (see for example [4, 5, 35]) speeds up the decomposition considerably, resulting in O(p2 ) complexity once the first SVD with p points has been computed. Therefore, the computation time for the SVD component of BKR-CL can be made roughly comparable

4 e−mod σ−mod proj BKR−CL true uncertainty

3

2

1

0

−1

−2

0

1000

2000

3000

4000

5000

6000

7000

(a) Online uncertainty tracking 2.5 e−mod σ−mod proj BKR−CL true uncertainty

2

1.5

1

the mapping from the state space to the underlying RKHS is never orthogonal to the linear subspace generated by the Radial Basis Function (RBF) centers. This ensures that the output of the radial basis function does not vanish. We used this connection to motivate an algorithm called Budgeted Kernel Restructuring (BKR) that updates the RBF network in a way that ensures any inserted excitation is retained. This enabled us to design adaptive controllers without assuming any prior knowledge about the domain of the uncertainty. Furthermore, it was shown through simulation that on a budget (limitation on the maximum number of RBFs used), the method is better at capturing the uncertainty than evenly spaced centers, because the centers are selected along the path of the system through the state space. We augmented BKR with Concurrent Learning (CL), a method that concurrently uses specifically recorded and instantaneous data for adaptation, to create the BKR-CL adaptive control algorithm. It was shown through Lyapunovlike analysis that the weights for the centers picked by BKRCL are bounded without needing PE. Simulations showed improved tracking performance.

0.5

VIII. ACKNOWLEDGMENTS

0

−0.5

0

1000

2000

3000

4000

5000

6000

7000

(b) Learned weight uncertainty tracking Fig. 9. Plots of the online NN output against the system uncertainty, and the NN output using the final weights frozen at the end of the simulation run, with a high learning rate. TABLE I W INGROCK COMPUTATION TIME COMPARISON WITH NON -BKR CONTROLLERS . Centers 8 10 12 14

e-mod 3.6 3.7 3.8 3.8

σ-mod 3.1 3.2 3.4 3.4

proj 3.2 3.3 3.5 3.6

BKR-CL 13.9 16.1 18.9 23.8

to the BKR component, which implies that the algorithm will scale better as the number of points in the history stack increases. VII. CONCLUSION In this work, we established a connection between kernel methods and Persistently Exciting (PE) signals using Reproducing Kernel Hilbert Space theory. Particularly, we showed that in order to ensure PE, not only do the system states have to PE, but the centers need to be also selected in such a way that TABLE II W INGROCK COMPUTATION TIME COMPARISON WITH BKR CONTROLLERS . Centers 8 10 12 14

BKR e-mod 4.2 4.4 4.6 4.8

BKR proj 4.2 4.4 4.6 4.8

BKR σ-mod 4.4 4.5 4.7 5.0

BKR-CL 13.9 16.1 18.9 23.8

This work was supported in part by NSF ECS-0238993, NSF ECS-0846750, and NASA Cooperative Agreement NNX08AD06A. We thank Maximilian M¨uhlegg for his valuable help in revising this paper and in the simulations. R EFERENCES [1] K. J. Astr¨om and B. Wittenmark. Adaptive Control. Addison-Weseley, Readings, 2nd edition, 1995. [2] Pantelis Bouboulis, Konstantinos Slavakis, and Sergios Theodoridis. Adaptive learning in complex reproducing kernel hilbert spaces employing wirtinger’s subgradients. IEEE Trans. Neural Netw. Learning Syst., 23(3):425–438, 2012. [3] S. Boyd and S. Sastry. Necessary and sufficient conditions for parameter convergence in adaptive control. Automatica, 22(6):629–639, 1986. [4] M. Brand. Fast online svd revisions for lightweight recommender systems. In Proc. of SIAM Int. Conf. on Data Mining, 2003. [5] M. Brand. Fast low-rank modifications of the thin singular value decomposition. Linear Algebra and its Applications, 415(1):20–30, May 2006. [6] C. Burges. A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 2:121–167, 1998. [7] A. J. Calise, M. Sharma, and S. Lee. Adaptive autopilot design for guided munitions. AIAA Journal of Guidance, Control, and Dynamics, 23(5):837–843, 2000. [8] G. Chowdhary. Concurrent Learning for Convergence in Adaptive Control Without Persistency of Excitation. PhD thesis, Georgia Institute of Technology, Atlanta, GA, 2010. [9] G. Chowdhary and E. N. Johnson. Concurrent learning for convergence in adaptive control without persistency

[10] [11]

[12] [13]

[14] [15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

[24]

[25]

[26]

[27]

[28]

of excitation. In IEEE Conference on Decision and Control, pages 3674–3679, 2010. N. Cristianini and J. Shawe-Taylor. Kernel Methods for Pattern Analysis. Cambridge University Press, 2004. Y. Engel, S. Mannor, and R. Meir. The kernel leastmean-square algorithm. IEEE Transactions on Signal Processing, 52(8):2275–2285, Aug 2004. G. Folland. Real Analysis: Modern Techniques and Their Applications. Wiley, 1999. T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning. Springer, Hoboken, New Jersey, 2001. P. A. Ioannou and J. Sun. Robust Adaptive Control. Prentice-Hall, Upper Saddle River, 1996. E. N. Johnson. Limited Authority Adaptive Flight Control. PhD thesis, Georgia Institute of Technology, Atlanta GA, 2000. E. N. Johnson and A. J. Calise. Adaptive flight control for reusable launch vehicles. Journal of Guidance, Control and Dynamics, 2001. S. Kannan. Adaptive Control of Systems in Cascade with Saturation. PhD thesis, Georgia Institute of Technology, Atlanta GA, 2005. N. Kim. Improved Methods in Neural Network Based Adaptive Output Feedback Control, with Applications to Flight Control. PhD thesis, Georgia Institute of Technology, Atlanta GA, 2003. Y. H. Kim and F.L. Lewis. High-Level Feedback Control with Neural Networks, volume 21 of Robotics and Intelligent Systems. World Scientific, Singapore, 1998. J. Li. and J-H. Liu. Identification and control of dynamic systems based on least squares wavelet vector machines. In Advances in Neural Networks, pages 934–942, 2006. W. Liu, P.P Pokharel, and J.C. Principe. The kernel leastmean-square algorithm. IEEE Transactions on Signal Processing, 56(2):543–554, Feb 2008. W. Liu, J. C. Principe, and S. Haykin. Kernel Adaptive Filtering: A Comprehensive Introduction. Wiley, Hoboken, New Jersey, 2010. J. Mercer. Functions of positive and negative type and their connection with the theory of integral equations. Phil. Trans. of the Royal Society, 209, 1909. C. A. Micchelli. Interpolation of scattered data: distance matrices and conditionally positive definite functions. Construct. Approx., 2(1):11–22, dec. 1986. Morgan M. Monahemi and Miroslav Krstic. Control of wingrock motion using adaptive feedback linearization. Journal of Guidance Control and Dynamics, 19(4):905– 912, August 1996. S. Narasimha, S. Suresh, and N. Sundararajan. Adaptive control of nonlinear smart base-isolated buildings using gaussian kernel functions. Structural Control and Health Monitoring, 15(4), 2008. N. Nardi. Neural Network based Adaptive Algorithms for Nonlinear Control. PhD thesis, Georgia Institute of Technology, School of Aerospace Engineering, Atlanta, GA, Nov 2000. K. Narendra and A. Annaswamy. A new adaptive

[29] [30]

[31]

[32]

[33]

[34]

[35]

[36]

[37]

[38]

[39]

[40]

[41] [42]

[43]

law for robust adaptation without persistent excitation. IEEE Transactions on Automatic Control, 32(2):134– 145, February 1987. K. S. Narendra and A. M. Annaswamy. Stable Adaptive Systems. Prentice-Hall, Englewood Cliffs, 1989. D. Nguyen-Tuong, B. Sch¨olkopf, and J. Peters. Sparse online model learning for robot control with support vector regression. In IEEE/RSJ Int. Conf. on Intelligent Robots and Systems, pages 3121–3126, 2009. J. Park and I.W. Sandberg. Universal approximation using radial-basis-function networks. Neural Computation, 3:246–257, 1991. H.D. Pati˜no, R. Carelli, and B.R. Kuchen. Neural networks for advanced control of robot manipulators. IEEE Transactions on Neural Networks, 13(2):343–354, Mar 2002. D. Patino and D. Liu. Neural network based model reference adaptive control system. IEEE Transactions on Systems, Man and Cybernetics, Part B, 30(1):198–204, 2000. Ahmed A. Saad. Simulation and analysis of wing rock physics for a generic fighter model with three degrees of freedom. PhD thesis, Air Force Institute of Technology, Air University, Wright-Patterson Air Force Base, Dayton, Ohio, 2000. B.M. Sarwar, G.Karypis, J.A. Konstan, and J. Reidl. Incremental singular value decomposition algorithms for highly scalable recommender systems. In Int. Conf. on Computer and Information Technology, 2002. S. N. Singh, W. Yim, and W. R. Wells. Direct adaptive control of wing rock motion of slender delta wings. Journal of Guidance Control and Dynamics, 18(1):25– 30, Feb. 1995. K. Slavakis, P. Bouboulis, and S. Theodoridis. Adaptive multiregression in reproducing kernel hilbert spaces: The multiaccess mimo channel case. IEEE Trans. Neural Netw. Learning Syst., 23(2):260–276, 2012. N. Sundararajan, P. Saratchandran, and L. Yan. Fully Tuned Radial Basis Function Neural Networks for Flight Control. Springer, 2002. S. Suresh, S. Narasimhan, and S. Nagarajaiah. Direct adaptive neural controller for the active control of earthquake-excited nonlinear base-isolated buildings. to appear Structural Control and Health Monitoring, 2011. J.A.K. Suykens, J.P.L. Vandewalle, and B.L.R. De Moor. Artificial Neural Networks for Modeling and Control of Non-Linear Systems. Kluwer, Norwell, 1996. G. Tao. Adaptive Control Design and Analysis. Wiley, New York, 2003. K.Y. Volyanskyy, W.M. Haddad, and A.J. Calise. A new neuroadaptive control architecture for nonlinear uncertain dynamical systems: Beyond σ and e-modifications. IEEE Transactions on Neural Networks, 20(11):1707–1723, Nov 2009. M-G. Zhang and W-H. Liu. Single neuron PID model reference adaptive control based on RBF neural networks. In 2006 International Conference on Machine Learning and Cybernetics, pages 3021–3025, 2006.