Global Convergence Rate of Recurrently ... - Semantic Scholar

1 downloads 0 Views 141KB Size Report
iD1. |vi (t)|2. C k n. X. iD1. Pi. Z vi (t). 0 gi (j ) dj. (2.7). Direct differentiating leads to. PL(k, t) D ¡DvT (t)v(t) C vT (t)Ag¤ (t). ¡ kfg¤ (t)gTPDv(t) C kfg¤ (t)gTPAfg¤ (t)g.
LETTER

Communicated by Morris Hirsch

Global Convergence Rate of Recurrently Connected Neural Networks Tianping Chen [email protected] Wenlian Lu [email protected] Laboratory of Nonlinear Mathematics Science, Institute of Mathematics, Fudan University, Shanghai, China Shun-ichi Amari [email protected] RIKEN Brain Science Institute, Wako-shi, Saitama 351-01, Japan We discuss recurrently connected neural networks, investigating their global exponential stability (GES). Some sufŽcient conditions for a class of recurrent neural networks belonging to GES are given. Sharp convergence rate is given too. 1 Introduction

Recurrently connected neural networks, sometimes called GrossbergHopŽeld neural networks, have been extensively studied, and many applications in different areas have been found. Such applications heavily depend on the dynamical behavior of the networks. Therefore, analysis of these behaviors is a necessary step for practical design of neural networks. The dynamical behaviors of recurrently connected neural networks have been studied since the early period of research on neural networks. For example, multistable and oscillatory behaviors were studied by Amari (1971, 1972) and Wilson and Cowan (1972). Chaotic behaviors were studied by Sompolinsky and Crisanti (1988). HopŽeld and Tank (1984, 1986) studied the dynamic stability of symmetrically connected networks and showed their practical applicability to optimization problems. Cohen and Grossberg (1983) gave more rigorous results on the global stability of networks. Other work studied the local and global stability of asymmetrically connected networks: Chen and Amari (2001a, 2001b), Cohen and Grossberg (1983), Fang and Kincaid (1996), Hirsch (1989), Kaszkurewicz and Bhaya (1994), Kelly (1990), Matsuoka (1992), Sompolinsky and Crisanti (1988), Wilson and Cowan (1972), Yang and Dillon (1994), and others. Forti, Maneti, and Marini (1994) and Forti and Tesi (1995) proposed three main features a neural network should posesses. Among them, the important one is that c 2002 Massachusetts Institute of Technology Neural Computation 14, 2947– 2957 (2002) °

2948

Tianping Chen, Wenlian Lu, and Shun-ichi Amari

a neural network should be absolutely stable. They also pointed out that the negative semideŽniteness of the interconnection matrix guarantees the global stability of a Grossberg-HopŽeld network and proved the absolute global stability for a class G of activation functions. However, they did not discuss the absolute exponential stability and give the convergence rate. Liang and Wu (1999) discussed the absolute exponential stability; however, they assumed that the connection matrix is a special structure, that is, the offdiagonal entries are nonnegative and the activation functions are bounded. Even with these restrictions, the attraction region is limited. Zhang, Heng, and Fu (1999), Liang and Wang (2000), and Liang and Si (2001) discussed the exponential stability of Grossberg-HopŽeld neural networks and estimated the convergence rate. However, these estimates are less sharp. Our goal here is to provide a more accurate estimate of the convergence rate. 2 Some DeŽnitions and Main Results

Class GN of functions: Let G D diag[G1 , . . . , Gn ], where Gi > 0, N if the functions i D 1, . . . , n. g(x) D (g1 (x), . . . , gn (x)) T is said to belong to G, gi (x C u)¡gi (x) gi (x), i D 1, . . . , n, satisfy 0 < · Gi . u DeŽnition 1.

DeŽnition 2.

M.

Ms D

1 (M C 2

MT ) is deŽned as the symmetric part of the matrix

Here, we investigate following dynamical system X dui (t) aij gj (uj (t)) C Ii D ¡di ui (t) C dt j

i D 1, . . . , n

(2.1)

N where di > 0, g(x) D (g1 (x), . . . , gn (x))T 2 G. Recurrently connected neural network 2.1 is said to be absolutely exponentially stable if any solution of it converges to an equilibrium point expoN nentially for any activation functions in G. DeŽnition 3.

We will prove two theorems on the globally exponential convergence Let In be the identity matrix of order n, D D diag[d1 , . . . , dn ], A D (aij ) ni, jD1 , P D diag[p1 , . . . , pn ], where pi > 0, i D 1, . . . , n, g(x) D (g1 (x), . . . , gn (x)) T 2 GN with gi (x) being differentiable. If there is a positive constant a > 0 such that Theorem 1.

a < di

i D 1, . . . , n

(2.2)

Global Convergence Rate of Recurrently Connected Neural Networks

2949

and fP[(D ¡ aIn )G ¡1 ¡ A]gs is positive deŽnite, then the dynamical system 2.1 has a unique equilibrium point u¤ , such that |ui (t) ¡ u¤i | D O(e ¡at )

i D 1, . . . , n.

(2.3)

Pick a small 2 > 0 such that di > a C 2 and the least eigenvalue lmin of fP[(D ¡ bIn )G¡1 ¡ A]gs satisŽes Proof.

lmin D lmin fP[ (D ¡ bIn )G ¡1 ¡ A]gs > 0,

(2.4)

where b D a C 2 . It is well known that under the conditions given in theorem 1, the dynamical system 2.1 has an equilibrium point u¤ D [u¤1 , . . . , u¤n ]T . DeŽne vi (t) D ui (t) ¡ u¤i , and gj¤ (s) D gj (s C uj¤ ) ¡ gj (uj¤ ).

(2.5)

Then we have X dvi (t) aij gj¤ (t) D ¡di vi (t) C dt j

i D 1, . . . , n.

(2.6)

The new variables vi put the unique equilibrium at zero. Let v(t) D [v1 (t), . . . , vn (t)]T , g¤ (t) D diag[g¤1 (u1 (t)), . . . , g¤n (un (t))], and deŽne the Lyapunov function, which was used in Forti and Tesi (1995), L(k, t) D

Z vi (t) n n X 1X |vi (t)| 2 C k Pi gi (j ) dj. 2 i D1 0 iD 1

(2.7)

Direct differentiating leads to P t) D ¡DvT (t)v(t) C vT (t)Ag¤ (t) L(k,

¡ kfg¤ (t)gT PDv(t) C kfg¤ (t)gT PAfg¤ (t)g T

T

T

(2.8) ¤

D ¡bv (t)v(t) ¡ v (t) (D ¡ bIn )v(t) C v (t)Ag (t)

¡ kbfg¤ (t)gT Pv(t) ¡ kfg¤ (t)gT P(D ¡ bIn )v(t) C kfg¤ (t)gT PAfg¤ (t)g T

T

T

(2.9)

¤

· ¡bv (t)v(t) ¡ v (t) (D ¡ bIn )v(t) C v (t)Ag (t)

¡ kbfg¤ (t)gT Pv(t) ¡ kfg¤ (t)gT

£ fP[(D ¡ bIn )G ¡1 ¡ A)]gs g¤ (t)

(2.10)

2950

Tianping Chen, Wenlian Lu, and Shun-ichi Amari

» ¼ 1 1/ 2 ¡1/ 2 ¤ (D ) (D ) (t) ¡ v(t) ¡ ¡ ¡ Ag D bIn bIn 2 » ¼ 1 £ v(t) (D ¡ bIn ) 1/ 2 ¡ (D ¡ bIn ) ¡1/ 2 Ag¤ (t) 2 C

T

1 ¤ fg (t)gT AT (D ¡ bIn ) ¡1 Afg¤ (t)g 4

¡ kfg¤ (t)gT fP[(D ¡ bIn )G ¡1 ¡ A]gs fg¤ (t)g ¡ bIn vT (t)v(t) ¡ kbfg¤ (t)gT Pv(t)

·

(2.11)

1 ¤ fg (t)gT AT (D ¡ bIn ) ¡1 Afg¤ (t)g 4 ¡ kfg¤ (t)gT fP[(D ¡ bIn )G ¡1 ¡ A]gs fg¤ (t)g

¡ bvT (t)v(t) ¡ kbfg¤ (t)gT Pv(t) » ¤ T kfP[(D ¡ bIn )G¡1 ¡ A]gs D ¡fg (t)g ¡

(2.12)

¼ 1 T A (D ¡ bIn ) ¡1 A fg¤ (t)g 4

¡ bvT (t)v(t) ¡ kbfg¤ (t)gT Pv(t).

(2.13)

Pick a sufŽciently large positive k such that the least eigenvalue » lmin D lmin k(P[(D ¡ bIn )G¡1 ¡ A]) s 1 ¡ AT (D ¡ bIn ) ¡1 A 4

¼

> 0.

(2.14)

Then we have P t) · bvT (t)v(t) ¡ kbfg¤ (t)gT Pv(t) ¡ lmin fg¤ (t)gT fg¤ (t)g. L(k,

(2.15)

To obtain the desired convergence rate, we use following equalities: g¤i (ji ) D g0i (u¤i )ji C o(|ji | )

(2.16)

g¤i (vi (t))vi (t) D g0i (u¤i )v2i (t) C o(|vi (t)| 2 )

(2.17)

Z 2

vi (t) 0

g¤i (j ) dj D g0 (u¤i )v2i (t) C o(|vi (t)| 2 ).

(2.18)

Global Convergence Rate of Recurrently Connected Neural Networks

2951

Combining previous equalities, we have Z g¤i (vi (t))vi (t)

D2

vi (t) 0

g¤i (j ) dj C o(|vi (t)| 2 ).

(2.19)

Substituting into equation 2.15, we see that P t) · bvT (t)v(t) ¡ kbfg¤ (t)gT Pv(t) ¡ lmin fg¤ (t)gT fg¤ (t)g L(k, Z vi (t) n n X X 2 |v (t)| ¡b ¡ P g¤i (j ) dj · 2kb i i i D1

0

iD 1

Á ! n n X X ¤ 2 2 2 0 |vi (t)| ¡ lmin fgi (ui )g |vi (t)| C o · ¡a

n X iD 1

iD 1

|vi (t)| 2 ¡ 2ka

n X

i D1

Z Pi

iD 1

vi (t) 0

g¤i (j ) dj D ¡2aL(k, t) (2.20)

holds for sufŽciently large t. Therefore, n 1X |vi (t)| 2 · L(k, t) · L(k, 0)e ¡2at 2 iD 1

(2.21)

vi (t) D O(e ¡at )

(2.22)

and i D 1, . . . , n.

Theorem 1 is proved. The conditions that the derivatives gi (x) exist imposed on the activations gi (x) in theorem 1 are a little stronger than in the existing literature. However, in practice, they are not very restrictive because the monotonically increasing function is differentiable almost everywhere. Remark 1.

The assumption gi (x) i D 1, . . . , N is differentiable can be relaxed to the right and left derivatives of gi (x) i D 1, . . . , N. This relaxation is signiŽcant. For example, monotonically increasing piecewise linear functions satisfy this condition. Remark 2.

Let g0§ (x) denote the right and left derivatives, respectively. Under the following assumption that for any 2 > 0, there is d > 0, such that Remark 3.

­ ­ ­ ­ gi (x § h) ¡ gi (x) 0 ­ ¡ g § (x) ­ 0, i D 1, . . . , N be some constants. Then under any of the following three inequalities, 8 9 N < = X djjj ¡ Gj jj Ajj C ji |Aij | > ajj , j D 1, . . . , N (2.25) : ; 6 Corollary 1.

i D 1,i D j

ji [di ¡ Gi Aii ] ¡

ji [di ¡ Gi Aii ] ¡

N X

jj Gj |Aij | > aji ,

6 i j D 1, j D

i D 1, . . . , N

(2.26)

N X 1 fji Gi |Aij | C jj Gj |Aji |g > aji , 2 6 i j D 1j D

i D 1, . . . , N,

(2.27)

there is a unique equilibrium u¤ D [u¤1 , . . . , u¤n ]T 2 RN , such that for any solution u(t) of the differential equation 2.1, there holds lim fu(t) ¡ u¤ g D O(e¡at ).

t!1

(2.28)

In fact, under either one of previous three conditions, the matrix (D ¡ aIn ) ¡ MG is an M-matrix (see Berman & Plemmons, 1979), where M D (mij ) N i, j D 1 with entries (

mii mij

D D

Aii |Aij |

6 iD j.

(2.29)

By a property of M-matrix, (D ¡aIn )G¡1 ¡M is also an M-matrix. Therefore, there exists P D diag[p1 , . . . , pn ], with pi > 0, i D 1, . . . , n such that pi [di ¡ a]Gi¡1 ¡

N 1 X fpi |Aij | C pj |Aji |g > 0 i D 1, . . . , N. 2 jD 1, jD6 i

(2.30)

By Gershgorin’s theorem (see Golub and Van Loan, 1990), all the eigenvalues of the matrix fP[(D ¡ aIn )G ¡1 ¡ M]gs are positive. Therefore, fP[ (D ¡ aIn )G¡1 ¡ A]gs is positive deŽnite. This corollary is a direct consequence of theorem 1.

Global Convergence Rate of Recurrently Connected Neural Networks

2953

The results in corollary 1 are generalizations of those given in Chen and Amari (2001a). Moreover, it gives a sharper convergence rate. Remark 4.

If the activation functions are sigmoidal functions gi (x) D tanh (Gi x), we have the following stronger results, and the proof is much simpler. Let A D (aij ) ni, jD1 , P D diag[p1 , . . . , pn ], where pi > 0, i D 1, . . . , n, gi (x) D tanh (Gi x). If there is a positive constant a > 0 such that a · di for i D 1, . . . , N and the least eigenvalue lmin D lmin fP[(D¡aIn )G¡1 ¡A]gs > 0. Then the dynamical system 2.1 has a unique equilibrium point u¤ , such that Theorem 2.

|ui (t) ¡ u¤i | D O(e ¡(a Cd¡2 )t )

i D 1, . . . , n

(2.31)

where » d D lmin min

i D 1,¢,n

¼ g0i (u¤i ) , Pi

(2.32)

and 2 > 0 is any Žxed small number. Proof.

Using the same notations as in the proof of the theorem 1, deŽne

L1 (t) D

n X

Z Pi

vi (t) 0

i D1

g¤i (j ) dj

(2.33)

and L2 (t) D

n 1X |vi (t)| 2 . 2 iD 1

(2.34)

Under the assumptions, vi (x) i D 1, . . . , n are bounded. Therefore, there is a constant C such that Gi ¸ g0i (ui (t)) > C for all t > 0 and i D 1, . . . , n. Thus, 1 C|vi (t)| 2 · 2

Z

vi (t) 0

g¤i (j ) dj ·

1 Gi |vi (t)| 2 . 2

(2.35)

Then we can Žnd two positive constants C1 and C2 such that C1 L1 (t) · L2 (t) · C2 L1 (t).

(2.36)

Using equation 2.19 and direct differentiating leads to LP 1 (t) D ¡fg¤ (t)gT PDv(t) C fg¤ (t)gT PAfg¤ (t)g ¡ afg¤ (t)gT Pv(t) ¡ fg¤ (t)gT P(D ¡ aIn )v(t) C fg¤ (t)gT PAfg¤ (t)g

(2.37)

2954

Tianping Chen, Wenlian Lu, and Shun-ichi Amari · afg¤ (t)gT Pv(t) ¡ fg¤ (t)gT fP[(D ¡ aIn )G¡1 ¡ A)]gs g¤ (t) T

¤

T

¤

¤

· ¡afg (t)g Pv(t) ¡ lmin fg (t)g fg (t)g · ¡2a

n X

Z

Pi

vi (t)

0

i D1

(2.39)

g¤i (j ) dj

Á ! n n X X ¤ 2 2 2 |vi (t)| ¡ lmin fg0i (ui )g |vi (t)| C o iD 1

· ¡2a

n X i D1

¡ 2lmin ³

(2.40)

i D1

Z

Pi

(2.38)

vi (t) 0

n X i D1

g¤i (j ) dj Z

gi (u¤i ) 0

vi (t) 0

»

· ¡2 a C lmin min

i D1,¢,n

Á g¤i (j ) dj C

g0i (u¤i ) Pi

o

¼

´ C o(1)

n X

! |vi (t)|

2

(2.41)

iD 1

L1 (t)

(2.42)

· ¡2(a C d ¡ 2 )L1 (t)

(2.43)

holds for sufŽciently large t. Therefore, n 1X |vi (t)| 2 D L2 (t) · C2 L1 (t) · C2 L1 (0)e ¡2(aCd¡2 2 iD 1

)t

(2.44)

and vi (t) D O(e¡(aCd¡2 )t )

i D 1, . . . , n,

(2.45)

which proves theorem 2. Under the assumptions that the activation functions gi satisfy g 2 GN and g0i (x) > C, i D 1, . . . , n in any Žnite interval for some constant C. The conclusions in theorem 2 are valid too. Remark 5.

Liang and Si (2001) claim that the lower bound mini di is attainable for a special degenerate system (linear). Theorem 2 shows that for a wide class of activation functions, the convergence rate of the nonlinear system, equation 2.1, can be even better than the lower bound mini di given in Liang and Si (2001). Remark 6.

3 Comparisons

In this section, we compare our results with other work that most closely relates to ours. In this other work, the activation functions are assumed to g (x C h)¡gi (x) satisfy 0 · i · Gi . h

Global Convergence Rate of Recurrently Connected Neural Networks

2955

Zhang et al. (1999) estimated convergence rate. The main result is as follows: If there exists 0 < a < min1 ·i·n < di such that the matrix lmax f¡[(D ¡ aIn )G¡1 ¡ A]gs · 0,

(3.1)

then dynamical system 2.1 has a unique equilibrium point u¤ , such that a

|ui (t) ¡ u¤i | D O(e ¡ 2 t )

i D 1, . . . , n.

(3.2)

Instead, in theorem 2, condition 0 < a < min1 ·i·n < di can be relaxed to a · di , and the convergence rate is |ui (t) ¡ u¤i | D O(e ¡(a Cd¡2 )t )

i D 1, . . . , n.

(3.3)

Liang and Wu (1999) and Liang and Wang (2000) addressed absolute exponential stability under various assumptions. In most of these articles, the convergence rate is O(e¡at / 2 ). Recently Liang and Si (2001) discussed the lower bound of the convergence rate and showed that in some case (when all gj (x) D 0), the lower bound a D mini di is attainable. However, in this case, the system is degenerate and has no great interest. Theorem 2 shows that if the activation functions are sigmoidal functions, then under the condition lmin f[(D ¡ min di In )G¡1 ¡ A]gs > 0, i

(3.4)

the convergence rate is |ui (t) ¡ u¤i | D O(e ¡(mini di Cd¡2 )t )

i D 1, . . . , n,

(3.5)

which is even better than O(e¡fmini di gt ). 4 Conclusion

We investigated the absolute exponential stability of a class of recurrent connected neural networks. The convergence rate given here is much sharper than that in the existing literature. Acknowledgments

This project is supported by the National Science Foundation of China 69982003 and 60074005. This work was completed when T. Chen visited the RIKEN Brain Science Institute in Japan.

2956

Tianping Chen, Wenlian Lu, and Shun-ichi Amari

References Amari, S. (1971). Characteristics of randomly connected threshold element networks and neural systems. Proc. IEEE, 59, 35–47. Amari, S. (1972). Characteristics of random nets of analog neuron-like elements. IEEE Trans., SMC-2, 643–657. Berman, A., & Plemmons, R. J. (1979). Nonnegative matrices in the mathematical sciences. New York: Academic Press. Chen, T., & Amari, S. (2001a). Stability of asymmetric HopŽeld networks. IEEE Transactions on Neural Networks, 12(1), 159–163. Chen, T., & Amari, S. (2001b). New theorems on global convergence of some dynamical systems. Neural Networks, 14(3), 251–255. Cohen, M. A., & Grossberg, S. (1983). Absolute stability of global pattern formation and parallel memory storage by competitive neural networks. IEEE Trans. SMC-13, 815–826. Fang, Y., & Kincaid, T. G. (1996). Stability analysis of dynamical neural networks. IEEE Trans. Neural Networks, 7(4), 996–1006. Forti, M., Maneti, S., & Marini, M. (1994). Necessary and sufŽcient conditions for absolute stability of neural networks. IEEE Trans. Circuits Syst.-I: Fundamental Theory Appl., 41(7), 491–494. Forti, M., & Tesi, A. (1995).New conditions for global stability of neural networks with application to linear and quadratic programming problems. IEEE Transactions on Circuits and Systems-I: Fundamental Theory and Applications, 42(7), 354–366. Golub, G. H., & Van Loan, C. F. (1990). Matrix computation (2nd ed.). Baltimore, MD: Johns Hopkins University Press. Hirsch, M. (1989).Convergent activation dynamics in continuous time networks. Neural Networks, 2, 331–349. HopŽeld J. J., & Tank, D. W. (1984). Neurons with graded response have collective computational properties like those of two-state neurons. Proc. Nat. Acad. Sci., 79, 3088–3092. HopŽeld J. J., & Tank, D. W. (1986). Computing with neural circuits: A model. Science, 233, 625–633. Kaszkurewicz, E., & Bhaya, A. (1994). On a class of globally stable neural circuits. IEEE Trans. Circuits Syst.-I: Fundamental Theory Appl., 41(2), 171–174. Kelly, D. G. (1990).Stability in contractive nonlinear neural networks. IEEE Trans. Biomed. Eng., 3(3), 231–242. Liang, X., & Si, J. (2001). Global exponential stability of neural networks with globally Lipschitz continuous activations and its application to linear variational inequality problem. IEEE Transactions on Neural Networks, 12(2), 349– 359. Liang, X., & Wang, J. (2000). Absolute exponential stability of neural networks with a general class of activation functions. IEEE Transactions on Circuits and Systems—I, Fundamental Theory and Applications, 47(8), 1258–1263. Liang, X., & Wu, L. (1999). Global exponential stability of a class of neural circuits. IEEE Transactions on Circuits and Systems—I, Fundamental Theory and Applications, 46(6), 748–751.

Global Convergence Rate of Recurrently Connected Neural Networks

2957

Matsuoka, K. (1992). Stability conditions for nonlinear continuous neural networks with asymmetric connection weights. Neural Networks, 5, 495–500. Sompolinsky, H., & Crisanti, A. (1988). Chaos in random neural networks. Physical Review Letter, 61(3), 259–262. Wilson, H., & Cowan, A. (1972). Excitatory and inhibitory interactions in localized populations of model neurons. Biophysical J., 12, 1–24. Yang, H., & Dillon, T. (1994). Exponential stability and oscillation of HopŽeld graded response neural network. IEEE Trans. Neural Networks, 5(5), 719–729. Zhang, Yi, P. A., Heng, P., & Fu, A. W. C. (1999). Estimate of exponential convergence rate and exponential stability for neural networks. IEEE Trans. Neural Networks, 10(6), 1487–1493. Received September 27, 2001; accepted May 24, 2002.