Exploiting the functional training approach in Radial Basis Function

0 downloads 0 Views 3MB Size Report
function approximation, serving as an excellent tool for RBF networks training. 1keywords: Radial Basis Neural networks training, local nonlinear optimization ...
Exploiting the functional training approach in Radial Basis Function Networks Cristiano L. Cabrita2, António E Ruano1,2, Pedro M. Ferreira1,3 1

Centre for Intelligent Systems, IDMEC, IST 2 University of Algarve, 8005-139 Faro 3 Algarve STP – Algarve Science & Technology Park, 8005-139 Faro, Portugal (e-mail: [email protected], [email protected], [email protected])

Abstract - This paper investigates the application of a novel approach for the parameter estimation of a Radial Basis Function (RBF) network model. The new concept (denoted as functional training) minimizes the integral of the analytical error between the process output and the model output [1]. In this paper, the analytical expressions needed to use this approach are introduced, both for the back-propagation and the LevenbergMarquardt algorithms. The results show that the proposed methodology outperforms the standard methods in terms of function approximation, serving as an excellent tool for RBF networks training. 1

keywords: Radial Basis Neural networks training, local nonlinear optimization, parameter separability, functional backpropagation

I.

INTRODUCTION

This paper must be seen as a follow-up of [1], where the basic methodology concerning the functional training is presented. As such, the same notation will be used here, and we shall make references to concepts and equations introduced therein. The layout of the present paper is as follows. A brief overview of RBFs is introduced in Section II. Section III describes how the analytical terms, needed to implement the functional training, can be computed for RBFs. Section IV shows an example, for a very simple uni-dimensional case. The paper ends with conclusions and future work. II.

OVERVIEW OF RBFS

Radial basis function networks were firstly introduced in the context of neural networks by Broomhead and Lowe [2]. The radial basis function network (RBF) is a feed-forward neural network, composed of three fully connected layers. The first is the input layer, which connects the source nodes to a set of M nodes in the hidden layer. The response of the network is given by the output layer, which is a linear combination of the neurons in the hidden layer, therefore belonging to the set of

models whose parameters can be linearly-nonlinearly separated. The most common basis function employed is the Gaussian function

ϕi (x, ci , vi ) = e



|| x − ci ||22 2 vi

Where vi is the variance associated with the basis function of the ith neuron. The output of a RBF neural network is defined as y ( x) =

M +1

∑ u φ (x, c , v ) = φ ( x, C, υ ) u = φ ( x, v ) u i =1

T

i

i

i

As a bias term is usually employed in RBFs, we have φ ( x, v ) = φ ( x, C, υ ) = ϕ1 ( x, c1 , v1 ) ... ϕ M ( x, c M , vM ) 1 

T

(3)

i.e., nu = M + 1 . With discretized input data, a compact form for eq. (2) is y ( X, v, u ) = Γ ( X, v ) u

III.

(4)

COMPUTATION OF TERMS

For the functional version (continuous data case), we have shown in [1] that, to implement the BP algorithm, we would need to have expressions (38-39) available. Furthermore, to implement the LM algorithm using the Jacobian form (32) we would need (44). Otherwise, if Jacobian (27) is employed, then expressions (46-47) are also required. These basic expressions are repeated here:  x MAX  T  ∫ φφ dx   x min  ∂

−1

x MAX



x min

(5)

x MAX



φφT dx

2∂vT

1

978-1-4577-1402-3/11/$26.00 ©2011 IEEE

(2)

T

i

x min

This work was supported in part by Fundação para a Ciência e Tecnologia under the Portuguese financial support program PROTEC SFRH/BD/49438/2009. The third author thanks the European commission for the funding through grant PERG-GA-2008-239451.

(1)

∂φ ∂φT dx ∂vT ∂vT

(6) (7)

x MAX



x min

∂φ T φ dx ∂vT



(8)

x MAX



As all other terms involve the function to approximate, they cannot be expressed analytically. In order to satisfy the paper length limit, only a brief explanation on the computation of some of the terms above can be given.

−1

(9)

(

∂ erf ( f ( x ) ) ∂x

(10)

φ i φ j dx

φi φ j dx

(15)

∂vTk

) = f ′( x )

2

π

e

− f 2 ( x)



Computation of (10) considers three different situations: - if i, j ≠ M + 1 :

x MAX



ϕi ( x, v i ) ϕ j ( x, v j ) dx

x min

=

∂vi

  MAX  −  = ∫ ϕi ( x, v i ) ϕ j ( x, v j ) dx ×   x min   2  L ci − c j 2 nv j  x v +v −c v −c v    + + j) i1 j j1 i  2  1min ( i   2v v + v j ) 2 ( vi + v j )     i ( i 2vi v j ( vi + v j ) 2      ( xk ( vi +v j )−cik v j −c jk vi ) − min   v v v v 2 +  x i j( i j)         xkmin ( vi + v j ) − cik ( 2vi + v j ) + c jk vi e  erf  kMAX ( vi + v j ) − cik v j − c jk vi  −  erf  xnMAX ( vi + v j ) − cin v j − c jn vi  −   v 2j       2        2vi v j ( vi + v j ) 2vi v j ( vi + v j ) ( xkMAX ( vi +v j )−cik v j −c jk vi )          −  v v v + v 2 L L ( ) i j i j     − xkMAX ( vi + v j ) − cik ( 2vi + v j ) + c jk vi e     xk ( vi + v j ) − cik v j − c jk vi     xn ( vi + v j ) − cin v j − c jn vi   × min min     3/2  erf   erf   n vi v j ( vi + v j ) 2π         2vi v j ( vi + v j ) 2vi v j ( vi + v j ) +∑          x    k =1 erf  kMAX ( vi + v j ) − cik v j − c jk vi  −  (11)       2vi v j ( vi + v j )      In the last equation, erf(.) represents the error function.       - if ( i = M + 1) ∨ ( j = M + 1) : xkmin ( vi + v j ) − cik v j − c jk vi       erf    x MAX   2vi v j ( vi + v j )    x1 − ci1   x1min − ci1     v π    i MAX n d erf erf ϕ x, v x= L −       ( ) ∫ i i    2   v v 2 2 (17) x min i i      x

x  1MAX ( vi + v j ) − ci1v j − c j1vi  2vi v j ( vi + v j ) 

(

)

(

 L erf 

 xkMAX − cik   2vi 

  − erf  

 xkmin − cik   2vi 

    L erf    

 xnMAX − cin    − erf  2vi  

 xnmin − cin      2vi   

(12)



x MAX



(

x MAX



∂cik

) (

) (

dx= x1MAX − x1min ... xkMAX − xkmin ... xnMAX − xnmin

)

(13)

x min

B. Derivatives of the Integral of the basis functions matrix

Denoting Equation (6) as the derivative of a matrix, i.e.:

(

ϕi ( x, vi )ϕ j ( x, v j ) dx

x min

- if i = j = M + 1 :

(16)

- if i, j ≠ M + 1 :

x min

  erf 2 ci −c j  2 x MAX π vi v j − 2( vi +v j )   ∫ ϕi ( x, vi ) ϕ j ( x, v j ) dx = n 2 ( vi + v j ) e x min   erf  



x min

To compute (15) we will need the derivatives concerning the terms in Eqs. (11) and (12). Using the fact that

We define Φ (with dimensions nu*nu) as the integral of the basis functions along the n-dimensional input domain. This is a symmetric matrix with its [i,j] element defined as: x MAX

(14)

x MAX

Φ ijk ' = ∂

 x MAX  T −1  ∫ φφ dx  = Φ  x min 

1 ∂Φ 1 = Φ' 2 ∂vT 2

Where Φ ' is an array of dimensions nu*nu*nv. Its [i,j,k] element is then

Eq. (5) can be seen as



=

2∂vT

A. Integral of the basis functions matrix

Φ ij =

φφT dx

x min

 ( cik − c jk ) − +  vi + v j   2 vj × π + ×  v v (v + v ) i j i j  erf  

=

x MAX



)

)

                        

ϕi ( x, vi ) ϕ j ( x, v j ) dx ×

x min

x  kMAX  

    2 2 xk min ( vi + v j )−cik v j −c jk vi ) xkMAX ( vi + v j )−cik v j − c jk vi ) ( (  − − 2 vi v j ( vi + v j ) 2 vi v j ( vi + v j )  e −e     ( vi + v j ) − cik v j − c jk vi  − erf  xkmin ( vi + v j ) − cik v j − c jk vi       2vi v j ( vi + v j ) 2vi v j ( vi + v j )    

(18)



x MAX



=

∂c jk =

x MAX



v i ∈ {cik , vi } , v j ∈ {c jk , v j } , k = 1K n

ϕi ( x, vi )ϕ j ( x, v j ) dx

x min

This means that, to compute (23), we must determine:

ϕi ( x, v i ) ϕ j ( x, v j ) dx ×

 x MAX ∂ϕ ( x, v ) ∂ϕ j ( x, v j ) i i  ∫ dx . ∂cik ∂c jm  x min x  MAX ∂ϕi ( x, v i ) ∂ϕ j ( x, v j ) dx .  ∫ ∂cik ∂v j  x min  x MAX ∂ϕi ( x, v i ) ∂ϕ j ( x, v j )  dx .  ∫ ∂vi ∂v j  x min

x min

 ( cik − c jk )  +  vi + v j  2 2 ( xkmin ( vi +v j )−cik v j −c jk vi ) ( xkMAX ( vi +v j )−cik v j −c jk vi )  2 − − vi 2 vi v j ( vi + v j ) 2 vi v j ( vi + v j ) × e −e π +  v v (v + v )  x  v + v − c v − c v ( ) i j i j k i j ik j jk i MAX  − erf  xkmin ( vi + v j ) − cik v j − c jk vi  erf      2vi v j ( vi + v j ) 2vi v j ( vi + v j )    

            

(19)





x min

2 2  ( xkMAX − cik )    ( xkmin − cik ) −  2  − 2vi  2 vi −e  πv e  ϕi ( x, v i ) dx x i   MAX    = ∫ ϕi ( x, v i ) dx ∂cik   xk − cik    x c − x min k ik erf  MAX  − erf  min     2vi  2vi     

∂ϕi ( x, v i ) xk − cik = .ϕi ( x, v i ) vi ∂cik n (x − c ) ∂ϕi ( x, v i ) j ij .ϕ i ( x, v i ) =∑ 2 v 2 ∂vi j =1 i 2





x min

=

∂vi    1  n +  2vi vi3/2 2π   



x MAX

x MAX







=

ϕi ( x, v i ) dx ×

x min

 2 2 ( x j −cij ) ( x j −cij )  − min − MAX 2 vi 2 vi n  x − x jMAX − cij e  jmin − cij e  ∑  x   x − c jk   j =1  − c jk erf  jMAX  − erf  jmin        2v j  2v j      

(

)

(

)

 (21)       

x MAX



= 0, ∀k , β = {v j , c jk } , i ≠ j ∂β - if i = j = M + 1 :

∫∑

x min k =1

x min

x min

=

dx = 0, ∀k , ∀i, β = { v i , cik }

x min

∂β

(22)

To solve Eq. (7) we denote



x min

∂φ ∂φT dx = Φ '' ∂vT ∂vT

( xk − cik )

2

2vi 2

n



(x

m

dx =

− c jm )

2

2v j 2

m =1

x MAX



  xk − cik  vi 

n



m =1

(x

m

(27)

(28) ϕi ( x, v i )ϕ j ( x, v j )dx

2 − c jm )   .ϕ ( x, v )ϕ ( x, v )dx i j j  i 2v j 2 

(29)

The computation of the integrals in equations (27)-(29) relies on two approaches, respectively if k=m and k≠m (due to lack of space we cannot provide the final equations). D. Integral of the product of derivatives and basis functions

C. Integral of the product of the derivatives

x MAX

∂v j

∂ϕi ( x, vi ) ∂ϕ j ( x, v j ) dx = ∂cik ∂v j

x min

x MAX



n

dx =

 xk − cik xm − c jm    .ϕi ( x, v i )ϕ j ( x, v j )dx v j   vi

∂vi

x MAX



∂c jm

∂ϕi ( x, vi ) ∂ϕ j ( x, v j )

x min

x MAX

ϕi ( x, v i ) dx



∂cik

x MAX

x min

Obviously, ∂

∂ϕi ( x, v i ) ∂ϕ j ( x, v j )

x MAX

x min

ϕi ( x, v i ) dx

(26)

Basically, replacing (26) into (25) provides the necessary terms to solve (23), i.e.:

(20) x MAX

(25)

To progress with the computation, we compute the derivatives of basis function i wrt the parameters cik and vi:

- if ( i = M + 1) ∨ ( j = M + 1) :

x MAX

(24)

By derivation, we know that: ∂ϕi ( x, v i ) ( xk − cik ) (23)

Here, Φ '' is a sparse matrix, with size nv * nu * nu * nv , where only their [a,i,j,b] elements, Φ ''aijb are non-zero, provided that

ϕ j ( x, v j ) = ϕi ( x, v i ) ϕ j ( x, v j )  vi  ∂cik  2 n (x −c )  ∂ϕi ( x, v i ) ϕ j ( x, v j ) = ∑ k 2ik ϕi ( x, v i ) ϕ j ( x, v j )  ∂vi 2vi k =1 

Hence, we have to compute the following 4 integrals:

(30)

( xk − cik ) ϕ

x kMAX



vi

x k min x kMAX

n

∫ ∑

x k min k =1

i

( x, vi )ϕ j ( x, v j ) dx

(31)

2

(32)

2vi

( xk − cik ) ϕ





Where,

( xm − cim ) 2vi

xm min

( xk − cik ) ϕ x, v ϕ x, v dx i( i) j( j) 2

x kMAX

xmMAX

vi

x k min

i

( x, v i ) dx

(33)

2

2

e



( xm − cim ) 2 2 vi



( xm −c jm )

2

2v j

1 [ A + B] 2vi 2

dxm ==

and c jm vi + cim v j − 2 xm min ( c jm vi +cim v j )+ ( vi + v j ) xm min  vv − 2 vi v j i j  − c jm vi − cim ( 2vi + v j ) + xm min ( vi + v j ) e 2  v +v j) ( i A= c jm2vi + cim 2v j − 2 xmMAX ( c jm vi +cimv j )+( vi + v j ) xmMAX 2 −  vi v j 2 vi v j 2 − + + + c v c v v x v v e ( ) ( ) − jm i im i j mMAX i j 2  ( vi + v j )  2

(

2

2

)

(

)

( cim −c jm )  − vi 3/2 2 v +v  πvj cim 2 vi − 2cim c jm vi + c jm2 vi + v j ( vi + v j ) e ( i j ) × 5/2  2 ( vi + v j )  B= x  ( v + v j ) − cimv j − c jmvi  − erf  xmmin ( vi + v j ) − cim v j − c jm vi  erf  mMAX i       2vi v j ( vi + v j ) 2vi v j ( vi + v j )     2

( xk − cik ) ϕ x, v dx i( i) ∫ ∑ 2v 2 k =1

x kMAX

2

n

(34)

i

x k min

Integral of (31) can be regarded as xkMAX



( xk − cik ) vi

xk min

x MAX

×



e



( xk − cik ) 2 2 vi



( xk − c jk )

2

2v j

dxk ×

2 ( vi + v j )

π vi v j

( cik − c jk ) 2 ( vi + v j )

2

×e

×

xkMAX

(35)

 x   erf  kMAX ( vi + v j ) − cik v j − c jk vi  −      2vi v j ( vi + v j )       x    v v c v c v + − − ( ) k i j ik j jk i    −erf  min     2vi v j ( vi + v j )    



xkMAX

xk min

 xk − cik   vi

 e 

2 vi

( xk − c jk ) 2v j

xkMAX



∑ ∫

k =1 xk min

( xk − cik ) 2vi 2

e

( xk − cik ) 2 vi

2



( xk − c jk )

x MAX

×



π vi 

 xk − cik erf  MAX  2   2vi

( xk − cik ) e−

( xk − cik )2 2 vi

vi

2v j

dxk ×

2 ( vi + v j )

π vi v j

∫ ∑

(cik − c jk ) 2 ( vi + v j )

2

×e

×

ϕi ( x, v i ) ϕ j ( x, v j )dx

x   kMAX ( vi + v j ) − cik v j − c jk vi  − erf   2vi v j ( vi + v j )  

(36)

n

( xk − cik )

x   kmin ( vi + v j ) − cik v j − c jk vi      2vi v j ( vi + v j )   

2

e

2vi 2

xk min k =1

    xkMAX ( vi + v j ) − cik v j − c jk vi  −     2vi v j ( vi + v j )     x    kmin ( vi + v j ) − cik v j − c jk vi      2vi v j ( vi + v j )   

2

  − erf  

 xk − cik  min  2vi 

   

dxk = e



( xk min − cik )2 2 vi

−e



( xkMAX − cik )2 2 vi

x MAX xkMAX

x min

 erf  

i

x min

∫ ϕ ( x, v ) dx

2

Integral of (32) can be seen as −

i

dxk

Lastly, for the calculation of integral (34), we need to compute

dxk =

  erf (cik − c jk )  cik − c jk ) π vi v j − 2 ( vi + v j )  ( e −  3/ 2 2 ( vi + v j )   erf  

2

2 vi

vi

xk min

2

n xkMAX

∫ ϕ ( x, v ) dx

2

Where,

2

2

( xk − cik )

(37)

 − c jk vi + cik v j − 2(c jk vi + cik v j ) xk min + ( vi + v j ) xk min  2 vi v j vj e −  − = 2 2 vi + v j  − c jk vi + cik v j − 2( c jk vi + cik v j ) xkMAX + ( vi + v j ) xkMAX 2  2 vi v j  −e    2



( xk − cik ) e−

xk min

Where, −

               

x MAX

x min

( xk − cik ) 2

)

If we consider the kth dimension apart, the computation of integral of (33) is confined to

ϕi ( x, v i ) ϕ j ( x, v j )dx



(

        



( xk − cik ) 2 2 vi

i

i

x min

dxk

π vi 

 xk − cik erf  MAX 2   2vi

  − erf  

 xkmin − cik   2vi 

(38) Where, xkMAX



xk min

( xk − cik ) 2vi 2

2

e



( xk − cik ) 2 vi

2

dxk =

( cik − xkMAX )  − 2 vi ( Cik − xkMAX ) e 1  2vi  π   xkMAX − cik + vi  erf   2   2vi 

2

IV.

− ( Cik − xk min ) e   − erf  



 xkmin − cik   2vi 

( cik − xk min )2 2 vi

    

 (39) +     

EXAMPLES

This section illustrates the use of the functional approach in a very simple example. We shall approximate an unidimensional function, within the domain x ∈ [ −1,1] . The input data consists of only 6 sample points, given by Gaussian quadrature within the input domain, i.e.:

   

x = [ 0.9325 0.6612 0.2386 −0.2386 −0.6612 −0.9325]

T

(40)

The set of termination criteria, for all cases, is: Ψ [ k − 1] − Ψ [ k ] < β [ k ]

(

v [ k − 1] − v [ k ] 2 < τ 1 + v [ k ] 2

(

g [k ] 2 ≤ τ 1 + Ψ [k ] 3

)

)

(41)

where β [ k ] = τ (1 + Ψ [ k ] )

(42)

and τ is a measure of the desired number of correct digits in the objective function. This is a criterion typically used in unconstrained optimization [3]. In order to be able to produce graphs, only models with 2 nonlinear parameters will be considered. Therefore, the basis function vector is  − ( x −c ) 2v φ ( x, c, v ) =  e   1

2

   

(43)

The nonlinear parameters are vT = [ c v ] and the model has 2 linear parameters. The function to be approximated will be: t ( x ) = 0.5e − x − 0.2e 2



( x −0.2 )2

(44)

0.5

i.e, a RBF network with 2 neurons will be approximated by a RBF network with 1 neuron. The global analytical solution, i.e., using the functional approach and the true function (44), obtaining a Ψˆ a = 4.7 × 10 −5 (the a subscript denotes the analytical value), is uˆ  T pˆ a =   = [ 0.219 0.116 | -0.198 0.314] ˆ v  

(45)

The optimum for the discrete approach and data (40) is uˆ  T pˆ d =   = [ 0.223 0.113 | -0.187 0.309] ˆ v  

These Ψa

pˆ d

values = 5.31 × 10 −5 .

With

the

will

functional

produce approach

(46)

ˆ = 1.58 × 10 −4 Ψ d

and

uˆ  T pˆ f =   = [ 0.215 0.121 | -0.200 0.299] ˆ v   and Ψ a pˆ = 4.76 × 10 −5 .

data

and (40),

Notice that in the case of the Levenberg-Marquardt algorithm, only the performance of the Golub-Pereyra version of the Jacobian matrix is illustrated. A summary of the evaluation criteria for 4 different initial starting points is presented in the tables below. There, Ψf denotes the value of the criterion using the data in (40) for the computation of the integral. In addition, the evolution of the training is presented in figures 1-4, superimposed on the surface of criterion Ψ a . TABLE 1: TRAININGS WITH BP (DISCRETE VERSION)

v[1] N v[N] Ψd[N] Ψa[N] Ψf[N]

[-1 0.8] 1248 [-0.186 0.316] 1.58e-004 5.3e-005 5.0e-005

v[1] N v[N] Ψd[N] Ψa[N] Ψf[N]

[-1 0.8] 3597 [-0.197 0.328] 1.84e-004 4.74e-005 4.60e-005

[0.4 0.4] 349 [0.863 0.036] 5.1e-003 1.65e-003 1.52e-003

[0.3 0.4] 260 [-0.187 0.309] 1.58e-004 5.3e-005 5.05e-005

[0.5 0.8] 1386 [0.863 0.036] 5.1e-003 1.65e-003 1.52e-003

TABLE 2: TRAININGS WITH BP (FUNCTIONAL VERSION)

[0.4 0.4] 442 [0.923 0.107] 5.8e-003 1.31e-003 1.32e-003

[0.3 0.4] 549 [-0.197 0.328] 1.84e-004 4.74e-005 4.60e-005

[0.5 0.8] 2212 [1.00 0.133] 5.8e-003 1.31e-003 1.32e-003

TABLE 3: TRAININGS WITH LM USING GP JACOBIAN (DISCRETE VERSION)

v[1] N v[N] Ψd[N] Ψa[N] Ψf[N]

[-1 0.8] 17 [-0.187 0.309] 1.58e-004 5.30e-005 5.02e-005

[0.4 0.4] 64 [0.807 0.0056] 5.1e-003 1.19e-002 1.52e-003

[0.3 0.4] 15 [-0.187 0.309] 1.58e-004 5.30e-005 5.02e-005

[0.5 0.8] 49 [0.807 0.0054] 5.1e-003 1.32e-002 1.52e-003

TABLE 4: TRAININGS WITH LM USING GP JACOBIAN (FUNCTIONAL VERSION)

v [1] N v[N] Ψd[N] Ψa[N] Ψf[N]

[-1 0.8] 17 [-0.200 0.299] 1.84e-004 4.76e-005 4.44e-005

[0.4 0.4] 22 [0.950 0.116] 5.8e-003 1.31e-003 1.32e-003

[0.3 0.4] 17 [-0.200 0.299] 1.84e-004 4.76e-005 4.44e-005

[0.5 0.8] 28 [0.950 0.116] 5.8e-003 1.31e-003 1.32e-003

Regardless of the version employed (whether functional or discrete), it is obvious that the LM algorithm is more efficient than the BP algorithm. The discrete versions achieve consistently a smaller value of Ψd[N], while the functional versions obtain a smaller value of Ψf[N], as it should be. Essentially, in both algorithms and for all the starting points, the functional version achieves consistently a small value of Ψa[N], i.e., the error obtained when considering the true function, and not the data.

(47)

f

0 0.2 0.03 Criterion

Although the functional approach uses the same data as the discrete version (6 input points), it produces, at the optimum, a better approximation to the function underlying the data. For the back-propagation algorithm, the learning rate is set to one, η = 1 . In both algorithms, the maximum number of iterations is fixed to 4000.

0.4

0.02 0.01

0.6

0 1.5

1

0.8 0.5

0

v -0.5

-1

-1.5

1

c

Fig. 1. 4 different trainings with BP (discrete version)

TABLE 5: TRAININGS WITH LM USING GOLUB-PEREYRA JACOBIAN (DISCRETE VERSION). THE STARTING POINT IS [-1, 0.8] AND INPUT DATA SIZE IS 6.

SNR (dB)

N

(Ψ var ( Ψ

a pd

-3

x 10

0

4

ˆ −Ψ a

)

34 16.9

31 17

6.04e-6

6.14e-6

2.e-5

4.13e-5

8.89e-15

8.18e-13

1.55e-10

7.19e-10

Criterion

TABLE 6: TRAININGS WITH LM USING GOLUB-PEREYRA JACOBIAN (FUNCTIONAL VERSION). THE STARTING POINT IS [-1, 0.8] AND INPUT DATA SIZE IS 6.

0.4 0

SNR (dB)

0.6 1

0.5

0.8

0

v

-0.5

-1

-1.5

1

(Ψ var ( Ψ

Fig. 2. 4 different trainings with BP (functional version)

0.2 0.03

0.01

0.6

0 0.8

1

0.5

0

v -0.5

-1

-1.5

1

c

Fig. 3. 4 different trainings with LM - GP (discrete version)

0 -3

x 10

0.2

4 0.4 2 0.6

0 1.5

1

0.5

0.8 0

v

-0.5

-1

-1.5

a pf

)

ˆ −Ψ a

)

54 18

34 18.8

31 19

5.23e-7

7.27e-7

1.93e-5

3.98e-5

8.84e-16

1.79e-13

2.39e-10

5.85e-10

V.

0.4

0.02

ˆ −Ψ a

74 18

The results in tables 5-6 clearly show that, for all SNRs considered, smaller means and variances are obtained using the functional version. Therefore, the finding that the functional version gives a better approximation to the underlying function, using the same data, applies also for noisy data.

0

1.5

N

a pf

c

Criterion

a pd

)

54 17

0.2 2

1.5

Criterion

ˆ −Ψ a

74 17

1

c

Fig. 4. 4 different trainings with LM - GP (functional version)

To investigate how well the proposed algorithms deal with noise, four different Signal to Noise Ratios (SNR) (noise added to the target values) were considered. For each one of them, a total of 50 runs were executed. We determine the mean and the variance of the error ˆ and Ψ , for each SNR, using either the optimal between Ψ a a parameters assuming the functional LM, and the optimal parameters obtained when the standard (discrete) approach is employed.

CONCLUSIONS

This paper showed how to apply the new concept of the minimization of the integral of the analytical errors in the context of function approximation, for the Radial Basis function network. The work presented the necessary mathematical background on how to apply this functional training, for the most common gradient-based methods. In order to evaluate the performance of the algorithms, a unidimensional function identification problem was depicted. The results show that the functional version obtains a better approximation to the function underlying the data, both for the deterministic case and for a noisy case. In the accompanying paper [1], we have claimed that this new approach brings considerable computational savings for the training. For lack of space, this could not be shown here. Until now, we have considered models with only one input. To be useful, this approach must be extended to multi-input problems. This implies that an efficient way to numerically compute equations (40) and (41) in [1] must be found. Subsequently, this new concept must be applied to other linear-nonlinear models, such as, MLPs, Wavelet networks, Bsplines, Mamdani, and Takagi-Sugeno fuzzy models. VI. [1]

[2] [3]

REFERENCES

Ruano, A.E., Cabrita, C.L., Ferreira, P. M., “Towards a More Analytical Training of Neural Networks and Neuro-Fuzzy Systems”, IEEE Int. Symposium on Intelligent Signal Processing, Malta, 19-21 Sept. 2011, in press. DS. Broomhead and D. Lowe, Multivariable Function interpolation and adaptive networks, Complex systems, vol 2, pp.321-355, 1988. Gill, P. E., Murray, W. and Wright, M. H. Practical Optimization. Academic Press, Inc., 1981.