International Journal of Computers and Applications, Vol. 29, No. 3, 2007

ANSER: ADAPTIVE NEURON ARTIFICIAL NEURAL NETWORK SYSTEM FOR ESTIMATING RAINFALL M. Zhang,∗ S. Xu,∗∗ and J. Fulcher∗∗∗

other (ﬁxed ) function, comprising no free parameters (for instance, it is not possible to adapt Ψ(x) = sin(x) + e−x + 1 +1e−x for diﬀerent approximation problems). • Certain real-world applications are characterized by nonlinear, discontinuous—rather than continuous— describing functions. In such cases, piecewise continuous data simulations are more appropriate.

Abstract We propose a new neural network model, Neuron-Adaptive artiﬁcial neural Network (NAN). A learning algorithm is derived to tune both the neuron activation function free parameters and the connection weights between neurons.

We proceed to prove that a NAN

can approximate any piecewise continuous function to any desired accuracy, and then relate the approximation properties of NAN models to some special mathematical functions. A neuron-Adaptive

Some researchers have turned their attention to the setting of free parameters in the neuron activation function ([8–12]). For example, Chen and Chen [11] adjusted both the gain and the slope of a generalized sigmoid activation function during learning. The resulting network appeared to provide superior curve-ﬁtting ability compared with classical, ﬁxed activation function FNNs; in other words, use of an adaptive sigmoid led to improved data modelling. More recently, Philip et al. [13] have shown that robust adaptive neurons with fewer parameters are more eﬃcient than standard FNNs. In this paper we develop the Neuron-Adaptive Network (NAN) model, which incorporates adjustable parameters and is capable of approximating any continuous or (nonlinear) piecewise continuous function, by way of a series of discontinuous, nonsmooth points, and to any degree of accuracy. Moreover, our method allows for the automatic determination of these points by way of an easy-touse learning algorithm. We illustrate the eﬀectiveness of our new NAN model by way of real-world heavy rainfall estimation derived from satellite images.

artiﬁcial Neural network System for Estimating Rainfall (ANSER), which uses NAN as its basic reasoning network, is described. Empirical results show that the NAN model performs about 1.8% better than artiﬁcial neural network groups, and around 16.4% better than classical artiﬁcial neural networks when using a rainfall estimate experimental database. The empirical results also show that by using the NAN model, ANSER plus can (1) automatically compute rainfall amounts ten times faster; and (2) reduce average errors of rainfall estimates for the total precipitation event to less than 10%.

Key Words Adaptive neuron, NANN system, estimating, rainfall

1. Introduction The universal approximation ability of Feedforward Neural Networks (FNNs—MLP/BP) has been a topic of study for some time now ([1–6]). Leshno et al. [4], for example, proved that FNNs are able to approximate any continuous function, to any degree of accuracy, provided the network activation function is nonpolynomial. There are two outstanding issues, however [7]:

2. Neuron-Adaptive Model

• The neuron activations commonly employed in the aforementioned studies are invariably sigmoid or

Artificial

Neural

Network

Preliminary function approximation and data simulation experiments indicated that the proposed NAN model oﬀers several advantages over traditional ﬁxed -neuron networks, these being: (1) greatly reduced network size, (2) faster convergence, and (3) lower simulation error.

∗

Department of Physics, Computer Science and Engineering, Christopher Newport University, Newport News, VA 23606, USA; e-mail: [email protected] ∗∗ School of Computing, University of Tasmania, Launceston, Tasmania 7250, Australia; e-mail: [email protected] ∗∗∗ School of Information Technology and Computer Science, University of Wollongong, Wollongong, NSW 2522, Australia; e-mail: [email protected] Recommended by Dr. Vaclav Sebesta (paper no. 202-1585)

2.1 NAN Model The NAN network structure is identical to that of a multilayer FNN; in other words, it consists of an n-unit input 215

layer, an m-unit output layer, and at least one hidden layer comprising an intermediate number of processing units (neurons or nodes). There is no activation function in the input layer, the output neuron is a summing unit (linear activation), and the hidden units employ Neuron-adaptive Activation Functions (NAFs) deﬁned as follows: Ψ(x) = A1 · sin(B1 · x) + A2 · e−B2·x +

A3 1 + e−B3·x

m

E=

1 2 (dj − Oj,l ) 2 j=1

(4)

is adopted, namely, the sum of the squared errors between the actual and desired network outputs, for all input patterns. The aim of learning is to minimize this energy function by adjusting the weights and variables in the activation function. This can be achieved by using the steepest descent gradient rule, expressed as follows:

(1)

where A1, B1, A2, B2, A3, B3 are free parameters that can be adjusted (along with the weights) during training.

∂E ∂wi,j,k

(5)

∂E ∂θi,k

(6)

∂E ∂A1i,k

(7)

(r−1)

(r)

wi,j,k = ηwi,j,k + β 2.1.1 Remarks on NANs (r−1)

(r)

θi,k = ηθi,k

If A1 = B1 = A2 = B2 = 0 and A3 = B3 = 1 in equation (1), then the NAN model becomes a standard FNN with sigmoid activation function. Conversely, if A3 and B3 are nonzero real variables, the model becomes a feedforward neural network with function shape auto-tuning, as described in [14]. More generally, when the As and Bs are real variables, the resulting NAN (referred to as NAN-R ) is capable of approximating any continuous function (Section 2.2). On the other hand, when they are piecewise continuous functions, the resulting NAN-P is able to approximate any piecewise continuous function (Section 2.3).

(r−1)

(r)

A1i,k = A1i,k

+β

+β

(r)

(r)

(r)

and similarly for A2i,k , A3i,k , B1 − 3i,k . In order to derive the gradient information of E with respect to each adjustable parameter in equations (5)–(7), we deﬁne: ∂E = ζi,k ∂Ii,k

(8)

2.2 Universal Approximation Capability of NANRs to any Continuous Function

∂E = ξi,k ∂Oi,k

(9)

Theorem 1. A NAN-R with a NAF described by (1) can approximate any continuous function to any degree of accuracy. Whereas this theorem achieves the same result as Leshno et al. [4], it should be noted that the free parameters in our NAF are tuned (Section 2.2.1); Chen and Chen, by contrast, use a ﬁxed neuron activation function [15]. As a result, our approach oﬀers several advantages, including increased ﬂexibility, greatly reduced network size, faster learning times, and fewer simulation errors (see Section 3.5).

Now, from (2), (3), (8), and (9), the partial derivatives of E with respect to the adjustable parameters are: ∂E ∂Ii,k ∂E = = ζi,k Oj,k−1 ∂wi,j,k ∂Ii,k ∂wi,j,k ∂E = −ζi,k ∂θi,k

2.2.1 Learning Algorithm The input–output relation of the ith neuron in the kth layer of the NAN-R can be described by the following: Ii,k =

[wi,j,k Oj,k−1 ] − θi,k

(2)

j

where j is the number of neurons in layer k − 1, and: Oi,k = Ψ(Ii,k ) = A1i,k · sin(B1i,k · Ii,k ) + A2i,k · e−B2i,k ·Ii,k +

A3i,k 1 + e−B3i,k ·Ii,k

(10) (11)

∂E = ξi,k · sin(B1i,k · Ii,k ) ∂A1i,k

(12)

∂E = ξi,k · A1i,k · Ii,k · cos(B1i,k · Ii,k ) ∂B1i,k

(13)

∂E = ξi,k e−B2i,k ·Ii,k ∂A2i,k

(14)

∂E = −ξi,k · A2i,k · Ii,k · e−B2i,k ·Ii,k ∂B2i,k

(15)

1 ∂E = ξi,k · −B3 i,k ·Ii,k ∂A3i,k 1+e

(16)

A3i,k · Ii,k · e−B3i,k ·Ii,k ∂E = ξi,k · ∂B3i,k (1 + e−B3i,k ·Ii,k )2

(17)

and for (8) and (9), the following equations can be computed:

(3)

In order to train the NAN-R, the following energy function:

ζi,k = 216

∂E ∂E ∂Oi,k ∂Oi,k = = ξi,k · ∂Ii,k ∂Oi,k ∂Ii,k ∂Ii,k

(18)

while: f (X) =

∂Oi,k = A1i,k · B1i,k · cos(B1i,k · Ii,k ) ∂Ii,k

⎧ ⎪ ζj,k+1 wj,i,k+1 , ⎨ and

ξi,k =

⎪ ⎩

(19)

if 1 ≤ k < l; (20)

j

Oi,l − di ,

(21)

where k is the number of hidden layer processing units, and Ψ is deﬁned in (1). Theorem 2. A NAN-P with a NAF described by (1) can approximate any piecewise continuous function with inﬁnite (countable) continuous parts, to any degree of accuracy.

−B3i,k ·Ii,k

A3i,k · B3i,k · e (1 + e−B3i,k ·Ii,k )2

βj Ψ(Wj · X − Θj )

j=1

− A2i,k · B2i,k · e−B2i,k ·Ii,k +

k

2.3.2 Use of NAN-Ps to Approximate Some Special Functions

if k = l.

Some special functions (like the Gamma- or Beta-function) are of signiﬁcance in many engineering and physics problems, and are deﬁned either by integration or by series expansion. Such functions exist due to the fact that the solutions to the diﬀerential equations that describe some practical problems are incapable of being expressed by elementary functions. As these special functions are essentially piecewise continuous functions (see below), in this section we proceed to approximate them using NAN-Ps.

The above procedure can be summarized as follows: 00: Determine the number of hidden units for the NAN-R 10: Initialize all weights wi,l,k , threshold values θi,k , and parameters A 1, B 1, A 2, B 2, A 3, B 3 20: Input a learning example from the training data set and calculate the actual outputs of all neurons using the present parameter values, according to equations (2) and (3). If desired output reached, stop. Otherwise go to 30.

2.3.2.1 Approximation of the Gamma Function by NAN-P The Gamma-function (Fig. 1) is deﬁned as:

30: Evaluate ζi,k and ξi,k from equations (18)–(20), and then the gradient values from equations (10–17). 40: Adjust the parameters according to the iterative formulae 5–7 50: Input another learning pattern, then go to step 20. All the training examples are presented cyclically until all parameters stabilize, or, in other words, until the energy function E for the entire training set is acceptably low and the network has converged.

Figure 1. Gamma-function. Γ(x) = 0

2.3 Universal Approximation Capability of NANPs to any Piecewise Continuous Function

∞

e−t tx−1 dt

(22)

where x > 0 and x ∈ R. Γ(x) cannot be explicitly expressed by elementary functions, because: ∞ ∞ ∞ e−t tx−1 dt e−t tx dt = [−e−t tx ]0 + x Γ(x + 1) = 0 0 ∞ −t x−1 e t dt = xΓ(x). (23) =x

2.3.1 Approximation Theorem Now because a mapping f : Rn → Rm (where n and m are positive integers) can be computed by m mappings: fj : Rn → R (where j = 1, 2, . . . , m), it is theoretically suﬃcient to focus on networks with a single output-unit only. The weights-vector and the threshold value associated with the jth hidden layer processing units are denoted by Wj and Θj respectively. The weights-vector associated with the single output unit is denoted by βj and the input-vector by X. With these notations we see that the function that a NAN-P computes is:

0

When x > 1, this can be written as: Γ(x) = (x − 1)Γ(x − 1) = (x − 1)(x − 2)Γ(x − 2) = (x − 1)(x − 2) · · · (x − n)Γ(x − n). When 0 < x < 1 and x is not a negative integer: 217

(24)

Several countries, realizing the importance of accurate, timely weather warnings, have accordingly invested signiﬁcant resources in government-sponsored agencies (such as the U.S. Department of Commerce’s National Oceanic and Atmospheric Administration). A typical example is the National Environmental Satellite, Data and Information Service (NESDIS) Interactive Flash Flood Analyzer (IFFA) system, housed within NOAA. The IFFA computes precipitation estimates, three-hour precipitation outlooks for convective systems, and extra tropical and tropical cyclones, and transmits this information to both the National Weather Service Forecast Oﬃce and the River Forecast Centre. The system is limited to the computation of rainfall estimates for a single convective system at a time, however, because of the considerable time needed for image processing, interpretation, and the computation involved in rainfall estimation, even using state-of-the-art computers. In practice, it is more usual to witness several storms occurring simultaneously. Agencies such as NOAA would therefore prefer to have available rainfall estimates for the entire country. This has been a primary motivating factor behind the development of ANSER—an Artiﬁcial neural Network expert system for Satellite-derived Estimation of Rainfall—which has been under development since the early 1990s [16]. Conventional (grid-based) global weather models are necessarily complex, given the nature of the attendant ﬂuid dynamic processes commonly occurring in the atmosphere (processes, it needs to be emphasized, that are not thoroughly understood) [17–21]. Since the mid-1980s, however, we have developed a better understanding of how geosynchronous and polar-orbiting satellite data are able to better inform rainfall estimation; a complete review of rainfall schemes that use visible, infrared, or microwave satellite data is to be found in [22]. The diﬀerential equations used to model such a complex system are inexact, due to incomplete boundary conditions, various (necessary) simplifying model assumptions, and also to numerical instabilities. It should be pointed out that heavy rainfall distributions are not always continuous. Summer showers are a good example in which heavy rainfall distribution is invariably discontinuous and nonsmooth. Not surprisingly, solving such complex equations is computationally expensive. Another motivation for developing ANSER was to realize a weather forecasting system that could handle discontinuous, nonsmooth data input—something that until this time has not been available. Most present-day systems yield less than satisfactory results (see, e.g., [23]). Typically, deriving rainfall estimates–even on state-of-the art supercomputers–can take 30 minutes or more. Moreover, the error rates that accompany such estimates can be as high as 30%. Accordingly, another main motivation in developing ANSER was to reduce both computation time and prediction errors by taking an alternative, more modern (soft computing) approach in our development of a real-world automatic heavy rainfall estimation system. Preliminary results showed that although the performance of classic FNNs (MLP/BP) was roughly twice as

Γ(x + 1) Γ(x) = (−1 < x < 0); x Γ(x + 2) (−2 < x < −1); Γ(x) = x(x + 1) Γ(x) =

Γ(x + n) (−n < x < −n + 1) x(x + 1) · · · (x + n − 1) (25)

(Γ(x) is not deﬁned when x = 0 or x is a negative integer.) We can see from Fig. 1 that the Gamma-function is piecewise continuous, with a ﬁnite (countable) number of continuous parts. According to Theorem 2, this function can be approximated to any degree of accuracy using a single NAN-P. 2.3.2.2 Approximation of the Beta Function by NAN-P The Beta-function (with real variables) is deﬁned as: B(x, y) =

0

1

tx−1 (1 − t)y−1 dt

(26)

where x and y are real variables. When y is greater than zero we have: ∞

B(x, y) =

1 y(y − 1) · · · (y − k) (−1)k y k!(x + k)

(27)

k=0

If x, y = 0, −1, −2, and so on, we have: ∞ k(x + y + k) . B(x, y) = (x + k)(y + k)

(28)

k=0

Now the relationship between the Beta- and Gammafunctions is: B(x, y) =

Γ(x)Γ(y) Γ(x + y)

(29)

Thus according to these properties, we can see that the Beta-function is also a piecewise continuous function (of two real variables). Moreover, according to Theorem 2, this function can be approximated to any desired accuracy using a single NAN-P. 3. Application of NANs to Heavy Rainfall Estimation Global weather patterns have received a lot of media attention during the last decade or so, ranging from the impact of greenhouse gas emissions on climate change to the havoc wreaked by natural disasters such as hurricanes, earthquakes, and tsunamis. In relation to the latter, the lack of an early warning system in the Indian Ocean contributed to the massive loss of life and property accompanying the Asian tsunamis that struck just prior to Christmas 2004. 218

Figure 2. ANSER plus SERVER GUI. good as the existing (conventional, model-based) rainfall estimation method, the resulting estimates still exhibited on average around 17% error [24]. We concluded that simple nonlinear, continuous FNNs are incapable of heavy rainfall estimation. Instead, we turned our attention to ANN groups, which were subsequently found to oﬀer much better performance—around 4% error [24]. In other words, ANN groups [25, 26] are capable of simulating piecewise, nonlinear, continuous functions by way of discontinuous and nonsmooth points. Accordingly, ANN groups were used as the knowledge base and reasoning network in the ANSER system for heavy rainfall estimation. The focus of the present paper is neither ANN groups nor PHONNs [27, 28], but rather Neuron-adaptive Artiﬁcial neural Networks (NAN), which have been under development since 1998 [7] and which oﬀer performance marginally better than ANN groups and comparable with PHONNs. Furthermore, an updated ANSER system [24, 27] uses NANs to determine cloud features and nonlinear functions for subsequent use in heavy rainfall estimation. A brief description of the ANSER plus system architecture and operation follows.

Interface (GUI) is shown in Fig. 2. 3.2 ANSER plus Operation Using satellite data as input, the ANSER plus SERVER system ﬁrstly extracts the following features using pattern recognition and image-processing techniques [14, 20]: cloud top temperature (CT), cloud growth factor, rainburst factor (RB), overshooting top factor (OS), cloud merger factor (M), saturated environment factor (SE), storm speed (S), and moisture correction (MC). An ANN pattern recognition technique [16] is used to determine the cloud merger factor (M). A NAN is used to obtain the rainfall estimate factor G (=f {cloud top temperature + cloud growth}), the inputs to this NAN being cloud top temperature and cloud growth, as advised by an NOAA “expert” [27]. For example, when the cloud top temperature is between −58◦ C and −60◦ C and the cloud growth is more than 2/3 latitude, the half-hour rainfall estimate is 0.94 inch [29]. 3.3 NAN Reasoning Network

3.1 ANSER plus Architecture

The seven inputs—G, RB, OS, M, SE, S, MC—are fed into a NAN reasoning network (the second NAN used in our system), the output of which is (real time) half-hourly rainfall estimates [24, 27]. Several NAN reasoning networks are available within the ANSER plus (SUN) SERVER, as appropriate for diﬀerent operating conditions. The basic three-layer NAN architecture comprises 7 input neurons, 30 hidden neurons, and a single output neuron (the optimum number of hidden units was found by trial and error; any more than 30 yielded no performance improvement but required longer training times).

ANSER plus comprises a (SUN) SERVER and (PC) USER systems, with the latter communicating with the former via an Ethernet LAN. Within each USER system reside the following subsystems: TRAINING, WEIGHT BASE, and ESTIMATE. The SERVER system uses several TRAINING subsystems (which incorporate Training and Output Result functions) for training weights. Inputs to the SERVER system are derived from satellite data using a resident expert system. The ANSER plus Graphical User 219

Time:

06Z

Date: 24th May 2000,

Location: LAT

LONG

ANSER Rainfall Estimate

34.144

84.349

7.10 mm

37.148

88.696

27.69 mm

|error| = (output of models − observation data)/ observation ∗ 100%

(37)

When input data become available, they are simultaneously fed into the NAN reasoning network—which has been previously trained using the aforementioned seven inputs together with the desired output. After training, the reasoning rules, models, and knowledge are contained within the network weights and activation functions (the input-output relationships, it needs to be emphasized, are diﬃcult to describe by way of mathematical formulae). Now, due to the massively parallel nature of the neural network, we are able to access all the rules and models simultaneously in our derivation of rainfall estimates. This results in fast (typically only several seconds) and accurate estimation, especially when the rules, models, knowledge, and factors are complex, nonlinear, and discontinuous. The ANSER plus USER system was developed in order to provide multi-user capability. Because training times in the USER PCs would be unacceptably long, the network weights are downloaded via the Ethernet from the SERVER training subsystem. As a result, once the input data (CT, CG, RB, OS, M, SE, S, and MC) have been provided, the ANSER plus USER system is capable of producing rainfall estimates in real time. Fig. 3 shows a typical output from the ANSER plus USER system.

and average over all n cases. The rainfall data estimation results (G) obtained using ANN, NNG, and NAN models were the average errors being 6.32%, 5.42%, and 5.32% respectively. In other words, both NANs and NNGs perform about 16% better than classic (MLP/BP or FFN) ANNs. These rainfall estimates were in turn used as the reasoning network within ANSER plus. As the NAN performance is only marginally better than NNGs, it needs to be pointed out that determination of the discontinuous points to use in a NNG is nontrivial. By contrast, NANs can not only simulate piecewise nonlinear continuous functions with discontinuous and nonsmooth points, but are also able to automatically ﬁnd these points by means of an easy-to-use learning algorithm. Conventional rainfall estimation yields an average error of almost one-third [30]. An ANN reasoning network [16], by comparison, reduced this error by roughly half. Resorting to neural network groups [24] and/or NANs (as in the present study) saw this error drop by almost an order of magnitude, as indicated in Fig. 4 (using test data throughout). A polynomial higher-order neural network used in an earlier study [27] achieved 3.75% error on these same data.

Figure 3. Typical ANSER plus rainfall estimate.

Figure 4. Comparison of results (1: Xie/Scoﬁeld; 2: MLP; 3: NN Group; 4: NAN). It is instructive to compare not just average errors but also maximum errors. The largest Xie/Scoﬁeld rainfall estimation error is a massive +96.21%, which falls to +18.2% with the ANN reasoning network, and to much more acceptable levels for both NNG (−0.74%) and NAN (−0.72%). By comparison, the largest observed error using (a neuron-adaptive ANN reasoning network) ANSER plus is only +7.65%. Conversely, the largest error resulting from using (an MLP reasoning network) ANSER plus is −21%.

3.4 Performance Comparison Standard practice with comparative experiments is to ensure the same environmental conditions throughout. Accordingly, the same groups of testing data and the same parameters were used for all three models, namely learning rate, number of hidden layers, and so forth. In each case, we use the following error measure: 220

Grants Scheme, grant X0012569). The authors also thank the following ORA staﬀ within NOAA for their invaluable help: Drs. Roderick Scoﬁeld, James Purdom, Frances Holt, Arnie Gruber, and Fuzhong Weng, and Mr. Clay Davenport.

In summary, the best results we obtained were with the neuron-adaptive neural network. These ﬁndings support our earlier proposition that NANs are capable of adequately estimating rainfall (which is known to be a complex, nonlinear, discontinuous task), in contrast to simple ANNs, which are inaccurate at points of discontinuity. This results in large errors with the latter technique, even though it in turn outperforms conventional (rule-based) expert systems. The NAN model not only provides better accuracy, but can also simulate discontinuous functions automatically using a convenient learning algorithm. Using a NAN reasoning network, the ANSER plus system is capable of achieving (1) an order-of-magnitude decrease in the automatic computation of rainfall estimates, and (2) average errors for the total precipitation event of around one-tenth of that achieved previously.

References [1] G. Cybenko, Approximation by superpositions of a sigmoidal function, Mathematics of Control, Signals, and Systems, 2(4), 1989, 303–314. [2] E. Blum & K. Li, Approximation theory and feedforward networks, Neural Networks, 4, 1991, 511–515. [3] K. Hornik, M. White, & H. Stinchcombe, Multilayer feedforward networks are universal approximators, Neural Networks, 2, 1989, 359–366. [4] M. Leshno, V.Y. Lin, A. Pinkus, & S. Schocken, Multilayer feedforward networks with a nonpolynomial activation function can approximate any function, Neural Networks, 6, 1993, 861–867. [5] J. Park & I.W. Sandberg, Universal approximation using radial-basis-function networks, Neural Computation, 3, 1991, 246–257. [6] F. Scarselli & A.C. Tsoi, Universal approximation using feedforward neural networks: A survey of some existing methods, and some new results, Neural Networks, 11(1), 1998, 15–37. [7] M. Zhang, S. Xu, & J. Fulcher, Neuron-adaptive higher order neural network models for automated ﬁnancial data modelling, IEEEE Trans. on Neural Networks, 13(1), 2002, 188–204. [8] M. Arai, R. Kohon, & H. Imai, Adaptive control of a neural network with a variable function of a unit and its application, Trans. on Institution Electronic Information Communication, J74-A, 1991, 551–559. [9] Z. Hu & H. Shao, The study of neural network adaptive control systems, Control and Decision, 7, 1992, 361–366. [10] T. Yamada & T. Yabuta, Remarks on a neural network controller which uses an auto-tuning method for nonlinear functions, IJCNN, 2, 1992, 775–780. [11] T. Chen & H. Chen, Approximation capability to functions of several variables, nonlinear functionals, and operators by radial basis function neural networks, IEEE Transactions on Neural Networks, 6(4), 1995, 904–910. [12] P. Campolucci, F. Capparelli, S. Guarnieri, F. Piazza, & A. Uncini, Neural networks with adaptive spline activation function, Proc. IEEE MELECON 96, Bari, Italy, 1996, 1442– 1445. [13] N.S. Philip & K.B. Joseph, A neural network tool for analysing trends in rainfall, Computers & Geosciences Journal, 29, 2003, 215–223. [14] C.T. Chen & W.D. Chang, A feedforward neural network with function shape autotuning, Neural Networks, 9(4), 1996, 627–641. [15] T. Chen & H. Chen, Approximations of continuous functionals by neural networks with application to dynamic systems, IEEE Transactions on Neural Networks, 4(6), 1993, 910–918. [16] M. Zhang & R.A. Scoﬁeld, Artiﬁcial neural network techniques for estimating heavy convective rainfall and recognition cloud mergers from satellite data, International Journal of Remote Sensing, 15(16), 1994, 3241–3262. [17] K.A. Browning, Airﬂow and precipitation trajectories within severe local storms which travel to the right of the winds, Journal of Atmospheric Science, 21, 1964, 634–639. [18] F.A. Huﬀ, Rainfall gradients in warm season rainfall, Journal of Applied Meteorology, 6, 1967, 435–437. [19] W.E. Shenk, Cloud top height visibility in strong convective cells, Journal of Applied Meteorology, 13, 1974, 917–922. [20] R.J. Kane, C.R. Chelius, & J.M. Fritsch, The precipitation characteristics of mesoscale convective weather system, Journal of Climate & Applied Meteorology, 26(10), 1987, 1323–1335. [21] C.A. Leary & E.N. Rappaport, The life cycle and internal structure of a mesoscale convective complex, Monthly Weather Review, 115(8), 1987, 1503–1527. [22] E.C. Barrett & D.W. Martin, The use of satellite data in rainfall monitoring (New York: Academic Press, 1981).

4. Conclusion In this paper we have introduced the NAN with neuronadaptive activation function (equation (1)), capable of approximating any continuous (or piecewise continuous) function, to any desired accuracy. The approximation ability of NANs to any continuous function is established, and a learning algorithm is derived to tune both the free parameters in the neuron-adaptive activation function and the connection weights between neurons. Our work diﬀers from previous research in that we use adaptive activation functions, whereas the neuron activation functions in most other models are ﬁxed and thus cannot be tuned to adapt to diﬀerent approximation problems. Moreover, we are able to use a single NAN to approximate any piecewise continuous function, as opposed to neural network groups, for example, which require many ANNs. Based on the NAN model, we developed a neuronAdaptive artiﬁcial Neural network System for Estimating Rainfall using satellite data (ANSER plus). Empirical results show that the NAN model is about 1.8% better than artiﬁcial neural network groups (NNGs) and about 16.4% better than ANNs when using an experimental rainfall estimation database. The empirical results also show that, using the NAN model, ANSER plus is able to achieve (1) a tenfold decrease in computation time and (2) an orderof-magnitude decrease in estimation error. Experiments indicate that the proposed NAN models present several advantages over traditional neuron-ﬁxed networks (such as much reduced network size, faster learning, and decreased simulation error). Furthermore, NAN models are able to eﬀectively approximate piecewise continuous functions, and oﬀer superior performance to neural network groups. Acknowledgments This work was ﬁnancially supported by a USA National Research Council Senior Research Associateship at the Atmospheric Research and Application Division of the NOAA (M. Zhang, 1999–2000); the Application and Research Centre, Christopher Newport University; and the Australian Research Council (2002 Institutional Research 221

Shuxiang Xu won the Australian government’s Overseas Postgraduate Research Award to research a Ph.D. at the University of Western Sydney, Sydney, Australia, in 1996, and was awarded a Ph.D. in computing by that university in 2000. He received an M.Sc. in applied mathematics and a B.Sc. in mathematics in 1989 and 1996, respectively, from the University of Electronic Science and Technology of China, Chengdu, China. His current interests include the theory and applications of artiﬁcial neural networks, especially the application of ANN in ﬁnancial simulation and forecasting and in image recognition. He is currently a lecturer at the School of Computing, University of Tasmania, Australia.

[23] J. Bullas, J.C. McLeod, & B. de Lorenzis, Knowledge Augmented Severe Storms Predictor (KASSPr): An operational test, Proc. 16th Conf. on Severe Local Storms, American Meteorology Society, Boston, 1990, 106–111. [24] M. Zhang, J. Fulcher, & R. Scoﬁeld, Rainfall estimation using artiﬁcial neural network group, Neurocomputing, 16(2), 1997, 97–115. [25] M. Zhang & J. Fulcher, Face recognition using artiﬁcial neural network group-based adaptive tolerance (GAT) trees, IEEE Transactions on Neural Networks, 7(3), 1996, 555–567. [26] M. Zhang, J.C. Zhang, & J. Fulcher, Neural network group models for data approximation, International Journal of Neural Systems, 10(2), 2000, 123–142. [27] M. Zhang & J. Fulcher, Higher order neural networks for satellite weather prediction, in J. Fulcher & L. Jain (eds.), Applied intelligent systems: New directions (Berlin: Springer, 2004). [28] J. Fulcher, M. Zhang, & S. Xu, The application of higher-order neural networks to ﬁnancial time series, in J. Kamruzzaman, R. Begg, & R. Sarker (eds.), Artificial neural networks in finance and manufacturing (Hershey, PA: Idea Group, 2006). [29] R.A. Scoﬁeld, The NESDIS operational convective precipitation estimation technique, Monthly Weather Review, 115, 1987, 1773–1792. [30] J. Xie & R.A. Scoﬁeld, Satellite-derived rainfall estimates and propagation characteristics associated with mesoscal convective system (MCSs), NAOO Technical Memorandum NESDIS, 25, 1989, 1–49.

John A. Fulcher is currently a professor in the School of Information Technology and Computer Science at the University of Wollongong, Australia. He is the author of more than 100 publications, including the bestselling textbook Microcomputer Interfacing (Addison Wesley, 1989), Applied Intelligent Systems (Springer, 2004), as well as contributing three chapters in the Handbook of Neural Computation (Oxford University Press, 1997). He serves as a reviewer for several journals, especially in neural networks and computer education, and is a regular contributor to ACM Computing Reviews. Prof. Fulcher also serves on the editorial board of Computer Science Education (CSE) and has been guest editor for special issues of CSE (Australasia) and Computer Standards and Interfaces (Artiﬁcial Neural Networks ). He is currently editing a book—along with Prof. Lakhmi C. Jain—for Springer Verlag entitled Handbook of Computation Intelligence. Prof. Fulcher is a senior member of IEEE and a member of ACM.

Biographies Ming Zhang was born in Shanghai, China. He received his M.S. degree in information processing and Ph.D. degree in the research area of computer vision from East China Normal University, Shanghai, China, in 1982 and 1989, respectively. He held Postdoctoral Fellowships in artiﬁcial neural networks with the Chinese Academy of the Sciences in 1989 and the USA National Research Council in 1991. He was a face recognition airport security system project manager and Ph.D. co-supervisor at the University of Wollongong, Australia in 1992. Since 1994, he was a lecturer at the Monash University, Australia, with a research area of artiﬁcial neural network ﬁnancial information system. From 1995 to 1999, he was a senior lecturer and Ph.D. supervisor at the University of Western Sydney, Australia, with the research interest of artiﬁcial neural networks. He also held Senior Research Associate Fellowship in artiﬁcial neural networks with the USA National Research Council in 1999. He is currently an Associate Professor and graduate student supervisor in computer science at the Christopher Newport University, VA, USA. With more than 100 papers published, his current research includes artiﬁcial neural network models for face recognition, weather forecasting, ﬁnancial data simulation, and management.

222

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

ANSER: ADAPTIVE NEURON ARTIFICIAL NEURAL NETWORK SYSTEM FOR ESTIMATING RAINFALL M. Zhang,∗ S. Xu,∗∗ and J. Fulcher∗∗∗

other (ﬁxed ) function, comprising no free parameters (for instance, it is not possible to adapt Ψ(x) = sin(x) + e−x + 1 +1e−x for diﬀerent approximation problems). • Certain real-world applications are characterized by nonlinear, discontinuous—rather than continuous— describing functions. In such cases, piecewise continuous data simulations are more appropriate.

Abstract We propose a new neural network model, Neuron-Adaptive artiﬁcial neural Network (NAN). A learning algorithm is derived to tune both the neuron activation function free parameters and the connection weights between neurons.

We proceed to prove that a NAN

can approximate any piecewise continuous function to any desired accuracy, and then relate the approximation properties of NAN models to some special mathematical functions. A neuron-Adaptive

Some researchers have turned their attention to the setting of free parameters in the neuron activation function ([8–12]). For example, Chen and Chen [11] adjusted both the gain and the slope of a generalized sigmoid activation function during learning. The resulting network appeared to provide superior curve-ﬁtting ability compared with classical, ﬁxed activation function FNNs; in other words, use of an adaptive sigmoid led to improved data modelling. More recently, Philip et al. [13] have shown that robust adaptive neurons with fewer parameters are more eﬃcient than standard FNNs. In this paper we develop the Neuron-Adaptive Network (NAN) model, which incorporates adjustable parameters and is capable of approximating any continuous or (nonlinear) piecewise continuous function, by way of a series of discontinuous, nonsmooth points, and to any degree of accuracy. Moreover, our method allows for the automatic determination of these points by way of an easy-touse learning algorithm. We illustrate the eﬀectiveness of our new NAN model by way of real-world heavy rainfall estimation derived from satellite images.

artiﬁcial Neural network System for Estimating Rainfall (ANSER), which uses NAN as its basic reasoning network, is described. Empirical results show that the NAN model performs about 1.8% better than artiﬁcial neural network groups, and around 16.4% better than classical artiﬁcial neural networks when using a rainfall estimate experimental database. The empirical results also show that by using the NAN model, ANSER plus can (1) automatically compute rainfall amounts ten times faster; and (2) reduce average errors of rainfall estimates for the total precipitation event to less than 10%.

Key Words Adaptive neuron, NANN system, estimating, rainfall

1. Introduction The universal approximation ability of Feedforward Neural Networks (FNNs—MLP/BP) has been a topic of study for some time now ([1–6]). Leshno et al. [4], for example, proved that FNNs are able to approximate any continuous function, to any degree of accuracy, provided the network activation function is nonpolynomial. There are two outstanding issues, however [7]:

2. Neuron-Adaptive Model

• The neuron activations commonly employed in the aforementioned studies are invariably sigmoid or

Artificial

Neural

Network

Preliminary function approximation and data simulation experiments indicated that the proposed NAN model oﬀers several advantages over traditional ﬁxed -neuron networks, these being: (1) greatly reduced network size, (2) faster convergence, and (3) lower simulation error.

∗

Department of Physics, Computer Science and Engineering, Christopher Newport University, Newport News, VA 23606, USA; e-mail: [email protected] ∗∗ School of Computing, University of Tasmania, Launceston, Tasmania 7250, Australia; e-mail: [email protected] ∗∗∗ School of Information Technology and Computer Science, University of Wollongong, Wollongong, NSW 2522, Australia; e-mail: [email protected] Recommended by Dr. Vaclav Sebesta (paper no. 202-1585)

2.1 NAN Model The NAN network structure is identical to that of a multilayer FNN; in other words, it consists of an n-unit input 215

layer, an m-unit output layer, and at least one hidden layer comprising an intermediate number of processing units (neurons or nodes). There is no activation function in the input layer, the output neuron is a summing unit (linear activation), and the hidden units employ Neuron-adaptive Activation Functions (NAFs) deﬁned as follows: Ψ(x) = A1 · sin(B1 · x) + A2 · e−B2·x +

A3 1 + e−B3·x

m

E=

1 2 (dj − Oj,l ) 2 j=1

(4)

is adopted, namely, the sum of the squared errors between the actual and desired network outputs, for all input patterns. The aim of learning is to minimize this energy function by adjusting the weights and variables in the activation function. This can be achieved by using the steepest descent gradient rule, expressed as follows:

(1)

where A1, B1, A2, B2, A3, B3 are free parameters that can be adjusted (along with the weights) during training.

∂E ∂wi,j,k

(5)

∂E ∂θi,k

(6)

∂E ∂A1i,k

(7)

(r−1)

(r)

wi,j,k = ηwi,j,k + β 2.1.1 Remarks on NANs (r−1)

(r)

θi,k = ηθi,k

If A1 = B1 = A2 = B2 = 0 and A3 = B3 = 1 in equation (1), then the NAN model becomes a standard FNN with sigmoid activation function. Conversely, if A3 and B3 are nonzero real variables, the model becomes a feedforward neural network with function shape auto-tuning, as described in [14]. More generally, when the As and Bs are real variables, the resulting NAN (referred to as NAN-R ) is capable of approximating any continuous function (Section 2.2). On the other hand, when they are piecewise continuous functions, the resulting NAN-P is able to approximate any piecewise continuous function (Section 2.3).

(r−1)

(r)

A1i,k = A1i,k

+β

+β

(r)

(r)

(r)

and similarly for A2i,k , A3i,k , B1 − 3i,k . In order to derive the gradient information of E with respect to each adjustable parameter in equations (5)–(7), we deﬁne: ∂E = ζi,k ∂Ii,k

(8)

2.2 Universal Approximation Capability of NANRs to any Continuous Function

∂E = ξi,k ∂Oi,k

(9)

Theorem 1. A NAN-R with a NAF described by (1) can approximate any continuous function to any degree of accuracy. Whereas this theorem achieves the same result as Leshno et al. [4], it should be noted that the free parameters in our NAF are tuned (Section 2.2.1); Chen and Chen, by contrast, use a ﬁxed neuron activation function [15]. As a result, our approach oﬀers several advantages, including increased ﬂexibility, greatly reduced network size, faster learning times, and fewer simulation errors (see Section 3.5).

Now, from (2), (3), (8), and (9), the partial derivatives of E with respect to the adjustable parameters are: ∂E ∂Ii,k ∂E = = ζi,k Oj,k−1 ∂wi,j,k ∂Ii,k ∂wi,j,k ∂E = −ζi,k ∂θi,k

2.2.1 Learning Algorithm The input–output relation of the ith neuron in the kth layer of the NAN-R can be described by the following: Ii,k =

[wi,j,k Oj,k−1 ] − θi,k

(2)

j

where j is the number of neurons in layer k − 1, and: Oi,k = Ψ(Ii,k ) = A1i,k · sin(B1i,k · Ii,k ) + A2i,k · e−B2i,k ·Ii,k +

A3i,k 1 + e−B3i,k ·Ii,k

(10) (11)

∂E = ξi,k · sin(B1i,k · Ii,k ) ∂A1i,k

(12)

∂E = ξi,k · A1i,k · Ii,k · cos(B1i,k · Ii,k ) ∂B1i,k

(13)

∂E = ξi,k e−B2i,k ·Ii,k ∂A2i,k

(14)

∂E = −ξi,k · A2i,k · Ii,k · e−B2i,k ·Ii,k ∂B2i,k

(15)

1 ∂E = ξi,k · −B3 i,k ·Ii,k ∂A3i,k 1+e

(16)

A3i,k · Ii,k · e−B3i,k ·Ii,k ∂E = ξi,k · ∂B3i,k (1 + e−B3i,k ·Ii,k )2

(17)

and for (8) and (9), the following equations can be computed:

(3)

In order to train the NAN-R, the following energy function:

ζi,k = 216

∂E ∂E ∂Oi,k ∂Oi,k = = ξi,k · ∂Ii,k ∂Oi,k ∂Ii,k ∂Ii,k

(18)

while: f (X) =

∂Oi,k = A1i,k · B1i,k · cos(B1i,k · Ii,k ) ∂Ii,k

⎧ ⎪ ζj,k+1 wj,i,k+1 , ⎨ and

ξi,k =

⎪ ⎩

(19)

if 1 ≤ k < l; (20)

j

Oi,l − di ,

(21)

where k is the number of hidden layer processing units, and Ψ is deﬁned in (1). Theorem 2. A NAN-P with a NAF described by (1) can approximate any piecewise continuous function with inﬁnite (countable) continuous parts, to any degree of accuracy.

−B3i,k ·Ii,k

A3i,k · B3i,k · e (1 + e−B3i,k ·Ii,k )2

βj Ψ(Wj · X − Θj )

j=1

− A2i,k · B2i,k · e−B2i,k ·Ii,k +

k

2.3.2 Use of NAN-Ps to Approximate Some Special Functions

if k = l.

Some special functions (like the Gamma- or Beta-function) are of signiﬁcance in many engineering and physics problems, and are deﬁned either by integration or by series expansion. Such functions exist due to the fact that the solutions to the diﬀerential equations that describe some practical problems are incapable of being expressed by elementary functions. As these special functions are essentially piecewise continuous functions (see below), in this section we proceed to approximate them using NAN-Ps.

The above procedure can be summarized as follows: 00: Determine the number of hidden units for the NAN-R 10: Initialize all weights wi,l,k , threshold values θi,k , and parameters A 1, B 1, A 2, B 2, A 3, B 3 20: Input a learning example from the training data set and calculate the actual outputs of all neurons using the present parameter values, according to equations (2) and (3). If desired output reached, stop. Otherwise go to 30.

2.3.2.1 Approximation of the Gamma Function by NAN-P The Gamma-function (Fig. 1) is deﬁned as:

30: Evaluate ζi,k and ξi,k from equations (18)–(20), and then the gradient values from equations (10–17). 40: Adjust the parameters according to the iterative formulae 5–7 50: Input another learning pattern, then go to step 20. All the training examples are presented cyclically until all parameters stabilize, or, in other words, until the energy function E for the entire training set is acceptably low and the network has converged.

Figure 1. Gamma-function. Γ(x) = 0

2.3 Universal Approximation Capability of NANPs to any Piecewise Continuous Function

∞

e−t tx−1 dt

(22)

where x > 0 and x ∈ R. Γ(x) cannot be explicitly expressed by elementary functions, because: ∞ ∞ ∞ e−t tx−1 dt e−t tx dt = [−e−t tx ]0 + x Γ(x + 1) = 0 0 ∞ −t x−1 e t dt = xΓ(x). (23) =x

2.3.1 Approximation Theorem Now because a mapping f : Rn → Rm (where n and m are positive integers) can be computed by m mappings: fj : Rn → R (where j = 1, 2, . . . , m), it is theoretically suﬃcient to focus on networks with a single output-unit only. The weights-vector and the threshold value associated with the jth hidden layer processing units are denoted by Wj and Θj respectively. The weights-vector associated with the single output unit is denoted by βj and the input-vector by X. With these notations we see that the function that a NAN-P computes is:

0

When x > 1, this can be written as: Γ(x) = (x − 1)Γ(x − 1) = (x − 1)(x − 2)Γ(x − 2) = (x − 1)(x − 2) · · · (x − n)Γ(x − n). When 0 < x < 1 and x is not a negative integer: 217

(24)

Several countries, realizing the importance of accurate, timely weather warnings, have accordingly invested signiﬁcant resources in government-sponsored agencies (such as the U.S. Department of Commerce’s National Oceanic and Atmospheric Administration). A typical example is the National Environmental Satellite, Data and Information Service (NESDIS) Interactive Flash Flood Analyzer (IFFA) system, housed within NOAA. The IFFA computes precipitation estimates, three-hour precipitation outlooks for convective systems, and extra tropical and tropical cyclones, and transmits this information to both the National Weather Service Forecast Oﬃce and the River Forecast Centre. The system is limited to the computation of rainfall estimates for a single convective system at a time, however, because of the considerable time needed for image processing, interpretation, and the computation involved in rainfall estimation, even using state-of-the-art computers. In practice, it is more usual to witness several storms occurring simultaneously. Agencies such as NOAA would therefore prefer to have available rainfall estimates for the entire country. This has been a primary motivating factor behind the development of ANSER—an Artiﬁcial neural Network expert system for Satellite-derived Estimation of Rainfall—which has been under development since the early 1990s [16]. Conventional (grid-based) global weather models are necessarily complex, given the nature of the attendant ﬂuid dynamic processes commonly occurring in the atmosphere (processes, it needs to be emphasized, that are not thoroughly understood) [17–21]. Since the mid-1980s, however, we have developed a better understanding of how geosynchronous and polar-orbiting satellite data are able to better inform rainfall estimation; a complete review of rainfall schemes that use visible, infrared, or microwave satellite data is to be found in [22]. The diﬀerential equations used to model such a complex system are inexact, due to incomplete boundary conditions, various (necessary) simplifying model assumptions, and also to numerical instabilities. It should be pointed out that heavy rainfall distributions are not always continuous. Summer showers are a good example in which heavy rainfall distribution is invariably discontinuous and nonsmooth. Not surprisingly, solving such complex equations is computationally expensive. Another motivation for developing ANSER was to realize a weather forecasting system that could handle discontinuous, nonsmooth data input—something that until this time has not been available. Most present-day systems yield less than satisfactory results (see, e.g., [23]). Typically, deriving rainfall estimates–even on state-of-the art supercomputers–can take 30 minutes or more. Moreover, the error rates that accompany such estimates can be as high as 30%. Accordingly, another main motivation in developing ANSER was to reduce both computation time and prediction errors by taking an alternative, more modern (soft computing) approach in our development of a real-world automatic heavy rainfall estimation system. Preliminary results showed that although the performance of classic FNNs (MLP/BP) was roughly twice as

Γ(x + 1) Γ(x) = (−1 < x < 0); x Γ(x + 2) (−2 < x < −1); Γ(x) = x(x + 1) Γ(x) =

Γ(x + n) (−n < x < −n + 1) x(x + 1) · · · (x + n − 1) (25)

(Γ(x) is not deﬁned when x = 0 or x is a negative integer.) We can see from Fig. 1 that the Gamma-function is piecewise continuous, with a ﬁnite (countable) number of continuous parts. According to Theorem 2, this function can be approximated to any degree of accuracy using a single NAN-P. 2.3.2.2 Approximation of the Beta Function by NAN-P The Beta-function (with real variables) is deﬁned as: B(x, y) =

0

1

tx−1 (1 − t)y−1 dt

(26)

where x and y are real variables. When y is greater than zero we have: ∞

B(x, y) =

1 y(y − 1) · · · (y − k) (−1)k y k!(x + k)

(27)

k=0

If x, y = 0, −1, −2, and so on, we have: ∞ k(x + y + k) . B(x, y) = (x + k)(y + k)

(28)

k=0

Now the relationship between the Beta- and Gammafunctions is: B(x, y) =

Γ(x)Γ(y) Γ(x + y)

(29)

Thus according to these properties, we can see that the Beta-function is also a piecewise continuous function (of two real variables). Moreover, according to Theorem 2, this function can be approximated to any desired accuracy using a single NAN-P. 3. Application of NANs to Heavy Rainfall Estimation Global weather patterns have received a lot of media attention during the last decade or so, ranging from the impact of greenhouse gas emissions on climate change to the havoc wreaked by natural disasters such as hurricanes, earthquakes, and tsunamis. In relation to the latter, the lack of an early warning system in the Indian Ocean contributed to the massive loss of life and property accompanying the Asian tsunamis that struck just prior to Christmas 2004. 218

Figure 2. ANSER plus SERVER GUI. good as the existing (conventional, model-based) rainfall estimation method, the resulting estimates still exhibited on average around 17% error [24]. We concluded that simple nonlinear, continuous FNNs are incapable of heavy rainfall estimation. Instead, we turned our attention to ANN groups, which were subsequently found to oﬀer much better performance—around 4% error [24]. In other words, ANN groups [25, 26] are capable of simulating piecewise, nonlinear, continuous functions by way of discontinuous and nonsmooth points. Accordingly, ANN groups were used as the knowledge base and reasoning network in the ANSER system for heavy rainfall estimation. The focus of the present paper is neither ANN groups nor PHONNs [27, 28], but rather Neuron-adaptive Artiﬁcial neural Networks (NAN), which have been under development since 1998 [7] and which oﬀer performance marginally better than ANN groups and comparable with PHONNs. Furthermore, an updated ANSER system [24, 27] uses NANs to determine cloud features and nonlinear functions for subsequent use in heavy rainfall estimation. A brief description of the ANSER plus system architecture and operation follows.

Interface (GUI) is shown in Fig. 2. 3.2 ANSER plus Operation Using satellite data as input, the ANSER plus SERVER system ﬁrstly extracts the following features using pattern recognition and image-processing techniques [14, 20]: cloud top temperature (CT), cloud growth factor, rainburst factor (RB), overshooting top factor (OS), cloud merger factor (M), saturated environment factor (SE), storm speed (S), and moisture correction (MC). An ANN pattern recognition technique [16] is used to determine the cloud merger factor (M). A NAN is used to obtain the rainfall estimate factor G (=f {cloud top temperature + cloud growth}), the inputs to this NAN being cloud top temperature and cloud growth, as advised by an NOAA “expert” [27]. For example, when the cloud top temperature is between −58◦ C and −60◦ C and the cloud growth is more than 2/3 latitude, the half-hour rainfall estimate is 0.94 inch [29]. 3.3 NAN Reasoning Network

3.1 ANSER plus Architecture

The seven inputs—G, RB, OS, M, SE, S, MC—are fed into a NAN reasoning network (the second NAN used in our system), the output of which is (real time) half-hourly rainfall estimates [24, 27]. Several NAN reasoning networks are available within the ANSER plus (SUN) SERVER, as appropriate for diﬀerent operating conditions. The basic three-layer NAN architecture comprises 7 input neurons, 30 hidden neurons, and a single output neuron (the optimum number of hidden units was found by trial and error; any more than 30 yielded no performance improvement but required longer training times).

ANSER plus comprises a (SUN) SERVER and (PC) USER systems, with the latter communicating with the former via an Ethernet LAN. Within each USER system reside the following subsystems: TRAINING, WEIGHT BASE, and ESTIMATE. The SERVER system uses several TRAINING subsystems (which incorporate Training and Output Result functions) for training weights. Inputs to the SERVER system are derived from satellite data using a resident expert system. The ANSER plus Graphical User 219

Time:

06Z

Date: 24th May 2000,

Location: LAT

LONG

ANSER Rainfall Estimate

34.144

84.349

7.10 mm

37.148

88.696

27.69 mm

|error| = (output of models − observation data)/ observation ∗ 100%

(37)

When input data become available, they are simultaneously fed into the NAN reasoning network—which has been previously trained using the aforementioned seven inputs together with the desired output. After training, the reasoning rules, models, and knowledge are contained within the network weights and activation functions (the input-output relationships, it needs to be emphasized, are diﬃcult to describe by way of mathematical formulae). Now, due to the massively parallel nature of the neural network, we are able to access all the rules and models simultaneously in our derivation of rainfall estimates. This results in fast (typically only several seconds) and accurate estimation, especially when the rules, models, knowledge, and factors are complex, nonlinear, and discontinuous. The ANSER plus USER system was developed in order to provide multi-user capability. Because training times in the USER PCs would be unacceptably long, the network weights are downloaded via the Ethernet from the SERVER training subsystem. As a result, once the input data (CT, CG, RB, OS, M, SE, S, and MC) have been provided, the ANSER plus USER system is capable of producing rainfall estimates in real time. Fig. 3 shows a typical output from the ANSER plus USER system.

and average over all n cases. The rainfall data estimation results (G) obtained using ANN, NNG, and NAN models were the average errors being 6.32%, 5.42%, and 5.32% respectively. In other words, both NANs and NNGs perform about 16% better than classic (MLP/BP or FFN) ANNs. These rainfall estimates were in turn used as the reasoning network within ANSER plus. As the NAN performance is only marginally better than NNGs, it needs to be pointed out that determination of the discontinuous points to use in a NNG is nontrivial. By contrast, NANs can not only simulate piecewise nonlinear continuous functions with discontinuous and nonsmooth points, but are also able to automatically ﬁnd these points by means of an easy-to-use learning algorithm. Conventional rainfall estimation yields an average error of almost one-third [30]. An ANN reasoning network [16], by comparison, reduced this error by roughly half. Resorting to neural network groups [24] and/or NANs (as in the present study) saw this error drop by almost an order of magnitude, as indicated in Fig. 4 (using test data throughout). A polynomial higher-order neural network used in an earlier study [27] achieved 3.75% error on these same data.

Figure 3. Typical ANSER plus rainfall estimate.

Figure 4. Comparison of results (1: Xie/Scoﬁeld; 2: MLP; 3: NN Group; 4: NAN). It is instructive to compare not just average errors but also maximum errors. The largest Xie/Scoﬁeld rainfall estimation error is a massive +96.21%, which falls to +18.2% with the ANN reasoning network, and to much more acceptable levels for both NNG (−0.74%) and NAN (−0.72%). By comparison, the largest observed error using (a neuron-adaptive ANN reasoning network) ANSER plus is only +7.65%. Conversely, the largest error resulting from using (an MLP reasoning network) ANSER plus is −21%.

3.4 Performance Comparison Standard practice with comparative experiments is to ensure the same environmental conditions throughout. Accordingly, the same groups of testing data and the same parameters were used for all three models, namely learning rate, number of hidden layers, and so forth. In each case, we use the following error measure: 220

Grants Scheme, grant X0012569). The authors also thank the following ORA staﬀ within NOAA for their invaluable help: Drs. Roderick Scoﬁeld, James Purdom, Frances Holt, Arnie Gruber, and Fuzhong Weng, and Mr. Clay Davenport.

In summary, the best results we obtained were with the neuron-adaptive neural network. These ﬁndings support our earlier proposition that NANs are capable of adequately estimating rainfall (which is known to be a complex, nonlinear, discontinuous task), in contrast to simple ANNs, which are inaccurate at points of discontinuity. This results in large errors with the latter technique, even though it in turn outperforms conventional (rule-based) expert systems. The NAN model not only provides better accuracy, but can also simulate discontinuous functions automatically using a convenient learning algorithm. Using a NAN reasoning network, the ANSER plus system is capable of achieving (1) an order-of-magnitude decrease in the automatic computation of rainfall estimates, and (2) average errors for the total precipitation event of around one-tenth of that achieved previously.

References [1] G. Cybenko, Approximation by superpositions of a sigmoidal function, Mathematics of Control, Signals, and Systems, 2(4), 1989, 303–314. [2] E. Blum & K. Li, Approximation theory and feedforward networks, Neural Networks, 4, 1991, 511–515. [3] K. Hornik, M. White, & H. Stinchcombe, Multilayer feedforward networks are universal approximators, Neural Networks, 2, 1989, 359–366. [4] M. Leshno, V.Y. Lin, A. Pinkus, & S. Schocken, Multilayer feedforward networks with a nonpolynomial activation function can approximate any function, Neural Networks, 6, 1993, 861–867. [5] J. Park & I.W. Sandberg, Universal approximation using radial-basis-function networks, Neural Computation, 3, 1991, 246–257. [6] F. Scarselli & A.C. Tsoi, Universal approximation using feedforward neural networks: A survey of some existing methods, and some new results, Neural Networks, 11(1), 1998, 15–37. [7] M. Zhang, S. Xu, & J. Fulcher, Neuron-adaptive higher order neural network models for automated ﬁnancial data modelling, IEEEE Trans. on Neural Networks, 13(1), 2002, 188–204. [8] M. Arai, R. Kohon, & H. Imai, Adaptive control of a neural network with a variable function of a unit and its application, Trans. on Institution Electronic Information Communication, J74-A, 1991, 551–559. [9] Z. Hu & H. Shao, The study of neural network adaptive control systems, Control and Decision, 7, 1992, 361–366. [10] T. Yamada & T. Yabuta, Remarks on a neural network controller which uses an auto-tuning method for nonlinear functions, IJCNN, 2, 1992, 775–780. [11] T. Chen & H. Chen, Approximation capability to functions of several variables, nonlinear functionals, and operators by radial basis function neural networks, IEEE Transactions on Neural Networks, 6(4), 1995, 904–910. [12] P. Campolucci, F. Capparelli, S. Guarnieri, F. Piazza, & A. Uncini, Neural networks with adaptive spline activation function, Proc. IEEE MELECON 96, Bari, Italy, 1996, 1442– 1445. [13] N.S. Philip & K.B. Joseph, A neural network tool for analysing trends in rainfall, Computers & Geosciences Journal, 29, 2003, 215–223. [14] C.T. Chen & W.D. Chang, A feedforward neural network with function shape autotuning, Neural Networks, 9(4), 1996, 627–641. [15] T. Chen & H. Chen, Approximations of continuous functionals by neural networks with application to dynamic systems, IEEE Transactions on Neural Networks, 4(6), 1993, 910–918. [16] M. Zhang & R.A. Scoﬁeld, Artiﬁcial neural network techniques for estimating heavy convective rainfall and recognition cloud mergers from satellite data, International Journal of Remote Sensing, 15(16), 1994, 3241–3262. [17] K.A. Browning, Airﬂow and precipitation trajectories within severe local storms which travel to the right of the winds, Journal of Atmospheric Science, 21, 1964, 634–639. [18] F.A. Huﬀ, Rainfall gradients in warm season rainfall, Journal of Applied Meteorology, 6, 1967, 435–437. [19] W.E. Shenk, Cloud top height visibility in strong convective cells, Journal of Applied Meteorology, 13, 1974, 917–922. [20] R.J. Kane, C.R. Chelius, & J.M. Fritsch, The precipitation characteristics of mesoscale convective weather system, Journal of Climate & Applied Meteorology, 26(10), 1987, 1323–1335. [21] C.A. Leary & E.N. Rappaport, The life cycle and internal structure of a mesoscale convective complex, Monthly Weather Review, 115(8), 1987, 1503–1527. [22] E.C. Barrett & D.W. Martin, The use of satellite data in rainfall monitoring (New York: Academic Press, 1981).

4. Conclusion In this paper we have introduced the NAN with neuronadaptive activation function (equation (1)), capable of approximating any continuous (or piecewise continuous) function, to any desired accuracy. The approximation ability of NANs to any continuous function is established, and a learning algorithm is derived to tune both the free parameters in the neuron-adaptive activation function and the connection weights between neurons. Our work diﬀers from previous research in that we use adaptive activation functions, whereas the neuron activation functions in most other models are ﬁxed and thus cannot be tuned to adapt to diﬀerent approximation problems. Moreover, we are able to use a single NAN to approximate any piecewise continuous function, as opposed to neural network groups, for example, which require many ANNs. Based on the NAN model, we developed a neuronAdaptive artiﬁcial Neural network System for Estimating Rainfall using satellite data (ANSER plus). Empirical results show that the NAN model is about 1.8% better than artiﬁcial neural network groups (NNGs) and about 16.4% better than ANNs when using an experimental rainfall estimation database. The empirical results also show that, using the NAN model, ANSER plus is able to achieve (1) a tenfold decrease in computation time and (2) an orderof-magnitude decrease in estimation error. Experiments indicate that the proposed NAN models present several advantages over traditional neuron-ﬁxed networks (such as much reduced network size, faster learning, and decreased simulation error). Furthermore, NAN models are able to eﬀectively approximate piecewise continuous functions, and oﬀer superior performance to neural network groups. Acknowledgments This work was ﬁnancially supported by a USA National Research Council Senior Research Associateship at the Atmospheric Research and Application Division of the NOAA (M. Zhang, 1999–2000); the Application and Research Centre, Christopher Newport University; and the Australian Research Council (2002 Institutional Research 221

Shuxiang Xu won the Australian government’s Overseas Postgraduate Research Award to research a Ph.D. at the University of Western Sydney, Sydney, Australia, in 1996, and was awarded a Ph.D. in computing by that university in 2000. He received an M.Sc. in applied mathematics and a B.Sc. in mathematics in 1989 and 1996, respectively, from the University of Electronic Science and Technology of China, Chengdu, China. His current interests include the theory and applications of artiﬁcial neural networks, especially the application of ANN in ﬁnancial simulation and forecasting and in image recognition. He is currently a lecturer at the School of Computing, University of Tasmania, Australia.

[23] J. Bullas, J.C. McLeod, & B. de Lorenzis, Knowledge Augmented Severe Storms Predictor (KASSPr): An operational test, Proc. 16th Conf. on Severe Local Storms, American Meteorology Society, Boston, 1990, 106–111. [24] M. Zhang, J. Fulcher, & R. Scoﬁeld, Rainfall estimation using artiﬁcial neural network group, Neurocomputing, 16(2), 1997, 97–115. [25] M. Zhang & J. Fulcher, Face recognition using artiﬁcial neural network group-based adaptive tolerance (GAT) trees, IEEE Transactions on Neural Networks, 7(3), 1996, 555–567. [26] M. Zhang, J.C. Zhang, & J. Fulcher, Neural network group models for data approximation, International Journal of Neural Systems, 10(2), 2000, 123–142. [27] M. Zhang & J. Fulcher, Higher order neural networks for satellite weather prediction, in J. Fulcher & L. Jain (eds.), Applied intelligent systems: New directions (Berlin: Springer, 2004). [28] J. Fulcher, M. Zhang, & S. Xu, The application of higher-order neural networks to ﬁnancial time series, in J. Kamruzzaman, R. Begg, & R. Sarker (eds.), Artificial neural networks in finance and manufacturing (Hershey, PA: Idea Group, 2006). [29] R.A. Scoﬁeld, The NESDIS operational convective precipitation estimation technique, Monthly Weather Review, 115, 1987, 1773–1792. [30] J. Xie & R.A. Scoﬁeld, Satellite-derived rainfall estimates and propagation characteristics associated with mesoscal convective system (MCSs), NAOO Technical Memorandum NESDIS, 25, 1989, 1–49.

John A. Fulcher is currently a professor in the School of Information Technology and Computer Science at the University of Wollongong, Australia. He is the author of more than 100 publications, including the bestselling textbook Microcomputer Interfacing (Addison Wesley, 1989), Applied Intelligent Systems (Springer, 2004), as well as contributing three chapters in the Handbook of Neural Computation (Oxford University Press, 1997). He serves as a reviewer for several journals, especially in neural networks and computer education, and is a regular contributor to ACM Computing Reviews. Prof. Fulcher also serves on the editorial board of Computer Science Education (CSE) and has been guest editor for special issues of CSE (Australasia) and Computer Standards and Interfaces (Artiﬁcial Neural Networks ). He is currently editing a book—along with Prof. Lakhmi C. Jain—for Springer Verlag entitled Handbook of Computation Intelligence. Prof. Fulcher is a senior member of IEEE and a member of ACM.

Biographies Ming Zhang was born in Shanghai, China. He received his M.S. degree in information processing and Ph.D. degree in the research area of computer vision from East China Normal University, Shanghai, China, in 1982 and 1989, respectively. He held Postdoctoral Fellowships in artiﬁcial neural networks with the Chinese Academy of the Sciences in 1989 and the USA National Research Council in 1991. He was a face recognition airport security system project manager and Ph.D. co-supervisor at the University of Wollongong, Australia in 1992. Since 1994, he was a lecturer at the Monash University, Australia, with a research area of artiﬁcial neural network ﬁnancial information system. From 1995 to 1999, he was a senior lecturer and Ph.D. supervisor at the University of Western Sydney, Australia, with the research interest of artiﬁcial neural networks. He also held Senior Research Associate Fellowship in artiﬁcial neural networks with the USA National Research Council in 1999. He is currently an Associate Professor and graduate student supervisor in computer science at the Christopher Newport University, VA, USA. With more than 100 papers published, his current research includes artiﬁcial neural network models for face recognition, weather forecasting, ﬁnancial data simulation, and management.

222

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.