Forecasting of Geomagnetic Activity with Neural

13 downloads 0 Views 196KB Size Report
which the input layer has as many units as the di- ... the three networks (y1; y2; y3), we connect the vec- ... and hidden units, use the architecture given in Eqn.
Forecasting of Geomagnetic Activity with Neural Networks

1

R.S. Weigel, W. Horton, T. Tajima

Institute for Fusion Studies, The University of Texas, Austin, TX 78712

T. Detman

NOAA Space Environment Center, Boulder, CO 80303

Abstract

Neural networks are developed as a predictive lter for geomagnetic activity. The networks use as inputs a vector containing time delayed copies of the time series for the geomagnetic activity level (AL) and the product of the interplanetary solar wind velocity and recti ed southern component of the interplanetary magnetic eld (VBs). Several new methods which improve predictive ability are considered, including a gating method which accounts for di erent levels of activity, and a preconditioning algorithm which allows the network to ignore very short time

uctuations. These results are compared with the previous work by Hernandez et al. [1993].

1. Introduction Since the linear prediction studies of Bargatze et al. [1985] which showed that the coupling between the magnetosphere and the solar wind is nonlinear, many attempts have been made to model the magnetosphere and predict electrojet indices. Several related approaches which use a state space reconstruction approach [Packard et al., 1980] to predict the inputoutput data include local linear [Price and Prichard, 1993; Vassiliadis et al. 1994], ARMA [Detman and Vassiliadis, 1997], and neural network models [Hernandez et al., 1993]. Another recent method is to model the magnetospheric dynamics as a low dimensional system driven by the solar wind. This is the approach taken in the nonlinear physics based model (WINDMI) developed by Horton and Doxas [1996, 1998]. This model takes into account conservation of energy and charge and has the advantage of having relatively few parameters compared to the number of weights required for a successful network model. Although neural or ARMA networks are successful at predicting the future values of the electrojet index, they do not say anything about the underlying physics of system. It has been noted [Klimas et al., 1996] that for certain nonlinear di erential equations it is possible to determine a network which will have the same input-output relationship by using a Volterra series expansion [Wray and Green, 1994]. Therefore, one approach would be to train a network and then determine the equivalent set of nonlinear di erential equations which gives the same inputoutput response. This method may work in principle, but there is no guarantee that the nonlinear di erential equation derived from a successful neural network model will be useful for understanding the dynamics of the system, unless the order of the derived equations is small. For this reason we attempt to determine the network which best predicts the response of the magnetosphere and make no restrictions on the architecture or size of the network. Past studies using neural networks to predict the AL index were not able to demonstrate that their prediction ability was signi cantly greater than that of of a comparable ARMA network [Hernandez et al., 1993]. Given the strong evidence that the magnetosphere dynamics is nonlinear, it is expected that a neural network will perform better than its corresponding ARMA network, since a neural network with a nonlinear activation function has the ability to produce a much larger set of models given a set of

2 input and output data. We reconsider the Hernandez et al. [1993] problem of constructing a neural network to predict the geomagnetic activity given in the database of Bargatze et al.. The relevant input variables are taken to be the the solar wind function VBs, and the westward electrojet index, AL. In this study we extend the work of Hernandez et al. by including a larger set of data when training and calculating the error and using a di erent scaling method which eliminates the clipping of the output. We develop a selection band on a new architecture which takes into account activity level by using a weighted average of outputs from networks trained on intervals with di ering levels of activity [Ramamurti and Ghosh, 1998]. This architecture is able to account for the scaling problem intrinsic to neural networks with nonlinear activation functions which are to cover a wide dynamical range. Finally, we consider preconditioning the input data. Future work will compare the predictive ability of these neural networks with that of physics based models.

2. Neural Networks and State Space Reconstruction A neural network is a system designed to recognize patterns in a given set of inputs. The architecture consists of an input layer, a number of hidden layers, and an output layer. When used for forecasting purposes, the goal is to produce an output which is the same as the output of the system under consideration, given an input which is believed to contain enough information to determine the output. One of the challenges of using a neural network for forecasting purposes is to determine the relevant input variables. For systems with many degrees of freedom this diculty is avoided by use of results of modern dynamical system theory. Given a scalar time series, O(t), produced from a measurement of an autonomous system with many degrees of freedom, it was shown by Packard et al. [1980] that if the dynamics of the system lies on a low dimensional attractor, then the attractor can be reconstructed by creating a delayed coordinate vector O(t) = O (t ?  ); O (t ? 2 ); :::; O (t ? m ): (1) It was shown by (Takens [1981]) that this attractor reconstruction is one-to-one for typical  > 0 and for large enough m. In the case of the solar wind driven response of the magnetosphere, we have an input-output system,

3 rather than an autonomous system. Therefore, to better specify the state of the system for a network, we must include the vector which contains information about the input at past times [Casdagli, 1992]. The state of the system, S, as viewed from the input of the neural network is then given by the l + m dimensional vector S(t) = I (t ?  ); I (t ? 2 ); :::; I (t ? l ); (2) O(t ?  ); O(t ? 2 ); :::; O(t ? m ) where I (t) is the system driving variable, and m and l are adjustable time lags. In this work we use a multilayer perceptron in which the input layer has as many units as the dimension of S(t), the activation for the output units are taken to be linear, and there is a single output, representing the predicted AL. The neural net is given by

y=

M Xd X w g( w (2)

j =0

j

i=0

ji Si ) (1)

(3)

where the activation function is given by g(x) = tanh(x). An intrinsic problem with the nonlinear mapping in (3) is that for a xed set of weights (2) w ; w(1) the activation function will saturate when its argument becomes large. The e ect of this saturation is seen in Figure 2c of Hernandez et al.. To prevent the saturation problem and to account for the fact that the magnetosphere has di erent levels of activity, we propose a new architecture suggested by the physics of the problem which requires a nonlinear response to the driving amplitude and has the possibility to incorporate current saturation as predicted by physics based models. To model this, we separate the data into three groups corresponding to low, medium and high activity levels, and for each activity level a separate network is trained. Next, all three networks are combined, and their individual outputs are connected to a gating network as shown in Figure 2. In addition to the outputs of the three networks (y1 ; y2 ; y3), we connect the vector y4 ; :::; y(m+3) = O(t), so that the nal network is mX +3 XP (4) Y = Wp(2) g( Wpi(1) yi ) p=1

i=1

and the weights W ; W are determined by training with the previously determined weights, w(1) ; w(2) , held xed. This three level network system is similar in context to the local linear lter approach used by Vassi(1)

(2)

ladis et al. [1994] in that both methods use di erent lters for di erent activity levels. The di erence is that only three levels (low, medium, and high) should be sucient for the network, rather than the larger number of subdivisions required for local linear lters. To determine the weights, we use the scaled conjugate gradient method [Bishop, 1995] to minimize the mean square error between the actual output y and the network output y~. Also, we introduce a regularization term which is intended to penalize networks with very large values of weights. The error which is minimized is thus M X 1 E = M (yj ? y~j )2 + 

j =1

(5)

The rst term on the right hand side is the usual mean square error, and the second term is a constant  multiplied by a weight decay

= 21 (w(1)  w(1) + w(2)  w(2) ): (6) Note that in all of the networks considered for this paper, we use  = 0:1. To determine the quality of the network, we use average relative variance de ned by PMj=1(yj ? y~j )2 (7) ARV = PM j =1 (yj ? yj )2 where the neural network output is represented by y~j ; the average and actual actual output are represented by yj and yj , respectively. Since the ARV will vary between test data set, we consider a large number of data sets and calculate the mean ARV (ARV ) of all the test sets.

3.

Results

Visual inspection of the AL and V Bs time series reveals uctuations on a very short time scale. Since we are more interested in long time predictions, we expect better predictions if we omit these short time scale uctuations so that a network will train on longer term trends. To lter the data, we fourier transform the time series and then set the highest 95% of the frequency components to zero, which corresponds to approximately 14% of the power. To test the e ectiveness of this preconditioning method, we train one network with input and output data that has been ltered and another for which the raw time series has been used. Both networks have 8 input and hidden units, use the architecture given in Eqn.

3 with  = 6 (corresponding 15 min) and l = m = 4, and are trained using intervals 1, 21 and 31 of the Bargatze database. In addition to ltering the data as described above, we scale both the AL and V Bs by 103; this scaling method avoids the clipping problem observed by Hernandez et al.. To calculate the ARV, we use the un ltered database output. In Figure 1a we see the 15 min predictions of the network which was trained on ltered input and output data. Figure 1b shows the predictions when un ltered data is used. As shown in Figure 1, ltering the input and output gives a signi cant improvement in the ARV: 0.13 vs. 0.31. Also, the network trained on the un ltered data has a tendency to produce a time{delayed copy of the actual output. This is an indication that the network is not recognizing underlying dynamics, but is minimizing the ARV by producing a time{delayed copy of the output. Note that the ARV is the average ARV found using intervals (2, 3, 8, 10, 15, 21 and 29). Next we consider a network which takes into account the dynamics of the solar wind { AL response by splitting the training data into three intervals corresponding to low (intervals 1-10), medium (11-20) and high (21-31) activity levels. For each activity interval a network is trained (8 input and hidden units,  = 6 and l = m = 4) using four of the ltered data sets from that interval. Finally, a fourth gating network is trained (7 inputs and 3 hidden units) with inputs from the three networks and four additional inputs corresponding to the previous four activity levels of the time{series we are trying to predict. This network is shown in Figure 2. In Figure 3 we compare the effectiveness of the gating network with the outputs of the single networks. We see that the performance is dramatically improved, especially when the activity level is high (intervals 21-31). The reason for this is that the high activity level intervals contain periods of low activity, and the gating network is able to distinguish which network will produce the best prediction based on the activity level. In the previous work of Hernandez et al., the performance of an ARMA and MA network were compared with a neural network. They found that the best ARMA and MA nets had a 15 min prediction performance in the neighborhood of ARV = 0.20{ 0.25, and the neural networks had an ARV near 0.45. In comparison, the gated and ungated networks developed here had a 15 min prediction ARV of 0.12 and 0.20, respectively.

4. Conclusion

4

Several neural networks using various time delays and prediction times have been created in order to predict the activity level of the magentosphere given an input of the solar wind. The problem reported in the past in which the output of the network is clipped is resolved, and the e ectiveness of the network with di erent training sets is considered. We nd that the neural network is a better single step predictor than a equivalent ARMA system studied in Hernandez et al.. Also, it is found that preconditioning the data gives predictions which are both more accurate for long range prediction, and less sensitive to uctuations between time steps. Finally, we show that a network which takes into account the di erent levels of activity is able to signi cantly improve predictability.

Acknowledgments. The work was supported by the NSF Grant No. ATM-97262716, and NASA Grants NAGW-5193 and NAGW-6198.

References

Bargatze, L.F., D.N. Baker, R.L., McPherron and E.W. Hones, Magnetospheric response to the IMF: substorms, J. Geophys. Res., 90, 6387, 1985. Bishop, C.M., Neural networks for pattern recognition, Oxford University Press, 1995. Casdagli, M., A dynamical systems approach to modeling input-output systems, In Nonlinear Modeling and Forecasting, M. Casdagle and S. Eubanks, eds., AddisonWesley, 1992. Detman and Vassiliadis, Review of Techniques for Magnetic Storm Forecasting, in Magnetic Storms, Geophysical Monograph, 98, American Geophysical Union, 1997. Hernandez, J.V., Tajima, T. and Horton, W., Neural net forecasting for geomagnetic activity, Geophys. Res. Lett., 20, 23, 2707, 1993. Horton, W., I. Doxas, A low-dimensional dynamical model for the solar wind driven geotail-ionosphere system, J. Geophys. Res., 103, A3, 4561, 1998. Klimas, A.J., D. Vassiliadis, D.N. Baker, D.A. Roberts, The organized nonlinear dynamics of the magnetosphere, J. Geophys. Res., 101, 13,089, 1996. Packard, N.H., J.P. Crutch eld, J.D. Farmer, and R.S. Shaw, Geometry from a time series, Phys. Rev. Lett., 45, 712, 1980. Price, C.P. and D. Prichard, The non-linear response of the magnetosphere: October 30, 1978, Geophys. Res. Lett., 20, 771, 1993.

5 Ramamurti, V., Ghosh, J., On the use of localized gating in mixture of experts networks, Proc. SPIE Conf. on Applications of Science and Computational Intelligence, SPIE Proc. Vol., Orlando, 1998. Takens, F., Dynamical systems and Turbulence, Warwick 1980, ed. Rand, D.A. and Young, L.S., Lecture Notes in Math, V. 898, Springer- Verlag, Berlin, 1981. Vassiliadis, D. A., A. J. Klimas, D. N. Baker, and D. A. Roberts, A description of solar wind-magnetosphere coupling based on nonlinear lters, J. Geophys. Res., 99, 3495, 1994. Wray, J., and G.G.R. Green, Calculation of the Volterra kernels of nonlinear dynamic systems using an arti cal neural network, Biological Cybernetics, 71(3), 187, 1994.

W. Horton, T. Tajima and R.S. Weigel, Institute for Fusion Studies, The University of Texas, Austin, TX 78712 (e-mail: [email protected]) T. Detman, NOAA Space Environment Center, Boulder, CO 80303

This preprint was prepared with AGU's LATEX macros v4. File paper2_2col formatted September 21, 1998.

6 450

Prediction Actual

400

350

300

−AL

250

200

150

100

50

0

−50

0

2

4

6

8

10 Time (hr)

12

14

16

18

20

450

Prediction Actual

400

350

300

−AL

250

200

150

100

50

0

0

2

4

6

8

10 Time (hr)

12

14

16

18

20

Figure 1. Top panel: 20 min prediction of interval 8 with network trained on ltered data (ARV = 0:13). Bottom Panel: 20 min predicion of same network trained with un ltered data (ARV = 0:31).

7

S

low med high

y1 y2

gate

Al(t)

y3 O

S = I(t-τ), ...,I(t-lτ),O(t-τ), ...,O(t-lτ) O = O(t-τ), ...,O(t-lτ) Schematic of gated network. Three networks are trained using di erent levels of activity, and their outputs become inputs to the gate net.

Figure 2.

8 0.45

0.4

0.35

0.3

ARV

0.25

0.2

0.15

0.1

0.05

0

0

5

10

15 Interval number

20

25

30

Comparison of ARV generated with gated network (circles) with that without (stars). The training intervals have lled circles. The mean ARV of the nontraining intervals is 0.12 for the gated network, and 0.20 for the ungated network.

Figure 3.