an application of artificial neural networks in environmental pollution ...

AN APPLICATION OF ARTIFICIAL NEURAL NETWORKS IN ENVIRONMENTAL POLLUTION FORECASTING Emil Lungu, Mihaela Oprea, Daniel Dunea University Petroleum-Gas of Ploiesti, Department of Informatics Bd. Bucuresti Nr. 39, Ploiesti, 100680 Romania [email protected], [email protected], [email protected]

The system RNA-AER is described in section 3, and the experimental results obtained so far are discussed in section 4. Finally, the conclusion of the paper and the future work is presented in section 5.

ABSTRACT The paper presents an application of feed-forward artificial neural networks in air pollution short time forecasting. The time series used in the experiments are measurements of some air pollutants specific to urban regions (e.g. NO2, SO2, PM10, TSP). Several tests were run in order to obtain the best results of the environmental pollution forecasting problem. We focus on the comparisons made between RProp and Quickprop training algorithms in respect with MSE obtained after a fixed number of epochs. The experimental results are similar with a slightly better performance for RProp.

2. Environmental protection and time-series forecastings In the last years the environmental problems became of great importance due to their impact on the quality of life on the earth. The pollution of the air, soil and water has major effects on human health as well as on all ecosystems from our planet. The reduction of the dramatical effects of the environmental pollution can be made by developing some efficient systems that monitor, analyse, forecast, and control the evolution of the pollutants concentrations for a specific type of environment (air, soil, water). The databases that collect data (e.g. pollutants concentrations) in such systems can be used to generate time series used in the forecasting systems. One of the artificial intelligence techniques that provide an efficient forecasting instrument is given by artificial neural networks. Several forecasting systems that use different types of neural networks were reported in the literature. In [2] it is described a feed-forward neural system applied to ecological phenomena forecasting. A neuro-fuzzy approach used for short-term weather forecasting is discussed in [3]. A LVQ network combined with genetic algorithms presented in [4] is applied to genetic fog occurrence forecasting in Japan. In [5] it is described an application of a feed-forward neural network combined with genetic algorithms to rainfall forecasting. Some recent reported forecasting neural systems can be found in [6], [7], [8], and [9]. Most of the forecasting systems are applied to meteorology, and few systems are applied in the environmental pollution domain. Taking into account that the environmental problems are of great concern, it is necessarily to use efficiently tools for making accurate predictions of the evolution of a certain environmental process. Based on the measured (observed) data such tools have to recognize the patterns and together with information about the seasonal factors to draw the future trend of the process when its lately evolution is close to it.

KEY WORDS Artificial neural networks, environmental protection, time series forecastings

1. Introduction The improvement of the environmental protection management involves the development of some efficient software instruments that assist the experts in making their decisions. The environmental pollution forecasting is one of the critical problems that can be solved in realtime. Based on the time series with pollutants concentrations in a certain window time, a forecasting system must provide information on the near future evolution of those pollutants concentrations. Such information is usually correlated with the meteorological forecasts. A feed-forward artificial neural network is one of the artificial intelligence techniques that proved to be a good forecasting instrument [1]. The paper presents an application of feed-forward artificial neural networks in air pollution short time forecasting. The time series used in the experiments are measurements of some air pollutants specific to urban regions (NO2, SO2, PM10, total suspended particles). In order to obtain the best results of the environmental pollution forecasting problem several tests were run. In this paper it is described a detailed analysis of the effects that different parameters of the feed-forward neural network have on the accuracy of the forecasted environmental data. The paper is organized as follows. Some brief information concerning the field of environmental protection and the time series forecasting problems are given in section 2. 595-156

187

Drawing the trend and anticipating accurately the change in the evolution of a process allow planning efficiently the future activity to diminish the impact when some prescribed values are exceeded (e.g. the maximum admisible concentration of a specific pollutant). The field of statistics which deals with the analysis of time dependent data is called time series processing. One of the most wide-spread types of processing is the forecasting time series. There are a lot of developed techniques that are used in practice. We can mention for example: random walks, moving averages, trend models, seasonal exponential smoothing, ARIMA parametric models, etc. Contributions of research in some branches of mathematics and computer sciences (operational research, numerical analysis, neural networks, evolutionary computing, and fuzzy logic) has led to the development in the field of time series processing. As a consequence new methods were proposed as attracting alternatives to the classical models Holt-Winter (1960) and Box-Jenkins (1976). Neural networks for example, succeeded to give good results for time series processing when the data present noise and nonlinear components. Their capacity of learning and generalization recommend them as good tools in a wide area of applications. The most popular architecture used in practice is the multilayer feedforward neural network. Their processing units (neurons) are organized in layers and there exist only forward connections (that is their orientation is from the input layer toward the output). This type of nets started to be extensively used in the late ’80 when it was introduced the standard back-propagation algorithm. Since that time, the multilayer feed forward artificial neural networks have had a large applicability (financial, health, meteorology, environmental protection). The subsequent researches have been oriented to find faster algorithms for training the network and to provide algorithms to automate the design of an optimal network topology for a specific problem. From the multitude of faster algorithms we can mention the standard backpropagation with momentum or with variable learning rate, the adaptive RPROP, or algorithms based on the standard numerical optimization techniques (FletcherPowel, conjugate gradient, quasi-Newton algorithm, Levenberg-Marquardt).

The activation function used for the hidden and output layers was the symmetric sigmoid function (tanh). Since this function transform the real axis into the range (-1,1) the data had been normalized before their use and transformed back in their real values after simulation. In the training stage we used different learning algorithms, but the best results were obtained with RPROP and QUICKPROP.

Figure 1 Feed-forward neural network with one hidden layer. RPROP algorithm introduced by Riedmiller and Braun [10] is a supervised batch learning which accelerates the training process in the flat regions of the error function and when the iterations get near by a local minimum. This algorithm allows different learning rates of updating for each weight. These rates are changed adaptively in respect with the change of sign in the corresponding partial derivative of the error function. They change progressively but without getting out of an initially prescribed interval. The algorithm is described by four parameters denoted by

η + , η − , Δ max , Δ min .

The first

two parameters give the increasing and decreasing factor for adjusting the update size and they are chosen such that 0 < η

< η + < 1 . The size-step of the update is bounded by Δ min and Δ max . For the tests we run there −

were used the following values of these parameters:

η + = 1.25 , η − = 0.5 , Δ max = 50 , Δ min = 0 . QuickProp is a batch training algorithm introduced by Fahlman in 1988 [11] and which try to take in consideration information about the second order derivative of the performance error function. In [12] it is showed that QuickProp is a particular case of the multivariate generalization of the secant method for nonlinear equation. The local minimum of the batch error function is reached in a critical point which is a zero of the gradient. In order to find the zeros of such system one may apply the Newton’s iteration. Though this kind of iteration may have a quadratic convergence nearby the solution it is strongly dependent of the initial iteration and also the amount of the involved work is expensive. In practice Newton’s iteration is replaced by a quasi-Newton iteration which uses an approximate of the Jacobian and saves the involved amount of computation. The approximation of the jacobian by a diagonal matrix with

3. Description of the system RNA-AER We have developed the forecasting system RNA-AER for the domain of air pollution forecasts in urban regions. Here, RNA-AER stands for the Romanian abbreviation of Artificial Neural Network for Air Pollution. This is a part of a complex system based on different techniques of Artificial Intelligence (multi-agents, knowledge base system, artificial neural networks, neuro-fuzzy) and that is designed to analyse the pollution level of air, water and soil. At present, RNA-AER system consists in a feedforward neural network with a single hidden layer. Figure 1 shows the architecture of the network. 188

its entries computed with finite difference formulas proves that QuickProp belongs to this category of quasiNewton iterations. Its convergence is not anymore quadratic but it still remains superlinear in a vicinity of the solution. In all our experiments with Quickprop algorithm we have used the same value (equal to 1.75) for the maximum growth factor denoted by μ in [11].

the neural network is denoted as n1-n2-n3, where n1 is the number of nodes in the input layer, n2 is the number of nodes in the hidden layer, and n3 is the number of nodes in the output layer. Since the training is sensitive to the initial values of the weights for all tables presented below there were performed 10 tests for each algorithm and it was taken the mean of the resulted values.

3.1 Implementation 4.1 Experiment 1 All tests reported in this paper were done using an application implemented in C++. This application offers a friendly user interface from which one may be chosen the different parameters that describe the network and the training algorithm. The program takes the raw data from a one column text file and applies the necessary transformations in the pre-processing stage. After training, the application tests the network on the validation set of samples and shows the error. Then, the user is able to see the graphics for the evolution of the error in the training process and the actual and forecasted data. The different parts of this application will be adapted and used as forecasting module in the final forecasting system RNA-AER that will monitor the concentration levels for the main air pollutants.

The first study uses as experimental data the daily mean concentration of NO2. Figure 2 presents a comparison of the normalized training error in the case of a 6-4-1 feedforward neural network when it is trained with the following algorithms: Rprop, Quickprop and the standard steepest descent with momentum. Initial values of the network weights were randomly created in the interval (-0.1,0.1). Training error 0.030

Rprop Quickprop Steepest descent with momentum

0.025

MSE

0.020

4. Experimental results

0.015

0.010

The experimental data used in our study were collected at the main center of the Environmental Protection Agency from Targoviste. They represent daily measured data starting from 3rd January 2005. The main pollutants that we analysed were NO2, SO2, PM10, total suspended particles (TSP). It is known that the use of raw data may rarely give satisfactory results. In this case the training of the ANN will catch only general properties of the data series without being able to identify more refined characteristics. Therefore, it is needed a pre-processing step in which the initial data is transformed such that the new data series eliminates some characteristics from the analysis. Before feeding data to the inputs of our network we had applied a moving average technique which acts like smoother eliminating the outliers from the initial data. The resulted series is then used to extract the necessary samples for training the network. Since our goal is to obtain one day ahead forecasting each sample has the form ( xt − k +1 , xt − k + 2 ,...xt , xt +1 ) and the whole set of

0.005

0.000 0

500

1000

1500

2000

Epochs

Figure 2 Evolution of the training error for different learning algorithms NO2 0.07

Concentration of NO2 (mg/m3)

0.06

0.05

0.04

0.03

0.02

Target output Network output

0.01

0.00 0

samples was obtained by moving window technique. Here, xt +1 represents the forecasting data while the other

20

40

60

80

100

120

140

160

180

200

Time

Figure 3 Real and forecasted concentration of NO2

numbers are the corresponding inputs. Three forth of this set was used for training while the rest was used in the validation process. The following experiments present how different parameters which describe the neural network model affect the accuracy of the forecasted data. The topology of

Figure 3 shows the real and forecasted data in the case of training with RPROP algorithm. After 2000 epochs the normalized mean squared error was equal to 0.00459 for the training data and 0.0078 for the validation data.

189

Table 1 presents how the training and validation errors depend on the number of neurons in the hidden layer.

SO2 0.06

Table 1 Experimental results for NO2

0.05

Concentration of SO2

RPROP QUICKPROP Network MSE MSE MSE MSE topology Training Validation Training Validation data data data data 6-4-1 0.00283 0.00842 0.00350 0.00716 6-8-1 0.00246 0.01014 0.00347 0.00691 6-12-1 0.00222 0.01018 0.00352 0.00696 6-16-1 0.00161 0.01475 0.00338 0.00676 In exchange, QUICKPROP algorithm shows some oscillations in respect with the increasing number of hidden layer. In this case the introduction of more neurons in the hidden layer will bring some edges with very low weights.

0.01 Target output Network output

50

100

150

200

Time

Figure 5 Real and forecasted concentration of SO2 From figure 5 we notice a good agreement between the real and forecasted data for the training part and few significant differences between them on the validation part (the ¼ right part of the graphic). These differences can be made less significant if we take for example a longer part of the data series for training the network. In order to obtain the graphic in figure 5 we trained our 6-41 network using 146 samples. After 2000 epochs we obtained a normalized MSE equal to 0.00348 for the training set and a value of 0.00519 for the validation set. If we extend the training set to 150 samples (this case includes the point with the highest difference between real and forecasted data in figure 5) the training error changes to 0.00404 while the validation error decreases to 0.0034. Table 2 shows again (as in table 1) that taking more than 4 neurons in the hidden layer and using RPROP as learning algorithm will result in an over fitting on the training data.

Training error 0.040 Steepest descent with momentum RPROP QUICKPROP

0.025

MSE

0.02

0

The second study is based on a data series of 201 terms which represent the daily mean concentration of SO2. Repeating the above strategy we obtain a similar behavior for the training errors (figure 4). As in the first case, among the tested algorithms, RPROP has the fastest decay but this happens only in the few first iterations and then the error decays slowly.

0.030

0.03

0.00

4.2 Experiment 2

0.035

0.04

0.020 0.015 0.010 0.005

Table 2 Experimental results for SO2

0.000 0

500

1000

1500

2000

Network topology

Epochs

Figure 4 Evolution of the error in the training process

6-4-1 6-8-1 5-12-1 6-16-1

RPROP MSE MSE Training Validation data data 0.00377 0.00387 0.00248 0.00803 0.00228 0.00682 0.00191 0.00883

QUICKPROP MSE MSE Training Validation data data 0.00595 0.00333 0.00585 0.00326 0.00596 0.00329 0.00593 0.00327

4.3 Experiment 3 The next experiment refers to the time data series representing the concentration of PM10. In this case the series has only 101 terms. This lack of a longer data series makes inefficiently the training. The number of network inputs has a major influence over the forecasting performances. Table 3 shows how training error depends on the number of network inputs. For each case we used the same number of samples (equals to 80). Increasing the number of network inputs results in the decreasing of 190

the number of testing samples. Even so, the table shows an increasing for the MSE on the validation data. This shows that increasing the number of input neurons will improve the capability of the network to respond very well for the data close to those one used in the training process but loses its generalization abilities.

PM10 (6-4-1)

100

Concentration of PM10

80

Table 3 Experimental results for PM10 Network topology 2-4-1 4-4-1 6-4-1 8-4-1



60

40

20 Target output Network output 0 0

20

40

60

80

100

Time

Figure 7 Real and forecasted conc. of PM10 (6-4-1)

Figures 6 and 7 present the graphics for 2 and respectively 6 neurons in the input layer.

Table 4 shows how the network training and testing depend on the number of training samples. The network topology was 2-4-1.

PM10 (2-4-1)

Table 4 Experimental results for PM10 (2-4-1) 100

No of training samples 70 80 90

Concentration of PM10

80

60

40

Target output Network output 0 20

40

60

80

QUICKPROP MSE MSE Training Validation data data 0.01233 0.00743 0.01143 0.00858 0.01068 0.01261

4.4 Experiment 4 In the following experiment we used 221 terms data series representing the concentration of total suspended particles. Figure 8 and 9 show the real data and the network outputs (after training with RProp) for 2-4-1 and respectively 6-4-1 topologies. From the last one forth right part of these graphics we see that taking a higher number of network inputs will lead to poor forecasting capabilities. In the first case we obtained the following errors 0.00889 and 0.00911. For the second case the training error was equal to 0.00539 and the error for validation data was equal to 0.02473. For both cases the training samples set represents three forth of the total set of samples.

20

0

RPROP MSE MSE Training Validation data data 0.01053 0.00961 0.00984 0.01085 0.00955 0.01297

100

Time

Figure 6 Real and forecasted conc. of PM10 (2-4-1)

191

Table 6 Experimental results for TSP (2-4-1)

Total suspended particles 0.16

No of training samples 164 (75%) 175 (80%) 197 (90%)

0.14

Concentration

0.12

0.10

0.08

0.06


0.04

20

40

60

80

100

120

140

160

180

200

Total suspended particles 0.16

0.14

Concentration

0.12

0.10

0.08

0.06


0.02 20

40

60

80

100

120

140

160

180

200

220

Time

Figure 9 Real and forecasted conc. of TSP (6-4-1) Table 5 is similar with table 3. From the columns corresponding to Quickprop we see that increasing the number of inputs we obtain an improvement for both training and testing of the network. Table 6 shows that for a 2-4-1 topology, Rprop algorithm needs more training data in order to improve its forecasting properties. In exchange, Quickprop is able to identify pretty well the pattern of the data series even if we use for training the network only 75% from the whole set of samples.

2-4-1 4-4-1 6-4-1 8-4-1


0.00982

0.00809

0.00864

0.00813

0.00962

0.00827

Acknowledgements The research work reported in this paper is funded by a Romanian Postdoctoral Programme under the CEEX research project no. 1533/2006.

Table 5 Experimental results for TSP Network topology

0.00813

The paper described a short-term forecasting system RNA-AER applied in the area of air pollution in urban regions. The time series that were used in the experiments include data regarding the concentrations of the most important air pollutants (SO2, NO2, PM10, total suspended particles). The best training algorithms applied in our experiments were RPROP and QUICKPROP. Taking into account that usually, the time series specific to air pollutants concentrations have no regularities, a short-term forecasting neural instrument has to be designed by experiments, testing different topologies of the neural network and different constant and variable values for the parameters of the network (e.g. learning rate, momentum). In this paper we have presented a detailed analysis of such experiments made by our neural forecasting system RNA-AER. Our future work has three directions of research. The first one will extend our investigation to the long term forecasting. There will be considered networks with more outputs and also will be implemented algorithms that use the predicted values to make new predictions. The second research direction will be oriented towards the implementation of some evolutionary algorithms to automate the design of the network topology. Finally, we will focus our attention to the implementation of a selection method that take into account several training algorithms and chooses the best one according to the best forecasting properties.

Figure 8 Real and forecasted conc. of TSP (2-4-1)

0

0.00889

220

Time

0.04

QUICKPROP MSE MSE Training Validation data data 0.01015 0.00777

5. Conclusion and future work

0.02 0

RPROP MSE MSE Training Validation data data 0.00905 0.00931


References [1] E. A. Plummer, Time Series Forecasting with FeedForward Neural Networks: Guidelines and Limitations, Master Thesis, University of Wyoming, Department of Computer Science, July 2000. [2] M. Oprea, Some Ecological Phenomena Forecasting by Using an Artificial Neural Network, Proc. of the 16th

192

IASTED Int. Conf. Applied Informatics, 1998, GarmischPartenkirchen, Germany, ACTA Press, 30-33. [3] M. Teshenehlab, N. Sarmadi, Short-Term Weather Forecasting Using Neuro-Fuzzy Approach, Proc. of the 16th IASTED Int. Conf. Applied Informatics, 1998, Garmisch-Partenkirchen, ACTA Press, 143-147. [4] Y. Mitsukura, S. Ito, M. Fukumi, N. Akamatsu, Genetic Fog Occurrence Forecasting System Using a LVQ Network, Proc. of the 16th IASTED International Conference Applied Informatics, 1998, GarmischPartenkirchen, Germany, ACTA Press, 200-203. [5] S. Ito, Y. Mitsukura, M. Fukumi, Rainfall Forecast Using a Neural Network with a Real-Coded Genetical Preprocessing, Proc. of the 16th IASTED International Conference Applied Informatics, 1998, GarmischPartenkirchen, Germany, ACTA Press, 210-213. [6] D. Wieland, F. Wotawa, G. Wotawa, From neural networks to qualitative models in environmental engineering, Computer-Aided Civil and Infrastructure Engineering, Blackwell Publishers, 17(2), 2002, 104-118. [7] F. Wotawa, G. Wotawa, Deriving qualitative rules from neural networks – A case study for ozone forecasting, AiCommunications, IOS Press, 14(1), 2001, 23-33. [8] M. Oprea, A case study of knowledge modelling in an air pollution control decision support system, AiCommunications, IOS Press, 18(4), 2005, 293-303. [9] V. Demyanov, M. Kanevski, E. Savelieva, V. Timonin, S. Chernov, V. Polishuk, Neural network residual stochastic cosimulation for environmental data analysis, Proc. of the 2nd ICSC Symposium on Neural Computation NC’2000, Berlin, Germany, 2000, 647-653. [10] M. Riedmiller, H. Braun, A direct adaptive method for faster backpropagation learning: The RPROP algorithm, in H. Ruspini, editor, Proc. of the IEEE Int. Conf. on Neural Networks, San Francisco, 1993, 586-591. [11] S.E. Fahlman, Faster learning variations on backpropagation: an empirical study, Proceedings of the 1988 Connectionist Models Summer School, D.S. Touretzky, G.E. Hinton, and T.J. Sejnowski, editors, Morgan Kaufmann, San Mateo, CA, 1988, 38-51. [12] M.N. Vrahatis, G.D. Magoulas and V.P. Plagianakos, Convergence analysis of the quickprop method, in Proc. of the International Joint Conference on Neural Networks (IJCNN'99), Washington DC, #848, Session: 5.3, 1999.

193