A comparison between functional networks and artificial neural ...

Neural Comput & Applic (2004) 13: 24–31 DOI 10.1007/s00521-004-0402-7

O R I GI N A L A R T IC L E

Alfonso Iglesias Æ Bernardino Arcay Æ J. M. Cotos J. A. Taboada Æ Carlos Dafonte

A comparison between functional networks and artificial neural networks for the prediction of fishing catches

Received: 26 June 2002 / Accepted: 24 February 2004 / Published online: 20 March 2004 Springer-Verlag London Limited 2004

Abstract In recent years, functional networks have emerged as an extension of artificial neural networks (ANNs). In this article, we apply both network techniques to predict the catches of the Prionace Glauca (a class of shark) and the Katsowonus Pelamis (a variety of tuna, more commonly known as the Skipjack). We have developed an application that will help reduce the search time for good fishing zones and thereby increase the fleet’s competitivity. Our results show that, thanks to their superior learning and generalisation capacities, functional networks are more efficient than ANNs. Our data proceeds from remote sensors. Their spectral signatures allow us to calculate products that are useful for ecological modelling. After an initial phase of digital image processing, we created a database that provides all the necessary patterns to train both network types. Keywords Artificial neural network (ANN) Æ Functional network Æ Remote sensing

1 Introduction In recent years, a large number of national and international organisations have focused on the increasing role of the ocean as a world-wide source of provision. The world population grows incessantly, and its nourishment depends more and more on the ocean’s natural storeroom. This is why the main preoccupation of the concerned organisms has become the management and protection of these limited natural resources. Already A. Iglesias (&) Æ B. Arcay Æ C. Dafonte Dept. of Information and Communications Technologies, University of A Coruna, Spain E-mail: [email protected] J. M. Cotos Æ J. A. Taboada Remote sensing laboratory (TELSIG), Dept. of Electronic and Computation, University of Santiago, Spain

most of the commercial fisheries have been affected by a policy of quotas, biological stops and the establishment and enlargement of exclusive economic areas by the riparian states. An activity that was in the hands of local fishermen until only a few years ago has rapidly adapted itself to modern times and changed into a worldwide business that moves large sums of money and relies on the latest technologies. Fishing, just like any other industrial activity, has become subject to the volatility of the free market and to the laws of supply and demand; in order to survive, it must decrease its exploitation costs and increase its catch levels. While commercial agreements open the doors of our markets to the very competitive fish prices from other countries, the catch levels of some of our most profitable species are decreasing at an alarming rate due to a flagrant overexploitation of the fishing grounds. Since environmental conditions play a fundamental role in fishing operations, monitoring these variables may improve the exploitation results. Numerous articles use artificial neural networks (ANNs) to analyse the relationship between diverse species and their most appropriate fishing environment. Komatsu et al. [1], for instance, predict the catches of Japanese sardines by analysing the weights that result from a backpropagation training. Brosse et al. [2] also use backpropagation for the prediction of fishing abundancy in lakes. Certain authors, such as Dreyfus-Leon [3], have applied ANNs to the creation of a model for the behaviour of fishermen. This article proposes functional networks to improve the results of a determined ANN application for the estimation of fishing possibilities on the basis of environmental and oceanographic conditions. Although other methodologies could be applied to this case, we believe that the nature of the problem leads to the use of ANNs and functional networks. There is no mathematical model, not even a complex one, and the casuistic relations are not clearly present. Several authors confirm this statement: Brosse et al. [2] compare the results of a multiple linear regression to an ANN,

25

and prove that ANNs are much more precise than the two attempted regressions. Groves et al. [4] compare a backpropagation network to a Cox regression for the medical prediction of a certain type of leukemia among children, and their prognoses with the Cox regression are similar to those based on a composition of two ANNs. Other factors in favour of the ANNs are their enormous generalisation capacities, the possibility to include discrete and continuous variables (Aussem et al. [5]) and the fact that there is no previous knowledge and that, consequently, we cannot use a rules-based system. In future articles, we will present a hybrid system of neural-functional networks and expert systems based on the knowledge gathered by both networks. This system will increase our prediction capacity and diminish the error rate.

2 Functional networks Castillo et al. [6] present functional networks as an extension of ANNs. Our definition is simple but rigorous: a functional network is a network in which the weights of the neurons are substituted by a set of functions. Some of its advantages include the following issues: 1. Unlike ANNs, functional networks can reproduce certain physical characteristics that lead to the corresponding network in a natural way. However, reproduction only takes place if we use an expression with a physical meaning inside the functions database, and since we do not dispose of that kind of information, this particular advantage does not apply in our case. 2. The estimation of the network parameters can be obtained by resolving a linear system of equations. It is a fast and unique solution, and the global minimum of an error function. Castillo et al. [7] resumes all the functional network models that can be applied. We have opted for the simplicity and power of the Unicity model, explained by Castillo et al [8], with 3 functions f1 f2 and f3 (Fig. 1). Our proposal is to predict the Prionace Glauca catches with the functional network shown in Fig. 2. It was derived from the network presented by Castillo et al [8]. Our objective is to calculate the functions {fi} i = 1,...,5 that allow us to obtain an acceptable prediction of

Fig. 1 A unicity model

our output variable. In order to obtain a unique solution, we only need to fix the functions {fi} i = 1,...,5 in one point, as explained by Castillo et al [8]. How can we train the network? Consider the following equivalence: zj ¼ f51 ðf1 ðx1j Þ þ f2 ðx2j Þ þ f3 ðx3j Þ þ f4 ðx4j ÞÞ , f5 ðzj Þ ¼ f1 ðx1j Þ þ f2 ðx2j Þ þ f3 ðx3j Þ þ f4 ðx4j Þ

ð1Þ

j = 1,...n training patterns. The error can be measured as follows: ej ¼ f1 ðx1j Þ þ f2 ðx2j Þ þ f3 ðx3j Þ þ f4 ðx4j Þ f5 ðzj Þ j = 1; . . . n training patterns

ð2Þ

We now approximate the functions {fs} s=1,...,5 by using a linear combination of the known functions of a given family Fs = { us1, ..., usm } s = 1,...,5. In other words: fs ðxÞ ¼

m P

asi /si ðxÞ

s = 1, ...,5

ð3Þ

i¼1

The formula of the sum of the squares of the errors, with z = x5 , is: !2 n n 5 X m X X X 2 Q¼ ej ¼ asi /si ðxsj Þ ð4Þ j¼1

j¼1

s¼1 i¼1

For reasons of simplicity, the negative sign associated to f5 is included in the coefficients a5i. We also need to define certain auxiliary conditions that guarantee the unicity of the solution: fk ðx0 Þ

m X

aki /ki ðx0 Þ ¼ ak k = 1, ...,5

i¼1

ak ,x0 are given constants

ð5Þ

Our objective is to calculate the coefficients aki. in such a way that the estimated error becomes as small as possible. This implies that we have to minimize the error function while taking into account the auxiliary

Fig. 2 A functional network for the prediction of Prionace Glauca catches

26

the height of oceanic surfaces. Satellite applications in this field are so diverse that they have become an essential part of physical, chemical and biological oceanography. The energy reflected by the oceanic surface, with all its physical characteristics, is detected and j¼1 s¼1 i¼1 processed by spatial platforms. Thanks to the spectral ! 3 m X X answers obtained by the measurement instruments, we þ kk aki /ki ðx0 Þ ak ð6Þ can often determine the nature and/or conditions of the k¼1 i¼1 radiation source, whose spectral emissions and reflecWe obtain the minimum with the following system of tance curves curves become a spectral signature. To train our two network types, we used the inforlinear equations, where the unknown quantities are mation of three different satellites: NOAA-14, OrbView coefficients asi and multipliers kt. 2 and Topex-Poseidon. After a digital processing work, 8 !2 we disposed of five different kinds of input data: sea n 3 X m X X > @Qk > > surface temperature (SST), thermal fronts, heating¼ 2 a / ðx Þ / ðx Þ þ k / ðx Þ ¼ 0 > si sj jt t 0 si tr tr > > > cooling tendency (thermal anomaly), chlorophyll-a j¼1 s¼1 i¼1 < @atr concentration and altimetric anomaly. We obtained t = 1, ...,5; r = 1, ...,m > SST, thermal fronts and thermal anomaly data of the > > m > > @Qk X NOAA-14 satellite, a chlorophyll-a concentration of > > ¼ ati /ti ðx0 Þ at ¼ 0 : OrbView 2 and an altimetric anomaly of Topex-Posei@kt i¼1 don. All these variables have a big influence in the ð7Þ behaviour of the studied fish. The temperature, the altimetric and thermal anomaly and the thermal fronts are If the set of approach functions Fs = { us1, ..., usm } the physical enviroment, and the chlorophyll-a concen(with s = 1,...,5 and m = size of the used basis) is tration indicates the evolution of chain food. The Prionace Glauca, better known as the blue shark, linearly independent, the matrix of the system is nonsingular and has therefore a unique solution. This con- lives in the Atlantic ocean and the Mediterranean sea. stitutes a clear advantage compared to the conventional The catches that appear in our training set correspond to networks, in which a final solution can have many rel- daily data provided by a collaborating boat: there are catch data, but also the coordinates (longitude and latative minima. Functional networks have been applied successfully itude) of the place of the catch. A software developed in to resolve differential equations [9] and to predict non- C and in IDL 5.1 then uses the images from the satellite linear models of temporal series [6]. These examples to obtain information for each day and for the detershow that functional networks are a powerful tool: they mined coordinates. Many researchers use remote sensing and ANNs to generalize the classical ANNs and visibly improve their predict environmental parameters: Murtagh et al. [10], predictions. for instance, apply an ANN and statistical methods to compare physical simulation models with observable data. Other authors, such as Yang et al. [11], use data 3 A preliminary study of the training data derived from the SPOT satellite for their estimations on Our data sources consist of three different satellites. This biological parameters and water quality. section explains the nature of these satellites and the products that each of them provides. Before training the functional networks, we performed a sensibility analysis 3.2 An analysis of the principal components so as to determine whether or not the input data of our network present redundant information. Two methods In some cases, satellites provide us with redundant data. It is, therefore, interesting to find out before the training were used for this analysis: if there exists a correlation between the variables: if – Principal component analysis certain data does not bring new information to the – Kohonen networks (KNs) training, we must eliminate the data and so increase the number of patterns available for processing problems (e.g., the presence of clouds). Initially, we disposed of five different kinds of data: 3.1 Data sources sea surface temperature (SST), thermal fronts, heatingRemote sensing allows us to observe the surface of the cooling tendency (thermal anomaly), chlorophyll-a earth from a privileged position and on a scale that is concentration and altimetric anomaly. Let us now find ideal for especially designed sensors. The range of oce- out if a correlation exists among them. In order to obtain that information, we analysed the anic parameters that can be measured from space is very wide, going from temperature, colour and rugosity to principal components of the entry data of our network, conditions. To this effect, we use the Lagrange multipliers and construct the auxiliary function Qk: !2 n 5 X m X X Qk ¼ asi /si ðxsj Þ

27 Table 1 The representation quality in the new factorial space

SST-network (X1) Thermal anomaly (X2) Thermal front Chlorophyll (X3) Altimetric anomaly (X4)

Initial

Extraction

1,000 1,000 1,000 1,000 1,000

0,780 0,698 0,723 0,790 0,988

Table 2 The matrix of rotated components Component

SST-network (X1) Thermal anomaly (X2) Thermal front Chlorophyll (X3) Altimetric anomaly (X4)

1

2

3

-0,880 0,873 @0 @0 @0

@0 @0 0,850 0,830 @0

0,124 0,111 @0 @0 0,993

with the following results: Table 1 gives us an idea of the representation quality of the corresponding variable in the new factorial space: high extraction means little information loss. According to Table 1, the best represented variable is the altimetry; the worst represented variable is the thermal anomaly of the waters. Table 2 shows the five variables projected on three principal components. Two variables, thermal anomaly and front presence, are closely related, since they are represented by a similar projection in principal component 2. As a result, we can use just one of both variables without any significant loss of information. The correlations can be considered logical, because they are based on products that derive from the same satellite (NOAA14). Principal component 3 contains information that corresponds to the variables chlorophyll, surface temperature (SST) and altimetry. In variable 1, the chlorophyll projections have opposite signs, which leads us to conclude that the chlorophyll concentration increases when the temperature decreases. Now, in order to obtain sufficient training patterns, we simply need to eliminate one element from our study: the Thermal Fronts or the Thermal Anomaly. We will maintain the latter, because it provides us with a larger amount of data for training and validation. 3.3 A preliminary study with Kohonen networks The Kohonen networks (KNs) [12] have been used successfully to group correlated data. Foody [13], for instance, applies a self-organising map (SOM) for the classification of vegetation data, and his results are similar to those obtained with three alternative grouping algorithms. We therefore used the KNs to validate the results of our principal components analysis. We implemented a KN with 5 entry units. These units represent the variables that, in a first phase, we tried to

Fig. 3 The proposed Kohonen network

relate to the Prionace Glauca catches. The second layer consists of a bidimensional set of 8*8 neurons (see Fig. 3) in which the different classes of the entry set are projected. The training parameters of the KN are the following: 1. h(t): the learning speed 2. Adaptation ratio r(t): the ratio of the area affected by the winning unit. 3. Horizontal size: the same value used for the creation of the network We have used the following values to obtain the results presented below: for 2500 training cycles: h(0)=0.999 r(0)=3 Figure 4 shows a bidimensional layer of 8*8 units, grouped according to the winning weights, and according to this legend: X = corresponding number of the neuron or processing unit Y = neuron/s of the winning entry layer/s: 1.- SST (Sea Surface Temperature) 2.- Heating-cooling tendency, thermal anomaly 3.- Presence of thermal front, taking into account its force 4.- Chlorophyll concentration in mg/m3 5.- Altimetry The connection that clearly exists between entry variables 2 (thermal anomaly) and 3 (presence of a thermal front) is due to the fact that the yellow neurons have learned to respond jointly, as if they were one, to the two stimuli. This means that the information presented by these two variables is redundant, and that the training set can be reduced to four entry variables instead of five. Thanks to this simplification, we now have a larger number of complete patterns at our disposal. For various reasons, satellites are indeed not always able to process the data from determined coordinates on Earth: the presence of clouds, for instance, masks the surface temperature of the sea.

28

Fig. 4 Training results

– g: the learning parameter, which specifies the step width of the gradient descent. The bigger the g, the faster the learning. In our study, we vary this parameter between 1 and 2. – l: the momentum term, which specifies the proportion of the previous change of the weights added to the current change. Typical l are in the (0,1) interval; we have fixed it at 0.05. – c: the elimination of constant terms. This value is added to the one derived from the activation function, in order to prevent the variation of the weights to equals 0 when it finds points with a zero derivative. Typically, c values are in the (0,0.25) interval. We use 0.1. – dmax: the maximal difference between a teaching value and a network output oj, i. e., which is propagated back as dj=0. If we must consider values superior to 0.9 as 1 or values inferior to 0.1 as 0, then dmax=0.1. This avoids excessive training of the network. Typical dmax values are 0,0.1 or 0.2. We fixed this parameter at 0.1. While creating the network, choosing the training patterns and training the network, we took into account the following points:

Fig. 5 Some of the tested network topologies. We can see four inputs (SST, chlorophyll, the thermal anomaly and the heatingcooling and altimetric anomaly. The output is the catches/efforts of Prionace Glauca

Figure 4 does not show the relationship between the sea surface temperature and the chlorophyll concentration, because it is limited to the winning variables with a dominant weight. The variables with inhibiting behaviour are left out. (Fig. 5 shows some of the tested network topologies). This result is obviously coherent with the principal components analysis. The relationships between variables 1-3 and 1, 2 and 3 in the SOM are less clear and did not appear in our previous study, because there the data was represented in three variables, which caused inevitable loss of information (cf. Table 1). Principal component 3 (cf. Table 2) reflects the grouping of processing units with winning weights of variables 1–5, i.e., SST - altimetry. It is less strong than the previous ones.

4 Results 4.1 Results of the ANNs Before commenting on the results, here is a clarification of the symbols that refer to the parameters of the training algorithm:

– The dimension of the network: In general, a total of three layers is sufficient, although the number of units and layers usually depends on the existing problem. In the case of the fishing conditions study, we eliminated an entry that did not provide significant information and kept only four parameters. The output is the catch information of the collaborating boat. – The number of units of the hidden layer: We studied the behaviour of the network according to the number of units in the hidden layer. Using as few units as possible makes it easier to extract knowledge rules. If the network does not reach a solution, more hidden nodes may be needed; if it does, we can try to eliminate a node. On the other hand, a large number of neurons may imply that the network is simply memorizing training sets, which means that there is very little generalisation and increased computing time [14]. – Training data: We could use all available patterns to train the red, but this is neither necessary nor recommendable. We have divided all the data in two different sets: the training set and the validation set. The latter is used to verify if, in the course of the training, the network shows a small error for data that is not used in the learning algorithm. In general, this type of networks generalises easily: given a set of distinct entry vectors, the perceptron learns to adapt to the similitudes of these vectors and to ignore the irrelevant ones. At any rate, this extrapolation would be erroneous if we trained the network inadequately or insufficiently: if we used, for instance, only one concrete class of entry vectors, the posterior identification of the members of this class could be imprecise. Training data must therefore cover the entire expected entry space. Also, we have normalised the entry data as well as the exit data inside the [0,1] interval.

29

– Learning weights and parameters: The initial weights of the connections between neurons must be given small and aleatory initial values (± 0.5). The selection of the value of g has a significant effect on the network’s performance: if g is small, more iterations will be necessary and learning will be slower. If g is very big, the network may move away from the minimal value. On the other hand, in presence of l (the moment), we add a portion of the old change in the actual weight, which tends to lead the weight changes in the same direction. – Local minima in the error space of the weights: if we reach a local minimum, the error may be excessively big. If a network stops learning before it has reached an acceptable solution, we can usually solve the problem by changing the number of hidden nodes, or by starting all over with a different set of initial weights. We have tried several network topologies with between two and four hidden nodes. We have presented the same training and validation set to these three different topologies. The training set consisted of 72 different patterns; the validation counted 23. We opted for a network with the following characteristics: 1. Two neurons in the hidden layer. This formula has proven to be the best for our application. 2. The parameters for the retropropagation algorithm affected the training’s speed, not its result. More concretely, the weights for g=1.5, g=1.75 and g=2.0 are very similar. Figure 6 shows how the error decreased as the iterations of the training algorithm increasd. The upper curve represents the MSE error on the validation set, the lower curve is the MSE of the training set. We can see that, as the training advances, the error of both sets diminishes and reaches an acceptable minimum. 4.2 The results of the functional networks We have trained the functional network of Fig. 2 according to the algorithm described earlier. We used the same data as those of the ANN’s training so as to compare the results.

The basis for the calculation of functions {fi} i = 1,...,4 is the functions family {1, x, x2, x3}. For f5, however, we used family {1, x}, because it allowed us to obtain an easily invertible function, as stipulated by the linear model [7]. The only criterion for our choice of function bases was simplicity: there is no model or mathematical equation that provides us with preliminary information. If a mathematical function with a physical meaning existed in our problem, it would belong to the family of approximating functions. Also, we would be giving our processing units a physical interpretation that does not exist in ANNs. After resolving the system of equations, the following approximation provides us with the global minimum of the error function: f1= 1.00213–0.00744703x + 0.0231891x2 – 0.0178763x3 f2= 0.997519–0.0125818x + 0.0299628x2 – 0.0148996x3 f3= 1.0024–0.00504234x + 0.0083502x2 – 0.00570404x3 f4= )3.84602+12.2356x – 26.8179x2 + 19.4283x3 f5= 0.998144–0.00185626x With these functions, we have calculated the MSE error of the functional network on the training set and the validation set, with the results in Table 3. As we can see, in both cases the results of a network trained with a propagation algorithm (backpropagation) are improved considerably: We will repeat this analysis for another species, the Katsowonus Pelamis or Skipjack, to test the generalisation capacity of the functional networks with validation sets of a larger size. The historic data comes from the Indian Ocean, surrounding Somalia Island: oSSTMax: the monthly averages of daily maximum temperatures. oSSTMin: the monthly averages of daily minimum temperatures. oSSTMedia: the monthly averages of daily medium temperatures. oX and Y: the monthly wind averages. They play an important role in the mixing of surface waters, which improves the growth of phytoplankton and zooplankton and thereby the entire food chain. oAltimetry: the monthly altimetric anomaly average obtained from images of the Topex-Poseidon satellite. oCatch/effort: the monthly average of Skipjack catches, divided by the hours of effort that were necessary. Unlike the Prionace Glauca study, the fields in this case were obtained through the collaboration of the boats of Table 3 A comparison of the results of the prediction of Prionace Glauca catches

Fig. 6 A representation of the MSE according to the training cycles and the following parameters: g=1.25, l=0.05, c=0.1, dmax=0.1

MSE training set MSE validation set

ANN

Functional network

0.009 0.019