CHLfuzzy: a spreadsheet tool for the fuzzy modeling of ... - Springer Link

1 downloads 0 Views 795KB Size Report
Mar 7, 2008 - natural ecosystems (Jшrgensen, 1994; Silvert, 2000;. Adriaenssens et al., 2004). Moreover, fuzzy interpre- tations of existing data may provide ...
Hydrobiologia (2008) 610:99–112 DOI 10.1007/s10750-008-9358-4

PRIMARY RESEARCH PAPER

CHLfuzzy: a spreadsheet tool for the fuzzy modeling of chlorophyll concentrations in coastal lagoons Georgios K. Sylaios Æ Nikolaos Gitsakis Æ Theodoros Koutroumanidis Æ Vassilios A. Tsihrintzis

Received: 30 July 2007 / Revised: 11 February 2008 / Accepted: 20 February 2008 / Published online: 7 March 2008  Springer Science+Business Media B.V. 2008

Abstract CHLFuzzy is a user-friendly, flexible, multiple-input single-output Takagi-Sugeno fuzzy rule based model developed in a MS-Excel spreadsheet environment. The model receives a raw dataset consisting of four predictor variables, e.g., water temperature, dissolved oxygen content, dissolved inorganic nitrogen concentration, and solar radiation levels. It then defines fuzzy sets according to a collection of fuzzy membership functions, allowing for the establishment of fuzzy ‘if–then’ rules, and predicts chlorophyll-a concentrations, which highly compare to the measured ones. The performance of the model was tested against the Adaptive Neural Fuzzy Inference System (ANFIS), showing satisfactory results. An extensive dataset of environmental observations in Vassova Lagoon (Northern Greece), during the years 2001–2002, was used to train the

model and an independent dataset collected during 2004 was used to validate CHLFuzzy and ANFIS models. Although both models showed a similar performance on the training dataset, with quite satisfactory agreement between observed and modeled chlorophyll-a values, the best results were obtained using the CHLfuzzy model. Similarly, the CHLfuzzy model depicted a fairly good ability to hindcast chlorophyll-a concentrations for the verification dataset, thus improving ANFIS model forecasts. Overall results suggest that CHLfuzzy can potentially be used as a lagoon water quality forecasting tool requiring limited computational cost. Keywords Fuzzy model  Fuzzy rules  Fuzzy inference system  Eutrophication  Phytoplankton  Lagoon management

Handling editor: P. Viaroli

Introduction G. K. Sylaios (&)  V. A. Tsihrintzis Laboratory of Ecological Engineering and Technology, Department of Environmental Engineering, Democritus University of Thrace, 67100 Xanthi, Greece e-mail: [email protected] N. Gitsakis Department of Hydrobiology and Animal Production, University of Thessaly, Volos, Greece T. Koutroumanidis Department of Rural Development, Democritus University of Thrace, Orestiada, Greece

Lagoon phytoplankton dynamics appear influenced by mass transport (advection and dispersion), exogenous environmental factors (e.g., water temperature, light extinction in the water column), and interactive biochemical kinetics, involving the available nutrients (Beck, 2005). Modeling of phytoplankton production in coastal lagoons refers to predicting chlorophyll-a concentrations, used as an aggregate variable being a function of various independent measured environmental parameters. Models of chlorophyll-a production

123

100

may involve dynamic water quality models, where hydrodynamics and biogeochemical routines are coupled (Lonin & Tuchkovenko, 2001; Hartnett & Nash, 2004). Neural network modeling emerged as an alternative for algal bloom prediction in shallow waters, where chlorophyll-a production is associated with a series of environmental factors (Yabunaka et al., 1997; Jeong et al., 2001; Wei et al., 2001; Scardi, 2001; Recknagel et al., 2002). Even though these models generally produce comparable results, by minimizing the root mean square error of approximations, they lack explicit representations between dependent and independent variables (Cao et al., 2006). Current research shows that fuzzy logic techniques are capable to treat the complex and non-linear relationships between environmental variables in natural ecosystems (Jørgensen, 1994; Silvert, 2000; Adriaenssens et al., 2004). Moreover, fuzzy interpretations of existing data may provide powerful and convenient tools in the description of processes and the reconstruction of scenarios for the development of lagoon management strategies (Newton & Mudge, 2005). Several investigators develop and utilize fuzzy routines, because these routines are able to deal with imprecise, uncertain or ambiguous datasets, and discover the underlying non-linear and complex relationships among these datasets (Fdez-Riverola & Corchado, 2004; Chen & Mynett, 2006; Laanemets et al., 2006). Previous applications include the use of fuzzy models in clustering of water quality data (Kung et al., 1992), in assessing the impact of environmental factors on fish distribution (Su et al., 2004), in developing fish-stock recruitment relations (Chen et al., 2000), and in predicting phytoplankton biomass in eutrophic lakes (Bobbin & Recknagel, 2001; Chen & Mynett, 2003), lagoons (MarsiliLibelli, 2004; Giusti & Marsili-Libelli, 2005), rivers (Chang et al., 2001; Maier et al., 2001), and reservoirs (Lu & Lo, 2002; Soyupak & Chen, 2004). Experience from the above works shows that fuzzy logic models could be used in a number of applications, but generally as a refinement to conventional optimization techniques, in which the usual crisp objective and some or all of the constraints are replaced by the fuzzy constraints (Chau, 2006). More specifically, the Takagi-Sugeno (TS) fuzzy systems have been widely applied in environmental studies due to their simplicity in the inference

123

Hydrobiologia (2008) 610:99–112

procedure and the possibility to incorporate a general condition on the physical structure of the system into the fuzzy system. In this article, a Takagi-Sugeno fuzzy model named CHLfuzzy is presented. It is a user-friendly, flexible, fuzzy rule based model in the form of a spreadsheet, in which the user imports the raw dataset of four predictor variables and the developed system defines the fuzzy sets according to a collection of fuzzy ‘if–then’ rules, and then produces an output highly comparable to the actual values observed in the real world. Since the construction of the membership function is subjective (Cornelissen et al., 2001), CHLfuzzy was designed to consider various membership functions (triangular, trapezoidal, and sigmoid), depending on the type of application. Standard spreadsheet tools are used to derive the proper values for the coefficients involved in the fuzzy decision rules, in order to minimize the standard error between observed and modeled chlorophyll-a values. CHLfuzzy model performance was tested against existing adaptive algorithms for fuzzy inference systems, as the standard Adaptive Neural Fuzzy Inference System (ANFIS), implemented in Matlab 7.0 (Jang, 1993).

Dataset description CHLfuzzy and ANFIS models were tested using an extensive dataset obtained in Vassova Lagoon, a typical Mediterranean, fishery-exploited, coastal lagoon, located on the northern coast of the Aegean Sea. The lagoon covers an area of 70 ha with a mean depth of 0.8 m and a maximum depth of about 3 m. It is connected to the open sea (Kavala Gulf, North Aegean Sea) with an inlet channel, allowing tidal water to enter the main lagoon body. Although Vassova Lagoon is located within an agricultural watershed, no direct connection with this area exists; however, water (and probably dissolved constituents) may seep into the lagoon through a peripheral levee and/or through groundwater movement, when water surface elevation behind the levee is higher than lagoon water (Tsihrintzis et al., 2007). The relations between estimates of freshwater input and tidal export loading rates suggest that during most of the year dissolved nutrients are transported from the lagoon to the coastal water. During spring, the development of seasonal thermocline at the adjacent

Hydrobiologia (2008) 610:99–112

101

Kavala Gulf enhances nutrient concentrations, allowing higher nutrient loads to enter the lagoon during the flood tidal phase. In the summer, nutrient loading rates are almost equal in magnitude during both tidal phases (Sylaios et al., 2004, 2006). The utilized training data consists of 330 sample sets of physical and chemical water parameters (water temperature, salinity, dissolved oxygen concentration, pH, turbidity), nutrient concentrations (nitrate, nitrite, ammonium, phosphate, silicate), chlorophyll-a concentrations, hydrodynamic parameters (flow speed and direction) and meteorological data (wind speed, wind direction, precipitation, and incident short-wave solar radiation), collected during the years 2001–2002 in the lagoon. The above described parameters were collected at irregular time intervals at a point located at the center of Vassova Lagoon. Data reduction to determine the four most significant abiotic driving factors was achieved by Principal Components Analysis (PCA). According to PCA methodology, when a variable has insignificant factor loading with a primary component, then that variable has insignificant contribution to the total variance of the system, and therefore can be eliminated from further consideration (Haan, 1977). Since chlorophyll-a is considered as the fuzzy model output, Principal Component Analysis was performed only on the remaining environmental variables (Chen and Mynett, 2003). Such reduction enables the model user to define the best available correlations between chlorophyll-a and the water quality parameters sampled in the system. Present analysis indicated that the variables mostly influencing chlorophyll-a concentrations (Chl-a) are water temperature (Temp, in C), dissolved oxygen concentration (DO, in mg/l), dissolved inorganic nitrogen concentration (DIN = nitrate + nitrite + ammonium, in lM) and incident short-wave solar radiation (SR, in Watt/m2). This result appeared consistent with the determination coefficients between Chl-a and Temp (R2 = 0.35; P = 0.01; t = 2.54; n = 330), DO (R2 = 0.54; Table 1 Statistics of the dataset imported in CHLfuzzy and ANFIS Takagi-Sugeno fuzzy logic models

Mean

P \ 0.0001; t = 5.9), DIN (R2 = 0.34; P = 0.05; t = 2.59) and SR (R2 = 0.62; P \ 0.0001; t = 3.39). Therefore, the model was executed using these four abiotic driving factors, as fuzzy model input variables. Experimental runs in combination with more fuzzy premises did not result to a remarkable improvement in model’s performance, leading to a significant increase in model complexity, computational time and effort. Chlorophyll-a concentrations utilized by CHLfuzzy and ANFIS were ln-transformed, since lntransformation of the dependent variable allows for the approximation of the normality assumption and leads to a more valid model (Soyupak & Chen, 2004). When a multiple regression model (MLR) was applied to these four significant independent variables, a poor determination coefficient (R2 = 0.172) was obtained. Application of fuzzy modeling techniques is expected to resolve significant relations between chlorophyll-a and the selected environmental variables. Data statistics of the sample set used by the fuzzy logic models are shown in Table 1.

Model description General description A Fuzzy Model is a tool utilizing the information observed from a complex phenomenon to derive a quantitative model. A fuzzy system is a nonlinear mapping between inputs and outputs. The mapping of inputs to outputs is in part characterized by a set of ‘‘IF–THEN’’ rules. A typical rule for the multiple input single output Takagi-Sugeno fuzzy system is of the form: ð2Þ ðpÞ Rr: IF ðx1 is Að1Þ r ; x2 is Ar ; . . .; xp is Ar Þ THEN yr ¼ fr ðx1 ; x2 ; . . .; xp Þ

ð1Þ

where A(i) is the fuzzy set corresponding to a r partitioned domain of input variable xj in the r-th Median

Standard deviation

Minimum

Maximum

Temp (C)

17.81

19.40

5.48

8.00

28.10

DO (mg/l)

7.39

8.08

2.70

2.56

14.63

DIN (lM)

11.81

7.18

13.08

0.0

62.36

2

SR (W/m )

324.75

324.45

130.33

56.53

856.71

Chl-a (lg/l)

2.35

4.02

4.27

0.10

80.53

123

102

Hydrobiologia (2008) 610:99–112

ð2Þ

R P



wi yi

i¼1 R P

ð3Þ wi

i¼1

where wi denotes the degree of fulfillment of the ith fuzzy rule, defined using the minimum or the product conjunction operators. Each developed Takagi-Sugeno fuzzy-rule-basedsystem consists of three parts: the definition of Fuzzy Membership Functions (FMFs), the construction of fuzzy decision rules and the fuzzy reasoning. The fuzzy membership functions During the fuzzification process all selected input water quality variables are rescaled into the [0, 1] interval and the corresponding fuzzy values for ‘‘Low’’ and ‘‘High’’ fuzzy sets are determined through the used defined fuzzy membership functions (Metternicht, 2001). CHLfuzzy spreadsheet has the ability to produce fuzzy sets using various Fuzzy Membership Functions (Fig. 1): (a) Triangular Fuzzy Membership Functions, (b) Trapezoidal Membership Functions, and (c) Sigmoid Fuzzy Membership Functions, defined as: lA ¼ ½ð1  mÞk1 ðx  aÞk =½ð1  mÞk1 ðx  aÞk þ mk1 ðb  xÞk 

x 2 ½a; b

ð4Þ

where k is the sharpness factor of the function, indicating an increase in membership to a fuzzy set, m [ [0,1] is the inflection factor representing the

123

Low

0

(b)

0

High

1

Inputs

1

Fuzzy Membership Value

where [br(0), br(1), br(2),…, br(p)] is the parameter vector. The crisp output of the fuzzy system may be determined by:

1

Low

0

0

High

1

Inputs

(c) 1 Low

Fuzzy Membership Value

yr ¼ fr ðx1 ; x2 ; . . .; xp Þ ¼ br ð0Þ þ br ð1Þx1 þ br ð2Þx2 þ . . . þ br ðpÞxp

(a) Fuzzy Membership Value

IF–THEN rule, p is the total number of antecedent consisting of the fuzzy model input variables, fr(.) denotes the linear function of the p input variables, and yr is the consequent of the rth inference rule. It is assumed that there are Rr (r = 1, 2,…,n) rules in the above mentioned form. The linear functions fr are model consequents defined as linear functions of the inputs by the following expression:

High

0 0

Inputs

1

Fig. 1 Available Fuzzy Membership Functions in CHLfuzzy. (a) Triangular (b) Sigmoid (solid line: k = 2, m = 0.8; dashed line: k = 3, m = 0.5; dotted line: k = 4, m = 0.2), and (c) trapezoidal

turning point of the function, and a, b are typical points of the function, with a membership degree to the fuzzy set considered of 0 and 1, respectively. Sharpness and inflection are the two parameters determining the shape of the fuzzy membership function (Dombi, 1990). In the present study three test cases were examined using the sigmoid Fuzzy Membership Functions: (a) a reduced sharpness high

Hydrobiologia (2008) 610:99–112

103

turning point curve k = 0.2, m = 0.8; (b) a moderate sharpness middle turning point curve k = 0.3, m = 0.5; and (c) an increased sharpness low turning point curve k = 4, m = 0.2 (Fig. 1c).

Step 1: The computation of the weighting factor using the ‘‘min’’ conjunction operation on the fuzzy membership values of each rule follows: wi;t ¼ minflðTempt ;Ai Þ ; lðDOt ;Ai Þ ; lðDINt ;Ai Þ ; lðSRt ;Ai Þ g ð5Þ

The fuzzy decision rules or the product conjunction operation: Each fuzzy rule generally consists of the antecedent block (IF-part of the rule) and the consequence (THEN-part of the rule). In this study, 16 fuzzy ‘IF– THEN’ rules were determined to relate the four selected input variables with the output variable of the fuzzy rule-based system. The general form of the fuzzy rules used by the model is presented in Table 2. The fuzzy reasoning During the defuzzification process of the model, for each record t (t = 1,…,N) of input environmental variables (e.g., Tempt, DOt, DINt and SRt), the computed chlorophyll-a value yt is then inferred from the following steps:

wi;t ¼ lðTempt ;Ai Þ  lðDOt ;Ai Þ  lðDINt ;Ai Þ  lðSRt ;Ai Þ

ð6Þ

where Ai represents the fuzzy sets defined in the ifpart of rule i (i = 1,…,16), l(x, Ai) denotes the value of the fuzzy membership function for each input variable x, and corresponding fuzzy set A as defined in the if-part of the rules. Step 2: For each ‘IF–THEN’ rule i, the modeled _ value of y i;t is calculated as: _ y i;t

¼ bi;0 þ bi;1 Tempt þ bi;2 DOt þ bi;3 DINt þ bi;4 SRt

ð7Þ

_

Step 3: The final fuzzy model output y t is obtained through defuzzification as the weighted average of all 16 inferred results derived by each rule:

Table 2 General form of the sixteen ‘IF-THEN’ fuzzy rules used in the TS fuzzy logic model for lagoon chlorophyll-a determination Rules

1

‘IF’-part of the rule

‘THEN’-part of the rule

Temp

DO

DIN

SR

L

L

L

L

2

L

L

L

H

3

L

L

H

L

4

L

L

H

H

5

L

H

L

L

6

L

H

L

H

7

L

H

H

L

8

L

H

H

H

9

H

L

L

L

10

H

L

L

H

11

H

L

H

L

12

H

L

H

H

13

H

H

L

L

14

H

H

L

H

15

H

H

H

L

16

H

H

H

H

_

y 1;t ¼ b1;0 þ b1;1 Temp þ b1;2 DO þ b1;3 DIN þ b1;4 SR

_

y 2;t ¼ b2;0 þ b2;1 Temp þ b2;2 DO þ b2;3 DIN þ b2;4 SR

_

y 3;t ¼ b3;0 þ b3;1 Temp þ b3;2 DO þ b3;3 DIN þ b3;4 SR

_

y 4;t ¼ b4;0 þ b4;1 Temp þ b4;2 DO þ b4;3 DIN þ b4;4 SR

_

y 5;t ¼ b5;0 þ b5;1 Temp þ b5;2 DO þ b5;3 DIN þ b5;4 SR

_

y 6;t ¼ b6;0 þ b6;1 Temp þ b6;2 DO þ b6;3 DIN þ b6;4 SR

_

y 7;t ¼ b7;0 þ b7;1 Temp þ b7;2 DO þ b7;3 DIN þ b7;4 SR

_

y 8;t ¼ b8;0 þ b8;1 Temp þ b8;2 DO þ b8;3 DIN þ b8;4 SR

_

y 9;t ¼ b9;0 þ b9;1 Temp þ b9;2 DO þ b9;3 DIN þ b9;4 SR

_

y 10;t ¼ b10;0 þ b10;1 Temp þ b10;2 DO þ b10;3 DIN þ b10;4 SR

_

y 11;t ¼ b11;0 þ b11;1 Temp þ b11;2 DO þ b11;3 DIN þ b11;4 SR

_

y 12;t ¼ b12;0 þ b12;1 Temp þ b12;2 DO þ b12;3 DIN þ b12;4 SR

_

y 13;t ¼ b13;0 þ b13;1 Temp þ b13;2 DO þ b13;3 DIN þ b13;4 SR

_

y 14;t ¼ b14;0 þ b14;1 Temp þ b14;2 DO þ b14;3 DIN þ b14;4 SR

_

y 15;t ¼ b15;0 þ b15;1 Temp þ b15;2 DO þ b15;3 DIN þ b15;4 SR

_

y 16;t ¼ b16;0 þ b16;1 Temp þ b16;2 DO þ b16;3 DIN þ b16;4 SR

_

‘‘L’’ and ‘‘H’’ denote the fuzzy sets ‘‘Low’’ and ‘‘High’’, respectively, yi;t (i = 1,…,16; t = 1,…,n, n = 330) is the fuzzy logic modeled chlorophyll-a value and the coefficients bi,j (i = 1,...,16; j = 0,...,4) are the fuzzy parameters to be computed by the developed spreadsheet

123

104

Hydrobiologia (2008) 610:99–112 16 P

_ yt

_

wi;t y i;t

¼ i¼116 P

ð8Þ wi;t

i¼1

where wi,t is the weighting factor, defined by Eqs. 5 or 6. Parameter estimation Following the above defined procedures, coefficients bi,j (i = 1,…,16; j = 0,…,4) from the fuzzy ‘if–then’ rules can be estimated by minimizing the sum of squares of errors (SSE): SSE ðbi;j ði ¼ 1; . . .; 16; j ¼ 0; . . .; 4ÞÞ N X _ ¼ ðyt  y t Þ2

ð9Þ

t¼1 _

where yt is the observed ln(Chl-a) and y t is the fuzzy logic modeled value obtained from Eq. 8. Description of CHLfuzzy spreadsheet The CHLfuzzy spreadsheet is organized in three interrelated worksheets: Data Input, Fuzzification, and Results. The general scheme of the CHLfuzzy model for the estimation of chlorophyll-a concentrations derived from a set of independent model predictors (Temp, DO, DIN, SR) is presented in Fig. 2.

In the worksheet Initial data of CHLfuzzy model, the user imports the observed dataset in the form of a matrix (columns: variables, rows: samples). The first four columns in this matrix contain the raw data for each environmental variable (water temperature, dissolved oxygen, nitrate, and solar radiation), while the last column contains the corresponding ln-transformed chlorophyll-a concentrations. In the Fuzzification worksheet, input data are rescaled into the [0, 1] interval. The user selects the applied fuzzy membership function (FMF) for each input variable, between the triangular FMF model (selection 1 in the ‘Fuzzification Method’ cell) and the sigmoid FMF model (selection 2 in the ‘Fuzzification Method’ cell). In this latter case, the user should also establish values for the parameters k(sharpness factor) and m(inflection factor) of the sigmoid FMF. When this process is completed, the Fuzzification worksheet calculates the fuzzy membership values for the ‘High’ and ‘Low’ fuzzy sets, defined as FMF(x,H) and FMF(x,L), respectively. Figure 3 presents an overview of the ‘Fuzzification’ worksheet, and especially the columns for the calculation of fuzzy sets, rules, and weights. The results produced by the application of the 16 fuzzy rules are computed, considering the bi,j coefficients stored in the Results worksheet. For each ‘if–then’ rule, the weighting factors wi,t are obtained as the minimum value corresponding to that fuzzy set. The application of the minimization procedures of Eq. 9 allows the _ calculation of the y i;t values, for each ‘if–then’ rule i

Rule 1: f1(x)

DO

DIN

Fuzzification

Rule 2: f2(x)

Weighting Factors

Temp

Aggregation of responses

Fuzzy Output

Defuzzification Crisp Output

SR Rule r: fr(x)

Model Input Variables

FMF Definition

Rule Base

System Optimization

Fig. 2 General scheme for the functioning of a multiple-input single-output Takagi-Sugeno fuzzy inference system followed by the CHLfuzzy spreadsheet

123

Hydrobiologia (2008) 610:99–112

105

Fig. 3 The worksheet ‘Fuzzification’ for fuzzy sets, fuzzy rules and weights computation _

and data value t. Finally, the products wi;t  y i;t are computed and the final model output is produced, following Eq. 8. In the Result worksheet the values of coefficients bi,j (i = 1,…,16; j = 0,…,4) are derived. Figure 4 illustrates an overview of the ‘Results’ worksheet where b-coefficients and the values of the model performance criteria are computed. The model validation criteria, defined as (a) the sum of the squares of errors (SSE), (b) the root mean squared error (RMSE), (c) the normalized objective function (NOF) and (d) the determination coefficient (R2) between the modeled and observed chlorophyll-a values are computed at this worksheet. SSE is minimized using the Solver Add-In Tool of MS-Excel. This tool can perform linear and non-linear optimization. The Solver produces the best possible optimization, given a set of cells with variable values (bi,j- cells) and a cell that must be optimized for the equal to zero solution (SSE-cell). A series of iterations take place in which the bi,j-cells are changed in order to achieve the SSE minimization. Best results are produced using quadratic complex formulas and central differencing for partial derivative estimation, since relations are highly non-linear. These settings may be controlled by the user through the Solver Options dialog box. The model in the form of a simple spreadsheet is now available at the following web site: www.env.duth.gr/eet/research.html.

The ANFIS model The Adaptive Neural Fuzzy Inference System (ANFIS) is based on the premise of mapping a fuzzy inference system (FIS) into a neural network structure so that the membership functions and consequent part parameters are optimized using a hybrid learning algorithm. ANFIS model supports only first or zeroorder Sugeno-type systems, which produce a single output obtained through a weighted average defuzzification process. However in ANFIS, the number of output membership functions must be equal to the number of rules and unity is the weight for each rule. Moreover, ANFIS cannot accept any customization, as other basic inference systems allow, such as CHLfuzzy. Therefore, the user cannot use other membership and defuzzification functions than those provided. ANFIS approach follows two different identification methods: the grid partition (GP) and the subtracting clustering (SC) methods. The grid partition method divides the data into rectangular subspaces based on the pre-defined number of the membership functions and their types. On the contrary, the subtractive clustering method determines datapoint clusters by measuring their potential in the feature space. Clustering algorithm parameters are: the range of influence, i.e., the distance from cluster

123

106

Hydrobiologia (2008) 610:99–112

Fig. 4 The worksheet ‘Results’ for b-coefficients and model performance criteria computations

center in each data dimension; the quash factor, a multiplier of the range of influence determining the neighborhood of a cluster center; the accept ratio, representing the potential as a fraction of the potential of the first cluster center, above which another datapoint will be accepted as a cluster center; and the reject ratio, i.e., the potential as a fraction of the potential of the first cluster center, below which another datapoint will be rejected as a cluster center. For the present work, the grid partition method was used to initialize the membership functions. The parameters of the membership functions were optimized on the identification dataset by a neural network back-propagation learning algorithm, while the consequent parameters were calculated by the

123

linear least squares method. The training epoch number for this optimization was set to 100. Values of independent variables were partitioned into membership functions, thus building different TakagiSugeno models, using triangular, trapezoidal, generalized bell-shaped, and Gaussian membership functions. Two membership functions for each of the antecedent variables were used. In another approach, the initial model parameters were formed using the subtractive clustering method. The same procedure as for the grid partition model was performed to optimize the parameters of the membership functions and to compute the consequent parameters. In order to produce the optimal model, the parameters of the subtracting clustering algorithm

Hydrobiologia (2008) 610:99–112

107

were varied between 0.5 and 2 for the quash factor, and 0.1 and 1.0 for the cluster radius and accept and reject ratios, respectively. Through optimization of the performance indices obtained for the training data, the optimal parameter combination was sought. Model validation The validity of output from CHLfuzzy and ANFIS models was tested by various statistical tests: (a)

The Sum of the Squares of Errors (SSE), the Root Mean Square Error (RMSE), and the Normalized Objective Function (NOF) of the modeled and observed values of chlorophyll-a were computed. SSE is defined as:

SSE ¼

N X

ðyt  yt Þ2 _

ð10Þ

i¼1

and RMSE as: vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi uN uP _ u ðy t  y t Þ2 ti¼1 RMSE ¼ N

ð11Þ

where yt are the observed ln(Chl-a) values at time t; are the fuzzy model values; and N is the total number of dataset. The parameter RMSE has to be as close to 0.0 as possible for good prediction. The D_NOF E is the ratio of the RMSE to the overall mean y t of the predicted by the model data (Tsihrintzis et al., 1998), defined as:

_ yt

NOF ¼

RMSE _

hyt i

ð12Þ

P _ _ where hy t i ¼ 1=N Ni¼1 yt is the average value of the fuzzy model output data. NOF has to be as close to 0.0 as possible. However, when parameter NOF is less than 1.0, then the theoretical method is reliable and can be used with sufficient accuracy (Hession et al., 1994; Kornecki et al., 1999; Tsihrintzis et al., 1998). (b) The validity is also tested through scattergrams, which are graphs of the predicted versus measured ln(Chl-a) values. Best match occurs when all points fall on a 1:1 slope line. Deviation from that line is measured by fitting through the points a straight regression line of the following equation:

lnðChlÞPredicted ¼ c lnðChlÞMeasured

ð13Þ

The slope c of this straight line should be equal to 1.0 for a perfect match. If this slope c is less than 1.0, the CHLfuzzy model underestimates the observed data. If the slope c is greater than 1.0, the model overestimates the observed values. Another parameter that evaluates the accuracy of the agreement is the determination coefficient R2, which shows data dispersion around the best fit line. The closer R2 is to 1.0 the less the points are scattered around the straight line. (c) The validity of the performance of the model during extreme chlorophyll-a conditions, was tested using the detection rate parameter (DR) and the false alarm rate (FAR). As Vassova Lagoon is a fishery-exploited system, a threshold value in chlorophyll-a concentration of 20 lg/l was established to define a eutrophic episode. Thus, DR and FAR parameters were used to evaluate model performance during alarming chlorophyll concentrations. The detection rate (DR) was defined as the fraction between the number of modeled episodes in which chlorophyll concentration exceeded the threshold value (DE), and the number of observed episodes of chlorophyll exceedence (EX): DR ¼ DE=EX

ð14Þ

Similarly, FAR is defined as the fraction of alarms that were false alarms. It is defined as the ratio of false alarms (FA) predicted by the model to the total number of observed episodes of chlorophyll exceedence (EX): FAR ¼ FA=EX

ð15Þ

Results CHLfuzzy and ANFIS model calibration Figure 5 illustrates the scattergrams of the fuzzy estimated chlorophyll-a versus the observed chlorophyll-a values imported in the CHLfuzzy and the ANFIS models. In addition, the values of the performance indices of the CHLfuzzy and the ANFIS models during the training stage are shown in Table 3.

123

108

Hydrobiologia (2008) 610:99–112

(a) 60

(b) 60 Predicted Chlorophyll-a Values

Predicted Chlorophyll-a Values

CHLfuzzy Training Results y = 0.966x 2 R = 0.97

50

40

30

20

10

0

ANFIS Training Results 50

y = 0.965x 2 R = 0.92

40

30

20

10

0 0

10

20

30

40

50

60

Observed Chlorophyll-a Values

0

10

20

30

40

50

60

Observed Chlorophyll-a Values

Fig. 5 Scattergram of observed versus modeled ln(Chl) values for the training dataset of Vassova Lagoon, using (a) CHLfuzzy model using sigmoid FMF under increased

sharpness factor (k = 4) and limited inflection factor (m = 0.2), and (b) ANFIS model using Gaussian membership function

All models examined using different fuzzy membership functions produced results in excellent agreement with the observed data. Chlorophyll-a values produced by the CHLfuzzy model were slightly underestimated, with slope c (Eq. 14) varying from 0.95 (triangular and sigmoid with low sharpness factor and high turning point FMFs) to 0.97 (sigmoid with middle to high sharpness factor). The determination coefficient of modeled and observed data obtained very high values (R2 = 0.94-0.97). The use of sigmoid fuzzy memberships improved the results produced by the triangular and trapezoidal considerations. RMSE and NOF values were found rather low, (0.37–0.35 and 0.29–0.25, respectively), meaning that CHLfuzzy model achieved reliable and sufficiently accurate training. An improvement in the performance of the model, in all validation criteria, occurs when accounting the sigmoid membership functions, especially under

increased sharpness factor (k = 4) and limited inflection factor (m = 0.2). This is particularly worth mentioning since no rigorous analysis on the methodology for the selection of the sharpness and inflection factors of the FMFs exists. Therefore, CHLfuzzy allows the user to perform different experiments and select the most appropriate form for the FMF according to the criteria established. In parallel, ANFIS attempted to produce the best function mapping the input variables, i.e., water temperature, dissolved oxygen, nitrate concentration and solar radiation, to the output variable, i.e., chlorophyll-a concentration utilizing the 2001–2002 dataset. Results from different ANFIS models following the grid partition method showed that the application of different types of membership functions did not result in remarkable differences; however the Gaussian membership function resulted in the best values for all performance indices. The subtractive clustering method produced results that optimized all performance indices when the cluster radius was set to 0.6, the quash factor to 1.1 and the accept and reject radius to 0.3 and 0.2, respectively. Figure 6 depicts the membership functions of input variables obtained through subtractive clustering for the four-rules model, while Fig. 7 presents the ANFIS model architecture. In general, the best ANFIS model considered finally achieved quite satisfactory results for the training dataset (Fig. 5b), although all performance indices were found lower than those achieved from CHLfuzzy model (Table 3). It can be seen that

Table 3 Performance criteria for the training dataset under the different FMFs considered in the CHLfuzzy and ANFIS fuzzy logic models FMF

c

R2

SSE

RMSE NOF

CHLfuzzy model Triangular

0.95 0.94 27.94 0.37

Trapezoidal

0.95 0.94 27.90 0.36

0.29 0.29

Sigmoid (k = 2, m = 0.8) 0.95 0.95 27.39 0.36

0.28

Sigmoid (k = 3, m = 0.5) 0.97 0.96 26.56 0.35

0.27

Sigmoid (k = 4, m = 0.2) 0.97 0.97 26.04 0.35

0.25

Best ANFIS model

0.35

123

0.96 0.92 29.83 0.38

109

Degree of Membership

Hydrobiologia (2008) 610:99–112

(a)

Dissolved Inorganic Nitrogen (uM)

Water Temperature (°C) Degree of Membership

(c)

(b)

Dissolved Oxygen (mg/l)

(d)

Solar Radiation (W/m²)

Fig. 6 Membership functions of input variables obtained through subtractive clustering and further optimization for the ANFIS model with four rules

ANFIS method under-predicts the chlorophyll-a concentration (c = 0.96, R2 = 0.92, RMSE = 0.38, NOF = 0.35) and under-predicts the peak chlorophyll-a values (DR = 0.66 and FAR = 0.75). CHLfuzzy and ANFIS model validation In order to test the ability of CHLfuzzy and ANFIS models to hindcast chlorophyll-a concentrations, an independent dataset consisting of 177 samples, collected from Vassova Lagoon during 2004, was used. Figure 8 plots the observed versus the modeled data pairs produced by both models. Overall, a good agreement between the simulations and the observed values can be seen by both models, for the reduced chlorophyll-a concentrations and for the peak

conditions. As displayed in Table 4, it is obvious that although the results obtained by CHLfuzzy and ANFIS models were relatively successful, the CHLfuzzy model gives best fit to the observed results and produced better prediction of chlorophyll-a concentrations than the developed Matlab algorithm (Fig. 9). A more detailed comparison shows that CHLfuzzy improves the ANFIS forecast by about 18.4% in RMSE and by 17.18% in NOF. In addition, an improvement of forecast in detecting the peak chlorophyll-a values was depicted, as DR increased from 0.88 in ANFIS to 0.93 in CHLfuzzy. However, CHLfuzzy produced one false alarm event for chlorophyll-a higher than ANFIS.

Discussion and conclusions

input

inputmf

rules

outputmf

Fig. 7 Network architecture of the ANFIS model

output

Primary production models in enclosed and semienclosed water bodies involve the quantification and estimation of chlorophyll-a concentrations, based: (a) on simple empirical relationships with poor descriptive and predictive capabilities, (b) on dynamic water quality models resolving the relevant physical, chemical and biological processes and interactions, requiring precise and detailed datasets for model calibration and validation, and (c) on the recently developed fuzzy logic and neural network approach, needing sufficient and high quality in-situ observations of various related parameters. The present article presents the formulation of such fuzzy logic

123

110

Hydrobiologia (2008) 610:99–112

(a) 100

(b) 100 ANFIS Validation Results

Predicted Chlorophyll-a Values

Predicted Chlorophyll-a Values

CHLfuzzy Validation Results y = 0.838x 2 R = 0.77

80

60

40

20

0

y = 0.868x 2 R = 0.67

80

60

40

20

0 0

20

60

40

80

100

0

Observed Chlorophyll-a Values

40

20

60

80

100

Observed Chlorophyll-a Values

Fig. 8 Scattergram of observed versus modeled ln(Chl) values for the validation dataset of Vassova Lagoon, using (a) CHLfuzzy, and (b) ANFIS Table 4 Performance criteria of CHfuzzy and ANFIS fuzzy logic models for the validation dataset of Vassova Lagoon c

R2

SSE

RMSE NOF DR

FAR

CHLfuzzy 0.84 0.77 199.61 1.06

0.53

0.93 0.13

ANFIS

0.64

0.88 0.11

0.87 0.67 231.07 1.30

model for chlorophyll-a representation, being dependent on the four most significantly related environmental factors (i.e., water temperature, dissolved oxygen, dissolved inorganic nitrogen, and solar radiation). Previous investigators selected enclosed water bodies (lakes and reservoirs) for the application of similar methods, whereas CHLfuzzy, was successfully applied on the Vassova Lagoon extensive dataset. Moreover, previous studies (Marsili-

(b) 100 Chlorophyll-a concentration (ug/l)

Chlorophyll-a concentration (ug/l)

(a) 100

Libelli, 2004; Fdez-Riverola & Corchado, 2004; Giusti & Marsili-Libelli, 2005) utilized datasets recorded by automated monitoring stations, while the present dataset involved parameters derived through the chemical analysis of water samples (e.g., dissolved inorganic nitrogen and chlorophyll-a concentrations). The present methodology allows for the collection of physical and chemical variables for system description, consisting of qualitatively better but quantitatively limited data. CHLfuzzy, follows a series of fuzzy logic operations (fuzzy membership function definition, fuzzy decision rules, fuzzy reasoning, and optimization) in the form of a userfriendly spreadsheet environment. The CHLfuzzy model performance was tested against the standard Takagi-Sugeno ANFIS model routine, showing

Observed Chl-a Modelled Chl-a

80

60

40

20

0 0

50

100

150

Validation Dataset

200

Observed Chl-a Modelled Chl-a

80

60

40

20

0 0

50

100

150

200

Validation Dataset

Fig. 9 Comparison of modeling results for Vassova Lagoon chlorophyll-a concentrations using (a) CHLfuzzy and (b) ANFIS models, against in situ data

123

Hydrobiologia (2008) 610:99–112

slightly better results. Both models illustrated that the fuzzy model approach is capable to empirically approximate the underlying non-linear relationships among model inputs and outputs. Moreover, both models allow the user to select the most appropriate fuzzy membership function for each input variable and to set realistic fuzzy rules that describe the biological processes affecting chlorophyll-a concentration. The user could further utilize the solution produced utilizing the training dataset, to validate the model with an independent dataset. Finally, such approach could be useful for the monitoring and management of enclosed and semi-enclosed water bodies, since measurements of easily determined environmental parameters, once imported in the fully developed model, would yield reliable chlorophyll-a concentration values.

References Adriaenssens, V., B. de Baets, P. Goethals & N. de Pauw, 2004. Fuzzy rule-based models for decision support in ecosystem management. The Science of the Total Environment 319: 1–12. Beck, M. B., 2005. Environmental foresight and structural change. Environmental Modelling & Software 20: 651– 670. Bobbin, J. & F. Recknagel, 2001. Knowledge discovery for prediction and explanation of blue-green algal dynamics in lakes by evolutionary algorithms. Ecological Modelling 146: 253–262. Cao, H., F. Recknagel, G.-J. Joo, D.-K. Kim, 2006. Discovery of predictive rule sets for chlorophyll-a dynamics in the Nakdong River (Korea) by means of the hybrid evolutionary algorithm HEA. Ecological Informatics 1: 43–53. Chang, N. B., H. W. Chen & S. K. Ning, 2001. Identification of river water quality using the Fuzzy synthetic evaluation approach. Journal of Environmental Management 63: 293–305. Chau, K., 2006. A review on integration of artificial intelligence into water quality modeling. Marine Pollution Bulletin 52:726–733. Chen, D. G., N. B. Hargreaves, D. M. Ware & Y. Liu, 2000. A fuzzy logic model with genetic algorithm for analyzing fish stock-recruitment relationships. Canadian Journal of Fishery and Aquatic Science 57: 1878–1887. Chen, Q. & A. E. Mynett, 2003. Integration of data mining techniques and heuristic knowledge in fuzzy logic modelling of eutrophication in Taihu Lake. Ecological Modelling 162: 55–67. Chen, Q. & A. E. Mynett, 2006. Modelling algal blooms in the Dutch coastal waters by integrated numerical and fuzzy cellular automata approaches. Ecological Modelling 199: 73–81.

111 Cornelissen A. M. G., van den Berg J., Koops W. J., Grossman M., Udo H. M. J. (2001) Assessment of the contribution of sustainability indicators to sustainable development: a novel approach using fuzzy set theory. Agricultural Ecosystems Environment 86: 173–185 Dombi, J., 1990. Membership function as an evaluation. Fuzzy Sets and Systems 35: 1–21. Fdez-Riverola, F. & J. M. Corchado, 2004. FSfRT : Forecasting system for Red Tides. Applied Intelligence 21: 254–264. Giusti, E. & S. Marsili-Libelli, 2005. Modelling the interactions between nutrients and the submerged vegetation in the Orbetello Lagoon. Ecological Modelling 184: 141– 161. Haan, C. T., 1977. Statistical Methods in Hydrology. The Iowa State University Press. Hartnett, M. & S. Nash, 2004. Modelling nutrient and chlorophyll-a dynamics in an Irish brackish water body. Environmental Modelling & Software 19: 47–56. Hession, W. C., V. O. Shanholtz, S. Mostaghimi, T. A. Dillaha, 1994. Uncalibrated performance of the finite element storm hydrograph model. Transactions of ASAE 37: 777– 783. Jang, J. S. R., 1993. ANFIS: adaptive network-based fuzzy inference system. IEEE Transactions on Systems, Man and Cybernetics 23: 665–685. Jeong, K. S., G. L. Joo, H. W. Kim, K. Ha & F. Recknagel, 2001. Prediction and elucidation of phytoplankton dynamics in the Nakdong River (Korea) by means of a recurrent artificial neural network. Ecological Modelling 146: 115–129. Jørgensen, S. E., 1994. Fundamentals of Ecological Modelling, 2nd ed. Elsevier, Amsterdam. Kornecki, T. S., G. J. Sabbagh & D. E. Storm, 1999. Evaluation of runoff, erosion and phosphorus modelling systemSIMPLE. Journal of the American Water Resources Association 35: 807–820. Kung, H., L. Ying & Y. C. Liu, 1992. A complementary tool to water quality index: fuzzy clustering analysis. Water Resources Bulletin 28: 525–533. Laanemets, J., M. -J. Lilover, U. Raudsepp, R. Autio, E. Vahtera, I. Lips & U. Lips, 2006. A fuzzy logic model to describe the cyanobacteria Nodularia spumigena blooms in the Gulf of Finland, Baltic Sea. In Kuparinen, J., E. Sandberg-Kilpi & J. Matttila (eds), Baltic Sea: A Lost System or a Future Treasury, Hydrobiologia 554: 31–45. Lonin, S. E. & Y. S. Tuchkovenko, 2001. Water quality modelling for the ecosystem of the Cienaga de Tesca coastal lagoon. Ecological Modelling 144: 279–293. Lu, R. -S. & S. -L. Lo, 2002. Diagnosing reservoir water quality using self-organizing maps and fuzzy theory. Water Research 36: 2265–2274. Maier, H. R., T. Sayed & B. J. Lence, 2001. Forecasting cyanobacterium Anabaena spp. in the River Murray, South Australia, using B-spline neurofuzzy models. Ecological Modelling 146: 85–96. Marsili-Libelli, S., 2004. Fuzzy prediction of the algal blooms in the Orbetello lagoon. Environmental Modelling & Software 19: 799–808. Metternicht, G., 2001. Assessing temporal and spatial changes of salinity using fuzzy logic, remote sensing and GIS.

123

112 Foundations of an expert system. Ecological Modelling 144: 163–179. Newton, A. & S. M. Mudge, 2005. Lagoon-sea exchanges, nutrient dynamics and water quality management of the Ria Formosa (Portugal). Estuarine, Coastal & Shelf Science 62: 405–414. Recknagel, F., J. Bobbin, P. Whigham & H. Wilson, 2002. Comparative application of artificial neural networks and genetic algorithms for multivariate time-series modeling of algal blooms in freshwater lakes. Journal of Hydroinformatics 4: 125–134. Scardi, M., 2001. Advances in neural network modelling of phytoplankton primary production. Ecological Modelling 146: 33–45. Silvert, W., 2000. Fuzzy indices of environmental conditions. Ecological Modelling 130: 111–119. Soyupak, S. & D. -G. Chen, 2004. Fuzzy logic model to estimate seasonal pseudo steady state chlorophyll-a concentrations in reservoirs. Environmental Modelling and Assessment 9: 51–59. Su, F., C. Zhou, V. Lyne, Y. Du & W. Shi, 2004. A datamining approach to determine the spatio-temporal relationship between environmental factors and fish distribution. Ecological Modelling 174: 421–431.

123

Hydrobiologia (2008) 610:99–112 Sylaios, G., V. A. Tsihrintzis, C. Akratos & K. Haralambidou, 2004. Monitoring and analysis of water, salt and nutrient fluxes at the mouth of a lagoon. Water, Air and Soil Pollution: Focus 4: 111–125. Sylaios, G., V. A. Tsihrintzis, C. Akratos & K. Haralambidou, 2006. Quantification of water, salt and nutrient exchange processes at the mouth of a Mediterranean coastal lagoon. Environmental Monitoring & Assessment 119: 275–301. Tsihrintzis V. A., John D. L., Tremblay P. J. (1998) Hydrodynamic Modeling of Wetlands for flood Detention. Water Resources Management 12: 251–269 Tsihrintzis, V. A., G. K. Sylaios, M. Sidiropoulou & E. Koutrakis, 2007. Hydrodynamic modeling and management alternatives in a Mediterranean, fishery exploited coastal lagoon. Aquacultural Engineering 36: 310–324. Wei, B., N. Sugiura & T. Maekawa, 2001. Use of artificial neural network in the prediction of algal blooms. Water Research 35: 2022–2028. Yabunaka, K., M. Hosomi & A. Murakami, 1997. Novel application of a back-propagation artificial neural network model formulated to predict algal blooms. Water Science Technology 36: 89–97.