Prediction of Global Distribution of Insect Pest Species in ... - BioOne

SAMPLING AND BIOSTATISTICS

Prediction of Global Distribution of Insect Pest Species in Relation to Climate by Using an Ecological Informatics Method MURIEL GEVREY

AND

S. P. WORNER1

National Centre for Advanced Bio-Protection Technologies, Lincoln University, P.O. Box 84, Canterbury, New Zealand

J. Econ. Entomol. 99(3): 979Ð986 (2006)

ABSTRACT The aim of this work was to predict the worldwide distribution of two pest speciesÑ Ceratitis capitata (Wiedemann), the Mediterranean fruit ßy, and Lymantria dispar (L.), the gypsy mothÑ based on climatic factors. The distribution patterns of insect pests have most often been investigated using classical statistical models or ecoclimatic assessment models such as CLIMEX. In this study, we used an artiÞcial neural network, the multilayer perceptron, trained using the backpropagation algorithm, to model the distribution of each species. The data matrix used to model the distribution of each species was divided into three data sets to 1) develop and train the model, 2) validate the model and prevent over-Þtting, and 3) test each model on novel data. The percentage of correct predictions of the global distribution of each species was high for Mediterranean fruit ßy for the three data sets giving 95.8, 81.5, and 80.6% correct predictions, respectively, and 96.8, 84.3, and 81.5 for the gypsy moth. Kappa statistics used to test the level of signiÞcance of the results were highly signiÞcant (in all cases P ⬍ 0.0001). A sensitivity analysis applied to each model based on the calculation of the derivatives of each of a large number of input variables showed that the variables that contributed most to explaining the distribution of C. capitata were annual average temperature and annual potential evapotranspiration. For L. dispar, the average minimum temperature and minimum daylength range were the main explanatory variables. The ANN models and methods developed in this study offer powerful additional predictive approaches in invasive species research. KEY WORDS backpropagation, Ceratitis capitata, Lymantria dispar, artiÞcial neural networks, biosecurity

Ecological and economic damage caused by pest species when they invade a new area can be devastating to a region or country (Mooney et al. 2005). In conservation and ecological management, it is important to know which species can invade an area. Mapping pest species distributions is a key issue in ecology and conservation, particularly to provide information about potential unwanted organisms. Furthermore, the valid test of any hypothesis regarding a species distribution relies on an accurate knowledge of where the species currently occurs (Brotons et al. 2004). The goal of our study was to predict the regional presence or absence of potentially invasive pest species by using a set of deÞned climatic variables. Clearly, if the presence or the absence of a species can be predicted, then the possible invasion of the species could be avoided. New ZealandÕs insular biogeographic setting, wide climatic variation, dynamic geological history and heterogeneous landscapes have produced a complex pattern of highly distinctive ecosystems and unique biota (Daugherty et al. 1993). Given that incursions, estab-

1

Corresponding author, e-mail: [email protected].

lishment and impacts are not spatially random, these indigenous ecosystems can be expected to exhibit varying degrees of vulnerability to invasive species. In response to the continuing threats to New ZealandÕs indigenous biodiversity and ecosystems, and the lack of robust procedures for assessing risk to indigenous species and ecosystems, the New Zealand Biosecurity Council (Biosecurity Council 2003) has called for the development of a risk proÞle for New Zealand natural ecosystems. The prediction of invasive species distribution can be a Þrst step to reach this goal. The successful development of modeling techniques could pay great dividends in this regard. Olden and Jackson (2002) compared traditional techniques such as logistic regression and linear discriminant analysis with alternative techniques such as classiÞcations trees and artiÞcial neural networks (ANNs) for species presence or absence prediction. They showed that ANNs are in most cases better than the other methods, particularly when nonlinear data are used. With nonlinear ecological data, nonlinear modeling methods such as ANNs should be preferred (Blayo and Demartines 1991). ANNs provide promising alternatives to traditional statistical approaches

0022-0493/06/0979Ð0986$04.00/0 䉷 2006 Entomological Society of America

980 Table 1.

JOURNAL OF ECONOMIC ENTOMOLOGY The code and description of the climate variables

Variable name Tmin Tmax Tannual Rmin Rmax Rannual PEannual AEannual P/PEindex DLannual DLmin DLmax AD50mm AS50mm Mi DD5

Vol. 99, no. 3

Description Minimum mean temperature (Celsius) Maximum mean temperature (Celsius) Annual mean temperature (Celsius) Minimum mean rainfall (mm) Maximum mean rainfall (mm) Annual mean rainfall (mm) Annual potential evapotranspiration (mm) ⫺ amount of water that would evaporate from surface and transpired by plants when supply of water is unlimited. E ⫽ 1.6(10T/I)a (Thornthwaite 1948). T, temperature; I, some heat index; a, constant corrected for daylength and days in month Annual actual evapotranspiration (mm) ⫺ amount of water that evaporates from surface and is transpired by plants if water supply is limited. Measured experimentally in the Þeld. Aridity index (total mean annual rainfall/potential evaporation). A low index indicates dryness. Average annual daylength range (h) (longest day of the year ⫺ shortest day of the year) Minimum annual daylength range (h) Maximum annual daylength range (h) Annual soil moisture deÞcit at 50-mm depth (mm). Derived from water balance equation Precipitation ⫽ evapotranspiration ⫹ surface runoff ⫹ inÞltration ⫾ soil moisture storage. Variables can be derived experimentally. Annual soil moisture surplus at 50-mm depth (mm). Derived from water balance equation Precipitation ⫽ evapotranspiration ⫹ surface runoff ⫹ inÞltration ⫾ soil moisture storage. Variables can be derived experimentally. Thornthwaite moisture index: an expression of relative moisture or dryness in an area and availability of moisture for plant use. Mi, sum of monthly P/E values. The higher the index, the wetter. Annual sum of normal degree-days above 5⬚C: an indicator of total heat available for plants or insects in growing season.

and have several advantages over traditional modeling methods when the patterns and relationships in the data are complex. For example, ANNs are not dependent on particular functional relationships; therefore, a priori understanding of variable relationships is not required. Nor do they require the data to be distributed in a certain way (Olden and Jackson 2001). ANNs have been widely applied in ecology during the last 10 yr (Recknagel et al. 1997, Mastrorillo et al. 1998, Brosse et al. 1999, Park et al. 2003, Gevrey et al. 2004). The current study determines the predictive capacity of ANN models for estimating the occurrence of two pest insect speciesÑCeratitis capitata, the Mediterranean fruit ßy, and Lymantria dispar (L.), the gypsy mothÑfrom a set of climatic variables for a large number of geographic sites throughout the world by using data extracted from the CABI (2003). The gypsy moth and Mediterranean fruit ßy were chosen on the basis of a previous study (unpublished data), where an ANN or more precisely, an unsupervised algorithm called the self-organizing map, was used to determine the risk of invasion based on pest assemblages. The results showed that both gypsy moth and Mediterranean fruit ßy have a high risk of invasion into New Zealand. L. dispar is interesting in that it occurs in two forms, the Asian and European gypsy moth. Because both forms have been found in both Europe and Asia in similar ecosystems no distinction is made between them in this analysis. Both forms are considered to be a threat to New Zealand. The prediction of each species occurrence at a geographic site based on a common set of climatic variables was accomplished using a feed-forward multilayer perceptron ANN (Rumelhart et al. 1986). The importance of the climatic variables for each species was further evaluated using a sensitivity analysis (Gevrey et al. 2003).

Materials and Methods Data The data used in this analysis were extracted with permission from the CABI (2003). This database encompasses a wide range information on crop protection, including the pests, diseases, weeds, and natural enemies associated with crops grown in most countries worldwide. Study Areas. The geographic areas represented in the compendium consist of countries, regions, or states of countries such that all countries and continents are represented with the exception of the Arctic and Antarctic. In total, 433 geographic areas were used in our analysis. Climate. The climate data representing the 433 geographic sites in the global database consisted of longterm, mostly 30-yr average monthly meteorological data compiled from sites on the Internet maintained by recognized meteorological organizations. The large number of climatic variables available for use as descriptor variables was reduced to 16 in total after those variables with a correlation higher than 0.9 (MINITAB release 11, Minitab, Inc. 1996), were removed. The 16 variables are summarized in Table 1. Artificial Neural Networks To identify relationships between the presence and absence of the insect species and climatic variables at different sites, a feed-forward multilayer perceptron trained by using the backpropagation error algorithm (BP) (Rumelhart et al. 1986) was used. This algorithm has been shown to be appropriate for regression type problems (Bishop 1995, Patterson 1996) and is the most popular ANN algorithm that has been applied in many Þelds of investigation. The BP algorithm is a

June 2006

GEVREY AND WORNER: PREDICTION OF INSECT PEST SPECIES

981

Fig. 1. Multilayer feedforward neural network backpropagation algorithm. (a) The signal (values of the independent variables xi) is propagated from the input neuron layer (i) via the hidden neuron layer (h) and the connection weight (W) to the output neuron layer (o) to compute, using a transfer function (f), the predicted output, the dependent variable (xˆ o). (b) The error (err) between the predicted output (xˆ o) and the observed output (xo) is calculated which allows the backpropagation of the signal from the output layer (o) to the input layer (i) with a modiÞcation of the weight values (W).

supervised learning algorithm designed to minimize the mean square error between the results or predictions computed by the network and the observations. A more detailed description can be found in following sections. The data matrix presented to the network comprised a set of predictor variables (in our case, climatic variables, the independent variables), that are linked to the dependent variables (the observed species presence or absence at particular geographic sites), variables to be predicted. Network Architecture. Normally, an ANN network is composed of three layers of processing elements called neurons: the input layer, the hidden layer, and the output layer (Fig. 1). Essentially, an ANN mimics the biological neural network where the neurons of each layer are connected to all neurons of adjacent layers. The neurons receive and send signals through these connections, which are assigned a weight. These weights modulate the intensity of the signal they transmit. The networkÕs Þnal conÞguration, which comprises a speciÞc number of neurons and a speciÞc learning rate, is arrived at by testing various combinations of these features and selecting the one that provides the best compromise between bias and variance (Kohavi 1995). In this study, only one hidden layer is used to reduce computation time and because

the results are often similar to those obtained with multiple hidden layers (Bishop 1995). Algorithm. Two main steps comprise the algorithm: the propagation of the signal from the input layer to the output layer through the hidden layer with the objective of computing the observed output (Fig. 1a) and the backpropagation of the signal from the output layer back to the input layer to modify the weight values (Fig. 1b) to minimize the error between the network output and the dependent variable or the desired output. These two steps are repeated or iterated many times in what is called an epoch until speciÞed end conditions are reached. Propagation. The signal that comprises the values introduced in the input layer, is transmitted from the input layer to the output layer through the hidden layer by using the weight values and an activation function such that an output value is computed. Each neuron in the network, excluding input layer neurons, calculates its output value using the expression xb ⫽ f 共netab 兲 with net ab ⫽

冘

W ab x a ⫹ B a,

b

where a and b denote the Þrst and the second layer, respectively. xa is the value of the Þrst-layer neurons, xb is the value of the second-layer neurons, Wab is the weights value between the layer a and the layer b, Ba

JOURNAL OF ECONOMIC ENTOMOLOGY

982

is the bias associated to the layer a, and f is the activation function. In Fig. 1, the input layer is i, the hidden layer is h, and the output layer is o. The mean square error (MSE) between the computed output (xˆ o) and the desired output (xo) is calculated as 1 MSE ⫽ N

冘

共err兲 2 where err ⫽ x o ⫺ xˆ o

(Fig. 1b). Backpropagation. Backpropagation involves the modiÞcation of the weight values according to the error values. The weights between the output layer and the hidden layer are changed Þrst, and then the weights between the hidden layer and the input layer. The weight values between two layers a and b are modiÞed according to the equation W ab 共t ⫹ 1兲 ⫽ ⌬W ab ⫹ ␣ W ab 共t兲 with ⌬W ab ⫽ ␩␴ ab 共t ⫹ 1兲 x a 共t ⫹ 1兲 and ␴ ab ⫽ f ⬘共net ab 兲 E 䡠共 x o 兲. Parameters ␣ and ␩ are called the momentum and learning rate, respectively. E differs according to the weights. Between the hidden and the output layer, E ⫽ err and between the input and the hidden layer E ⫽ 兺 ␴ ho W ho . The end condition that determines the stopping criterion for network learning, is when either the maximum number of iterations or the minimum error, both Þxed beforehand, is reached. Model Development. The matrix consisting of 433 geographic sites ⫻ 16 climatic variables was used to train the ANN. During preliminary model development, different neural network architectures were tested with regard to the number of hidden neurons and number of epochs used to obtain the best Þt or training performance. The input layer contained the 16 climate variables as the independent predictor variables. These variables were rescaled between 0 and 1 before training. The hidden layer consisted of Þve neurons. The output comprised the observed species occurrence at respective geographic areas represented by 0 for species absence and 1 for presence. The database was randomly split into three parts for network training (50% or 217 geographic areas), validation (25% or 108 geographic areas), and testing (25% or 108 geographic areas). Training was run for 270 epochs for C. capitata and 500 epochs for L. dispar. Model Validation. Cross-validation was used to test each model to assess prediction accuracy on independent data (Verbyla and Litaitis 1989). This technique consists of dividing the data matrix into three parts: 50% for network training, 25% for validation, and 25% for test data sets. Each data set was randomly selected. The training data set adjusts the connection weights to obtain the minimum error and therefore the maximum number of correct predictions. The validation data set was used to monitor network error after each training cycle, enabling training to be stopped when the network began to over-train or over-Þt the data (Taras-

Vol. 99, no. 3

senko 1998). This phenomenon occurs when the model learns the training data so well that it is not able to generalize new data anymore. The test data set is independent of the training process. It is used to test the reliability of the trained network, in other words, its ability to generalize or predict new data. Model Evaluation. For presenceÐabsence data, the simplest and most widely used measure of overall classiÞcation accuracy is the number of correctly classiÞed cases that can be called the correct assignment score. The computed output of each model is a value between 0 and 1. Conventionally, a decision threshold of 0.5 is used where values higher than 0.5 indicate presence, and values lower than 0.5 are considered absent. In this study, scores for correct assignments were expressed as the percentage of cases where each species was correctly classiÞed (as present or absent at a site) over the total number of examined cases. True and false positive and negative rates were calculated (Fielding and Bell 1997, Joy and Death 2004). The true positive rate (also called sensitivity) is the ability of the model to correctly predict species presence (percentage of sites where the species presence was correctly predicted). The true negative rate (also called speciÞcity) is the ability of the model to correctly predict species absence (percentage of sites where species absence was correctly predicted). The false positive rate represents model error predicting presence, and the false negative rate represents model error predicting absence. In this study, CohenÕs Kappa coefÞcient was used to test the sensitivity and speciÞcity of the models. CohenÕs Kappa coefÞcient of agreement is a commonly used statistic that provides a measure of proportional accuracy, adjusted for chance agreement (Cohen 1960). This test provides a robust evaluation of model performance relatively independent of species frequency of occurrence or prevalence (Manel et al. 2001.) Contribution of Predictor Variables. A sensitivity analysis carried out on the inputs of the neural network was conducted to indicate input variables that contributed most to explaining species presence or absence. In addition, sensitivity analysis can give important insights into the usefulness of other variables. It can identify insigniÞcant variables that can be omitted in subsequent analysis, and key variables that must always be retained. There are many ways to perform a sensitivity analysis (see Gevrey et al. 2003 for details). The partial derivatives or PaD method (Dimopoulos et al. 1995, 1999) was judged a superior method by Gevrey et al. (2003) and was chosen for this study. The principle underlying the PaD method is that the sum of the squared partial derivatives of each input variable represents the relative contribution of that variable to the ANN output. Because of the random values assigned to the weights at the beginning of the training stage, a calibrated ANN model is run 500 times, and the average percentage of the relative contribution of each variable is recorded to rank the variables by order of importance.

June 2006 Table 2.


983

Accuracy of neural network models predicting C. capitata and L. dispar species presence/absence (1/0)

Species

Data set

True negative % (0-0)

True positive % (1-1)

False negative % (0-1)

False positive % (1-0)

% correct classiÞcation

CohenÕs Kappa coefÞcient

P value

C. capitata C. capitata C. capitata

Training Validation Test

75.9 66.7 69.5

19.9 14.8 11.1

0.9 7.4 11.1

3.2 11.1 8.3

95.8 81.5 80.6

0.88 0.49 0.41

⬍0.0001 ⬍0.0001 ⬍0.0001

L. dispar L. dispar L. dispar

Training Validation Test

76.4 74.1 72.2

20.4 10.2 9.3

0.5 5.5 4.6

2.8 10.2 13.9

96.8 84.3 81.5

0.91 0.47 0.4

⬍0.0001 ⬍0.0001 ⬍0.0001

The percentages of true negative, true positive, false negative, false positive, percentage of correct classiÞcations, the CohenÕs Kappa coefÞcient, and the associated P value are reported.

Results Model Evaluation. In this study, the scores for correct assignments by the models of species presence or absence were measured as the percentage of geographic sites where each model correctly predicted the presence or absence of each of the species. We also calculated the Kappa coefÞcient and its corresponding P value for each species and data set. The ANN gave a very high percentage of correct assignment scores. Using three-fold cross-validation, the lowest score was 80.6% for the C. capitata test data set (Table 2). The highest score was 96.8% for the L. dispar training data set. We obtained 95.8% correct classiÞcation for C. capitata during the training stage. The results for the validation data set are 81.5% correct classiÞcation for C. capitata and 84.3% for L. dispar. For the L. dispar test set, a correct classiÞcation score of 81.5% was obtained. In general, the results were better for the gypsy moth than for the Mediterranean fruit ßy. The average CohenÕs Kappa coefÞcient was 0.59 with the highest value equal to 0.91 and the lowest

Fig. 2. Contribution of the 16 independent climatic variables (cf. Table 1) used in the 16-5-1 ANN model for C. capitata occurrence prediction by using the PaD sensitivity analysis method. Standard error bars for each variable were calculated; ⬎500 random models are shown.

equal to 0.40. All the predictions, whatever data sets were used, were signiÞcant at P ⬍ 0.0001 (Table 2). Contribution Analysis. The results of the sensitivity analysis, using the PaD method, are shown in Figs. 2 and 3. The results give useful information about the relative importance of the climate variables for predicting species presence and absence. Based on the low conÞdence intervals obtained, we conclude that the method is very stable, whatever the species. For C. capitata (Fig. 2), the annual average temperature (Tannual) contributes most to the model (⬎28%) followed by annual potential evaporation (PEannual ⬎ 15%) and minimum daylength range (DLmin ⬎ 9%). All other variables contributed between 1 and 8%. For L. dispar (Fig. 3), average minimum temperature (Tmin) is the most important variable with a contribution ⬎15%, followed by minimum daylength range (DLmin ⬎ 11%). The other variables range from 2 to 8%.

Fig. 3. Contribution of the 16 independent climatic variables (cf. Table 1) used in the 16-5-1 ANN model for L. dispar occurrence prediction by using the PaD sensitivity analysis method. Error bars were calculated; ⬎500 random models are shown.

984

JOURNAL OF ECONOMIC ENTOMOLOGY Discussion

ArtiÞcial intelligence modeling techniques were used to predict the global distribution of two pest species and identify the climatic features that may have a major inßuence on that distribution. The two species, C. capitata and L. dispar, were both well predicted by the models with the percentage of correct classiÞcations between 80.6 and 96.8%. However, the most relevant results are those obtained using the test data set. The test data set presents new data to the model and therefore tests the modelÕs ability to generalize to novel data. For the test data set, the percentage of correct classiÞcation was 80.6 and 81.5% for C. capitata and L. dispar, respectively, which could be considered very good prediction for novel data. Moreover, both models performed well, identifying pest species distribution at the global scale, with Kappa values of 0.41 and 0.40 for C. capitata and L. dispar over the test data set. The Kappa test applied to each data set indicates that model results were not due to chance. Landis and Koch (1977) suggested that results are good if 0.4 ⬍ K ⬍ 0.75, which was the case for the test set for both species, and excellent if K ⬎ 0.75, which was the case for the training set. Because the ANN models predicted pest species occurrence with a reasonably high level of accuracy by using climate variables conÞrms that climate has considerable inßuence over insect pest species distribution. Because in this study climatic variables are shown to have an important inßuence on the distribution of the insect species, it is important to identify which variables have most inßuence. Sensitivity analysis using the partial derivatives method was applied to each model. Of the climatic factors examined, the annual average temperature and annual potential evapotranspiration explained the largest proportion of C. capitata global distribution. For L. dispar, the average minimum temperature and the minimum annual daylength range are the two variables that contribute most to that species global distribution. Annual average temperature and evapotranspiration seem to have a strong inßuence on C. capitata, likely because these ßies are sensitive to temperature because eggs, larvae, and pupae will not develop if temperatures fall too low (Duyck and Quilici 2002). In general, this species prefers warm climates (Worner 1988, Vera et al. 2002). For L. dispar, average minimum temperature and minimum daylength range are the major contributory variables. This species overwinters in the egg stage and is sensitive to the minimum temperatures that occur in winter (Re´ gnie` re and Nealis 2002). Although L. dispar is considered photoperiodically neutral (Hoy 1978) it has been shown that sperm movement (Giebultowicz et al. 1988) and mating behavior in this species is modulated by daylength in combination with temperature (Linn et al. 1992). Such an effect could conceivably inßuence species establishment in certain areas. Whereas daylength and temperature are likely to be correlated, both variables in combination are well known to have a fundamental inßuence on insect

Vol. 99, no. 3

development especially those that live in seasonal climates (Speight et al. 1999). These results suggest that different factors are likely to be responsible for each species distribution, and the methods here support life history studies that identify the environmental factors that limit the viability of an insect population in a particular region. Such information will contribute to more accurate risk assessments applied to invasive species. This study relies on the integrity and quality of the data used in the analysis. With regard to data quality, species presence records are assumed to be reliable because we tried to use data that had been conÞrmed by experts. Unfortunately, species absence is another matter. Indeed, failing to detect a species in a geographic area does not guarantee the species is absent from that area (Brotons et al. 2004). Our models do not and cannot distinguish false negatives. Records of true absence would improve the precision of the methods used here. In this study, the threshold used to distinguish presence or absence of a species at a particular site was 0.5. This threshold was used to generate output with values within range of 0 to 1 (Fielding and Bell 1997). However, some authors suggest it is possible to Þnd an optimal threshold using the receiver operating characteristic curve. This method has received considerable attention in recent years (Manel et al. 2001). Despite such attention, the 0.5 threshold continues to be more commonly used (Oberdorff et al. 2001). In their study, Fielding and Bell (1997) demonstrated that there are many ways to evaluate the predictive accuracy of any model and that consideration of the choice of measurement should be objective. In our work, the percentage of correct classiÞcations combined with a CohenÕs Kappa test has been used. Because the correct classiÞcation ratio alone is a poor guide of the accuracy of model prediction, the Kappa test was used (Manel et al. 2001). This test also takes into account the effect of prevalence. Other kinds of measurement and tests could have been used to make full use of the percentage of correct classiÞcations, for example, the odds-ratio or the normalized mutual information measure. However, such measures have advantages and drawbacks as explained in Fielding and Bell (1997). Our aim was to assess the effectiveness of the classiÞer by using a measure that is not affected by prevalence. We chose the Kappa statistic that determines the probability of obtaining the classiÞer by chance alone. Moreover, Manel et al. (2001) found that CohenÕs Kappa test was the most robust measure of model accuracy because it was only marginally affected by prevalence. In conclusion, the models developed here show very good prediction of species distribution based on climatic variables. However, several improvements can be made. Much more precision could be achieved with data recorded at a higher resolution. Geographical information system data could facilitate analysis and visualization of species distributions, but more precise data would be required concerning the true distribution of each species. Although the models used

June 2006


in this study offer an additional tool to conventional pest risk analysis, only climatic variables have been taken into account. The modeling process could be developed further to link multiple layers of environmental, biological, and land-use information. Acknowledgments We acknowledge CAB International for allowing us to use data reproduced from CAB 2003. We also thank Joel Pitt for invaluable help compiling the data used in this research. This research was funded by the Centre of research Excellence in Bio-Protection, Lincoln University, New Zealand (http:// bioprotection.lincoln.ac.nz/).

References Cited Biosecurity Council. 2003. Tiakina Aotearoa protect New Zealand: the biosecurity strategy for New Zealand. Ministry of Agriculture and Forestry, Wellington, New Zealand. Bishop, C. M. 1995. Neural networks for pattern recognition. Oxford University Press, New York. Blayo, F., and P. Demartines. 1991. Data analysis: how to compare Kohonen neural networks to other techniques?, pp. 469 Ð 476. In A. Prieto [ed.], International Workshop on ArtiÞcial Neural Networks, IWANN Õ91, 17Ð19 September 1991, Granada, Spain. Springer, Berlin, Germany. Brosse, S., J.-F. Guegan, J.-N. Tourenq, and S. Lek. 1999. The use of artiÞcial neural networks to assess Þsh abundance and spatial occupancy in the littoral zone of a mesotrophic lake. Ecol. Model. 120: 299 Ð311. Brotons, L., W. Thuiller, M. B. Araujo, and A. H. Hirzel. 2004. Presence-absence versus presence-only modelling methods for predicting bird habitat suitability. Ecography 27: 437Ð 448. [CABI] Commonwealth Agricultural Bureau International. 2003. Crop protection compendium: global module, 5th ed. CAB International, Wallingford, United Kingdom. Cohen, J. 1960. A coefÞcient of agreement for nominal scales. Ed. Psychol. Meas. 20: 27Ð 46. Daugherty, C. H., Gibbs, G. W., and R. A. Hitchmough. 1993. Mega-island or micro-continent? New Zealand and its fauna. Trends Ecol. Evol. 8: 437Ð 442. Dimopoulos, Y., P. Bourret, and S. Lek. 1995. Use of some sensitivity criteria for choosing networks with good generalization ability. Neural Process Lett. 2: 1Ð 4. Dimopoulos, I., J. Chronopoulos, A. Chronopoulou Sereli, and S. Lek. 1999. Neural network models to study relationships between lead concentration in grasses and permanent urban descriptors in Athens city (Greece). Ecol. Model. 120: 157Ð165. Duyck, P. F., and S. Quilici. 2002. Survival and development of different life stages of three Ceratitis spp. (Diptera: Tephritidae) reared at Þve constant temperatures. Bull. Entomol. Res. 92: 461Ð 469. Gevrey, M., L. Dimopoulos, and S. Lek. 2003. Review and comparison of methods to study the contribution of variables in artiÞcial neural network models. Ecol. Model. 160: 249 Ð264. Gevrey, M., F. Rimet, Y. S. Park, J. L. Giraudel, L. Ector, and S. Lek. 2004. Water quality assessment using diatom assemblages and advanced modelling techniques. Freshw. Biol. 49: 208 Ð220. Giebultowicz, J. M., Bell, R. A., and R. B. Imberski. 1988. Circadian rhythm of sperm movement in the male reproductive tract of the gypsy moth, Lymantria dispar. J. Insect Physiol. 34: 527Ð533.

985

Fielding, A. H., and J. F. Bell. 1997. A review of methods for the assessment of prediction errors in conservation presence/absence models. Environ. Conserv. 24: 38 Ð 49. Hoy, M. A. 1978. Selection for a non-diapausing gypsy moth: some biological attributes of a new laboratory strain. Ann. Entomol. Soc. Am. 71: 75Ð 80. Joy, M. K., and Death, R. G. 2004. Predictive modelling and spatial mapping of freshwater Þsh and decapod assemblages using GIS and neural networks. Freshw. Biol. 49: 1036 Ð1052. Kohavi, R. 1995. A study of crossvalidation and bootstrap for estimation and model selection. Morgan Kaufmann Publishers, Montre´ al, Canada. Landis, J. R., and G. C. Koch. 1977. The measurement of observer agreement for categorical data. Biometrics 33: 159 Ð174. Linn, C. E., Jr., Campbell, M. G., and W. L. Roelofs. 1992. Photoperiod cues and the modulatory action of octopamine and 5-hydroxytryptamine on locomotor and pheromone response in male gypsy moths, Lymantria dispar. Arch. Insect Biochem. Physiol. 20: 265Ð284. Manel, S., H. C. Williams, and S. J. Ormerod. 2001. Evaluating presence-absence models in ecology: the need to account for prevalence. J. Appl. Ecol. 38: 921Ð931. Mastrorillo, S., F. Dauba, T. Oberdorff, J.-F. Guegan, and S. Lek. 1998. Predicting local Þsh species richness in the Garonne River basin. C R Acad Sci III 321: 423Ð 428. Minitab, Inc. 1996. MINITAB reference manual, release 11. Minitab, Inc., State College, PA. Mooney, H. A., Mack, R. N., McNeely, J. K., Neville, L. E., Schei, P. J., and J. K. Waage [eds.]. 2005. SCOPE 63 invasive alien species: a new synthesis. Island Press, Washington, DC. Oberdorff, T., D. Pont, B. Hugueny, and D. Chessel. 2001. A probabilistic model characterizing Þsh assemblages of French rivers: a framework for environmental assessment. Freshw. Biol. 46: 399 Ð 415. Olden, J. D., and D. A. Jackson. 2001. Fish-habitat relationships in lakes: gaining predictive and explanatory insight by using artiÞcial neural networks. Trans. Am. Fish. Soc. 130: 878 Ð 897. Olden, J. D., and D. A. Jackson. 2002. A comparison of statistical approaches for modelling Þsh species distributions. Freshw. Biol. 47: 1976 Ð1995. Park, Y. S., R. Cereghino, A. Compin, and S. Lek. 2003. Applications of artiÞcial neural networks for patterning and predicting aquatic insect species richness in running waters. Ecol. Model. 160: 265Ð280. Patterson, D. 1996. ArtiÞcial neural networks. Prentice Hall, Singapore. Recknagel, F., M. French, P. Harkonen, and K.-I. Yabunaka. 1997. ArtiÞcial neural network approach for modelling and prediction of algal blooms. Ecol. Model. 96: 11Ð28. Re´gnie`re, J., and V. Nealis. 2002. Modelling seasonality of gypsy moth, Lymantria dispar (Lepidoptera: Lymantriidae), to evaluate probability of its persistence in novel environments. Can. Entomol. 134: 805Ð 824. Rumelhart, D. E., G. E. Hinton, and R. J. Williams. 1986. Learning internal representations by error propagation, pp. 318 Ð362. In D.E.J.M. Rumelhart [ed.], Parallel distributed processing: explorations in the microstructure of cognition. MIT Press, Cambridge, MA. Speight, M. R., M. D. Hunter, and A. D. Watt. 1999. Ecology of insects: concepts and applications. Blackwell, Oxford, England. Tarassenko, L. 1998. A guide to neural computing applications. Arnold, London, United Kingdom.

986

JOURNAL OF ECONOMIC ENTOMOLOGY

Vera, M. T., R. Rodriguez, D. F. Segura, J. L. Cladera, and R. W. Sutherst. 2002. Potential geographical distribution of the Mediterranean fruit ßy, Ceratitis capitata (Diptera: Tephritidae), with emphasis on Argentina and Australia. Environ. Entomol. 31: 1009 Ð1022. Verbyla, D. L., and J. A. Litaitis. 1989. Resampling methods for evaluating classiÞcation accuracy of wildlife habitat models. Environ. Manage. 13: 783Ð787.

Vol. 99, no. 3

Worner, S. P. 1988. Ecoclimatic assessment of potential establishment of exotic pests. J. Econ. Entomol. 81: 973Ð983.

Received 23 June 2005; accepted 2 January 2006.