Application of artificial neural networks to ... - Wiley Online Library

18 downloads 801 Views 920KB Size Report
Artificial neural networks, a branch of artificial intelligence, are computer systems ... We employed a three-layer back propagation neural network to assess its ...
PALEOCEANOGRAPHY, VOL. 11, NO. 4, PAGES 505-512, AUGUST 1996

Application of artificial neural networks to chemostratigraphy Bj6rn A. Malmgren Department of MarineGeology,EarthSciences Centre,UniversityofG6teborg,G6teborg,Sweden

Ulf Nordlund Instituteof EarthSciences, Universityof Uppsala,Uppsala,Sweden

Abstract. Artificial neuralnetworks,a branchof artificial intelligence,are computersystems formedby a numberof simple,highlyinterconnected processing unitsthat havethe abilityto learn a setof targetvectorsfrom a setof associated input signals.Neural networkslearnby selfadjustinga setof parameters,usingsomepertinentalgorithmto minimize the errorbetweenthe desiredoutputand networkoutput.We explorethe potentialof this approachin solvinga problem involvingclassification of geochemical data.The data,takenfrom the literature,are derived from four late Quaternaryzonesof volcanicashof basalticand rhyolithicorigin from the NorwegianSea.Theseashlayersspanthe oxygenisotopezones1, 5, 7, and 11, respectively (last 420,000 years).The data consistof nine geochemicalvariables(oxides)determinedin eachof 183 samples.We employeda three-layerbackpropagationneuralnetworkto assess its efficiency to optimallydifferentiatesamplesfrom the four ashzoneson the basisof their geochemicalcomposition.For comparison,threestatisticalpatternrecognitiontechniques,linear discriminant analysis,the k-nearestneighbor(k-NN) technique,and SIMCA (softindependent modelingof classanalogy),were appliedto the samedata.All of theseshowedconsiderably highererror rates than the artificial neuralnetwork,indicatingthat the backpropagationnetworkwasindeedmore powerfulin correctlyclassifyingthe ashparticlesto the appropriatezoneon the basisof their geochemicalcomposition. Introduction

Neural networks,a branchof artificial intelligence(AI), are computersystemsthat have the ability to "learn."They learn sometargetvaluesor vectorsfroma setof associated inputsignals throughiterativeadjustments of a set of parameters.This is accomplished throughminimizationof the errorbetweenthe networkoutputand the desiredoutputfollowingsomelearning rule. Whereasmultivariate-statistical approaches alwaysproducethe sameresultwhen appliedto the samedataset, a neural networkis morelike a livingsystemin thatvariousanalyses most likely will not produceexactly the sameresult. Neural networkshave foundtheir inspirationfrom the way that the mammalianbrain operates;however,they are able to mimic only the most elementaryfunctionsof the biologicalneuron [Wasserman,1989]. The historyof neural networksdates back to the 1940s, when the first learninglaw for traininga neuralnetworkwas described[Wasserman,1989]. The field underwentrapid expansionduringthe 1980s.Neural networkshavebeenapplied to solvingproblemsin a wide varietyof fields,for example,in the aerospaceindustryas aircraft autopilots,for oil and gas exploration,in the defenseindustryfor weaponsteeringand

targettracking,and in medicinefor breastcancercell analysis [Demuthand Beale, 1994]. Applicationsof artificial neural networks to the Earth sciences are still rare. Networks

have

beenapplied,for example,to problemsof well log interpretations [Baldwinet al., 1989, 1990; Rogerset al., 1992], for identification of linearfeaturesin satelliteimagery[Pennet al., 1993],andfor geophysical inversion problems[Raiche,1991]. We exploreherethe potentialof artificialneuralnetworksin the field of chemostratigraphy. We applieda neuralnetwork algorithm,backpropagation (BP), to a probleminvolvingclassificationof geochemical data.$joholrnet al. [ 1991] identified four distinctvolcanicash zonesin late Quaternarysediments from the NorwegianSea. They suggested that geochemical differences amongtheseashzonesmaybe usedas a stratigraphic tool in theNorwegianSeaand surrounding landareas.The objective of our analyseswas to assessthe capabilityof the BP network to learn to differentiate these ash zones on the basis of

the geochemicalcompositionof particles of a basaltic and rhyolithicorigin.We alsoanalyzedhow the neuralnetworkresultscomparewith variousstatisticalpattern-recognition techniques: linear discriminantanalysis (LDA), the k-nearest neighbortechnique(k-NN), and"softindependent modelingof class analogy" (SIMCA). Furthermore, we wanted to test whether the trained neural network was useful for classification

of new sampleswith knowngeochemical composition. Figure 1 showsa plot of the particlesfrom theseashzones alongthe axesof the first two canonicalvariates.Theseaccount

Copyright 1996by theAmerican Geophysical Union.

ifor 98% of thevariabilityin theoriginaldataset,whichcon-

Papernumber96PA01237. 0883-8305/96/96PA-01237512.00

sistedof nine variables(nine chemicalelements).(The canoni505

,

506

MALMGREN AND NORDLUND: ARTIFICIAL NEURAL NETWORKS

m



B

BB B

DD D

B

A

A BB

DDBD BB C

DDB •E•A

C

D D cC•c

S ,•, BB D

D D

A CA c

e_ 2-

c

AC

ACC• ' CC

Rhyolithicgrains

CC

c c C

-4-

Basalticgrains

5

-10

-5

0

5

10

First canonical variate

Figure1. Configuration ofgrains referable tothefourlateQuaternary volcanic ashzones, A through D, in the Norwegian Seadescribed bySjoholmet al. [1991] alongfirstandsecond canonical variateaxes.The canonical

variate analysis isbased onthegeochemical composition ofindividual ashparticles (ninechemical elements were

analyzed: Na20,MgO,A1203, SiOn, K•O,CaO,TiO•,MnO,andFeO).Twotypes ofgrains, basaltic andrhyolithic, were distinguished within each zone. Thisplane, accounting for98%ofthevariability among group mean vectors in nine-dimensional space (thefirstaxisrepresents 95%),distinguishes basaltic andrhyolithic grains. Apart from basaltic grains from zone C,which maybedifferentiated from such grains from theother zones, grains ofthesame typeareclearly overlapping withregard togeochemical composition among thezones.

cal variatesareproduced througha lineartransformation of the wasprobablynot explosivebecauseit occurred underthe ice originalvariables,allowingfor summarization and visualiza- cap[Sjoholm etal., 1991]. tion of multidimensional datasets[see,for example,Rock, Sjoholm etal. [1991]identified fourtypesof ashparticles: 1988,p. 261].) Two clusters maybe distinguished, onebasaltic colorless particles, palebrownparticles, brownblocky parti-

clusterandonerhyolithic cluster.Apartfrombasalticgrains cles,andblackblocky particles. Thefirsttwotypeswererefrom zone C, no clear distinctionsof ash zonesare seenwithin

the clusters.Considering the lack of distinctsubclusters, it is relevantto ask the questionof how well an artificial neural networkcanreallybetrainedto recognize thesevarious zones. The Data Set

Sjoholmet al. [1991] presented geochemical data for late Quaternary volcanicashzonesfromthe NorwegianSea.The ashis mostprobablyderivedfromvolcaniceruptions on Iceland,sinceits geochemical composition is similarto lavasand

ferredto assilicicandthelasttwoasbasaltic particles. The geochemical analyses were performed on nine chemicalelements,Na•O,MgO, A1203,SiOn,K20, CaO,TiO2,MnO, and

FeOin 183samples total(Table1).Theanalyses displayed a difference in theSiO•content between thebasaltic andrhyolithicparticlesin all of the ashzones;the SiO• waslessthan 54-59%in thebasaltic particles andgreater than68-71%in the particles of a rhyolithicorigin.

Table1. Numberof Basaltic andRhyolithic Particles FromEachoftheLate

tephras fromIceland[Sejrup etal., 1989].Sjoholm etal. [1991] QuaternaryAsh Zones intheNorwegian SeaIncluded inThisStudy identifiedfourdistinctashzones,denoted A, B, C, andD, in coreP57-7fromtheIcelandPlateau(latitude68øN,longitude Basaltic Rhyolithic 14øW; waterdepth,1620m).Theuppermost ashzone,A, spans theoxygen isotope stage1 of Emiliani[1970](12-0kyrB.P.); Zone A 40 28 zoneB, mostof isotopestage5 (128-71kyr B.P.);zoneC, most

Zone B

33

12

22 26 of isotope stage7 (245-186kyrB.P.);andzoneD, thetoppart Zone C 13 9 of isotope stage11(423-362kyrB.P.)andbasalpartof stage Zone D 10(362to about339kyrB.P.).It is thusapparent thatdeposiA totalof 183particles wereanalyzed. AshzoneA isfromtheoxygen tion of the volcanic ashtookplaceonlyunderinterglacial isotope stage 1' zoneB, fromstage 5; zoneC, fromstage 7; andzoneD,

conditions.During glacialperiods,the volcanismon Iceland

fromstage 11.ThedataarefromSjoholm etal. [1991 ].

MALMGREN AND NORDLUND:ARTIFICIALNEURAL NETWORKS

507

eachotherin sucha way thateachof the neuronsin onelayer hasa weighted connection to eachof the neurons in the next A backpropagation networkconsists of an interconnected layer(Figures2a and 2b). The weightterm associated with series of layerscontaining neurons (Figure2a).Eachneuron in eachconnection describes the type (negativeor positive)and thefirstlayeris connected to eachof theelements in aninput strengthof the connection. vector,while the neuronsin the last layercorrespond to the All neuronsin a BP network,exceptfor thosein the first elementsin an outputvector.Betweenthe inputand output layer,are alsoassociated with a transferfunctionanda bias layers,thereareoneormore"hiddenlayers"withanarbitrary term (Figures2b and 2c). The transferfunctionreceivesthe numberof neurons.In our case,there are nine neuronsin the summedsignalsfromtheneurons in thepreviouslayer(or, for inputlayer,corresponding to thechemical elements, andeight the first layer,fromthe inputvector),filtersthese,and sends neuronsin the outputlayer,corresponding to the eightclasses the resulton to the neuronsin the followinglayer (see Appento distinguish (rhyolithic orbasaltic grainsbelonging to oneout dix 1). The transferfunctionmaybe a simplestepfunction,a of fourlayers).The layersin a BP networkare connected to linear function,or a nonlinear(usuallysigmoidal)function.In

BackPropagationNeuralNetworks

a)

input vector

input layer

hidden layer

output layer

output vector

the networkusedin this study,a lineartransferfunctionwas usedin the last (output)neuronlayeranda tan-sigmoid functionwasusedin thepreceding (hidden)layer.The valueof the bias term associatedwith each neuroncan be regardedas a measureof the "importance" of the individualneurons.The biastermis analogous to the constant termin regression equations.

It is primarily the weight terms that contain the "knowledge," or "memory," of a network.Theprocess of traininga networkinvolvesincremental adjustments of the weights in orderto find an optimalformalmappingbetweena setof inputvectorsanda corresponding setof outputvectors. Training thusrequiresa trainingsetwith samples consisting of pairsof input and outputvectors.The optimizationprocesscan be thoughtof as a sophisticated typeof "templatecomparison" in whichthe differencebetweenthe actualoutputandthe desired

b)

outputis usedas the optimization criterion.In the presentapplication,a trainingpair consists of an inputvectorwith the geochemical composition of the ash particlesand an output

Pwl

••-••2

a

c)

a = lin(n)

a

a = tansig(n)

Figure2. (a) A diagram showing thegeneral architecture of a three-layer backpropagation networkwithfiveneurons in the inputlayer,threeneurons in thehiddenlayer,andtwoneurons in theoutputlayer.Eachneuronin thehiddenandoutputlayers receivesweightedsignalsfrom eachneuronin the previous layer.(b)A diagramshowing a singleneuronin a backpropagationnetwork.In forwardpropagation, the incoming signals fromtheneurons of the previouslayer(p) are multipliedwith theweights of theconnections (w) andsummed. Thebias(b) is thenadded,andthe resultingsumis filteredthroughthe trans-

fer functionto producethe activity(a) of the neuron.This is senton to thenextlayeror, in the caseof the lastlayer,representstheoutput.(c) A lineartransferfunction (left) anda sigmoidaltransferfunction(right).

repeatedfor eachtrainingpair in the trainingset: forward propagation andbackpropagation. Thesearerepeated untilthe differencebetweenthe targetoutputand the computedoutput reaches a presetlowerthreshold. The backpropagation stepis sometimes repeatedonlyoncefor everyrun throughall of the samples, i.e., oncefor each"epoch."

•r!•jwn a

vectorwith the eightashclasses. Trainingof a BP networkinvolvestwo main stepsthat are

Forward Propagation

Eachsampleinputvectorthatthenetworkis presented with is propagated throughthe networkwhile beingmodifiedand filteredby the weightsof the connections andby the transfer functionsof the neurons(Appendix1). For eachneuronin the hiddenandoutputlayers,all incomingsignalsn whichare the signalsfromthe connecting neurons multipliedwith the respective connective weights n aresummed andfilteredthrough the neuron'stransferfunction.The resultingactivityof the neuron is thenusedas inputto the next layer.The activitiesproducedby theneurons in theoutputlayerconstitute thefinalre-

sult.Forwardpropagation is performed alone,withoutthe followingbackpropagation step,whennmningan alreadytrained network.

Back Propagation

The difference,or "error,"betweenthe outputvectorresult-

ing from the forwardpropagation stepand the desiredtarget vectoris computed.The errorvalues,or their derivatives,are

508

MALMGREN

AND NORDLUND: ARTIFICIAL NEURAL NETWORKS

usedto incrementally adjustthe weightsbetweenthe output layerandthelastof thehiddenlayersaccording to a learning algorithm basedonthegradient-descent method. Foreachlayer thereafter,goingbackwardthroughthe network,the values usedfor adjusting theweightsaretheerrortermsin theimmediatelysucceeding layer.Sincewearegoingbackward through thenetwork,thesetermshavealreadybeencomputed. Thesize oftheincremental adjustments oftheweights is determined by the learningrate,whichvariesbetween 0 and1. Toohigha learningratemayresultin the networkbeingunableto converge,whiletoolow a learning ratewill resultin excessively slowlearning.

particles) oftheoriginal observations. Theremaining 80%(146 particles) wasusedastraining sets. Wethenemployed a crossvalidation technique forestimating theabilityof theclassifiers tocorrectly predict theclass referability ofthetestsetsamples [Stone,1974;WeissandKapouleas, 1989].The errorrates computed arethusaverage ratesofmisclassification (%) forthe five different test sets.

Results

Network Configuration

It hasbeenshown thattheuseofmore thanonehidden layer

Severaldifferentmodifications of the back propagation inaBPnetwork israrely beneficial [see, forexample, Masters,

processhave beenproposedin orderto enhancethe rate and

1993], and we thereforeconcentrated our evaluationson a

qualityof learning. Amongthese,"momentum" and"adaptive three-layer configuration (oneinputlayer,onehidden layer, learning" wereapplied tothenetworks usedin thisstudy. andoneoutputlayer;cf.Figure2a).Thenumber of neurons in

theinputlayeris ninein thiscase, corresponding to thenine chemical elements, andthenumber of neurons in theoutput Learningin a BP networkis basedon the gradient-descent layeriseight(thenumber ofclasses topredict). Thenumber of

Momentum

method,thatis, theweightsareadjusted sothatthechanges at eachtime will followthe steepest "downhill"directionon the errorsurface.By makingthe adjustments partiallydependent uponpreviouschanges,the risk of gettingcaughtin local minimacanbe reduced(thenetworkis allowedto "slide"outof localirregularities in the errorsurface).A momentum valueof zero meansthat changeswill dependonly uponthe current gradient, whilea valueof 1 will ignorecurrentgradient completely.A valueof about0.90-0.95is commonly applied.

neurons in thehiddenlayer,however, is notfixed.In orderto

determine theoptimal configuration forourparticular applica-

tion,theperformance of networksin whichthe numberof hid-

denneurons wassuccessively increased by 3 (i.e.,3, 6, 9...., 33), wasanalyzed. As shownin Figure3, theerrorratesdecrease gradually foranincreasing number of neurons, withthe lowest errorrateobtained for24neurons (10.0%). Forgreater numbers ofneurons (27,30,and33),noimprovement in error rateisnoticed. Wetherefore drawtheconclusion thattheoptimumprediction ofparticles isobtained fora configuration with AdaptiveLearning 24 neurons in thehiddenlayer.It should be noted,however, withasfewasninehidden neurons yields Adaptivelearning meansthatthelearningrateis dynami- thata configuration

callyadjustedto suitcurrentconditions. In brief, it worksas

almost as low error rates.

follows: If theerrorincreases fromonelearning cycleto the next,thenthelearningrateis decreased. If, ontheotherhand,

theerrordecreases, thenthelearning rateis increased. Adaptivelearning decreases thetotallearning time(i.e., munber of training cycles) bykeeping thelearning rateashighaspossible whileatthesame timemaintaining a highlearning stability.

60-

50-

40-

It should benotedthatin orderfora BPnetwork to perform well asa classifier, thetrainingdatashould be representative 30of theexpected rangeandvariability of thedatato whichit will beapplied. It should alsobenotedthattheoretically, a BPnetworkis ableto perfectly learnanypattern,including random 10noise,provided thata sufficient number of neurons (withnonlineartransfer functions) is included andprovided thattraining timeis sufficiently long.It is, however, onlythegeneralized knowledge thatis usefulwhenapplying the networkon sam0 • • ; 1'2 1'5 1'8 11 2'4 2•7 3'0 3•3 plesoutside thetrainingset.Specific knowledge, suchasthat Number of neurons of random noise,is useless sinceit is validonlywithinthe trainingset.Goodintroductory textson artificialneuralnetFigure3. Changes in errorrate(percentages of misclassificaworks,in general,areBealeandJackson[1990]andCaudill tionsin thetestset)fora three-layer backpropagation network [1990].

withincreasing number of neurons whenapplied to training-

Estimates of Error Rates

testset 1 (80:20%training-test setpartition).Errorrateswere determined foranincremental seriesof 3, 6, 9..... 33 neurons in

Thesuccess ofa classifier maybedetermined bycomputing thepercentage ofmisclassifications, the"errorrate,"ofpredictionsin a datasetwhichis notpartof thetraining set.Instead ofrelyingona singletestsetforestimating theperformance of the variousclassifiers, whichmaybe misleading [Weissand Kapouleas,1989], we createdfive different randomtest sets

fromouroriginaldata.Eachsuchtestsetcontained 20% (37

thehidden layer.Errorrateswerecomputed asaverage rates basedon ten independent trials with differentinitial random

weights andbiases. Theerrorratesrepresent theminimum errorobtained forrunsof 300,600,900,andupto 9000epochs. Theminimum errorrate(9.2%)wasobtained fora configurationwith24 neurons in thehidden layer,although thereis a majorreductionalreadyat nineneurons.

MALMGREN

AND NORDLUND: ARTIFICIAL NEURAL NETWORKS

Number of Training Cycles

In additionto findingtheoptimalnumberof hiddenneurons, we neededto find the optimalnumberof trainingcycles.Too few cyclesresultin poorlearning,whiletoomanycyclesresult in "overlearning," i.e., learningbecomestoo specific(see above).We performed ten separate trialsfor networks withdifferentrandominitial weightsand biases.In orderto monitor the improvement in learning,we trainedeachnetworkiterativelyover30 intervalsof 300 epochseach.The networkswere thustrainedover9000 epochs.We recordedthe errorratesafter eachinterval.The averageof the minimumerrorratesfor eachof the ten trials was then computed.In caseswhere the minimumerrorrateswere reachedafter 8000 epochs,we continuedthe trainingover 20 more 300-epoch-long intervalsto make surethat no furtherimprovementin learningoccurred beyond9000epochs. In nocasedidanyfurtherreduction in error rate take place,whichimpliesthat monitoringovertotally 9000epochsshouldbe sufficient.

509

neuronsconnectedto eachof eight outputneurons)are free to vary.In addition,thebiases,24 in the hiddenlayerand8 in the outputlayer, are also subjectto modificationas trainingproceeds.

For estimatesof the fivefoldaverageerrorrates,we usedthe samestrategyas in the determination of the optimumnumber of hiddenneurons.We thusran ten independent trials for each of the five test sets,and we usedthe averageminimumerror rate as the estimateof overall error rate. The minimumpercentagesof misclassification for the five setswere foundto rangebetween5.7 and 11.4% (Table 2; Figure5), yieldingan averagerate of 9.2%. Hence,on average,90.8% of the observationscouldbe allocatedto the correctparticletype and ash zoneusinga BP network. Statistical Pattern-RecognitionTechniques

Brief descriptions of the statisticalpattern-recognition techniquesused for comparisonare given in Appendix2. Error ratesfor LDA, the k-NN technique,and SIMCA for eachof the five training-testpartitionsare shownin Table2 (cf. Figure5). Learning The LDA fails to assignbetweena third and a half of the parAs discussed before,the back propagationlearningrules ticlesto their properclasses(error rams27.0-48.6%), and the adjusttheweightsandbiasesto minimizetheerrorbetween the fivefoldaverageerror rate amountsto 38.4%. Thus only 51.4actualnetworkoutputandthe desiredoutputaccording to the 73.0% of the grainscouldbe correctlyclassified.The LDA is gradient-descent procedure. Figure4 illustratesthechange in thusnot as successful as the neuralnetworksin discriminating errorratefor trainingset 1 duringlearningusinga three-layer amongthe particles.This is not surprising,considering the network with 24 neuronsin the hidden layer. During the greatoverlapsamongpointsof both typesof particles(Figure learningprocess, theerrorratein thetrainingsetsuccessively 1). ^ linear methodwould not be anticipatedto handlethe decreased to a minimumof 2.1% after 7500 epochs.After 7500 partitioningof theseclustersin an optimalfashion. epochs,no furtherimprovement in errorratewasnoticed.The The k-NN techniquegivesa slightlybetterfivefoldaverage minimumerror rate in the test set (10.8%) was obtainedafter error rate, 30.8%, than the LDA, but the result is still inferior 3300 epochs,indicatingthat mostlynoisewas learnedduring to that of the neuralnetwork.In the individualpartitions,error epochs3300 to 7500. ratesvariedbetween27.0 and40.5% (Table 2; Figure5). In the networkarchitectureused here, 216 weightsin the The fivefoldaverageerrorrateproducedby SIMCA, 28.7%, hiddenlayer(nine inputneuronsconnected to eachof 24 hidis similar to that of the k-NN technique(individualratesare den neurons)and 192 weightsin the outputlayer (24 hidden between 21.6 and 35.1%; listed as SIMCA 1 in Table 2 and

Figure5). However,theanalysisindicatesthatmostof the particlesthat were correctlyassignedcouldequallywell be refer-

2O

able to one or severalother classes,sincetheir SIMCA models

were sufficientlysimilar to them. If error ratesthat also take into consideration thoseparticlesthat were not unequivocally allocatedto one singleclassare computed,theyturn out to be veryhigh,between83.8 and 100.0%with a fivefoldaverageerror rate of 90.8% (reportedas SIMCA 2 in Table2 andFigure

15-

5). Error

Rates in Individual

Classes

5-

O-

2000

4000

6000

8000

Number of trainingcycles

Figure 4. Changesin the errorrate (percentages of misclassifications)in the trainingset with increasingnumberof epochs in the first out of ten trials in training set 1. This networkhad 24 neuronsin the hidden layer, and the network error was monitoredover 30 subsequent intervalsof 300 trainingepochs each. During training, the error rate in the training set decreasedfrom 18.5% after 300 epochsto a minimumof 2.1% after 7500 epochs.The minimum error rate in the test set (10.8%) wasreachedafter 3300 epochs.

To get an ideaaboutthe success of the variousclassifiersin correctlypredictingparticlesfrom each of the individualash zones,error rateswere also computedfor each of the eight classes(Table 3). For the BP network,the configurationwith 24 neuronswas used.As expected,the BP networkgivesthe overall best result. Even if error rates for some zones are lower

for classifiersotherthan the BP network,for example,k-NN givesa lowerrate for basalticparticlesin zoneC and SIMCA error rate is lower for rhyolithicgrainsin zoneD, the BP network by far produces the overallmostbalancedsolution. Although the BP network showedthe lowest error rate (0.4%) for rhyolithicparticlesfrom zoneA, particlesreferable to zone B were generallythe easiestto predict (error rate of 3.7% for rhyolithicparticlesand 5.7% for basalticparticles).

510

MALMGRENAND NORDLUND:ARTIFICIALNEURALNETWORKS lOO

_

. I

_

-

90 • Test Test Test Test Test

80 70

set set set set set

1 2 3 4

5

Average

60 '"" 50 -

ß.-

o

_

40

•._

-

30 20

-

_

_

--

10

--

_

o

BP

LDA

k-NN

SlMCA

1

SlMCA 2

Figure 5. Errorrates(percentages of misclassification in thetestsets)for eachof the five independent trainingtestsetpartitions(80% trainingsetand20% testsetmembers) andaverageerrorratesoverthefive partitionsfor a three-layer backpropagation (BP) neuralnetwork,lineardiscriminant analysis, thek-nearest neigbors technique (k-NN), andSIMCA. Neuralnetworkresultsarebasedonten independent trialswith differentinitial conditions. Errorratesforeachtestsetarerepresented bytheaverage of theminimumerrorratesobtainedduringeachof the ten trials,andthe fivefoldaverageerrorratesare the averages of the minimumerrorratesfor the variouspartitions.

ZonesC and D were slightlymoredifficultto classify(error rate 19.0 and 15.0%, respectively,for basalticparticles,and 15.0and21.9%, respectively, forrhyolithicparticles).

Applicationto Core G52-39 Additionalgeochemical dataonashparticlesfromcoreG5239, drilled east of Iceland,were presentedby Sj'oholmet al.

[1991]. Thesegrainswerederivedfromtwo levels,19-21cm and 23-25 cm. At the lower level, grainsof bothbasalticand

rhyolithicoriginwereanalyzed,whereasonly basalticgrains wereanalyzedfromthe upperlevel.Bothsampleswerecorrelatedto the samevolcanicsystemas zoneA in coreP57-7 by Sj'oholrn et al. [1991].

We appliedthe BP networkwith 24 neuronsin an attemptat classifyingthesegrains.We testedthe resultsof elevenrandomlychosenexperiments usedin the estimatesof fivefoldaverageerrorrates(outof 50 experiments total, 10 trialsper each of the five training-test partitions),andwe assigned eachof the particlesto the classreferredto by the majorityof experiments. The neuralnetworkconfirmsthe inferencemadeby Sjoholmet al. [1991] in assigningthe majorityof bothbasalticand rhyolithicgrainsto zoneA. Thusall of the rhyolithicparticles(10 out of 10) from the lower level were foundto be mostsimilarin geochemical composition to grainsof thistypefromzoneA in referencecoreP57-7. Similarly,73% of the basalticgrainsin the upperlayer(8 out of 11), and86% of the basalticgrainsin thelowerlayer(6 outof 7) wereassigned to zoneA.

Table 2. ErrorRatesin Eachof Five Training-TestSetPartitions,FivefoldAverageErrorRatesin the TestSets,ano95% ConfidenceIntervalsfortheFivefoldAverageErrorRatesfortheTechniques Discussed in ThisPaper Error Rates,%

BP LDA k-NN SIMCA SIMCA

1 2

Fivefold

Set I

Set 2

Set 3

Set 4

Set 5

10.0 43.2 40.5 29.7 97.3

7.6 48.6 27.0 35.1 100.0

5.7 27.0 27.0 27.0 91.9

11.4 35.1 32.4 29.7 83.8

11.4 37.8 27.0 21.6 81.1

Error Rate, % 9.2+ 3.1 38.4+ 10.2 30.8+ 7.4 28.7+ 6.1 90.8+10.2

Thefivefoldaverageerrorratesweredetermined astheaverage errorratesoverfive independent trainingandtestsetsusing80% trainingand20% testpartitions. Errorratesfor theneuralnetworks areaverages oftentrialsfor eachtraining-test setpartitionusingdifferent initialconditions (randominitialweights andbiases). Theminimum fivefolderrorrateforthebackpropagation (BP) networkwasobtainedusing24 neuronsin the hiddenlayer.Apartfromregularerrorratesfor softindependent modelingof class analogy(SIMCA 1), thetotalerrorratesfor misclassified observations andobservations thatcouldbereferable to oneor several othergroupsarereported underSIMCA 2. LDA represents lineardiscriminant analysis, andk-NN, k-nearest neighlx)r.

MALMGREN AND NORDLUND: ARTIFICIAL NEURAL NETWORKS

511

Table 3. AverageErrorRates(Percentages) for BasalticandRhyolithicParticles in AshZonesA ThroughD Type of Zone

Particle

N

BP

LDA

k-NN

$IMCA

A

basaltic

7-13

13.5

15.6

40.3

24.8

96.9

A

rhyolithic

5-10

0.4

13.3

94.0

B

basaltic

3-13

5.7

30.6

18.1

11.5

88.3

B

rhyolithic

2- 6

3.7

70.0

76.7

33.3

86.7

C

basaltic

1- 5

19.0

C

rhyolithic

3- 8

8.7

9.3

7.3

1

$IMCA

29.0

15.0

29.0

88.0

88.3

25.2

49.3

81.3

D

basaltic

1- 2

15.0

50.0

50.0

37.5

75.0

D

rhyolithic

1- 4

21.9

68.8

56.3

18.8

75.0

2

ASbefore,errorratesareaverageerrorratesoverfive experiments basedon 80% trainingsetmembersand20% test setmembers. N is therangeof samplesizesin theseexperiments. For abbreviations andotherdetails,pleasereferto Table 2.

Summaryand Conclusions We applieda backpropagation neuralnetworkto a problem of assigning183 volcanicashparticlesof two differentorigins, basalticand rhyolithic,to four late Quaternaryvolcanicash zonesin the NorwegianSea.Differencesin geochemical compositionof nine chemicalelementsamongthesezoneshave beensuggested to be of stratigraphic significance [Sjohotmet at., 1991]. We wishedto test whethera neural networkwas ableto distinguishthe eightclassesof ashparticlesfrom these variouszones.The predictivepowerof the networkwe applied was compared with that of threestatisticalpattern-recognition techniques (lineardiscriminant analysis,the k-nearestneighbor technique,andsoftindependent modelingof classanalogy).Error rateswere computedbasedon averagepercentage of misclassifications. The optimumconfiguration of the BP network, as well as the optimumnumberof trainingcycles,was establishedfromexperiments. The averageerrorrate for the BP networkwith its optimum configuration is 9.2%. On average,33.6 outof 37 particlesin a test set couldthusbe correctlyclassifiedby the BP network. Noneof the statisticalpattern-recognition techniques managed nearlyaswell astheneuralnetworkto classifythe ashparticles to their appropriatezones.The averageerrorrate is 38.4% for LDA, 30.8% for the k-NN technique,and28.7% for SIMCA. If thoseparticlesthat were shownby SIMCA to alsobe referable

rameters were set as follows: error goal=0.001, learning rate=0.01,increasein learningrate=1.05,multiplierto decrease learning rate=0.7, degreeof momentum=0.95,and error ratio= 1.04. Most of thesesettingsare in accordance with the recommendations providedby Demuthand Beale [1994]. The networkconfiguration was one input layerwith nine (passive) neurons,one hiddenlayer with 24 neuronswith tan-sigmoid transferfunctions,andoneoutputlayerwith eightneuronswith lineartransferfunctions. In additionto momentum andadaptive learning,'•qguyen-Widrow initialization"was employed.This involvesadjustingthe rangesof the randomnumbersinitially assigned to the weightsaccording to the rangesof the expected inputvalues,whichmayspeedup trainingconsiderably. In our case, this initialization procedureonly affectedthe neurons havinga sigmoidaltransferfunction.During training,adjustmentsof weightsandbiastermsweredoneoncefor everyrun throughall of the trainingset,i.e., oncefor eachepochandnot oncefor eachsample.

Appendix 2: Brief Descriptionsof PatternRecognitionTechniquesUsedfor Comparison Linear Discriminant Analysis(LDA)

Normally, discriminantanalysisamountsto establishing

linear functions,representinga planar surfacewith p-one dito one or several of the SIMCA models for other classes are mensions(p=the numberof variables),that optimallydistintaken into account,the SIMCA errorrate is increasedto 90.8%. guishtwo predefinedgroupsof observations. In the caseof the Applicationof the resultsof theBP networkto geochemical ashzone samples,we assignednew observationvectorsto the dataon ashparticlesfromtwo levelsin coreG52-39,drilledto predefinedgroupsthroughcomputations of Mahalanobis' genthe east of Iceland [Sjohotmet at., 1991], indicatedthat the eralizeddistancesbetweenthesevectorsand groupmeanvecmajorityof basalticandrhyolithicparticlesfromthesesamples tors[CooleyandLohnes,1971].Thesedistancemeasures were were assignableto ash zone A. This confirmsthe resultsof thenconverted to probabilitiesof referabilityto the predefined Sjohotrnet at. [1991]. groups.MalrngrenandKennett[1977]appliedthisprocedure to In conclusion,we have demonstratedthat the BP neural a taxonomic problemin recentplanktonicforaminifera. networkis considerably moresuccessful in classifyingparticles from the variousash zonesthan standardstatisticalpatternK-Nearest Neighbors(k-NN) recognitiontechniques.Consideringthe complex nonlinear The k-nearestneighboris a conceptually simpletechnique natureof mostreal-worlddata and consideringalsothe potential of neural networksto handle suchdata, it is reasonableto

based on the Euclidean distance between observations in mul-

assumethat methodssimilarto neuralnetworksmay become veryusefulin the areaof paleoceanography.

tidimensional space[KowalskiandBender,1972].The allocation of the test set membersto the trainingset classesis dependentupon the distancesof the k shortestEuclideandistancesbetweenthesesets.In theapplication to the ashdata,we setk equalto 3, and we monitoredthe distancesfrom eachof the testsetmembersto eachof the trainingsetmembers.A test set memberis referredto that trainingset classto which the

Appendix 1' TechnicalDetailsAbout the Neural Network Used in This Study The softwareused in this studywas MATLAB's Neural Network Toolbox(MathWorksIncorporated).The variouspa-

512

MALMGREN

AND NORDLUND:

ARTWICIAL

majority(two or three)of the threeclosest trainingsetmembersbelonged, asindicated bytheEuclidean distances.

NEURAL

NETWORKS

Chayes, F., Oncorrelations between variables of constant sum,J. Geophys.Res.,65, 4185-4193,1960.

Cooley, W.W.,andP.R.Lohnes, Multivariate DataAnalysis, 364pp., SoftIndependent Modelingof ClassAnalogy(SIMCA) SIMCA may involveany or severalof four distinctlevels [Wold,1976;WoMet al., 1984].Level1 of S]MCAis devoted

JohnWiley,New York, 1971. Demuth, H., and M. Beale, Neural Network Toolbox for Use with

MATLAB, 158pp.,MathWorks, Natick,Mass.,1994.

Emiliani,C., Pleistocene palcotemperatures, Science,168, 822-825, 1970.

todeveloping mathematical rulesforeachofa number ofpreset C.M.,A pattern recognition approach tothederivation of geogroups (termedclasses in SIMCA)in thetrainingsetby fitting Griffiths, separate R-modeprincipalcomponent modelsto eachof them.

logicalinformation fromdrillprocess monitoring, Underwater Tech-

nol.,winter1984, 2-13, 1984.

The dimensionality of a principalcomponent solutionis deHaugen, J.-E.,H.-P.Sejmp, andN.B.Vogt,Chemotaxonomy of Quatertermined by a cross-validation technique. In level2, the prenarybenthie foraminifera usingaminoacids, d. Foraminiferal Res., dictionphase,theserulesareusedto assign newobservations 19, 38-51, 1989.•' Kowalski, B.R.,andC.F.Bender, TheK-nearest neighbor classification to anyof thegivenclasses onthebasisof theirdegree of fit to

role(pattern recognition) applied to nuclear magnetic resonance specthe variousclassmodelsusinga distancemeasure.At both tralinterpretation,Anal. Chem.,44, 1405-1411,1972. levels,atypicalobservations ("outliers"),that is, observations Malmgren,B.A•andJ.P.Kennett,Biometricdifferentiation between recent with a datastructure thatdoesnotaccordwith a classmodel, Globigerina hulloides andGlobigerina falconensis in thesouthern

maybeidentified. In thisway,observations ofunknown affinity thatcannot beclassified withanytrainingsetclassmaybeinterpreted asbeingreferableto a yetunknown class.

Levels3 and4 of SIMCAaredesigned forquantitative predictions of oneor several variables froma multivariate setup through partialleastsquares (PLS)models [Wold,1982].These levels of SIMCA are not used here.

IndianOcean, J. ForaminiJ•ral Res.,7, 130-148,1977.

Masters, T., Practical NeuralNetwork Recipes in C++, 490pp.,Academic,SanDiego,Calif., 1993.

Pearson, K., Mathematical contributions tothetheory of evolution: Ona formofspurious correlation whichmayarisewhenindices areusedin themeasurement oforgans, Proc.R. Soc.London,60,489-498,1897. Penn,B.S.,A•J.Gordon, andR.F.Wendlandt, Usingneural networks to locate edges andlinearfeatures in satellite images, Cornput. Geosci.,

19, 1545-1565, 1993. Sofar,SIMCAmodeling, originally developed fordataanalysis A•,A pattern recognition approach togeophysical inversion using in the field of chemometry, hasbeenappliedto geological Raiche, neuralnets,Geophys. J. Int., 105,629-648,1991. problems by, for example,Griffiths[1984],Haugenet al.

Rock, N.M.S., Numerical Geology, 427pp.,Springer-Verlag, NewYork,

[1989],and Wei[1994]. Relative

1988.

Rogers, S.J.,J.H.Fang,C.L.Karr,andD.A•Stanley, Determination of

abundance data

lithology fromwelllogsusinga neuralnetwork, Am.Assoc. Pet.Geol. Bull., 76, 731-739, 1992.

Relativeabundance data,addingup to a unit valuein indiSejmp,H.-P.,J. Sjoholm, H. Fumes, I. Beyer,L. Eide,E. Jansen, andJ. vidualsamples, havelongbeenknownto besubject to thesoMangerud, Quaternary tephrachronology ontheIceland Plateau, north of Iceland, J. Quat.Sci.,4, 109-114,1989. calledconstant-sum constraint [Pearson, 1897;Chayes, 1960]. J.,H.-P.Sejmp, andH. Fumes, Quaternary volcanic ashzones In ourapplications of LDA andSIMCA,we useda log-ratio Sjoholm, transformation to

relieve

the

constant-sum constraint

ontheIceland Plateau, southern Norwegian Sea,J. Quat.Sci.,6, 159173, 1991.

[Aitchison,1981, 1986]. Log-ratiotransformations imply Stone, M., Cross-validatory choice andassessment of statistical predicmathematical operations in a so-called simplex space, constituttions,J. R. Stat.Soc.,36, 111-147,1974. ing a limitedpart of the originalp-dimensional Cartesian Wasserman, P.D.,NeuralComputing - Theory andPractice, 230pp., space.

Van NostrandReynold,New York, 1989.

Wei,K.-Y.,Statistical pattern recognition inpaleontology using SIMCAMACUP,,/.Paleontol.,68, 689-703, 1994. Acknowledgments.We appreciatecomments on an earlierversionof

thisarticlefromHansThierstein, ETH Zentrum, Z•rich,andCajoter Braak,Agricultural Mathematics Group,Wageningen, Netherlands. This researchwas supported by grant G-GU 04076-321 from the Swedish Natural Science Research Council to B.A•M.

Weiss, S.M.,andI. Kapouleas, Anempirical comparison of pattern recognition, neuralnets,andmachine learning classification methods, in Proceedings ofthe1lth International JointConference onArtificial Intelligence, edited byN.S.Sridharan, pp.781-787, Kaufmann, Calif., 1989.

Wold,S.,Pattern recognition bymeans of disjoint principal component models, PatternRecognit., 8, 127-139,1976.

References

Aitchison, J.,A newapproach to nullcorrelations of proportions, J. Int. Assoc.Math. Geol.,13, 175-189,1981.

Aitchison, J.,TheStatistical Analysis of Compositional Data,416pp., ChapmanandHall, New York, 1986.

Baldwin, J.L.,D.N.Otte,andC.L.WheatIcy, Computer emulation ofhu-

Wold,H., Soilmodeling: Thebasic design andsome extensions, inSystemsUnderDirectObservation, edited byH. Jrreskog andH. Wold, pp. 1-53,North-Holland, New York, 1982.

Wold,S.,C. Albano, W.J.DunnIII, K. Esbensen, S.Hellberg, E.Johansson,W.Lindberg, andM. Sjrstrrm, Modelling datatables byprincipal components andPLS:Classpatterns andquantitative predictive relations,Analusis,12, 477-485,1984.

man mentalprocess:Applicationof neuralnetworksimulations to

problems inwellloginterpretation, SPEdSoc.Pet.Eng.,19619,481493, 1989.

Baldwin, J.L.,R.M.Bateman, andC.L.WheatIcy, Application of neural network totheproblem ofmineral identification fromwelllogs, Log Anal., 3, 279-293, 1990.

Beale, R.,andT. Jackson, NeuralComputing: AnIntroduction, 240pp., AdamHilger,Bristol,England,1990.

Caudill, M.,Neuralnetworks primer,I-VIII, AI Expert, 1990.

B.A•Malmgren, Department of MarineGeology, EarthSciences Cen-

tre, University of G/Steborg, S-41381 Grteborg,Sweden. (e-mail: bjom.malmgren@marine-geology. gu.se)

U. Nordlund, Institute of EarthSciences, University of Uppsala, Norbyvagen 22,S-75236Uppsala, Sweden. (e-mail: ulf.nordlund•pal.uu.se) (Received October16, 1995;revised March26, 1996; accepted April 15, 1996.)