PALEOCEANOGRAPHY, VOL. 11, NO. 4, PAGES 505512, AUGUST 1996
Application of artificial neural networks to chemostratigraphy Bj6rn A. Malmgren Department of MarineGeology,EarthSciences Centre,UniversityofG6teborg,G6teborg,Sweden
Ulf Nordlund Instituteof EarthSciences, Universityof Uppsala,Uppsala,Sweden
Abstract. Artificial neuralnetworks,a branchof artificial intelligence,are computersystems formedby a numberof simple,highlyinterconnected processing unitsthat havethe abilityto learn a setof targetvectorsfrom a setof associated input signals.Neural networkslearnby selfadjustinga setof parameters,usingsomepertinentalgorithmto minimize the errorbetweenthe desiredoutputand networkoutput.We explorethe potentialof this approachin solvinga problem involvingclassification of geochemical data.The data,takenfrom the literature,are derived from four late Quaternaryzonesof volcanicashof basalticand rhyolithicorigin from the NorwegianSea.Theseashlayersspanthe oxygenisotopezones1, 5, 7, and 11, respectively (last 420,000 years).The data consistof nine geochemicalvariables(oxides)determinedin eachof 183 samples.We employeda threelayerbackpropagationneuralnetworkto assess its efficiency to optimallydifferentiatesamplesfrom the four ashzoneson the basisof their geochemicalcomposition.For comparison,threestatisticalpatternrecognitiontechniques,linear discriminant analysis,the knearestneighbor(kNN) technique,and SIMCA (softindependent modelingof classanalogy),were appliedto the samedata.All of theseshowedconsiderably highererror rates than the artificial neuralnetwork,indicatingthat the backpropagationnetworkwasindeedmore powerfulin correctlyclassifyingthe ashparticlesto the appropriatezoneon the basisof their geochemicalcomposition. Introduction
Neural networks,a branchof artificial intelligence(AI), are computersystemsthat have the ability to "learn."They learn sometargetvaluesor vectorsfroma setof associated inputsignals throughiterativeadjustments of a set of parameters.This is accomplished throughminimizationof the errorbetweenthe networkoutputand the desiredoutputfollowingsomelearning rule. Whereasmultivariatestatistical approaches alwaysproducethe sameresultwhen appliedto the samedataset, a neural networkis morelike a livingsystemin thatvariousanalyses most likely will not produceexactly the sameresult. Neural networkshave foundtheir inspirationfrom the way that the mammalianbrain operates;however,they are able to mimic only the most elementaryfunctionsof the biologicalneuron [Wasserman,1989]. The historyof neural networksdates back to the 1940s, when the first learninglaw for traininga neuralnetworkwas described[Wasserman,1989]. The field underwentrapid expansionduringthe 1980s.Neural networkshavebeenapplied to solvingproblemsin a wide varietyof fields,for example,in the aerospaceindustryas aircraft autopilots,for oil and gas exploration,in the defenseindustryfor weaponsteeringand
targettracking,and in medicinefor breastcancercell analysis [Demuthand Beale, 1994]. Applicationsof artificial neural networks to the Earth sciences are still rare. Networks
have
beenapplied,for example,to problemsof well log interpretations [Baldwinet al., 1989, 1990; Rogerset al., 1992], for identification of linearfeaturesin satelliteimagery[Pennet al., 1993],andfor geophysical inversion problems[Raiche,1991]. We exploreherethe potentialof artificialneuralnetworksin the field of chemostratigraphy. We applieda neuralnetwork algorithm,backpropagation (BP), to a probleminvolvingclassificationof geochemical data.$joholrnet al. [ 1991] identified four distinctvolcanicash zonesin late Quaternarysediments from the NorwegianSea. They suggested that geochemical differences amongtheseashzonesmaybe usedas a stratigraphic tool in theNorwegianSeaand surrounding landareas.The objective of our analyseswas to assessthe capabilityof the BP network to learn to differentiate these ash zones on the basis of
the geochemicalcompositionof particles of a basaltic and rhyolithicorigin.We alsoanalyzedhow the neuralnetworkresultscomparewith variousstatisticalpatternrecognition techniques: linear discriminantanalysis (LDA), the knearest neighbortechnique(kNN), and"softindependent modelingof class analogy" (SIMCA). Furthermore, we wanted to test whether the trained neural network was useful for classification
of new sampleswith knowngeochemical composition. Figure 1 showsa plot of the particlesfrom theseashzones alongthe axesof the first two canonicalvariates.Theseaccount
Copyright 1996by theAmerican Geophysical Union.
ifor 98% of thevariabilityin theoriginaldataset,whichcon
Papernumber96PA01237. 08838305/96/96PA01237512.00
sistedof nine variables(nine chemicalelements).(The canoni505
,
506
MALMGREN AND NORDLUND: ARTIFICIAL NEURAL NETWORKS
m
•
B
BB B
DD D
B
A
A BB
DDBD BB C
DDB •E•A
C
D D cC•c
S ,•, BB D
D D
A CA c
e_ 2
c
AC
ACC• ' CC
Rhyolithicgrains
CC
c c C
4
Basalticgrains
5
10
5
0
5
10
First canonical variate
Figure1. Configuration ofgrains referable tothefourlateQuaternary volcanic ashzones, A through D, in the Norwegian Seadescribed bySjoholmet al. [1991] alongfirstandsecond canonical variateaxes.The canonical
variate analysis isbased onthegeochemical composition ofindividual ashparticles (ninechemical elements were
analyzed: Na20,MgO,A1203, SiOn, K•O,CaO,TiO•,MnO,andFeO).Twotypes ofgrains, basaltic andrhyolithic, were distinguished within each zone. Thisplane, accounting for98%ofthevariability among group mean vectors in ninedimensional space (thefirstaxisrepresents 95%),distinguishes basaltic andrhyolithic grains. Apart from basaltic grains from zone C,which maybedifferentiated from such grains from theother zones, grains ofthesame typeareclearly overlapping withregard togeochemical composition among thezones.
cal variatesareproduced througha lineartransformation of the wasprobablynot explosivebecauseit occurred underthe ice originalvariables,allowingfor summarization and visualiza cap[Sjoholm etal., 1991]. tion of multidimensional datasets[see,for example,Rock, Sjoholm etal. [1991]identified fourtypesof ashparticles: 1988,p. 261].) Two clusters maybe distinguished, onebasaltic colorless particles, palebrownparticles, brownblocky parti
clusterandonerhyolithic cluster.Apartfrombasalticgrains cles,andblackblocky particles. Thefirsttwotypeswererefrom zone C, no clear distinctionsof ash zonesare seenwithin
the clusters.Considering the lack of distinctsubclusters, it is relevantto ask the questionof how well an artificial neural networkcanreallybetrainedto recognize thesevarious zones. The Data Set
Sjoholmet al. [1991] presented geochemical data for late Quaternary volcanicashzonesfromthe NorwegianSea.The ashis mostprobablyderivedfromvolcaniceruptions on Iceland,sinceits geochemical composition is similarto lavasand
ferredto assilicicandthelasttwoasbasaltic particles. The geochemical analyses were performed on nine chemicalelements,Na•O,MgO, A1203,SiOn,K20, CaO,TiO2,MnO, and
FeOin 183samples total(Table1).Theanalyses displayed a difference in theSiO•content between thebasaltic andrhyolithicparticlesin all of the ashzones;the SiO• waslessthan 5459%in thebasaltic particles andgreater than6871%in the particles of a rhyolithicorigin.
Table1. Numberof Basaltic andRhyolithic Particles FromEachoftheLate
tephras fromIceland[Sejrup etal., 1989].Sjoholm etal. [1991] QuaternaryAsh Zones intheNorwegian SeaIncluded inThisStudy identifiedfourdistinctashzones,denoted A, B, C, andD, in coreP577fromtheIcelandPlateau(latitude68øN,longitude Basaltic Rhyolithic 14øW; waterdepth,1620m).Theuppermost ashzone,A, spans theoxygen isotope stage1 of Emiliani[1970](120kyrB.P.); Zone A 40 28 zoneB, mostof isotopestage5 (12871kyr B.P.);zoneC, most
Zone B
33
12
22 26 of isotope stage7 (245186kyrB.P.);andzoneD, thetoppart Zone C 13 9 of isotope stage11(423362kyrB.P.)andbasalpartof stage Zone D 10(362to about339kyrB.P.).It is thusapparent thatdeposiA totalof 183particles wereanalyzed. AshzoneA isfromtheoxygen tion of the volcanic ashtookplaceonlyunderinterglacial isotope stage 1' zoneB, fromstage 5; zoneC, fromstage 7; andzoneD,
conditions.During glacialperiods,the volcanismon Iceland
fromstage 11.ThedataarefromSjoholm etal. [1991 ].
MALMGREN AND NORDLUND:ARTIFICIALNEURAL NETWORKS
507
eachotherin sucha way thateachof the neuronsin onelayer hasa weighted connection to eachof the neurons in the next A backpropagation networkconsists of an interconnected layer(Figures2a and 2b). The weightterm associated with series of layerscontaining neurons (Figure2a).Eachneuron in eachconnection describes the type (negativeor positive)and thefirstlayeris connected to eachof theelements in aninput strengthof the connection. vector,while the neuronsin the last layercorrespond to the All neuronsin a BP network,exceptfor thosein the first elementsin an outputvector.Betweenthe inputand output layer,are alsoassociated with a transferfunctionanda bias layers,thereareoneormore"hiddenlayers"withanarbitrary term (Figures2b and 2c). The transferfunctionreceivesthe numberof neurons.In our case,there are nine neuronsin the summedsignalsfromtheneurons in thepreviouslayer(or, for inputlayer,corresponding to thechemical elements, andeight the first layer,fromthe inputvector),filtersthese,and sends neuronsin the outputlayer,corresponding to the eightclasses the resulton to the neuronsin the followinglayer (see Appento distinguish (rhyolithic orbasaltic grainsbelonging to oneout dix 1). The transferfunctionmaybe a simplestepfunction,a of fourlayers).The layersin a BP networkare connected to linear function,or a nonlinear(usuallysigmoidal)function.In
BackPropagationNeuralNetworks
a)
input vector
input layer
hidden layer
output layer
output vector
the networkusedin this study,a lineartransferfunctionwas usedin the last (output)neuronlayeranda tansigmoid functionwasusedin thepreceding (hidden)layer.The valueof the bias term associatedwith each neuroncan be regardedas a measureof the "importance" of the individualneurons.The biastermis analogous to the constant termin regression equations.
It is primarily the weight terms that contain the "knowledge," or "memory," of a network.Theprocess of traininga networkinvolvesincremental adjustments of the weights in orderto find an optimalformalmappingbetweena setof inputvectorsanda corresponding setof outputvectors. Training thusrequiresa trainingsetwith samples consisting of pairsof input and outputvectors.The optimizationprocesscan be thoughtof as a sophisticated typeof "templatecomparison" in whichthe differencebetweenthe actualoutputandthe desired
b)
outputis usedas the optimization criterion.In the presentapplication,a trainingpair consists of an inputvectorwith the geochemical composition of the ash particlesand an output
Pwl
••••2
a
c)
a = lin(n)
a
a = tansig(n)
Figure2. (a) A diagram showing thegeneral architecture of a threelayer backpropagation networkwithfiveneurons in the inputlayer,threeneurons in thehiddenlayer,andtwoneurons in theoutputlayer.Eachneuronin thehiddenandoutputlayers receivesweightedsignalsfrom eachneuronin the previous layer.(b)A diagramshowing a singleneuronin a backpropagationnetwork.In forwardpropagation, the incoming signals fromtheneurons of the previouslayer(p) are multipliedwith theweights of theconnections (w) andsummed. Thebias(b) is thenadded,andthe resultingsumis filteredthroughthe trans
fer functionto producethe activity(a) of the neuron.This is senton to thenextlayeror, in the caseof the lastlayer,representstheoutput.(c) A lineartransferfunction (left) anda sigmoidaltransferfunction(right).
repeatedfor eachtrainingpair in the trainingset: forward propagation andbackpropagation. Thesearerepeated untilthe differencebetweenthe targetoutputand the computedoutput reaches a presetlowerthreshold. The backpropagation stepis sometimes repeatedonlyoncefor everyrun throughall of the samples, i.e., oncefor each"epoch."
•r!•jwn a
vectorwith the eightashclasses. Trainingof a BP networkinvolvestwo main stepsthat are
Forward Propagation
Eachsampleinputvectorthatthenetworkis presented with is propagated throughthe networkwhile beingmodifiedand filteredby the weightsof the connections andby the transfer functionsof the neurons(Appendix1). For eachneuronin the hiddenandoutputlayers,all incomingsignalsn whichare the signalsfromthe connecting neurons multipliedwith the respective connective weights n aresummed andfilteredthrough the neuron'stransferfunction.The resultingactivityof the neuron is thenusedas inputto the next layer.The activitiesproducedby theneurons in theoutputlayerconstitute thefinalre
sult.Forwardpropagation is performed alone,withoutthe followingbackpropagation step,whennmningan alreadytrained network.
Back Propagation
The difference,or "error,"betweenthe outputvectorresult
ing from the forwardpropagation stepand the desiredtarget vectoris computed.The errorvalues,or their derivatives,are
508
MALMGREN
AND NORDLUND: ARTIFICIAL NEURAL NETWORKS
usedto incrementally adjustthe weightsbetweenthe output layerandthelastof thehiddenlayersaccording to a learning algorithm basedonthegradientdescent method. Foreachlayer thereafter,goingbackwardthroughthe network,the values usedfor adjusting theweightsaretheerrortermsin theimmediatelysucceeding layer.Sincewearegoingbackward through thenetwork,thesetermshavealreadybeencomputed. Thesize oftheincremental adjustments oftheweights is determined by the learningrate,whichvariesbetween 0 and1. Toohigha learningratemayresultin the networkbeingunableto converge,whiletoolow a learning ratewill resultin excessively slowlearning.
particles) oftheoriginal observations. Theremaining 80%(146 particles) wasusedastraining sets. Wethenemployed a crossvalidation technique forestimating theabilityof theclassifiers tocorrectly predict theclass referability ofthetestsetsamples [Stone,1974;WeissandKapouleas, 1989].The errorrates computed arethusaverage ratesofmisclassification (%) forthe five different test sets.
Results
Network Configuration
It hasbeenshown thattheuseofmore thanonehidden layer
Severaldifferentmodifications of the back propagation inaBPnetwork israrely beneficial [see, forexample, Masters,
processhave beenproposedin orderto enhancethe rate and
1993], and we thereforeconcentrated our evaluationson a
qualityof learning. Amongthese,"momentum" and"adaptive threelayer configuration (oneinputlayer,onehidden layer, learning" wereapplied tothenetworks usedin thisstudy. andoneoutputlayer;cf.Figure2a).Thenumber of neurons in
theinputlayeris ninein thiscase, corresponding to thenine chemical elements, andthenumber of neurons in theoutput Learningin a BP networkis basedon the gradientdescent layeriseight(thenumber ofclasses topredict). Thenumber of
Momentum
method,thatis, theweightsareadjusted sothatthechanges at eachtime will followthe steepest "downhill"directionon the errorsurface.By makingthe adjustments partiallydependent uponpreviouschanges,the risk of gettingcaughtin local minimacanbe reduced(thenetworkis allowedto "slide"outof localirregularities in the errorsurface).A momentum valueof zero meansthat changeswill dependonly uponthe current gradient, whilea valueof 1 will ignorecurrentgradient completely.A valueof about0.900.95is commonly applied.
neurons in thehiddenlayer,however, is notfixed.In orderto
determine theoptimal configuration forourparticular applica
tion,theperformance of networksin whichthe numberof hid
denneurons wassuccessively increased by 3 (i.e.,3, 6, 9...., 33), wasanalyzed. As shownin Figure3, theerrorratesdecrease gradually foranincreasing number of neurons, withthe lowest errorrateobtained for24neurons (10.0%). Forgreater numbers ofneurons (27,30,and33),noimprovement in error rateisnoticed. Wetherefore drawtheconclusion thattheoptimumprediction ofparticles isobtained fora configuration with AdaptiveLearning 24 neurons in thehiddenlayer.It should be noted,however, withasfewasninehidden neurons yields Adaptivelearning meansthatthelearningrateis dynami thata configuration
callyadjustedto suitcurrentconditions. In brief, it worksas
almost as low error rates.
follows: If theerrorincreases fromonelearning cycleto the next,thenthelearningrateis decreased. If, ontheotherhand,
theerrordecreases, thenthelearning rateis increased. Adaptivelearning decreases thetotallearning time(i.e., munber of training cycles) bykeeping thelearning rateashighaspossible whileatthesame timemaintaining a highlearning stability.
60
50
40
It should benotedthatin orderfora BPnetwork to perform well asa classifier, thetrainingdatashould be representative 30of theexpected rangeandvariability of thedatato whichit will beapplied. It should alsobenotedthattheoretically, a BPnetworkis ableto perfectly learnanypattern,including random 10noise,provided thata sufficient number of neurons (withnonlineartransfer functions) is included andprovided thattraining timeis sufficiently long.It is, however, onlythegeneralized knowledge thatis usefulwhenapplying the networkon sam0 • • ; 1'2 1'5 1'8 11 2'4 2•7 3'0 3•3 plesoutside thetrainingset.Specific knowledge, suchasthat Number of neurons of random noise,is useless sinceit is validonlywithinthe trainingset.Goodintroductory textson artificialneuralnetFigure3. Changes in errorrate(percentages of misclassificaworks,in general,areBealeandJackson[1990]andCaudill tionsin thetestset)fora threelayer backpropagation network [1990].
withincreasing number of neurons whenapplied to training
Estimates of Error Rates
testset 1 (80:20%trainingtest setpartition).Errorrateswere determined foranincremental seriesof 3, 6, 9..... 33 neurons in
Thesuccess ofa classifier maybedetermined bycomputing thepercentage ofmisclassifications, the"errorrate,"ofpredictionsin a datasetwhichis notpartof thetraining set.Instead ofrelyingona singletestsetforestimating theperformance of the variousclassifiers, whichmaybe misleading [Weissand Kapouleas,1989], we createdfive different randomtest sets
fromouroriginaldata.Eachsuchtestsetcontained 20% (37
thehidden layer.Errorrateswerecomputed asaverage rates basedon ten independent trials with differentinitial random
weights andbiases. Theerrorratesrepresent theminimum errorobtained forrunsof 300,600,900,andupto 9000epochs. Theminimum errorrate(9.2%)wasobtained fora configurationwith24 neurons in thehidden layer,although thereis a majorreductionalreadyat nineneurons.
MALMGREN
AND NORDLUND: ARTIFICIAL NEURAL NETWORKS
Number of Training Cycles
In additionto findingtheoptimalnumberof hiddenneurons, we neededto find the optimalnumberof trainingcycles.Too few cyclesresultin poorlearning,whiletoomanycyclesresult in "overlearning," i.e., learningbecomestoo specific(see above).We performed ten separate trialsfor networks withdifferentrandominitial weightsand biases.In orderto monitor the improvement in learning,we trainedeachnetworkiterativelyover30 intervalsof 300 epochseach.The networkswere thustrainedover9000 epochs.We recordedthe errorratesafter eachinterval.The averageof the minimumerrorratesfor eachof the ten trials was then computed.In caseswhere the minimumerrorrateswere reachedafter 8000 epochs,we continuedthe trainingover 20 more 300epochlong intervalsto make surethat no furtherimprovementin learningoccurred beyond9000epochs. In nocasedidanyfurtherreduction in error rate take place,whichimpliesthat monitoringovertotally 9000epochsshouldbe sufficient.
509
neuronsconnectedto eachof eight outputneurons)are free to vary.In addition,thebiases,24 in the hiddenlayerand8 in the outputlayer, are also subjectto modificationas trainingproceeds.
For estimatesof the fivefoldaverageerrorrates,we usedthe samestrategyas in the determination of the optimumnumber of hiddenneurons.We thusran ten independent trials for each of the five test sets,and we usedthe averageminimumerror rate as the estimateof overall error rate. The minimumpercentagesof misclassification for the five setswere foundto rangebetween5.7 and 11.4% (Table 2; Figure5), yieldingan averagerate of 9.2%. Hence,on average,90.8% of the observationscouldbe allocatedto the correctparticletype and ash zoneusinga BP network. Statistical PatternRecognitionTechniques
Brief descriptions of the statisticalpatternrecognition techniquesused for comparisonare given in Appendix2. Error ratesfor LDA, the kNN technique,and SIMCA for eachof the five trainingtestpartitionsare shownin Table2 (cf. Figure5). Learning The LDA fails to assignbetweena third and a half of the parAs discussed before,the back propagationlearningrules ticlesto their properclasses(error rams27.048.6%), and the adjusttheweightsandbiasesto minimizetheerrorbetween the fivefoldaverageerror rate amountsto 38.4%. Thus only 51.4actualnetworkoutputandthe desiredoutputaccording to the 73.0% of the grainscouldbe correctlyclassified.The LDA is gradientdescent procedure. Figure4 illustratesthechange in thusnot as successful as the neuralnetworksin discriminating errorratefor trainingset 1 duringlearningusinga threelayer amongthe particles.This is not surprising,considering the network with 24 neuronsin the hidden layer. During the greatoverlapsamongpointsof both typesof particles(Figure learningprocess, theerrorratein thetrainingsetsuccessively 1). ^ linear methodwould not be anticipatedto handlethe decreased to a minimumof 2.1% after 7500 epochs.After 7500 partitioningof theseclustersin an optimalfashion. epochs,no furtherimprovement in errorratewasnoticed.The The kNN techniquegivesa slightlybetterfivefoldaverage minimumerror rate in the test set (10.8%) was obtainedafter error rate, 30.8%, than the LDA, but the result is still inferior 3300 epochs,indicatingthat mostlynoisewas learnedduring to that of the neuralnetwork.In the individualpartitions,error epochs3300 to 7500. ratesvariedbetween27.0 and40.5% (Table 2; Figure5). In the networkarchitectureused here, 216 weightsin the The fivefoldaverageerrorrateproducedby SIMCA, 28.7%, hiddenlayer(nine inputneuronsconnected to eachof 24 hidis similar to that of the kNN technique(individualratesare den neurons)and 192 weightsin the outputlayer (24 hidden between 21.6 and 35.1%; listed as SIMCA 1 in Table 2 and
Figure5). However,theanalysisindicatesthatmostof the particlesthat were correctlyassignedcouldequallywell be refer
2O
able to one or severalother classes,sincetheir SIMCA models
were sufficientlysimilar to them. If error ratesthat also take into consideration thoseparticlesthat were not unequivocally allocatedto one singleclassare computed,theyturn out to be veryhigh,between83.8 and 100.0%with a fivefoldaverageerror rate of 90.8% (reportedas SIMCA 2 in Table2 andFigure
15
5). Error
Rates in Individual
Classes
5
O
2000
4000
6000
8000
Number of trainingcycles
Figure 4. Changesin the errorrate (percentages of misclassifications)in the trainingset with increasingnumberof epochs in the first out of ten trials in training set 1. This networkhad 24 neuronsin the hidden layer, and the network error was monitoredover 30 subsequent intervalsof 300 trainingepochs each. During training, the error rate in the training set decreasedfrom 18.5% after 300 epochsto a minimumof 2.1% after 7500 epochs.The minimum error rate in the test set (10.8%) wasreachedafter 3300 epochs.
To get an ideaaboutthe success of the variousclassifiersin correctlypredictingparticlesfrom each of the individualash zones,error rateswere also computedfor each of the eight classes(Table 3). For the BP network,the configurationwith 24 neuronswas used.As expected,the BP networkgivesthe overall best result. Even if error rates for some zones are lower
for classifiersotherthan the BP network,for example,kNN givesa lowerrate for basalticparticlesin zoneC and SIMCA error rate is lower for rhyolithicgrainsin zoneD, the BP network by far produces the overallmostbalancedsolution. Although the BP network showedthe lowest error rate (0.4%) for rhyolithicparticlesfrom zoneA, particlesreferable to zone B were generallythe easiestto predict (error rate of 3.7% for rhyolithicparticlesand 5.7% for basalticparticles).
510
MALMGRENAND NORDLUND:ARTIFICIALNEURALNETWORKS lOO
_
. I
_

90 • Test Test Test Test Test
80 70
set set set set set
1 2 3 4
5
Average
60 '"" 50 
ß.
o
_
40
•._

30 20

_
_

10

_
o
BP
LDA
kNN
SlMCA
1
SlMCA 2
Figure 5. Errorrates(percentages of misclassification in thetestsets)for eachof the five independent trainingtestsetpartitions(80% trainingsetand20% testsetmembers) andaverageerrorratesoverthefive partitionsfor a threelayer backpropagation (BP) neuralnetwork,lineardiscriminant analysis, theknearest neigbors technique (kNN), andSIMCA. Neuralnetworkresultsarebasedonten independent trialswith differentinitial conditions. Errorratesforeachtestsetarerepresented bytheaverage of theminimumerrorratesobtainedduringeachof the ten trials,andthe fivefoldaverageerrorratesare the averages of the minimumerrorratesfor the variouspartitions.
ZonesC and D were slightlymoredifficultto classify(error rate 19.0 and 15.0%, respectively,for basalticparticles,and 15.0and21.9%, respectively, forrhyolithicparticles).
Applicationto Core G5239 Additionalgeochemical dataonashparticlesfromcoreG5239, drilled east of Iceland,were presentedby Sj'oholmet al.
[1991]. Thesegrainswerederivedfromtwo levels,1921cm and 2325 cm. At the lower level, grainsof bothbasalticand
rhyolithicoriginwereanalyzed,whereasonly basalticgrains wereanalyzedfromthe upperlevel.Bothsampleswerecorrelatedto the samevolcanicsystemas zoneA in coreP577 by Sj'oholrn et al. [1991].
We appliedthe BP networkwith 24 neuronsin an attemptat classifyingthesegrains.We testedthe resultsof elevenrandomlychosenexperiments usedin the estimatesof fivefoldaverageerrorrates(outof 50 experiments total, 10 trialsper each of the five trainingtest partitions),andwe assigned eachof the particlesto the classreferredto by the majorityof experiments. The neuralnetworkconfirmsthe inferencemadeby Sjoholmet al. [1991] in assigningthe majorityof bothbasalticand rhyolithicgrainsto zoneA. Thusall of the rhyolithicparticles(10 out of 10) from the lower level were foundto be mostsimilarin geochemical composition to grainsof thistypefromzoneA in referencecoreP577. Similarly,73% of the basalticgrainsin the upperlayer(8 out of 11), and86% of the basalticgrainsin thelowerlayer(6 outof 7) wereassigned to zoneA.
Table 2. ErrorRatesin Eachof Five TrainingTestSetPartitions,FivefoldAverageErrorRatesin the TestSets,ano95% ConfidenceIntervalsfortheFivefoldAverageErrorRatesfortheTechniques Discussed in ThisPaper Error Rates,%
BP LDA kNN SIMCA SIMCA
1 2
Fivefold
Set I
Set 2
Set 3
Set 4
Set 5
10.0 43.2 40.5 29.7 97.3
7.6 48.6 27.0 35.1 100.0
5.7 27.0 27.0 27.0 91.9
11.4 35.1 32.4 29.7 83.8
11.4 37.8 27.0 21.6 81.1
Error Rate, % 9.2+ 3.1 38.4+ 10.2 30.8+ 7.4 28.7+ 6.1 90.8+10.2
Thefivefoldaverageerrorratesweredetermined astheaverage errorratesoverfive independent trainingandtestsetsusing80% trainingand20% testpartitions. Errorratesfor theneuralnetworks areaverages oftentrialsfor eachtrainingtest setpartitionusingdifferent initialconditions (randominitialweights andbiases). Theminimum fivefolderrorrateforthebackpropagation (BP) networkwasobtainedusing24 neuronsin the hiddenlayer.Apartfromregularerrorratesfor softindependent modelingof class analogy(SIMCA 1), thetotalerrorratesfor misclassified observations andobservations thatcouldbereferable to oneor several othergroupsarereported underSIMCA 2. LDA represents lineardiscriminant analysis, andkNN, knearest neighlx)r.
MALMGREN AND NORDLUND: ARTIFICIAL NEURAL NETWORKS
511
Table 3. AverageErrorRates(Percentages) for BasalticandRhyolithicParticles in AshZonesA ThroughD Type of Zone
Particle
N
BP
LDA
kNN
$IMCA
A
basaltic
713
13.5
15.6
40.3
24.8
96.9
A
rhyolithic
510
0.4
13.3
94.0
B
basaltic
313
5.7
30.6
18.1
11.5
88.3
B
rhyolithic
2 6
3.7
70.0
76.7
33.3
86.7
C
basaltic
1 5
19.0
C
rhyolithic
3 8
8.7
9.3
7.3
1
$IMCA
29.0
15.0
29.0
88.0
88.3
25.2
49.3
81.3
D
basaltic
1 2
15.0
50.0
50.0
37.5
75.0
D
rhyolithic
1 4
21.9
68.8
56.3
18.8
75.0
2
ASbefore,errorratesareaverageerrorratesoverfive experiments basedon 80% trainingsetmembersand20% test setmembers. N is therangeof samplesizesin theseexperiments. For abbreviations andotherdetails,pleasereferto Table 2.
Summaryand Conclusions We applieda backpropagation neuralnetworkto a problem of assigning183 volcanicashparticlesof two differentorigins, basalticand rhyolithic,to four late Quaternaryvolcanicash zonesin the NorwegianSea.Differencesin geochemical compositionof nine chemicalelementsamongthesezoneshave beensuggested to be of stratigraphic significance [Sjohotmet at., 1991]. We wishedto test whethera neural networkwas ableto distinguishthe eightclassesof ashparticlesfrom these variouszones.The predictivepowerof the networkwe applied was compared with that of threestatisticalpatternrecognition techniques (lineardiscriminant analysis,the knearestneighbor technique,andsoftindependent modelingof classanalogy).Error rateswere computedbasedon averagepercentage of misclassifications. The optimumconfiguration of the BP network, as well as the optimumnumberof trainingcycles,was establishedfromexperiments. The averageerrorrate for the BP networkwith its optimum configuration is 9.2%. On average,33.6 outof 37 particlesin a test set couldthusbe correctlyclassifiedby the BP network. Noneof the statisticalpatternrecognition techniques managed nearlyaswell astheneuralnetworkto classifythe ashparticles to their appropriatezones.The averageerrorrate is 38.4% for LDA, 30.8% for the kNN technique,and28.7% for SIMCA. If thoseparticlesthat were shownby SIMCA to alsobe referable
rameters were set as follows: error goal=0.001, learning rate=0.01,increasein learningrate=1.05,multiplierto decrease learning rate=0.7, degreeof momentum=0.95,and error ratio= 1.04. Most of thesesettingsare in accordance with the recommendations providedby Demuthand Beale [1994]. The networkconfiguration was one input layerwith nine (passive) neurons,one hiddenlayer with 24 neuronswith tansigmoid transferfunctions,andoneoutputlayerwith eightneuronswith lineartransferfunctions. In additionto momentum andadaptive learning,'•qguyenWidrow initialization"was employed.This involvesadjustingthe rangesof the randomnumbersinitially assigned to the weightsaccording to the rangesof the expected inputvalues,whichmayspeedup trainingconsiderably. In our case, this initialization procedureonly affectedthe neurons havinga sigmoidaltransferfunction.During training,adjustmentsof weightsandbiastermsweredoneoncefor everyrun throughall of the trainingset,i.e., oncefor eachepochandnot oncefor eachsample.
Appendix 2: Brief Descriptionsof PatternRecognitionTechniquesUsedfor Comparison Linear Discriminant Analysis(LDA)
Normally, discriminantanalysisamountsto establishing
linear functions,representinga planar surfacewith pone dito one or several of the SIMCA models for other classes are mensions(p=the numberof variables),that optimallydistintaken into account,the SIMCA errorrate is increasedto 90.8%. guishtwo predefinedgroupsof observations. In the caseof the Applicationof the resultsof theBP networkto geochemical ashzone samples,we assignednew observationvectorsto the dataon ashparticlesfromtwo levelsin coreG5239,drilledto predefinedgroupsthroughcomputations of Mahalanobis' genthe east of Iceland [Sjohotmet at., 1991], indicatedthat the eralizeddistancesbetweenthesevectorsand groupmeanvecmajorityof basalticandrhyolithicparticlesfromthesesamples tors[CooleyandLohnes,1971].Thesedistancemeasures were were assignableto ash zone A. This confirmsthe resultsof thenconverted to probabilitiesof referabilityto the predefined Sjohotrnet at. [1991]. groups.MalrngrenandKennett[1977]appliedthisprocedure to In conclusion,we have demonstratedthat the BP neural a taxonomic problemin recentplanktonicforaminifera. networkis considerably moresuccessful in classifyingparticles from the variousash zonesthan standardstatisticalpatternKNearest Neighbors(kNN) recognitiontechniques.Consideringthe complex nonlinear The knearestneighboris a conceptually simpletechnique natureof mostrealworlddata and consideringalsothe potential of neural networksto handle suchdata, it is reasonableto
based on the Euclidean distance between observations in mul
assumethat methodssimilarto neuralnetworksmay become veryusefulin the areaof paleoceanography.
tidimensional space[KowalskiandBender,1972].The allocation of the test set membersto the trainingset classesis dependentupon the distancesof the k shortestEuclideandistancesbetweenthesesets.In theapplication to the ashdata,we setk equalto 3, and we monitoredthe distancesfrom eachof the testsetmembersto eachof the trainingsetmembers.A test set memberis referredto that trainingset classto which the
Appendix 1' TechnicalDetailsAbout the Neural Network Used in This Study The softwareused in this studywas MATLAB's Neural Network Toolbox(MathWorksIncorporated).The variouspa
512
MALMGREN
AND NORDLUND:
ARTWICIAL
majority(two or three)of the threeclosest trainingsetmembersbelonged, asindicated bytheEuclidean distances.
NEURAL
NETWORKS
Chayes, F., Oncorrelations between variables of constant sum,J. Geophys.Res.,65, 41854193,1960.
Cooley, W.W.,andP.R.Lohnes, Multivariate DataAnalysis, 364pp., SoftIndependent Modelingof ClassAnalogy(SIMCA) SIMCA may involveany or severalof four distinctlevels [Wold,1976;WoMet al., 1984].Level1 of S]MCAis devoted
JohnWiley,New York, 1971. Demuth, H., and M. Beale, Neural Network Toolbox for Use with
MATLAB, 158pp.,MathWorks, Natick,Mass.,1994.
Emiliani,C., Pleistocene palcotemperatures, Science,168, 822825, 1970.
todeveloping mathematical rulesforeachofa number ofpreset C.M.,A pattern recognition approach tothederivation of geogroups (termedclasses in SIMCA)in thetrainingsetby fitting Griffiths, separate Rmodeprincipalcomponent modelsto eachof them.
logicalinformation fromdrillprocess monitoring, Underwater Tech
nol.,winter1984, 213, 1984.
The dimensionality of a principalcomponent solutionis deHaugen, J.E.,H.P.Sejmp, andN.B.Vogt,Chemotaxonomy of Quatertermined by a crossvalidation technique. In level2, the prenarybenthie foraminifera usingaminoacids, d. Foraminiferal Res., dictionphase,theserulesareusedto assign newobservations 19, 3851, 1989.•' Kowalski, B.R.,andC.F.Bender, TheKnearest neighbor classification to anyof thegivenclasses onthebasisof theirdegree of fit to
role(pattern recognition) applied to nuclear magnetic resonance specthe variousclassmodelsusinga distancemeasure.At both tralinterpretation,Anal. Chem.,44, 14051411,1972. levels,atypicalobservations ("outliers"),that is, observations Malmgren,B.A•andJ.P.Kennett,Biometricdifferentiation between recent with a datastructure thatdoesnotaccordwith a classmodel, Globigerina hulloides andGlobigerina falconensis in thesouthern
maybeidentified. In thisway,observations ofunknown affinity thatcannot beclassified withanytrainingsetclassmaybeinterpreted asbeingreferableto a yetunknown class.
Levels3 and4 of SIMCAaredesigned forquantitative predictions of oneor several variables froma multivariate setup through partialleastsquares (PLS)models [Wold,1982].These levels of SIMCA are not used here.
IndianOcean, J. ForaminiJ•ral Res.,7, 130148,1977.
Masters, T., Practical NeuralNetwork Recipes in C++, 490pp.,Academic,SanDiego,Calif., 1993.
Pearson, K., Mathematical contributions tothetheory of evolution: Ona formofspurious correlation whichmayarisewhenindices areusedin themeasurement oforgans, Proc.R. Soc.London,60,489498,1897. Penn,B.S.,A•J.Gordon, andR.F.Wendlandt, Usingneural networks to locate edges andlinearfeatures in satellite images, Cornput. Geosci.,
19, 15451565, 1993. Sofar,SIMCAmodeling, originally developed fordataanalysis A•,A pattern recognition approach togeophysical inversion using in the field of chemometry, hasbeenappliedto geological Raiche, neuralnets,Geophys. J. Int., 105,629648,1991. problems by, for example,Griffiths[1984],Haugenet al.
Rock, N.M.S., Numerical Geology, 427pp.,SpringerVerlag, NewYork,
[1989],and Wei[1994]. Relative
1988.
Rogers, S.J.,J.H.Fang,C.L.Karr,andD.A•Stanley, Determination of
abundance data
lithology fromwelllogsusinga neuralnetwork, Am.Assoc. Pet.Geol. Bull., 76, 731739, 1992.
Relativeabundance data,addingup to a unit valuein indiSejmp,H.P.,J. Sjoholm, H. Fumes, I. Beyer,L. Eide,E. Jansen, andJ. vidualsamples, havelongbeenknownto besubject to thesoMangerud, Quaternary tephrachronology ontheIceland Plateau, north of Iceland, J. Quat.Sci.,4, 109114,1989. calledconstantsum constraint [Pearson, 1897;Chayes, 1960]. J.,H.P.Sejmp, andH. Fumes, Quaternary volcanic ashzones In ourapplications of LDA andSIMCA,we useda logratio Sjoholm, transformation to
relieve
the
constantsum constraint
ontheIceland Plateau, southern Norwegian Sea,J. Quat.Sci.,6, 159173, 1991.
[Aitchison,1981, 1986]. Logratiotransformations imply Stone, M., Crossvalidatory choice andassessment of statistical predicmathematical operations in a socalled simplex space, constituttions,J. R. Stat.Soc.,36, 111147,1974. ing a limitedpart of the originalpdimensional Cartesian Wasserman, P.D.,NeuralComputing  Theory andPractice, 230pp., space.
Van NostrandReynold,New York, 1989.
Wei,K.Y.,Statistical pattern recognition inpaleontology using SIMCAMACUP,,/.Paleontol.,68, 689703, 1994. Acknowledgments.We appreciatecomments on an earlierversionof
thisarticlefromHansThierstein, ETH Zentrum, Z•rich,andCajoter Braak,Agricultural Mathematics Group,Wageningen, Netherlands. This researchwas supported by grant GGU 04076321 from the Swedish Natural Science Research Council to B.A•M.
Weiss, S.M.,andI. Kapouleas, Anempirical comparison of pattern recognition, neuralnets,andmachine learning classification methods, in Proceedings ofthe1lth International JointConference onArtificial Intelligence, edited byN.S.Sridharan, pp.781787, Kaufmann, Calif., 1989.
Wold,S.,Pattern recognition bymeans of disjoint principal component models, PatternRecognit., 8, 127139,1976.
References
Aitchison, J.,A newapproach to nullcorrelations of proportions, J. Int. Assoc.Math. Geol.,13, 175189,1981.
Aitchison, J.,TheStatistical Analysis of Compositional Data,416pp., ChapmanandHall, New York, 1986.
Baldwin, J.L.,D.N.Otte,andC.L.WheatIcy, Computer emulation ofhu
Wold,H., Soilmodeling: Thebasic design andsome extensions, inSystemsUnderDirectObservation, edited byH. Jrreskog andH. Wold, pp. 153,NorthHolland, New York, 1982.
Wold,S.,C. Albano, W.J.DunnIII, K. Esbensen, S.Hellberg, E.Johansson,W.Lindberg, andM. Sjrstrrm, Modelling datatables byprincipal components andPLS:Classpatterns andquantitative predictive relations,Analusis,12, 477485,1984.
man mentalprocess:Applicationof neuralnetworksimulations to
problems inwellloginterpretation, SPEdSoc.Pet.Eng.,19619,481493, 1989.
Baldwin, J.L.,R.M.Bateman, andC.L.WheatIcy, Application of neural network totheproblem ofmineral identification fromwelllogs, Log Anal., 3, 279293, 1990.
Beale, R.,andT. Jackson, NeuralComputing: AnIntroduction, 240pp., AdamHilger,Bristol,England,1990.
Caudill, M.,Neuralnetworks primer,IVIII, AI Expert, 1990.
B.A•Malmgren, Department of MarineGeology, EarthSciences Cen
tre, University of G/Steborg, S41381 Grteborg,Sweden. (email:
[email protected] gu.se)
U. Nordlund, Institute of EarthSciences, University of Uppsala, Norbyvagen 22,S75236Uppsala, Sweden. (email: ulf.nordlund•pal.uu.se) (Received October16, 1995;revised March26, 1996; accepted April 15, 1996.)