application of artificial neural network systems to grade ... - CiteSeerX

2 downloads 0 Views 6MB Size Report
Artificial Neural Networks (ANN) become increasingly popular within the ... Kapageridis I., Denby B. Ore grade estimation with modular neural network systems.
University of Nottingham School of Chemical, Environmental, and Mining Engineering

APPLICATION OF ARTIFICIAL NEURAL NETWORK SYSTEMS TO GRADE ESTIMATION FROM EXPLORATION DATA by Ioannis K. Kapageridis M.Sc.

Thesis submitted to the University of Nottingham for the Degree of Doctor of Philosophy October 1999

Abstract

Abstract Artificial Neural Networks (ANN) become increasingly popular within the resources industry. ANN technology provides solutions to problems characterised by shortage or bad quality of input data. It is a purpose of this research work to show that estimation of ore grades within a mineral deposit is one of these problems where ANNs can be applied successfully. Ore grade is one of the main variables that characterise an orebody. Almost every mining project begins with the determination of ore grade distribution in threedimensional space, a problem often reduced to modelling the spatial variability of ore grade values. At the early stages of a mining project, the distribution of ore grades has to be determined to enable the calculation of ore reserves within the deposit and to aid the planning of mining operations throughout the entire life of a mine. The estimation of ore grades/reserves is a very important and money-consuming stage in a mine project. The profitability of the project is often depending on the results of grade estimation. For the last three decades the mining industry has adopted and applied geostatistics as the main solution to problems of evaluation of mineral deposits. Geostatistics provide powerful tools for modelling most of the aspects of an ore deposit. However, geostatistics and other more conventional methods require a lot of assumptions, knowledge, skills and time to be effectively applied while their results are not always easy to justify. The work that has been undertaken in the AIMS Research Unit at the University of Nottingham aimed at assessing the suitability of ANN systems for the problem of ore grade estimation and the development of a complete ANN based

i

Abstract

system that will handle real exploration data in order to provide ore grade estimates. GEMNET II is a modular neural network system designed and developed by the Author to receive 3D exploration data from an orebody and perform ore grade estimation on a block model basis. The aims of the system are to provide a valid alternative to conventional grade estimation techniques while reducing considerably the time and knowledge required for development and application.

ii

Abstract

Affirmation The following papers have been published based on the research presented in this thesis: Kapageridis I., Denby B. Ore grade estimation with modular neural network systems – a case study. In: Panagiotou G (ed) Information technology in the minerals industry (MineIT ’97). AA Balkema, Rotterdam, 1998. Kapageridis I., Denby B. Neural network modelling of ore grade spatial variability. In: Proceedings of the International Conference for Artificial Neural Networks (ICANN 98), Vol. 1, pp 209 – 214, Springer-Verlag, Skovde, 1998. Kapageridis I., Denby B., and Hunter G. Integration of a Neural Ore Grade Estimation Tool In a 3D Resource Modeling Package. In: Proceedings of the International Joint Conference on Neural Networks (IJCNN ’99), International Neural Network Society, and The Neural Networks Council of IEEE, Washington D.C., 1999. Kapageridis I., Denby B., Schofield, D., and Hunter G. GEMNET II – A Neural Ore Grade Estimation System. In: 29th Internation Symposium on the Application of Computers and Operations Research in the Minerals Industries (APCOM ’99), Denver, Colorado. Kapageridis I, Denby B., and Hunter G. Ore Grade Estimation and Artificial Neural Networks. Mineral Wealth Journal, Jul. – Sep. 99, No. 112, The Scientific Society of the Mineral Wealth Technologists, Athens. Kapageridis I., Denby B. Ore Grade Estimation Using Artificial Neural Networks. In: 2nd Regional VULCAN Conference, Maptek/KRJA Systems, Nice, 1999.

iii

Abstract

Acknowledgements I would like to thank Professor Bryan Denby for his guidance and help through the duration of my studies at the University of Nottingham. I would also like to thank him for introducing me to the exciting world of the AIMS Research Unit.

Thanks should go to everyone at the AIMS Research Unit, people who have been there and others who still are, and who made it all so much easier. Special thanks to Dr. Damian Schofield for being such a good friend and teacher, and also for sharing his music CD collection with me.

A big thank you goes to the State Scholarships Foundation of Greece for making it all possible. Their investment in me was most appreciated.

Many thanks to everyone at the Nottingham office of Maptek/KRJA Systems for the help and support over the last year of my studies. In particular, I would like to thank Dr. Graham Hunter, David Muller, and Les Neilson for their help and advice.

Finally, I would like to thank all my friends and in particular David Newton, Marina Lisurenko, and Stefanos Gazeas for their support and for some unforgettable times in Nottingham.

iv

Contents

Contents ABSTRACT ........................................................................................................................................... I AFFIRMATION ................................................................................................................................. III ACKNOWLEDGEMENTS ................................................................................................................IV CONTENTS .......................................................................................................................................... V LIST OF FIGURES..........................................................................................................................VIII LIST OF TABLES............................................................................................................................XIII 1. INTRODUCTION ............................................................................................................................. 1 1.1 THE PROBLEM OF GRADE ESTIMATION .......................................................................................... 1 1.2 GRADE DATA FROM EXPLORATION PROGRAMS ............................................................................. 3 1.3 EXISTING METHODS FOR GRADE ESTIMATION ............................................................................... 7 1.3.1 General ................................................................................................................................... 7 1.3.2 Geometrical Methods.............................................................................................................. 7 1.3.3 Inverse Distance Method ...................................................................................................... 10 1.3.4 Geostatistics ......................................................................................................................... 12 1.3.5 Conclusions .......................................................................................................................... 15 1.4 BLOCK MODELLING & GRID MODELLING IN GRADE ESTIMATION ............................................... 16 1.5 ARTIFICIAL NEURAL NETWORKS FOR GRADE ESTIMATION .......................................................... 18 1.6 RESEARCH OBJECTIVES ................................................................................................................ 19 1.7 THESIS OVERVIEW........................................................................................................................ 20 2. ARTIFICIAL NEURAL NETWORKS THEORY....................................................................... 23 2.1 INTRODUCTION ............................................................................................................................. 23 2.1.1 Biological Background ......................................................................................................... 23 2.1.2 Statistical Background.......................................................................................................... 25 2.1.3 History .................................................................................................................................. 27 2.2 BASIC STRUCTURE – PRINCIPLES .................................................................................................. 29 2.2.1 The Artificial Neuron – the Processing Element .................................................................. 29 2.2.2 The Artificial Neural Network .............................................................................................. 31 2.3 LEARNING ALGORITHMS .............................................................................................................. 33 2.3.1 Overview............................................................................................................................... 33 2.3.2 Error Correction Learning ................................................................................................... 33 2.3.3 Memory Based Learning....................................................................................................... 35 2.3.4 Hebbian Learning................................................................................................................. 35 2.3.5 Competitive Learning ........................................................................................................... 36 2.3.6 Boltzmann Learning ............................................................................................................. 37 2.3.7 Self-Organized Learning ...................................................................................................... 39 2.3.8 Reinforcement Learning ....................................................................................................... 40 2.4 MAJOR TYPES OF ARTIFICIAL NEURAL NETWORKS...................................................................... 40 2.4.1 Feedforward Networks ......................................................................................................... 40 2.4.2 Recurrent Networks .............................................................................................................. 42 2.4.3 Self-Organizing Networks..................................................................................................... 43 2.4.4 Radial Basis Function Networks and Time Delay Neural Networks .................................... 44 2.4.5 Fuzzy Neural Networks......................................................................................................... 46 2.5 CONCLUSIONS .............................................................................................................................. 48 3. RADIAL BASIS FUNCTION NETWORKS ................................................................................ 23 3.1 INTRODUCTION ............................................................................................................................. 23 3.2 RADIAL BASIS FUNCTION NETWORKS – THEORETICAL FOUNDATIONS ........................................ 24 3.2.1 Overview............................................................................................................................... 24 3.2.2 Multivariable Interpolation .................................................................................................. 24

v

Contents

3.2.3 The Hyper-Surface Reconstruction Problem ........................................................................ 26 3.2.4 Regularisation ...................................................................................................................... 28 3.3 RADIAL BASIS FUNCTION NETWORKS .......................................................................................... 31 3.3.1 General ................................................................................................................................. 31 3.3.2 RBF Structure ....................................................................................................................... 31 3.3.3 RBF Initialisation and Learning........................................................................................... 32 3.4 FUNCTION APPROXIMATION WITH RBFNS ................................................................................... 39 3.4.1 General ................................................................................................................................. 39 3.4.2 Universal Approximation...................................................................................................... 39 3.4.3 Input Dimensionality ............................................................................................................ 40 3.4.4 Comparison of RBFNs and Multi-Layer Perceptrons .......................................................... 41 3.5 SUITABILITY OF RBFNS FOR GRADE ESTIMATION ....................................................................... 42 4. MINING APPLICATIONS OF ARTIFICIAL NEURAL NETWORKS................................... 71 4.1 OVERVIEW.................................................................................................................................... 71 4.2 ANN SYSTEMS FOR EXPLORATION AND RESOURCE ESTIMATION ................................................ 72 4.2.1 General ................................................................................................................................. 72 4.2.2 Sample Location Based Systems ........................................................................................... 73 POPULATIONS ..................................................................................................................................... 79 4.2.3 Sample Neighborhood Based Systems .................................................................................. 80 4.2.4 Conclusions .......................................................................................................................... 85 4.3 ANN SYSTEMS FOR OTHER MINING APPLICATIONS ..................................................................... 86 4.3.1 Overview............................................................................................................................... 86 4.3.2 Geophysics............................................................................................................................ 86 4.3.3 Rock Engineering ................................................................................................................. 89 4.3.4 Mineral Processing............................................................................................................... 89 4.3.5 Remote Sensing..................................................................................................................... 91 4.3.6 Process Control-Optimisation and Equipment Selection ..................................................... 93 4.4 CONCLUSIONS .............................................................................................................................. 94 5. DEVELOPMENT OF A MODULAR NEURAL NETWORK SYSTEM FOR GRADE ESTIMATION ..................................................................................................................................... 96 5.1 INTRODUCTION ............................................................................................................................. 96 5.2 FORMING THE INPUT SPACE FROM 2D SAMPLES........................................................................... 98 5.3 DEVELOPMENT OF THE NEURAL NETWORK TOPOLOGIES ........................................................... 106 5.3.1 Overview............................................................................................................................. 106 5.3.2 The Hidden Layer ............................................................................................................... 107 5.3.3 Final Weights and Output................................................................................................... 110 5.4 LEARNING FROM 2D SAMPLES .................................................................................................... 111 5.4.1 Overview............................................................................................................................. 111 5.4.2 Module 1 – Learning from Octants..................................................................................... 112 5.4.3 Module 2 – Learning from Quadrants................................................................................ 115 5.4.4 Module 3 – Learning from Sample 2D Co-ordinates ......................................................... 117 5.5 TRANSITION FROM 2D TO 3D DATA ........................................................................................... 120 5.5.1 General ............................................................................................................................... 120 5.5.2 Input Space: Adding the Third Co-ordinate ....................................................................... 121 5.5.3 Input Space: Adding the Sample Volume............................................................................ 122 5.5.4 Search Method: Expanding to Three Dimensions .............................................................. 123 5.6 COMPLETE PROTOTYPE OF THE MNNS ...................................................................................... 126 5.7 CONCLUSIONS ............................................................................................................................ 129 6. CASE STUDIES OF THE PROTOTYPE MODULAR NEURAL NETWORK SYSTEM.... 131 6.1 OVERVIEW.................................................................................................................................. 131 6.2 CASE STUDY 1 – 2D IRON ORE DEPOSIT..................................................................................... 133 6.3 CASE STUDY 2 – 2D COPPER DEPOSIT ........................................................................................ 136 6.4 CASE STUDY 3 – 3D GOLD DEPOSIT ........................................................................................... 140 6.5 CASE STUDY 4 – 3D CHROMITE DEPOSIT ................................................................................... 146 6.6 CONCLUSIONS ............................................................................................................................ 149 7. GEMNET II – AN INTEGRATED SYSTEM FOR GRADE ESTIMATION ......................... 150

vi

Contents

7.1 OVERVIEW.................................................................................................................................. 150 7.2 CORE ARCHITECTURE AND OPERATION...................................................................................... 152 7.2.1 Exploration Data Processing and Control Module ............................................................ 152 7.2.2 Module Two – Modeling Grade’s Spatial Distribution ...................................................... 159 7.2.3 Module One – Modelling Grade’s Spatial Variability........................................................ 162 7.2.4 Final Module – Providing a Single Grade Estimate........................................................... 164 7.3 VALIDATION ............................................................................................................................... 167 7.3.1 Training and Validation Errors.......................................................................................... 167 7.3.2 Reliability Indicator............................................................................................................ 168 7.3.3 Module Index ...................................................................................................................... 170 7.3.4 RBF Centres Visualisation ................................................................................................. 171 7.4 INTEGRATION ............................................................................................................................. 172 7.4.1 Neural Network Simulator.................................................................................................. 172 7.4.2 Interface with VULCAN – 3D Visualization....................................................................... 176 7.5 CONCLUSIONS ............................................................................................................................ 182 8. GEMNET II APPLICATION – CASE STUDIES...................................................................... 185 8.1 OVERVIEW.................................................................................................................................. 185 8.2 CASE STUDY 1 – COPPER/GOLD DEPOSIT 1 ................................................................................ 188 8.3 CASE STUDY 2 – COPPER/GOLD DEPOSIT 2 ................................................................................ 197 8.4 CASE STUDY 3 – COPPER/GOLD DEPOSIT 3 ................................................................................ 209 8.5 CASE STUDY 4 – COPPER/GOLD DEPOSIT 4 ................................................................................ 220 8.6 CONCLUSIONS ............................................................................................................................ 226 9. CONCLUSIONS AND FURTHER RESEARCH....................................................................... 185 9.1 CONCLUSIONS ............................................................................................................................ 185 9.2 FURTHER RESEARCH .................................................................................................................. 188 APPENDIX A – FILE STRUCTURES............................................................................................ 239 A1. SNNS NETWORK DESCRIPTION FILE ......................................................................................... 239 A2. SNNS NETWORK PATTERN FILE ............................................................................................... 241 A3. BATCHMAN NETWORK DEVELOPMENT SCRIPT ..................................................................... 242 A4. SNNS2C NETWORK C CODE EXTRACT ..................................................................................... 243 A5. VULCAN COMPOSITES FILE ..................................................................................................... 247 APPENDIX B – CASE STUDY DATA ........................................................................................... 254 B1. CASE STUDY 1 – 2D IRON ORE DEPOSIT .................................................................................... 254 B2. CASE STUDY 2 – 2D COPPER DEPOSIT ....................................................................................... 254 B3. CASE STUDY 3 – 3D GOLD DEPOSIT .......................................................................................... 246 B4. CASE STUDY 4 – 3D CHROME DEPOSIT ..................................................................................... 246 REFERENCES .................................................................................................................................. 253

vii

Contents

List of Figures Chapter 1 Figure 1.1: Drillholes from exploration programme and development, intersecting the orebody (coloured by gold assays – screenshot from VULCAN Envisage).

4

Figure 1.2: Compositing of drillhole samples using interval equal to sample length.

6

Figure 1.3: Polygonal method of ore grade estimation.

8

Figure 1.4: Triangular method of ore grade estimation.

9

Figure 1.5: Search ellipse used during selection of samples for ore grade estimation.

12

Figure 1.6: Frequency histogram (left) and variogram (right) of copper grades (percentages).

15

Figure 1.7: Grid modeling as visualised in an advanced 3D graphics environment.

17

Figure 1.8: Sections through a block model intersecting the orebody.

18

Chapter 2 Figure 2.1: Illustration of a typical neuron [13].

25

Figure 2.2: Propagation of an action potential through a neuron’s axon [13].

26

Figure 2.3: The five major models of computation as they were presented six decades ago [18].

29

Figure 2.4: Structure of the processing element [32].

30

Figure 2.5: Effect of bias on the input to the activation function (induced local field) [32].

31

Figure 2.6: Common activation functions: (a) unipolar threshold, (b) bipolar threshold, (c) unipolar sigmoid, and (d) bipolar sigmoid [33]. Figure 2.7: Basic structure of a layered ANN [32].

32 33

Figure 2.8: Structure of the feedforward artificial neural network. There can be more than one middle or hidden layers [33].

42

Figure 2.9: a) Recurrent network without self-feedback connections, b) recurrent network with self-feedback connections [32].

44

Figure 2.10: Structure of a two-dimensional Self-Organising Map [32].

45

Figure 2.11: Basic structure of the Radial Basis Function Network [33].

46

Figure 2.12: The concept of Time Delay Neural Networks for speech recognition [40].

47

Figure 2.13: An approach to FNN implementation [44].

49

viii

Contents

Chapter 3 Figure 3.1: Regularisation network [32].

58

Figure 3.2: Structure of generalised RBF network [32].

61

Figure 3.3: Illustration of input space dissection performed by the RBF and MLP networks [69]. 70

Chapter 4 Figure 4.1: ANN for ore grade/reserve estimation by Wu and Zhou [73].

77

Figure 4.2: General structure of the AMAN neural system.

80

Figure 4.3: Back-propagation network used in the NNRK hybrid system.

82

Figure 4.4: Drillhole data used for testing the performance of the NNRK system.

83

Figure 4.5: 2D approach of learning from neighbour samples arranged on a regular grid.

85

Figure 4.6: Modular network approach implemented in the GEMNet system [84].

86

Figure 4.7: Scatter diagram of GEMNet estimates on a copper deposit [84].

87

Figure 4.8: Contour maps of GEMNet reliability indicator and grade estimates of a copper deposit [84].

88

Figure 4.9: Back-propagation network used for lateral log inversion [86]. Connections between layers are not shown. Figure 4.10: Estimated grades and assays (red and blue) vs. actual (black) (89).

91 92

Chapter 5 Figure 5.1: Illustration of quadrant and octant search method (special case where only one sample is allowed per sector). Respective grid nodes are also shown.

104

Figure 5.2: Estimation results from neural network architecture developed for use with gridded data. The use of irregular data has an obvious effect in the performance of the system.

105

Figure 5.3: Neural network architectures receiving inputs from a quadrant search (left) and from an octant search (right).

106

Figure 5.4: Improvement in estimation by the introduction of the neighbour sample distance in the input vector.

108

Figure 5.5: Modular neural network architecture developed for ore grade estimation from 2D samples [113].

110

Figure 5.6: Partitioning of the original dataset into three parts each one targeted at a different module of the MNNS.

115

Figure 5.7: RBF network used as part of module 1 in MNNS. Training patterns from an octant search were used to train the network.

117

Figure 5.8: Posting of the basis function centres from the RBF network of Fig. 5.7 in the normalised input space (X-Grade, Y-Distance).

118

Figure 5.9: Graph showing the learned relationship between the network’s inputs (grade and

ix

Contents

distance of neighbour sample) and the network’s output (target grade) for the RBF network of Fig. 5.7. Figure 5.10: Example of an RBF network from Module 2.

119 120

Figure 5.11: Posting of the basis function centres from the RBF network of Fig. 5.10 in the normalised input space (X-Grade, Y-Distance).

121

Figure 5.12: Graph showing the learned relationship between the network’s inputs (grade and distance of neighbour sample) and the network’s output (target grade) for the RBF network of Fig. 5.10. Figure 5.13: Module 3 MLP network trained on sample co-ordinates.

121 122

Figure 5.14: Learned mapping between sample co-ordinates (easting and northing) and sample ore grade for MLP network of Module 3.

124

Figure 5.15: 3D version of quadrant search.

127

Figure 5.16: 3D version of octant search.

128

Figure 5.17: Simplified 3D search method used in the MNNS for sample selection.

129

Figure 5.18: Diagram showing the structure of the MNNS for 3D data (units are the neural network modules).

130

Figure 5.19: Learned weighting of outputs from module one RBF networks by the RBF of module two.

131

Figure 5.20: Learned relationships between sample co-ordinates, length (inputs) and sample grade (output) from the RBF network of module three.

133

Chapter 6 Figure 6.1: Posting of input/training samples (blue) and test samples (red) from the iron ore deposit.

138

Figure 6.2: Scatter diagram of actual vs. estimated iron ore grades.

139

Figure 6.3: Iron ore grade distributions – actual and estimated.

140

Figure 6.4: Contour maps of iron ore actual and estimated grades.

141

Figure 6.5: Posting of input/training samples (blue) and test samples (red) from the copper deposit.

142

Figure 6.6: Scatter diagram of actual vs. estimated copper grades.

143

Figure 6.7: Copper grade distributions – actual and estimated.

144

Figure 6.8: Contour maps of copper actual and estimated grades.

145

Figure 6.9: 3D view of the orebody and drillhole samples used in the 3D gold deposit study.

147

Figure 6.10: Scatter diagram of actual vs. estimated gold grades.

148

Figure 6.11: Gold grade distributions – actual and estimated.

149

Figure 6.12: Gold grades distribution of the complete dataset.

150

Figure 6.13: Drillholes from a 3D chromite deposit.

151

Figure 6.14: Scatter diagram of actual vs. estimated chromite grades.

153

Figure 6.15: Chromite grade distributions – actual and estimated.

153

x

Contents

Chapter 7 Figure 7.1: Simplified block diagram showing the operational steps of the data processing and control module in GEMNET II. Figure 7.2: Normalisation information panel.

159 160

Figure 7.3: Interaction between GEMNET II and other parts of the integrated system during operation of the data processing and control module.

165

Figure 7.4: RBF centres from second module located in 3D space. Drillholes and modelled orebody are also shown.

168

Figure 7.5: RBF centres of west sector RBF network and respective training samples in the input pattern hyperspace (X-Grade, Y-Distance, Z-Length).

170

Figure 7.6: Final module’s RBF network.

172

Figure 7.7: Block model coloured by the reliability indicator in GEMNET II.

176

Figure 7.8: Block model coloured by module index in GEMNET II. Cyan blocks represent first module estimates while red blocks represent second module estimates.

177

Figure 7.9: First module RBF centres visualisation in GEMNET II. Drillholes and orebody model are also shown.

178

Figure 7.10: Diagram of the main components of SNNS.

179

Figure 7.11: Modules and extensions of VULCAN.

185

Figure 7.12: Menu structure of GEMNET II in Envisage.

186

Figure 7.13: GEMNET II panels in Envisage.

187

Figure 7.14: Console window with messages from GEMNET II operation.

188

Figure 7.15: GEMNET II online help.

189

Chapter 8 Figure 8.1: Orebody and drillholes from copper/gold deposit 1.

195

Figure 8.2: Scatter diagram of actual vs. estimated copper grades from copper/gold deposit 1.

196

Figure 8.3: Copper grade distributions from copper/gold deposit 1.

197

Figure 8.4: Scatter diagram of actual vs. estimated gold grades from copper/gold deposit 1.

198

Figure 8.5: Gold grade distributions from copper/gold deposit 1.

199

Figure 8.6: Plan section (top) and cross section (bottom) of block model coloured by reliability indicator values for the gold grade estimation of copper/gold deposit 1.

200

Figure 8.7: Plan section (top) and cross section (bottom) of block model coloured by module index for gold and copper grade estimation of copper/gold deposit 1.

201

Figure 8.8: RBF centers locations and training patterns from module 1 networks, north (top) and east (bottom).

202

Figure 8.9: Plan section (top) and cross section (bottom) of block model coloured by gold grade estimates for the copper/gold deposit 1. Figure 8.10: Orebodies and drillholes from copper/gold deposit 2.

203 205

xi

Contents

Figure 8.11: Scatter diagram of actual vs. estimated gold grades from zone TQ1 of copper/gold deposit 2. Figure 8.12: Gold grade distributions from zone TQ1 of copper/gold deposit 2.

208 208

Figure 8.13: Scatter diagram of actual vs. estimated gold grades from zone TQ1A of copper/ gold deposit 2. Figure 8.14: Gold grade distributions from zone TQ1A of copper/gold deposit 2.

209 209

Figure 8.15: Scatter diagram of actual vs. estimated gold grades from zone TQ2 of copper/gold deposit 2. Figure 8.16: Gold grade distributions from zone TQ2 of copper/gold deposit 2.

210 210

Figure 8.17: Scatter diagram of actual vs. estimated gold grades from zone TQ3 of copper/gold deposit 2. Figure 8.18: Gold grade distributions from zone TQ3 of copper/gold deposit 2.

211 211

Figure 8.19: Plan section (top) and cross section (bottom) of block model coloured by reliability indicator values for the gold grade estimation of copper/gold deposit 2.

213

Figure 8.20: Plan section (top) and cross section (bottom) of block model coloured by module index for gold and copper grade estimation of copper/gold deposit 2.

214

Figure 8.21: RBF centers locations and training patterns from module 1 network north (top) and module 2 network (bottom) in copper/gold deposit 2.

215

Figure 8.22: Plan section (top) and cross section (bottom) of block model coloured by gold grade estimates for the copper/gold deposit 2.

216

xii

Contents

List of Tables Chapter 4 Table 4.1: Comparison of NNRK, ANN, and kriging estimates.

83

Chapter 5 Table 5.1: Learning strategy for Module 3 MLP network.

123

Chapter 6 Table 6.1: Characteristics of datasets from the MNNS case studies.

137

Table 6.2: Mean absolute errors from case study 1.

139

Table 6.3: Mean absolute errors from case study 2.

143

Table 6.4: Actual and estimated average gold grades.

147

Table 6.5: Mean absolute errors from case study 3.

148

Table 6.6: Actual and estimated average chromite grades.

152

Table 6.7: Mean absolute errors from case study 4.

152

Chapter 7 Table 7.1: System variables available in BATCHMAN.

181

Chapter 8 Table 8.1: Main characteristics of the four deposits used for testing the final GEMNET II architecture.

193

Table 8.2: Statistics of data from copper/gold deposit 1.

196

Table 8.3: Actual and estimated average copper and gold grades from copper/gold deposit 1.

199

Table 8.4: Samples and block model file information and training pattern generation results for copper/gold deposit 2. Table 8.5: Statistics from copper/gold deposit 2 and estimation performance results.

206 207

xiii

Introduction

1. Introduction 1.1 The Problem of Grade Estimation Grade estimation is one of the most complicated aspects in mining. It also happens to be one of the most important. The complexity of grade estimation originates from scientific uncertainty, common to similar engineering problems, and the necessity for human intervention. The combination of scientific uncertainty and human judgement is common to all grade estimation procedures regardless of the chosen methodology. In statistical terms, grade estimation is a problem of prediction. Geo-scientists are given a set of samples from which they need to construct a quantitative model of an orebody’s grade by interpolating and extrapolating between these samples. Geoscientists can be people coming from very different fields like geology, mathematics and statistics. The quantitative model they will construct, ideally takes into consideration the qualitative model of the orebody built by the geologists interpreting the exploration data. The amount of data available for support of the grade estimation process is usually very small compared with the amount of information that has to be extracted from them. This data also occupies a very small volume in 3D space compared to the volume of the orebody that undergoes grade estimation. The quality of this data is dependant on a number of processes that involve human interaction and allow for the introduction of measurement errors at the early stages of sampling, analysing and logging. It should be noticed also that exploration data is usually very expensive. There are various methods developed for performing grade estimation. Generally, these methods can be classified in three categories: geometrical, distance based and geostatistical. There are certain assumptions inherent to each of these methods, while most of them depend on human judgement and allow for the

1

Introduction

introduction of human errors. These assumptions mainly regard the spatial distribution characteristics of grade, such as the continuity in different directions in space. It would be an understatement to say that a great percentage of the people who apply these methods do not understand or take into consideration these assumptions. Especially in the case of geostatistics, due to the built-in complexity of the methodology, people tend to overlook the significance of these assumptions or underestimate the negative effects that any misjudgements might have. As a result, mining projects often begin with ‘great expectations’ that may never become reality. Over- or underestimation of grades is only one of the many unforgiving results from a wrong choice and application of grade estimation methods. In recent years, many researchers in the field of grade/reserves estimation have noticed these problems and tried to suggest possible alternatives. Some of them have tried to prove that the assumptions inherent in geostatistics cannot be valid most of the times and therefore other methods should be considered. However, these discussions commonly concentrate more in disapproving geostatistics and other established methodologies rather than progress towards a new and valid method. It seems to be a common belief that the geostatistical methodology has created a special league of people who understand the underlying mechanisms and theory. Unfortunately these people are the minority of the scientists and engineers who are asked to provide with grade estimates based on which large amounts of investment money will be spent. In most of the cases people misuse geostatistics or completely avoid them even though they could benefit from their use. Many geologists build their own picture of the orebody in their minds using their experience and even their instincts. They ‘develop’ their own methods of estimation by adjusting less advanced methods to the exploration data at the early stages of a mining project. What is even

2

Introduction

more unfortunate is that they continue to build confidence on those early models of the orebody, something that inevitably leads them to the difficult point of not being able to fit new data coming from the mine to their model. There are too many examples of successful application of geostatistics and other existing methods for one to completely disregard them. Specifically in the case of geostatistics, this success cannot be credited to luck because as will be discussed later, it is a very painful and time consuming process that leaves no space for mistakes or misjudgements. Therefore, careful choice of a method and careful application of this method to exploration data can produce reliable results. As already discussed though, the current methods for grade estimation and particularly geostatistics require a large amount of knowledge and skills to be effectively applied. They can be very time consuming and difficult to explain to people who make investment decisions. Finally, their results depend on the skills and experience of the modeller, and the quality of the exploration data. They can also be prone to errors when handling data, which does not follow the necessary assumptions. In the next paragraph a brief discussion is given on the exploration data used during grade estimation in order to explain the potential problems they can cause in this process.

1.2 Grade Data from Exploration Programs Drilling is the most common way to enter the 3D space under the ground surface to extract samples from the underlying rock. Other methods exist such as the construction of shafts and tunnels. The geologist, based on the samples obtained, will conclude as to the presence of a mineralised body. Economics usually dictate the maximum number of drillholes even though this is also controlled by the complexity of the geological environment. There are many types of drilling equipment. The layout of a drilling programme does

3

Introduction

not follow specific rules. Figure 1.1 shows a set of drillholes from a copper/gold orebody. Hole spacing and size depend solely on the characteristics of the orebody. This is a major source of complication when it comes to developing a grade estimation technique.

Figure 1.1: Drillholes from exploration programme and development, intersecting the orebody (coloured by gold assays – screenshot from VULCAN Envisage).

Once the samples are obtained and logged, the mineralised parts are prepared for assay. Computers are extensively used during this process for logging and storage of the samples. The outcome of the exploration programme and post-processing is a series of files containing records of drillhole samples. There are usually three files describing the contents and the position of the samples in 3D space. These files are:



Collar table file: this file contains the co-ordinates of the drillhole collars and the overall geometry of the drillhole.

4

Introduction



Survey table file: this file provides all the necessary information to derive the co-ordinates of individual samples in space. The combination of the survey and collar tables is necessary in order to visualise drillholes correctly using 3D computer graphics and enable the development of a drillhole database.



Assay table file: the results of the assay analysis are stored as per sample in this file. When combined with the previous two files, this leads to the completion of the drillhole database. This database is the source of input data for the process of grade estimation.

Following the development of a drillhole database is the compositing of drillholes into intervals. These intervals refer to drillhole length and can be fixed or they can be derived from the sample lengths. In the first case, if the interval is greater than the length of the samples, then more than one samples are used to provide the assay value for that interval. Figure 1.2 illustrates the process of compositing. Compositing is usually a length-weighted average except in the case of extremely variable density where compositing must be weighted by length times density [71]. In the case of the intervals being derived directly from the sample lengths, the number of composites equals the number of samples in the database and the compositing procedure is reduced to a reconstruction of the database into a single file containing all the information. This type of compositing will be used throughout this thesis in order to provide the input data files for the various case studies.

5

Introduction

Figure 1.2: Compositing of drillhole samples using interval equal to sample length.

A typical composites file starts with a header describing the structure of the file and the format used for reporting the values of the various parameters. After the header follows the main part of the file consisting of the sample records. Records typically contain the following parameters:

Sample id, top xyz, bottom xyz, middle xyz, length, from, to, geocode, assay values

The top, bottom, and middle co-ordinates are derived from the survey and collar tables as explained above. The from and to fields refer to the distance from the drillhole collar to the beginning and end of the sample respectively. There can be a number of codes describing geology, lithology, etc. These parameters allow the interaction between the qualitative model of the orebody, built by the geologist, and the quantitative model, which will be developed after grade estimation. Finally, there

6

Introduction

can be more than one variable values reported for every composite, e.g. gold and copper grade. The irregularities of the drilling scheme, the limited amount of drillholes, which are economically feasible, and the complex procedures necessary for the analysis of the obtained samples account for many of the problems during grade estimation. Additionally, the grades themselves will often present behaviour, which is very difficult to model using the information available from an exploration programme. People responsible for the exploration programme are always facing the questions of how much would it help to add an extra drillhole to the samples database, whether the cost of the extra drillhole is justifiable by the derived benefits and naturally, where to drill in the given area.

1.3 Existing Methods for Grade Estimation 1.3.1 General In the following paragraphs, several of the most common existing methods for grade estimation will be discussed briefly. Attention will be given to their specific areas of application. Every method presents special characteristics that make it more applicable to certain types of deposits. There is no such thing as a universally applicable method for grade estimation. The selection of a method for a particular deposit depends on the geological and engineering attributes of the latter.

1.3.2 Geometrical Methods Before computers dominated the field of grade estimation, the geometrical methods were the most often employed [81] and they are still used for quick evaluation of reserves. These methods include the polygonal (Fig. 1.3), triangular (Fig. 1.4) and the method of sections.

7

Introduction

Figure 1.3: Polygonal method of grade estimation.

The polygonal method is very often used with drillhole data. It can be applied on plans, cross sections, and longitudinal sections. The average grade of the sample inside the polygon is assigned to the entire polygon and provides the grade estimate for the area of the polygon. The thickness of the mineralisation in the sample is also applied to the polygon to provide a volume for the reserve estimate. The assumption here is that the area of influence of any sample extends halfway to the adjacent sample points. The polygons are constructed by joining the bisectors perpendicular to the lines connecting these sample points. The polygonal method is applied to simple moderate geometry deposits with low to medium grade variability (e.g. coal, sedimentary iron, limestone, evaporites).

8

Introduction

Figure 1.4: Triangular method of grade estimation.

The triangular method is a slightly more advanced method than the polygonal. In this method the triangle area between three adjacent drillholes receives the average grade of the three samples involved. In computation terms, the triangular method is much faster since the areas are easy to calculate from the co-ordinates of the three points. This method can be applied to the same cases as the polygonal method. The last of the three geometrical methods to be mentioned in this thesis, the method of sections, is the most manual one and requires a lot of time and patience. The areas of influence of the drillhole samples expand half way to adjacent sections and to adjacent drillholes in the same section. The grades of the samples are assigned to their areas of influence. The method of sections is usually applied in deposits with very complex geometry where the other methods present.

9

Introduction

The geometrical methods suffer from problems concerning the predicted distribution of grades. Depending on the average grade of the deposit and the cutoff grade, they can lead to over- or underestimation of grades.

1.3.3 Inverse Distance Method Inverse distance weighting as well as kriging – the geostatistical interpolation tool belong to the class of moving average methods. They are both based on repetitive calculations and therefore require the use of computers. Inverse distance weighting consists of searching the database for the samples surrounding a point (or a block) and computing the weighted average of those samples’ grades. This average is calculated using the equation below:

g* = Σwigi

i = 1,2,3,…n

(1.1)

where g* is the grade estimate, gi is the i sample’s grade, wi is the weight for the sample I, and n is the number of samples. The difference between inverse distance weighting and kriging is in the way the weights w are calculated. In the case of inverse distance, the weights are calculated as an inverse power of distance as follows:

wi =

d i− power ∑ d i− power

i = 1…n

(1.2)

where wi is the weight for sample i, di is the distance between sample i and the estimated point and

weighting power

is the inverse distance weighting power. The sample

10

Introduction

selection strategy is as important as the weighting power. The following rules – guidelines can be used during sample selection [71]:



Samples should be chosen from the estimate point’s geologic domain;



The search distance should be at least equal to the distance between samples;



There should be a maximum number of samples to be selected;



Samples must be a minimum distance from the estimate point to prevent excessive extrapolation;



Trends in the grade should be accounted for by the use of a search ellipse. Modelling of the grade’s range of continuity in various directions is necessary to provide the axes of the search ellipse (Fig. 1.5). This is commonly achieved using variogram modelling (see next paragraph);



The number of samples from any drillhole should be kept up to a maximum of three. More samples leads to redundant data and can cause problems specially if kriging is used as the interpolation method;



Quadrant or octant search schemes may be used in the case of clustered data to improve the estimation results [71].

The weighting power as well as the search radius and number of samples used can affect the degree of smoothing. Unfortunately, these can only be found through trial and error in order to honour the trends in the grade or match production results or even follow the ideas of the geologist about deposit.

11

Introduction

Figure 1.5: Search ellipse used during selection of samples for grade estimation. The ellipse is divided in quadrants and a maximum number of points is selected from each one of them.

Inverse distance weighting can be applied to deposits with simple to moderate geometry and with low to high variability of grade (e.g. all the types mentioned in polygonal method, bauxite, lateritic nickel, porphyry copper, gold veins, gold placers, alluvial diamond, stockwork) [71].

1.3.4 Geostatistics The work of G. Matheron and D. Krige in the early 1960s led to the development of an ore reserve estimation methodology, which is known as geostatistics. The theory of geostatistics combines aspects from different sciences such as geology, statistics and probability theory. It is a highly complex methodology with its main purpose being the best possible estimation of ore grades within a deposit given a certain amount of information. Geostatistics as any other method will not improve on the quantity and quality of input data.

12

Introduction

Matheron’s theory of regionalised variables [58] forms the basis of geostatistical methodology. In brief, according to this theory, any mineralisation can be characterised by the spatial distribution of a certain number of measurable quantities (regionalised variables) [38]. Geostatistics follows the observation that samples within an ore deposit are spatially correlated with one another. Attention is also given to the relationship between sample variance and sample size. Every geostatistical study begins with the process of structural analysis, which is by far the most important step of this methodology. Structural analysis examines the structures of the spatial distribution of ore grades via the development of variograms. The variogram utilises all the available structural information to provide a model of the spatial correlation or continuity of ore grades. The calculation of a variogram should be based on data from similar geological domains. The variogram function is as follows:

γ (h ) =

1 (g (xi ) − g (xi + h ))2 ∑ 2n

i = 1…n

(1.3)

Where g(xi) is the grade at point xi, g(xi + h) is the grade of a point at distance h from point xi, and n is the number of sample pairs. Sample pairs are oriented in the same direction and separated by the distance h. Their volume should also be constant. This is being considered during compositing of drillholes. For the purposes of constant semi-variogram support, compositing should be performed on a constant interval. The variogram function is calculated for different values of distance h. The resulting graph is known as the experimental variogram. As shown in Figure 1.7, the variogram usually increases with increasing distance and reaches a plateau level. The distance h at which the variogram stops increasing and becomes more or less level is

13

Introduction

called the variogram range. The value of the variogram at this distance is called sill of the variogram (C+Co). Finally, the value of the variogram at distance h = 0 is called the nugget effect (Co). There is a number of different meanings given to a high nugget effect in comparison to the sill of the variogram, such as low quality samples and non-homogenic sampling zone. Most of the times it is fairly difficult to identify these three parameters from the experimental variogram graph and therefore it becomes difficult to fit one of the available models. It is a process that requires skill, experience and large amounts of time. It is also a point where mistakes are being made, undermining the entire process of grade estimation. Following the variogram modelling is the geostatistical method for grade interpolation, called kriging. Kriging is a linear estimation method, which is based on the position of the samples and the continuity of grades as shown by the variograms. The method finds the optimal weights wi for equation (1.1) by evaluating the estimation variance from the calculated variograms. Therefore kriging is not based only on distance, as is the inverse distance method.

Figure 1.7: Frequency histogram (left) and variogram (right) of copper grades (percentages).

14

Introduction

There are a number of variations of kriging each suited to different types of deposits and sampling schemes. The geostatistical methodology is very well documented and there are many good publications on this field [38,20,37,17]. There have also been developed non-linear variants of kriging such as log-normal and disjunctive kriging [21,49] which are far more advanced than linear kriging but also far more complicated. Generally, it is difficult to argue with the efficiency and reliability of a properly developed geostatistical study. However, there is always the issue of justifying the extra complexity and cost of geostatistics especially at the beginning of a mining project when there are no actual values to compare with.

1.3.5 Conclusions From the very brief discussion above, it becomes clear that there is still a need for a fast and reliable method for ore grade estimation, the results of which will depend only on the complexity and variability of the given deposit and not so much on the quality and quantity of the given data. The required method should also not depend on the skills and knowledge of the person who is applying it, while remaining easy to understand and apply. The methods developed so far and especially geostatistics suffer either from over-simplification of the ore grade estimation process, as in the case of the geometrical methods, or from over-sophistication, as in the case of geostatistics. Choosing one of the available methods is usually a compromise between speed and reliability, cost and attention to detail. This is a compromise very few mining companies are willing to make but many of them have to because of the resources available.

15

Introduction

1.4 Block Modelling & Grid Modelling in Grade Estimation Grade estimation usually involves interpolation between known samples, which become available from an exploration program or from the development of the mine. The interpolation process is based on locations commonly arranged on a regular geometric structure designed to provide for the necessary detail and cover the volume/area of interest. Block and grid model are the main structures used during grade estimation and deposit modelling. The choice depends on the type and complexity of the deposit and the value of interest [5].

Figure 1.8: Grid modeling as visualised in an advanced 3D graphics environment.

Grid models (Fig. 1.8) consist of a series of computer two-dimensional matrices. These matrices may contain estimates of different parameters such as grades, thickness, structures and other values. A grid is usually defined by its origin

16

Introduction

in space, i.e. the easting, northing, and elevation of its starting position, the distance between its nodes in both directions, and its dimension in these directions, i.e. the number of nodes. This structure dramatically reduces the amount of information necessary to represent a complete model of the deposit and has the additional advantage of allowing easy manipulation of the various parameters included by performing simple calculations between the grids. Grid modeling is best suited for deposits with two of their dimensions being significantly greater than the third. Block models are far more complex structures being three-dimensional and allowing the storage of more than one parameter. Figure 1.9 shows two sections through a block model. The volume including the deposit of interest is divided into blocks with specific volume associated with them. Their centroid’s relative X, Y, and Z co-ordinates as to the origin of the model define these blocks. Their dimensions can vary from one to the other – usually decreasing close to geologic structures and other features that require more detail. There can be more than one variables associated with every block, some estimated and others derived. grade estimation on a block model basis means the extension of point samples to block estimates with volume.

17

Introduction

Figure 1.9: Sections through a block model intersecting the orebody. A surface topography model has limited the block model.

Block models allow the modeling of deposits with very complex geometry. They do require though excessive computational power and they tend to be more demanding as the number of variables stored increases. They are also more difficult to visualise as they are three-dimensional and they can only be effectively plotted in sections.

1.5 Artificial Neural Networks for Grade Estimation Artificial neural networks (ANNs) are the result of decades of research for a biologically motivated computing paradigm. There are many different opinions as to their definition and applicability to technological problems. It is a common belief though, that ANNs present an alternative to the concept of programmed or hard computing. The ever-emerging ANN technology brought the concept of neural computing, which finds its way more and more into real engineering problems. ANNs

18

Introduction

are parallel computing structures, which replace program development with learning [92]. There have been many cases of successful applications of ANNs to function approximation, prediction and pattern recognition problems in the past. This fact as well as special characteristics of ANNs that will be discussed in the next chapter makes them a natural choice for the problem of grade estimation. As discussed in the previous paragraphs, grade estimation is commonly reduced to a problem of function approximation. ANNs and specifically the chosen type of ANNs can provide, as this thesis will try to prove, a valid methodology for grade estimation.

1.6 Research Objectives Disregarding of the existing methodologies for grade estimation is definitely not one of the aims of this thesis. The GEMNet II system described was developed to provide a flexible but complete alternative method, which takes into consideration the theory behind deposit formation while minimising the dependence on certain assumptions. The main objectives of the development of GEMNet II can be identified as follows:



To find a suitable neural network architecture for the problem of grade estimation.



To take advantage of the function approximation properties of ANNs.



To break down the problem of grade estimation into less complex functions that can be modelled using these properties.



To integrate the developed neural network architecture in a system which will be user-friendly and flexible.



To provide means of validating the results of this system.



To minimise the knowledge required for using the system.

19

Introduction



To compare the performance of the system with existing grade estimation techniques on the basis of estimation properties, usability and time requirements.

1.7 Thesis Overview Given below is a description of the chapters included in this thesis: •

Chapter 2 - Artificial Neural Networks Theory

Gives a brief discussion on the theory behind ANNs, the main ANN architectures and their main application areas. •

Chapter 3 - Radial Basis Function Networks

Examines a special type of ANN architecture, which will form the basis of the GEMNet II system. An in-depth analysis of Radial Basis Function Networks is presented in order to provide a better understanding of their operation and their suitability to the problem of grade estimation. •

Chapter 4 – Mining Applications of Artificial Neural Networks

Discusses a number of examples of ANNs application to grade/reserves estimation. Examples of similar applications from non-mining areas are also given. Presents a number of reported uses of ANN systems to mining and shows how this technology begins to gain ground in the mining industry. •

Chapter 5 - Development of a Modular Neural Network System for Grade Estimation

Describes the development of prototype modular neural network systems for use with 2D and 3D exploration data. The transition from two to three dimensions is discussed. •

Chapter 6 - Case Studies of the Prototype Modular Neural Network System

20

Introduction

Presents a number of case studies, which were used to guide the development of the prototype system. These case studies were also used to validate the overall approach. •

Chapter 7 - GEMNET II – An Integrated Modular System for Grade Estimation

Explains the design and development of the GEMNet II system. The system architecture as well as application is analysed. The integration of the system in an advanced 3D resource-modelling environment is also discussed. •

Chapter 8 - GEMNet II Application – Case Studies

Contains several examples of the application of GEMNet II to real deposits with real sampling schemes. The case studies are presented in order of increasing complexity. Other techniques are applied to the same data in order to provide with a basis for comparison and evaluation of GEMNet II system’s performance. •

Chapter 9 - Conclusions – Further Research

Gives a discussion on the conclusions from the research described and the potential areas for further research and development.

21

Artificial Neural Networks Theory

2. Artificial Neural Networks Theory 2.1 Introduction 2.1.1 Biological Background The human brain and generally the mammalian nervous system has been the source of inspiration for decades of research for a computational model, which is based not on hard-coded programming but on learning from experience. The human brain, central to the human nervous system, is generally understood not as a single neural network but as a network of neural networks each having their own architecture, learning strategy, and objectives. The massive parallelism of the human brain and the deriving advantages of this structure always attracted the attention of scientists especially in the field of computing. Biological neural networks, regardless of their function and complexity, are composed of building blocks known as neurons (Fig. 2.1). The minimal structure of a neuron consists of four elements: dendrites, synapses, cell body, and axon. Dendrites are the transmission channels for information coming into the neuron. The signals, which propagate through the dendrites, originate from the synapses, which form the input contact points with other neurons. Synapses are also centres of information storage in biological neural networks. There are however other storage mechanisms inside the biological neurons, which are still not very well understood and extend outside the four-element neuron model described here. The axon is responsible for transmitting the output of the neuron. There is only one axon per neuron, but axons can have more than one branches the tips of which form synapses upon other neurons [3]. The cell body of the neuron is where most of the processing takes place. The cell body also provides the necessary chemicals and energy for its operation.

23

Artificial Neural Networks Theory

Axon Cell body

Synapses

Dendrites

Figure 2.1: Illustration of a typical neuron [100].

Transmission of information within biological neural networks is achieved by means of ions, semi permeable membranes and action potentials as opposed to simple electronic transport in metallic cables [87]. Neural signals produced at the neuron travel through the axon in the form of ions, which in the case of neurons are called neurotransmitters. The neuron is constantly trying to keep a balanced electrical system by transporting excess positive ions out of the cell while holding negative ions inside. These movements of ions through the neuron are known as depolarisation waves or action potentials (Fig 2.2). The information transmitted between neurons is processed using a number of electrical and chemical processes. The synapses play a leading role in the regulation of these processes. Synapses direct the transmission of information and control the flow of neurotransmitters. The cell body integrates incoming signals and when these reach a certain level the activation threshold is reached and the neuron generates an action potential, which propagates through the neuron’s axon. Synapses, as already mentioned, are the centres of information storage. The synapses store information by modifying the permeability of the cell to different kinds of neurotransmitters and therefore altering their effect to the neuron’s

24

Artificial Neural Networks Theory

activation. This information needs to be refreshed periodically in order to maintain the optimal behaviour of the neuron. This form of information storage is also known as synaptic efficiency, which represents the ability of a particular synapse to evoke the depolarisation of the cell body.

Figure 2.2: Propagation of an action potential through a neuron’s axon [100].

All the above knowledge of the way neurons transmit, store, and process information is far from being complete and therefore any derived artificial model cannot be considered to be anywhere close to being as complex as its biological counterpart both in the level of neurons and neural networks. ANNs follow the simple four-element model of the biological neuron in the definition of their building block, the artificial neuron or processing element.

2.1.2 Statistical Background The study of the human brain and other biological nervous structures is not the only source of inspiration and formalisation for the development of artificial neural network

models.

ANNs

are

commonly

treated

as

fine-grained

parallel

implementations of non-linear static or dynamic systems [31]. The biological structures when simplified to an artificial model become a system that can be best described by a traditional mathematical or statistical model such as non-parametric pattern classifiers, clustering algorithms, non-linear filters, and statistical regression 25

Artificial Neural Networks Theory

models rather than a true biological model. These statistical models are either parametric with a small number of parameters, or non-parametric and completely flexible. Artificial neural network methods cover the area in between with models of large but not unlimited flexibility given by a large number of parameters as required in large-scale practical problems [82]. The behaviour and dynamics of the structure of artificial networks can be shown to implement the operation of classical mathematical estimators and optimal discriminators [47]. It is generally accepted that the earlier models of artificial neurons and neural networks in the 1940s and ‘50s tried to imitate as close as possible the biological model while more recent models have been elaborated for new generations of information-processing devices. In most cases of ANNs it is almost impossible to get any agreement between their behaviour and experimental neurophysiological measurements. This results from the over-simplification of the biological nervous systems, which is dictated by the incomplete understanding of the numerous chemical and electrical processes involved. Understanding the operation properties of ANNs can be approached by a number of different methods. Statistical mechanics is a very important tool for analysing the learning ability of a neural network. Statistical mechanics provide a description of the collective properties of complex systems consisting of many interacting elements on the basis of the individual behaviour and mutual interaction of these elements [118]. Within this approach, ANNs are defined as ensembles of neurons with certain activity, which interact through synaptic couplings. Both the activities and synaptic couplings are assumed to evolve dynamically.

26

Artificial Neural Networks Theory

In the following paragraphs, a discussion on various aspects of ANNs will be given which will show to a greater extent the strong connection between statistics and neural computing.

2.1.3 History Almost every introduction to ANNs begins with a brief presentation of the historical development of ANNs and neural computation in general. There are many good reasons for discussing the history of ANNs. The brief discussion in this paragraph will show how this multi-science field of computing evolved through time. This historical analysis will help to assess the growth and potential of ANNs as an approach to the problem of computing. ANNs are the realisation of one of the first formal definitions of computability, namely the biological model. In the 1930s and ‘40s there were at least five alternative models of computation (Figure 2.3) [86]: 1. mathematical model 2. logic-operational model (Turing machines) 3. computer model 4. cellular automata 5. biological model (Neural Networks)

27

Artificial Neural Networks Theory

Figure 2.3: The five major models of computation as they were presented six decades ago [86].

The computer model of von Neuman became the most popular and widespread used one, but this did not mean the dismissal of the other approaches. In fact John von Neuman himself has participated in the development of other models like the first ANNs [69]. In 1943 Warren McCulloch and Walter Pitts introduced the first models of artificial neurons [60]. Donald Hebb in his book entitled The Organisation of Behaviour [33] tried to build a qualitative explanation of experimental results from psychology using a specific learning law for the synapses of neurons that he proposed. The first hardware implementations of ANNs included the Snark by Marvin Minsky [64], the Mark I Perceptron by Frank Rosenblatt and others [88], the ADALINE by Bernard Widrow [109], and the Learnmatrix by Karl Steinbuch [98]. After a quiet period in the 1950s and early 1960s, the field of neural computing became once again the centre of research activity. Researchers such as Teuvo

28

Artificial Neural Networks Theory

Kohonen [46], James Anderson [2], Stephen Grossberg [30], and Shun-ichi Amari [1] brought back the interest in the field and by the 1980s the first neural network applications became a reality. John Hopfield [34] was also another example of an established scientist who helped to raise the worldwide awareness to the neural computing field. By the late 1980s the field was very well established through research groups in most of the major universities and research institutions around the world. David Rumelhart and James McClelland [89] are also worth mentioning for their contribution to the field through the publication of the Parallel Distributed Processing, which are considered as major references of neural computing.

2.2 Basic Structure – Principles 2.2.1 The Artificial Neuron – the Processing Element The artificial neuron or processing element (PE) is the basic unit of an ANN. It is a simplified version of the four-element model described in Paragraph 2.1. There are both software and hardware implementations of PEs. Their basic structure is illustrated in Figure 2.4.

Figure 2.4: Structure of the processing element [32].

The PE k includes a set of synapses each being identified by a weight w. Each input signal xj to the PE k is multiplied by the synaptic weight wkj. The weighted input

29

Artificial Neural Networks Theory

signals are summed by the adder of the PE (linear combiner). The outcome of the summation is passed to an activation function also known as squashing function because it squashes (i.e. limits) the amplitude range of the PE’s output to a finite value [32]. The bias bk is applied to the adder and has the effect of increasing or decreasing the output of the latter. Figure 2.5 shows the effect of the bias on the output of the linear combiner.

Figure 2.5: Effect of bias on the input to the activation function (induced local field) [32].

The following equations describe the model of the PE in mathematical terms:

m

υ k = ∑ wkj x j

(2.1)

y k = ϕ (υ k )

(2.2)

j =1

And

where x1, x2, …, xm are the input signals which are multiplied by the synaptic weights wk1, wk2, …, wkm and then added to give the linear combiner output υk. The bias bk is applied to uk to provide the input to the activation function ϕ( . ). Finally, the output

30

Artificial Neural Networks Theory

of the activation function gives the output of the neuron yk. Figure 2.6 illustrates the most common activation functions used in modern PEs.

Figure 2.6: Common activation functions: (a) unipolar threshold, (b) bipolar threshold, (c) unipolar sigmoid, and (d) bipolar sigmoid [53].

2.2.2 The Artificial Neural Network The model of the artificial neuron or processing element described above forms the basis of the artificial neural network (ANN) structure. ANNs consist of layers of interconnected PEs as shown in Fig. 2.7. This layered structure is the most common in ANNs and is usually called the fully connected feedforward or acyclic network. However, there are ANNs that do not adopt this structure as will be discussed in Section 2.4. The starting point of the ANN structure is a layer of input units that allows the entering of information into the network. The input units cannot be considered as PEs mainly because there is no processing of information taking place at them with the exception of normalisation (when required). Normalisation is the process of

31

Artificial Neural Networks Theory

equalising the signal range (commonly to a range between 0.1 and 0.9) of different inputs. Normalisation ensures that changes in the signals of different inputs have the same effect on the network’s behaviour regardless of their magnitude.

Figure 2.7: Basic structure of a layered ANN [32].

Following the input layer is one ore more internal or hidden layers. The use of the word hidden is mainly due to the fact that they are not accessible from outside the ANN. The first hidden layer is fully interconnected with the units of the input layer. In other words, all PEs of the hidden layer receive the signal from each input unit. The signals are multiplied by a weight, which is different for every connection. In the case of more than one hidden layers, there will be full interconnection between subsequent layers as in the case of the input and first hidden layer. The final part of the ANN structure is the output layer. The units of this layer are also PEs, which receive the signals from the last hidden layer and perform similar

32

Artificial Neural Networks Theory

processing to that of the hidden PEs. If normalisation is used in the input layer, then the outputs of the output PEs have to be transformed back to the range of the original data to get sensible results. This is required normally when the ANN is used for function approximation.

2.3 Learning Algorithms 2.3.1 Overview Learning from examples is the main operation of any ANN. Learning in this case means the ability of an ANN to improve its performance through an interactive process of adjusting its free parameters. The adjustment of an ANN’s free parameters is stimulated by a set of examples presented to the network during the application of a set of well-defined rules for improving its performance called a learning algorithm. There are many different learning algorithms for ANNs, each with a different way of adjusting the connection weights of PEs and different way of formalising the measurement of the ANN’s performance. They are generally grouped into supervised and unsupervised algorithms. Supervised algorithms are applied when the required ANN outputs are known in advance, while unsupervised algorithms are applied when the correct outputs are not known and need to be found. Over the next paragraphs of this section, the main learning processes and algorithms will be discussed briefly.

2.3.2 Error Correction Learning In order to explain the error correction learning algorithm, the basic structure of any ANN, the PE, will be examined. The example is based on the assumption that the PE is the only unit of the output layer of a feedforward ANN. As in any learning algorithm, adjusting the synaptic weights of the PEs is an iterative process involving a number of time steps.

33

Artificial Neural Networks Theory

The PE k is presented an input signal vector x(n) at the time step n. This signal vector is produced by the units of the previous layer - the last hidden layer in this case. The output signal yk(n) of the ANN’s only output is compared to a target output dk(n), which produces an error signal ek(n):

ek(n) = dk(n) – yk(n)

(2.3)

The production of the error signal activates a corrective mechanism – a sequence of corrective adjustments to the synaptic weights of the PE that bring the output signal closer to the target output. A cost function or index of performance is defined based on the error signal as follows [32]:

E(n) = e2k(n) / 2

(2.4)

Eventually the process of adjusting the synaptic weights of the PE reaches a stabilised weight state and learning terminates. This learning process of cost function minimisation is also known as the delta rule or Widrow-Hoff rule [99]. The adjustment Δwkj(n) of the synaptic weight wkj at time step n is given by:

Δwkj(n) = ηek(n)xi(n)

(2.5)

Where η is the learning-rate parameter. The new value of the synaptic weight at time step n+1 will be:

wkj(n+1) = wkj(n) + Δwkj(n)

(2.6)

34

Artificial Neural Networks Theory

The correct choice of the learning-rate parameter is very important for the overall performance of the ANN.

2.3.3 Memory Based Learning Memory based learning is mainly used for pattern classification purposes. Learning takes the form of past experiences stored in a memory of classified input-output examples {(xi, di)}Ni=1, where xi is the input vector, di the target output, and N the number of patterns [32]. In the case of a new vector xnew presented to the network, the algorithm will try to classify it by looking at the training data in a local neighbourhood of xnew. There is a number of different algorithms for memory based learning, which differ in the way they define two major aspects:



the local neighbourhood of the new vector xnew



the learning rule applied to training data in the local neighbourhood of xnew.

In Chapter 3 an in-depth discussion of a very important type of memory-based classifier will be given, namely the radial basis function network.

2.3.4 Hebbian Learning The oldest of the learning rules is Hebb’s postulate of learning [33]. Hebb, in his book The Organisation of Behaviour made the following statement as the basis for associative learning:

When an axon of cell A is near enough to excite a cell B and repeatedly or persistently takes part in firing it, some growth process or metabolic changes take

35

Artificial Neural Networks Theory

place in one or both cells such that A’s efficiency as one of the cells firing B, is increased [33, p.62].

Transferring this statement from the neurobiological context into a more algorithmic language, the following two-part rule [99]: 1. If two neurons on either side of a synapse are activated simultaneously, then the strength of that synapse is selectively increased. 2. If two neurons on either side of a synapse are activated asynchronously, then that synapse is selectively weakened or eliminated.

The second part of the rule was not included in the original Hebb’s rule but was added for consistency reasons. The mathematical formulation of Hebbian learning is given by the following equation of the synaptic weight wkj adjustment Δwkj(n):

Δwkj(n) = ηyk(n)xj(n)

(2.7)

where xj and yk are the presynaptic and postsynaptic signals at time step n, and η is the learning rate parameter. Hebbian learning is strongly supported by physiological evidence in the area of the brain called the hippocampus.

2.3.5 Competitive Learning Competitive learning is one of the major types of unsupervised learning. In competitive learning the output PEs of an ANN compete to become active when an input signal is presented. In other words the output PEs try to provide the output associated with an input vector. Competitive learning is based on three elements [90]:

36

Artificial Neural Networks Theory

1. A number of similar PEs, which can have however some randomly distributed synaptic weights causing a different response to a given set of input vectors. 2. A limited strength for each PE. 3. A competition mechanism for the PEs to gain the right to respond to a given input. The mechanism must ensure that only one output PE responds at a time – that PE is called the winner-takes-all-neuron.

The winning PE is the one with the largest induced local field υk for an input pattern x. The output of the winning PE is set to one while the outputs of all other PEs is set to zero. The adjustment of the synaptic weight wkj for the winning PE is given by the following equation:

Δwkj = η(xj – wkj)

(2.8)

while for the loosing PEs:

Δwkj = 0

(2.9)

This leads to moving the synaptic weight vector of the winning PE towards the input vector.

2.3.6 Boltzmann Learning Named in honour of Ludwig Boltzmann, the Boltzmann learning rule is a stochastic learning algorithm based on statistical mechanics [32]. An ANN designed to follow the Boltzmann learning algorithm is called a Boltzmann Machine (BM). A BM implements a stochastic response function to characterise the transitions of individual

37

Artificial Neural Networks Theory

PEs between different states. There are two possible states for BM units: on state denoted by +1 or off state denoted by –1. A BM is characterised by an energy function, E, which depends on the states of the BM units:

E (x ) = −

1 ∑∑ w ji xi x j 2 i j

(2.10)

where xj is the state of PE j, and wkj is the synaptic weight between PE j and k. There are no weights between a PE and itself (j ≠ k, or wjj = 0), in other words, none of the PEs has self-feedback. During a BM’s operation a PE is chosen at random. Its output is characterised in terms of a state transition function:

P( x j → − x j ) =

1 1+ e

− ΔE j / T

(2.11)

where ΔEj is the change in the energy function of the BM as a result of the state transition and T is the pseudotemperature. The PEs in a BM fall into two categories: visible and hidden. The visible units form the connection of the network with its environment. These units have two modes of operation: clamped and unclamped or free running. In clamped mode, the visible units are clamped onto specific states while in unclamped mode they operate freely. The hidden units always operate freely. The adjustment of the synaptic weight is defined by [45]:

Δw ji = η ( ρ +ji − ρ −ji )

(2.12)

38

Artificial Neural Networks Theory

where ρ+kj is the correlation between the states of PEs j and k in clamped mode, ρ-kj is the correlation between the states of PEs j and k in free-running mode, both range from –1 to +1.

2.3.7 Self-Organized Learning Self-organising learning is usually considered to be just another way to describe unsupervised learning. Self-organising learning is however a member of the group of unsupervised learning algorithms together with competitive and reinforcement learning. The most widely known model of self-organising networks is that of the Self-Organising Maps or Kohonen Networks proposed by Teuvo Kohonen [45] as a realisation of the ideas developed by Rosenblatt, von der Malsburg, and other researchers. A Kohonen network is an arrangement of PEs in a multi-dimensional lattice (Par. 2.4.3). This structure enables the identification of the immediate neighbourhood of every PE. Kohonen learning is based on a neighbourhood function φ(i,k) representing the strength of the coupling between PE i and k during the training process. The neighbourhood function is defined for all units i inside a neighbourhood of radius r of unit k to equal one and for all other units to equal zero. The adjustment of the weight vectors follows the rule below:

Δwi = wi + ηφ(i,k)(ξ – wi),

for i = 1, …,m

(2.13)

where m is the total number of PEs, η is a learning constant, and ξ is an input vector selected using the desired probability distribution over the input space. The learning process is repeated several times with the neighbourhood radius and the learning constant being reduced according to a schedule. The value of the neighbourhood

39

Artificial Neural Networks Theory

function also decreases so that the influence of each PE upon its neighbours is reduced. The effect of the schedule is to accelerate learning at the beginning of the learning process and produce smaller corrections towards the end. The overall result of Kohonen’s learning algorithm is that each PE learns to specialise on different regions of input space and learns to produce the highest output for an input from such a region.

2.3.8 Reinforcement Learning Reinforcement learning is another important member of the group of unsupervised learning algorithms. It is closely related to dynamic programming which is why it is sometimes referred to as neurodynamic programming. Reinforcement learning is, in essence, an input-output mapping achieved through interaction with the environment (input space) in order to minimise a scalar index of performance [7]. Unlike other learning processes, reinforcement learning aims to minimising a cost-to-go function defined as the cumulative cost of actions taken over a sequence of steps instead of the immediate cost. The function of the network is to find these actions and feed them back to the environment. Reinforcement learning is very appealing since it allows for the network to interact with its environment and develop the ability to increase its performance on the basis of the outcomes of its experience from this interaction.

2.4 Major Types of Artificial Neural Networks 2.4.1 Feedforward Networks Beyond any doubt the most popular and widely used ANN structure, the feedforward network is a hierarchical design consisting of fully interconnected layers of PEs. Generally the operation of this network is mapping an n-dimensional input to an mdimensional output, in other words modelling of a function F : ℜn → ℜm. This is

40

Artificial Neural Networks Theory

achieved by means of training on examples (x1,y1), (x2,y2), …,(xk,yk) of the mapping, where yk = f(xk).

Figure 2.8: Structure of the feedforward artificial neural network. There can be more than one middle or hidden layers [53].

The feedforward network is commonly used together with an error correction algorithm such as backpropagation, gradient descent, or conjugate gradient descent. The structure of the feedforward network, as shown in Fig. 2.8 comprises a number of layers of PEs. There are three types of layers depending on their location and function: input , hidden, and output. The connections between the layers are generally feedforward during presentation of an input signal. However, during training the network allows the backpropagation of error signals to the hidden units in order to adjust the connection weights. Feedforward networks may have more than one hidden layers. Extra hidden layers allow more complex mappings but also require more information for training of the network. The choice is usually between an excessive number of PEs in one hidden layer and a low number of PEs but in more than one hidden layer.

41

Artificial Neural Networks Theory

2.4.2 Recurrent Networks The main difference between the ANN structure described above and that of the recurrent networks is in the presence of feedback loops. A recurrent network may or may not have input and output units since the outputs of a single layer of units can be directed back to the inputs of the same units, i.e. every PE branches its output to the inputs of all other units in the layer. Figures 2.9a and 2.9b show examples of recurrent network with and without input and output units. The other difference between these two examples is in the presence of self-feedback loops . In Fig. 2.9a each PE sends its output to the input of every other PE while in Fig. 2.9b the PEs also receive their own outputs as inputs. Feedback loops are usually passed through unitdelay elements leading to a nonlinear dynamical behavior [32]. A particular example of a recurrent network is the Amari-Hopfield model or Hopfield network [35]. The Hopfield network consists of a single layer of PEs which receive an initial input vector. This input vector consists of component values which may be either 1 or –1 and are fed one per PE. The initial output from each PE is fed back to a branching node which fans out to every PE except the one from which the output signal originates. The branching connections to every PE are weighted by N-1 weights, N being the total number of PEs in the network. The weighted signals are summed and passed through a threshold activation function resulting in an updated output. Hopfield networks normally operate asynchronously, i.e. the PEs are activated one at a time and therefore a single updated output is produced at any given time, there is a random input added to the weighted signal sum, and the new updated output is held and used to update each future asynchronous activation of any PE [53].

42

Artificial Neural Networks Theory

(a)

(b)

Figure 2.9: a) Recurrent network without self-feedback connections, b) recurrent

network with self-feedback connections [32].

2.4.3 Self-Organizing Networks Self-organizing networks or self-organizing maps (SOMs) are a special class of the unsupervised ANNs group. SOMs were developed by Teuvo Kohonen [45]. The learning process applied to these networks, as was described in a previous paragraph, follows the competitive learning paradigm. SOMs construct topology-preserving mappings of the input data in a way that the location of a PE carries semantic information. The SOM can be considered as a specific type of clustering algorithm. A large number of clusters are chosen and arranged on a square or hexagonal grid in one ore two dimensions. This grid is in essence a lattice of PEs of the SOMs single computational layer. Input patterns representing similar examples are mapped to nearby nodes of the grid. Figure 2.10 illustrates the basic SOM structure.

43

Artificial Neural Networks Theory

Figure 2.10: Structure of a two-dimensional Self-Organising Map [32].

2.4.4 Radial Basis Function Networks and Time Delay Neural Networks Radial Basis Function Networks (RBFNs) and Time Delay Neural Networks (TDNNs) are two different ANN topologies with characteristics which separate them from other classes of ANNs. The RBFNs are powerful network structures which construct global approximations to functions using combinations of Radial Basis Functions (RBFs) centred around weight vectors [54]. The basic RBFN structure is shown in Fig. 2.11. A non-linear basis function is centred around each hidden node weight vector. Hidden nodes have an adaptable range of influence or receptive field. The output of the hidden nodes is a radial function of the distance between each pattern vector and each hidden node weight vector. The RBFN structure’s original motivation was in terms of functional approximation techniques, regression and regularisation, and biological pattern formation. The RBFN structure was chosen after a series of tests as the basic ANN

44

Artificial Neural Networks Theory

structure for the GEMNet II system for ore grade estimation. Chapter 4 gives a more in-depth discussion of the RBFNs and the reasons behind their choice as the building block of the GEMNet II system.

Figure 2.11: Basic structure of the Radial Basis Function Network [53].

TDNNs are based on ordinary time delays to perform temporal processing [50, 105]. The TDNN (Fig. 2.12a) is a multi-layered feedforward ANN whose PEs are replicated across time. The building block of a TDNN is a PE whose inputs are delayed in time. The activation of a PE is computed by passing the weighted sum of its inputs through an activation function like a threshold or sigmoid function. The overall behaviour of the network is modified through the introduction of delays. The M inputs of a PE are multiplied by N delay steps. Hidden PEs receive M * N delayed inputs plus M “undelayed” inputs, a total of M * (N+1) inputs. However, only the hidden PEs activated at any given time step have connections to the inputs with all the other units having the same connection pattern but shifted to a later point in time according to their delay position in time.

45

Artificial Neural Networks Theory

TDNNs are used for position independent recognition of features within larger patterns. The TDNNs are trained on time-position independent detection of subpatterns, a feature that makes them independent from error-prone pre-processing algorithms for time alignment. They are used to capture the concept of time symmetry as encountered in the recognition of phonemes using frequency-time images known as spectrograms (Fig. 2.11b).

Figure 2.11: The concept of Time Delay Neural Networks for speech recognition [50].

2.4.5 Fuzzy Neural Networks Fuzzy logic and systems can be used in conjunction with ANNs in more than one way to provide solutions for control problems, decision making, and pattern recognition. The most common way of integrating the two technologies is the fuzzy logic implementation by ANNs leading to neuro-fuzzy systems. Fuzzy systems provide means of capturing uncertainty. Uncertainty is inherent in almost every real-world problem. The essential characteristics of fuzzy logic are as follows [117]:

46

Artificial Neural Networks Theory



Exact reasoning is viewed as a limiting case.



Everything is a matter of degree.



Inference is viewed as the process of propagation of elastic constraints.



Any logical system can be fuzzified.

The integration of ANNs with fuzzy systems results to a Fuzzy Neural Network (FNN) of one of the following types [93]: •

FNN with crisp number of inputs and fuzzy weights.



FNN with fuzzy set input signals and crisp weights.



FNN with both fuzzy input signals and fuzzy weights.

The building block of an FNN is a fuzzy version of the PE described in Paragraph 2.2.1. A possible FNN structure consists of a layered net with an input layer implementing membership functions, a first hidden layer implementing fuzzy rules and combining membership functions, a second hidden layer combining fuzzy values, and an output layer providing defuzzification. Figure 2.12 illustrates an approach to FNN implementation.

Figure 2.12: An approach to FNN implementation.

47

Artificial Neural Networks Theory

2.5 Conclusions The discussion given in this chapter covered the basic concepts of artificial neural networks as well as major types of ANN learning and architecture. The potential of this technology became clear through examples of ANNs presenting special characteristics and areas of application. The ever increasing research activity in this field has also been discussed, showing that ANNs become more and more popular as tools for solving an increasing number of real-world problems. ANN technology finds its way to a number of diverse engineering and decision making problems of the mining industry, as it will be demonstrated in Chapter 4 through a set of successful examples.

48

Radial Basis Function Networks

3. Radial Basis Function Networks 3.1 Introduction In this chapter the discussion continues with an analysis of a very unique type of ANNs, the Radial Basis Function networks (RBF). RBFs were initially used for solving problems of real multivariate interpolation. Work on this subject has been extensively surveyed by Powell [79]. The theory of RBFs is one of the main fields of study in numerical analysis [96, 80]. RBFNs are very simple structures. Their design is in essence a problem of curve fitting in a high-dimensional space. Learning in RBFNs means finding the hyper-surface in multi-dimensional space that fits the training data in the best possible way. This is clearly different from most of the ANN design principles discussed in the previous chapter. Function approximation and pattern classification are the main areas of RBFNs application. One of the main advantages of RBFNs lies in their strong scientific foundation. RBFs have been motivated by statistical pattern processing theory, regression and regularisation, biological pattern formation, and mapping in the presence of noisy data [96]. Therefore, RBFNs have inherited a wide range of useful theoretical properties, which have been used to provide solutions to a much wider range of problems than the RBFs themselves. The choice of RBFNs in the development of GEMNet II was based on these theoretical properties, which will be further discussed over the next paragraphs, but also on results from experiments carried out using data from real mineral deposits. The use of RBFNs also helped achieving one of the main aims of GEMNet II, which is to provide a fast alternative to existing grade estimation techniques. In the tests carried out at the beginning of the project the speed of development of RBFNs has been unparalleled by any other architecture tested.

23

Radial Basis Function Networks

3.2 Radial Basis Function Networks – Theoretical Foundations 3.2.1 Overview The basic principles of RBFs and of the derived networks will be discussed in this section. For the purposes of this thesis, the discussion will concentrate to the theory behind the use of RBFs for interpolation problems and not for pattern classification. The transition from the original RBF methods for interpolation to RBFNs will also be analysed.

3.2.2 Multivariable Interpolation RBFs were first introduced to the problem of multivariable interpolation as an approach to dealing with irregularly positioned data points. The problem of multivariable interpolation is as follows [79]:

Given m different points {x i ; i = 1,2,..., m} in ℜ n , and m real numbers { f i ; i = 1,2,..., m} , one has to calculate a function s from ℜ n to ℜ that satisfies the interpolation conditions

s ( x i ) = f i , i = 1,2,..., m.

(3.1)

The choice of s from a linear space that depends on the positions of the data points forms the approach of RBFs. RBFs have the general form:

φ ( x − xi ), x ∈ ℜ n , i = 1,2,..., m

(3.2)

Where φ is the basis function from ℜ + to ℜ and the norm of ℜ n is Euclidean. Several interpolation methods have been considered in which s has the form:

24

Radial Basis Function Networks

m

s ( x) = ∑ λiφ ( x − x i ), x ∈ ℜ n .

(3.3)

i =1

With the condition of the matrix

(

)

Aij = φ x i − x j , i, j = 1,2,..., m,

being non-singular, the condition 3.1 defines the coefficients

(3.4)

{λi ; i = 1,2,..., m}

uniquely. The matrix A is normally called the interpolation matrix. These methods have a very useful property, proved by Micchelli [62], that, if the data points are different then, for all positive integers m and n, A is always non-singular. This theory applies to many choices of φ. However, in the case of the basis functions of the form

φ (r ) = r l , r ≥ 0

(3.5)

the theory applies only under conditions concerning the degree l, and the dimension of the input space mo. The class of RBFs covered by Micchelli’s theorem includes the following functions:

1. Multiquadratics:

φ (r ) = (r 2 + c 2 )

1

2

for some c > 0 and r ∈ ℜ

(3.6)

for some c > 0 and r ∈ ℜ

(3.7)

2. Inverse Multiquadratics:

φ (r ) =

(r

1 2

+ c2

)

1

2

25

Radial Basis Function Networks

3. Gaussian Functions: ⎛

r2 2 ⎝ 2σ

φ (r ) = exp⎜⎜ −

⎞ ⎟⎟ for some σ > 0 and r ∈ ℜ ⎠

(3.8)

4. Thin Plate Splines:

φ (r ) = r 2 ln(r ) , r ∈ ℜ

(3.9)

It should be noticed that multiquadratics and thin plate splines decrease by moving away from the centre of the basis function, while Gaussian and inverse multiquadratics increase. The thin plate splines are interpolating functions derived by variational methods [22, 61].

3.2.3 The Hyper-Surface Reconstruction Problem The interpolation technique described above suffers from a very serious problem: If the number of data points in the training sample is greater than the number of degrees of freedom of the underlying physical process, then fitting as many RBFs as the number of data points leads to over-determination of the hyper-surface reconstruction problem [11]. This is known in neural network terms as overfitting or overtraining. Allowing an RBFN to reach this stage means degradation of its generalization performance. The problem of learning the hyper-surface defining the output in terms of the input can be either well-posed or ill-posed. These terms have been in use in applied mathematics for over a century. An unknown mapping f between a domain X and an output range Y (both taken as metric spaces) is considered. Reconstructing this mapping f is said to be well-posed when the following three conditions are satisfied [101, 66, and 44]:

26

Radial Basis Function Networks

1. Existence: for every input vector x ∈ X there is an output y = f(x), where y ∈ Y . 2. Uniqueness: for any pair of input vectors x, t ∈ X, f(x) = f(t) only if x = t. 3. Continuity: also referred to as stability, continuity requires for any ε > 0 there will exist δ = δ(ε) so that if ρx (x, t) < δ then ρy(f(x),f(t)) < ε, where ρ(⋅,⋅) is the distance between two arguments in their respective spaces [32].

A problem is ill-posed when any of these conditions is not satisfied. Normally, a physical phenomenon such as an orebody deposition, is a well-posed problem. Learning from drillhole data is, however, an ill-posed problem because:



For any pair of input vectors x, t there can be f(x) = f(t) even when x ≠ t.



It is well known that drillhole and other physical samples from mineral deposits contain physical sampling errors leading to the possibility for the neural network to produce an output outside the range Y for a specified input. That means violation of the continuity criterion.

The second of the reasons has a more serious impact to solving the problem, as lack of continuity means that the computed input-output mapping does not represent the true solution. The issues of hyper-surface reconstruction with RBFs being an ill-posed problem and leading to overfitting need to be addressed. A number of methods have been developed for making an ill-posed problem into a well-posed one, as well as preventing overfitting. The most important one, regularisation, will be discussed in the following paragraph.

27

Radial Basis Function Networks

3.2.4 Regularisation Regularisation is a method developed by Tikhonov in 1963 [102] for solving illposed problems. Its use has been mostly explored in approximation theory. Regularisation aims at overcoming the lack of continuity of an ill-posed problem by means of an auxiliary nonnegative functional embedding prior information about the solution. Such information is commonly the assumption that similar inputs correspond to similar outputs. Tikhonov’s theory involves two terms:

1. Standard Error Term: denoted by E(F), represents the standard error or distance between the desired response (target output) di and the actual response yi for the training example i = 1, 2, …,N. The standard error term is defined as:

Es ( F ) =

1 N 1 N 2 ( d − y ) = [d i − F ( xi )]2 ∑ i i 2∑ 2 i =1 i =1

(3.10)

2. Regularising Term: denoted by Ec(F), provides the means for embedding geometrical information about the approximating function F(x) to the solution. This term is defined as:

Ec ( F ) =

1 DF 2

2

(3.11)

where D is a linear differential operator. It is in this operator that prior information about the form of the solution is embedded and therefore its selection depends on the problem at hand.

28

Radial Basis Function Networks

Regularisation provides a way of reducing the number of basis functions when fitting RBFs by adding a penalty term described above as the regularising term [83]. The principle of regularisation is the following:

Find the function Fλ(x) that minimises the Tikhonov functional E(F), defined by E(F) = Es(F) + λEc(F)

Where λ is a positive real number called the regularisation parameter. The choice of

λ is very crucial as it controls the balance of contribution from the sample data and the prior information. It can also be seen as an indicator of the sufficiency of the given data samples to specify the solution to the above minimisation problem. The implementation of the regularisation theory leads to the regularisation

network [77]. As shown in Fig. 3.1, it consists of three layers. The first layer consists of a number of input nodes equal to the dimension mo of the input vector x. The second or hidden layer consists of non-linear nodes connected directly to all the input nodes. The number of hidden nodes equals the number of samples N.

Figure 3.1: Regularisation network [32].

29

Radial Basis Function Networks

The activation function used in the hidden nodes is a Green’s function G(x, xi). One of the most common Green’s functions is the multivariate Gaussian function:

G ( x, xi ) = exp(−

1 2σ

x − xi ) 2

2 i

(3.12)

where xi denotes the centre of the function, σi its width or receptive field, and wj the unknown coefficients. These coefficients are defined as follows:

wi =

1

λ

[d i − F ( xi )], i = 1,2,..., N

(3.13)

The minimising solution, denoted as Fλ(x), is given by:

N

Fλ ( x) = ∑ wi G ( x, xi )

(3.14)

i =1

The solution reached by the regularisation network exists in an N-dimensional subspace of the space of smooth functions, the set of Green’s functions constituting the basis for this subspace [77]. As Poggio and Girosi point out, the regularisation network has three useful properties:

1. It is a universal approximator as it can approximate arbitrarily well any multivariate continuous function, given sufficient number of hidden nodes.

30

Radial Basis Function Networks

2. It has the best-approximation property, i.e. given an unknown non-linear function f, there always exists a choice of coefficients that approximate f better than all other choices. 3. It provides the optimal solution. In other words, the regularisation network minimises a functional that measures the solution’s deviation from its true value as represented by the training data.

3.3 Radial Basis Function Networks 3.3.1 General The structure described above as the regularisation network has a very important weakness: as the number of functions depends initially to the number of training samples, the network produced can be very expensive in computational terms. This can be easily understood by considering the computation of the network’s linear weights, which requires inversion of a very large matrix. Therefore there is a need for reducing the complexity of the network leading to an approximation of the regularised solution. This is achieved by the introduction of a simplified version of the regularisation network, the generalised radial basis function network. From this point on, it will be assumed that RBFNs are generalised RBFNs. RBFNs involve searching for a sub-optimal solution in a lower-dimensional space. This solution approximates the regularised solution discussed before.

3.3.2 RBF Structure Figure 3.2 illustrates the basic structure of the (generalised) RBFN. The first obvious difference between this network and that of Fig. 3.1 is in the number of hidden layer basis functions. In the RBFN there are m1 RBFs, typically less than the number of training samples, while in the regularisation network there were N RBFs, with N

31

Radial Basis Function Networks

equal to the number of training samples. Other structural differences include the number of weights being also reduced to m1, and the introduction of a bias applied to the output unit.

Figure 3.2: Structure of generalised RBFN [32].

Significant differences, not so obvious from the figures, concern the centre positions and receptive fields of the RBFs as well as the linear weights associated with the output layer. These are all unknown parameters and have to be learned by the RBFN during training. In the regularisation network, only the linear weights are unknown and require training. In the next paragraph, the function of the RBFN will be further analysed. Special attention is given to the way of initially positioning the RBF centres during initialisation and the RBF learning algorithms.

3.3.3 RBF Initialisation and Learning For an RBFN to be able to receive training samples and function as a hyper-surface reconstruction network, a number of its parameters need to be calculated. These parameters include:

32

Radial Basis Function Networks



The linear weights between hidden and output layer.



The bias to the output units.



The centres of the hidden layer RBFs.

There are a number of methods for RBFN initialisation and learning. The most common methods are:

1. Random Centre Selection: it is the simplest of the methods. The centres are randomly chosen from the training data set. It is a common method used when the training data represent well the problem at hand. Learning using this approach is concentrated in adjusting the linear weights between the hidden and output layer. This is achieved using the pseudoinverse method [11]. The weights are calculated using the formula below:

w = G+d

(3.15)

where d represents the target output vector in the training data set. G+ is the pseudoinverse of matrix G, defined as

G = {g i , j }

(3.16)

where gi,j is the output of RBF i when presented with input vector j. Golub and Van Loan [28] provide an in depth discussion over the computation of a pseudoinverse matrix.

33

Radial Basis Function Networks

2. Self-Organised Centre Selection: The learning method described above requires a data set representative to the problem at hand. There is no guarantee that the randomly selected centres reflect accurately the distribution of the data points. To overcome this problem, a clustering algorithm is used that creates homogeneous groups of data from the given data set. There are a number of clustering algorithms, however, in the case of RBFNs, the k-means clustering algorithm is the most commonly used [23]. Moody and Darken [65] describe the use of kmeans clustering algorithm. The number of centres k is set in advance. With the number of centres set, the algorithm proceeds with the following steps [9]:

I.

The values of the initial RBF centres tk(0) are set randomly. These values need to be different between them.

II.

A vector x is selected from the data set and passed to the algorithm. The index k(x) of the best-matching centre for the vector is calculated using the minimum-distance Euclidean criterion:

k ( x) = arg min x(n) − t k (n) , k = 1,2,..., m1 k

(3.17)

where tk(n) is the kth centre at iteration n. III.

The RBF centres are adjusted using the following rule:

⎧t k (n) + η[ x(n) − t k (n)], k = k ( x) t k (n + 1) = ⎨ ⎩t k (n), otherwise

(3.18)

34

Radial Basis Function Networks

where η is the learning-rate parameter receiving values between 0 and 1. This parameter controls the speed of learning, i.e. the degree of adjustment on the particular network parameter, in this case, the RBF centres. IV.

The iteration pointer n is increased by 1 and the algorithm loops back to step II. This process continues until the centres become stable.

The self-organised stage described above is followed by a supervised learning stage, which allows the calculation of the linear weights between the hidden and output layer. The overall approach depends largely on the initial selection of centres. Several enhancements to the initial centre selection have been introduced in order to avoid the situation where some initial centres get trapped in regions of the input space with low density of data points [14 and 15]. An advanced version of this learning method is used in the development stages of the GEMNET II system.

3. Orthogonal Least Squares: the OLS algorithm involves sequential addition of new RBFs to a network, which starts with a single basis function. Each new RBF is positioned to each data point and the linear weights are calculated for each position. The centre that gives the smallest residual error is retained. This way the number of RBFs increases step by step. The selection of a candidate data point for centre positioning is done by constructing a set of orthogonal vectors in the space S spanned by the hidden unit activation vectors for each training pattern. The data point that produces the greatest reduction in the residual error is chosen as the

35

Radial Basis Function Networks

location of the new RBF centre. It is important to stop the algorithm well before every data point is selected to ensure good generalisation.

4. Supervised Centre Selection: the basis of this method is the least-mean-square

algorithm (LMS). A supervised learning process based on the LMS algorithm sets all the free parameters of the RBFN. The LMS algorithm takes the form of a gradient descent procedure. Initially, a cost function is defined as follows:

E=

1 N 2 ∑ej 2 j =1

(3.19)

where N is the number of training samples, and ej is the error defined as:

e j = d j − F * (x j ) M

= d j − ∑ wi G ( x j − t i i =1

Ci

(3.20)

)

where Ci is the norm-weighting matrix. The method aims at minimising E by adjusting the free parameters of the network, the weights wi, the centres ti, and the receptive fields Σ i−1 . The adjustments to these three parameters are calculated below [32]:

Linear Weights Adjustment: N ∂E (n) = ∑ e j (n)G ( x j − t i (n) ∂wi (n) j =1

Ci

)

(3.21)

Centres Position Adjustment:

36

Radial Basis Function Networks

N ∂E (n) = 2 wi (n)∑ e j G ' ( x j − t i (n) ∂t i (n) j =1

Ci

)Σ i−1[ x j − t i (n)]

(3.22)

)Q (n)

(3.23)

Receptive Fields Adjustment:

(

N ∂E (n) = − w ( n ) e j (n)G ' x j − t i (n) ∑ i ∂Σ i−1 (n) j =1

Ci

ji

Q ji (n) = [ x j − t i (n)][ x j − t i (n)]T

The update rules for the three parameters, based on the three learning-rate parameters η1, η2, and η3, are given below:

Linear Weights Update Rule: wi (n + 1) = wi (n) − η1

∂E (n) ∂wi (n)

(3.24)

Centres Positions Update Rule: t i (n + 1) = t i (n) − η 2

∂E (n) ∂t i (n)

(3.25)

Receptive Fields Update Rule:

Σ i−1 (n + 1) = Σ i−1 (n) − η3

∂E (n) ∂Σ i−1 (n)

(3.26)

It should be noticed that this gradient-descent procedure for RBFNs does not involve error back-propagation.

5. Regularisation Based Learning: the final RBF learning method described is based on regularisation theory. Yee [116] provides the justification for this RBF design procedure that is based on four main elements:

37

Radial Basis Function Networks

I.

A radial-basis function, G, admissible as the kernel of a mean-square consistent Nadaraya-Watson regression estimate (NWRE) [68, 108].

II.

A common for all centres, input norm-weighting matrix, Σ −1 , with entries

Σ = diag (h1 , h2 ,..., hm0 )

(3.27)

where h1, h2, …, hmo are the bandwidths of a consistent NWRE kernel G for each dimension of the input space. These bandwidths are given as the product of the sample variance of the ith input variable estimated from the available training data and a scale factor determined using a crossvalidation procedure. III.

Regularised strict interpolation for the training of the linear weights using the following equation: w = (G + λI ) −1 d

(3.28)

where G is Green’s matrix and I is the N-by-N identity matrix. IV.

The choice of the regularisation parameter λ and the scale factors is achieved using a method such as the common cross-validation (CV). Generally, larger values of λ lead to larger noise in measuring the parameters. In a similar manner, the larger the values for a specific scale factor, the less important is the associated input dimension for the variation of the network output in relation to variations in the input. In other words, the scale factors can be used for ranking the significance of the input variables and can aid the reduction of the input space dimensionality.

38

Radial Basis Function Networks

3.4 Function Approximation with RBFNs 3.4.1 General In this section, the discussion continues with an evaluation of the function approximation capabilities of RBFNs. It will be shown that the range of RBFNs is broad enough to uniformly approximate any continuous function. The effects of the input space dimension and the amount of input data on the RBFN approximation properties will also be analysed.

3.4.2 Universal Approximation The universal approximation theorem for RBFNs, as stated by Park and Sandberg [74], opened the way for their use in function approximation problems, which were commonly approached using Multi-Layered Perceptrons. The work of Park and Sandberg [74, 73], Cybenko [19], and Poggio and Girosi [77] led to a new model for function approximation based on generalised RBFNs. Specifically, the theorem can be stated as below:

Let G:Rmo→R is an integrable bounded function such that G is continuous and

∫ G ( x)dx ≠ 0

R m0

Let ℑG denote the family of RBFNs consisting of functions F:Rmo→R represented by 1 ⎛ x − ti ⎞ F ( x) = ∑ wi G⎜ ⎟ ⎝ σ ⎠ i =1

m

where σ > 0, wi ∈ R and ti ∈ Rmo for i = 1, 2, …, m1. For any continuous input-output mapping function f(x) there is an RBFN with a set of centres {t i }i =11 and a common receptive m

field σ > 0 such that the input-output mapping function F(x) realised by the RBFN is close to f(x) in the Lp norm, p ∈ [1,∞].

39

Radial Basis Function Networks

The universal approximation theorem provides the theoretical basis for the design of RBFNs for practical applications.

3.4.3 Input Dimensionality A very critical issue in the use of RBFNs as function approximators is the dimension of the input space and its effect on the intrinsic complexity of the approximating function(s). It is generally accepted that this complexity increases exponentially in the ratio mo/s, where mo is the input dimensionality and s is a smoothness index of the number of constraints imposed on the approximating function. Therefore, for the RBFN to be able to achieve a sensible rate of convergence, the smoothness index s needs to be increased with the number of parameters in the approximating function. However, the space of approximating functions attainable with RBFNs becomes increasingly constrained as the input dimensionality is increased [32]. Increased dimensionality also has a great effect on the computational overhead caused during training of the RBFN. The dimension of the input space has a direct control over the RBFN architecture – the number of input nodes, the number of RBFs, and consequently, the number of linear weights between hidden and output layer. Therefore, any increase in the input dimensionality causes an increase in computer memory and power requirements, and an almost certain increase in development time. The most common ways of addressing the high input dimensionality for a given problem are to identify and ignore the inputs that do not contribute considerably to the output or to try to combine inputs that present a high correlation. Another way of reducing the input dimensionality, which is not always applicable though, is to try and break a complex problem into a number of low dimensionality problems that can be more effectively addressed using RBFNs.

40

Radial Basis Function Networks

3.4.4 Comparison of RBFNs and Multi-Layer Perceptrons Comparison of RBFNs with MLPs is inevitable since they are both used for similar applications and both are universal approximators. This comparison also leads to better understanding of these two ANN architectures. The differences between the two architectures are both structural (concerning the topology of the network) and functional (concerning the operation and use of the network):

Structural Differences:



RBFNs have a single hidden layer. MLPs can have more than one hidden layers.



Hidden units in RBFNs are different from the output units. MLP hidden units are similar to the output units.

Figure 3.3: Illustration of input space dissection performed by the RBF and MLP

networks [54].

Functional Differences:



RBFNs construct local approximations to non-linear input-output mappings, while MLPs construct global approximations.

41

Radial Basis Function Networks



The output layer of an RBFN is always linear, while the MLP output layer can be non-linear depending on the application.



RBF hidden units calculate the Euclidean norm between the input vector and their centre, while MLP hidden units compute the inner product of the input vector and their synaptic weight vector.



MLPs exploit the logistic non-linearity to create combinations of hyperplanes to dissect pattern space into separable regions. RBFNs dissect pattern space by modelling clusters of data directly and, therefore, are more concerned with data distributions (Fig. 3.3) [54].

3.5 Suitability of RBFNs for Grade Estimation RBFNs, as most of the ANN structures, have certain properties that establish them as a natural choice for grade estimation. However, RBFNs also have a number of additional useful properties that give them an advantage over other ANN architectures for this specific problem. The first of these properties, and possibly the most important one, is that RBFNs construct local approximations to input-output mappings. It is well known that a mineral ore deposit is a localised phenomenon. Modelling of a deposit’s grade in 3D space using drillhole data can be considered to be a problem of hypersurface reconstruction in 3D space, with this hypersurface consisting of a number of zones that need to be locally approximated. Deposits commonly present a localised behaviour; i.e. points within one area of a deposit close to each other tend to have similar grades. Clearly, this area very rarely extends to the entire deposit and, therefore, the approach of fitting RBFs in specific locations can be advantageous. These locations are found by clustering of the drillhole data in order to identify these areas of similar ore grade behaviour.

42

Radial Basis Function Networks

RBFNs provide an approach to dealing with ill-posed problems due the properties that they inherit from regularisation theory. Grade estimation is an illposed problem, even though the underlying phenomenon – the orebody deposition – is well-posed. As was shown in Par. 3.2.3, reconstructing a deposit’s grade as a hypersurface in the space derived from the drillhole data information, is an ill-posed problem, hence RBFNs should be the choice of ANN for this task. RBFNs also allow the calculation of reliability measures, such as the extrapolation measure and confidence limit. Due to the localised nature of approximation performed by RBFNs, it is possible to measure the local data density for a given point x in the input space as an index of extrapolation [52]. Confidence limits for the model prediction can also be calculated from the local confidence intervals developed for each RBF unit using a weighted average of the latter. These reliability measures were first introduced by Leonard et al. [71, 70] incorporated in a new ANN architecture that computes its own reliability, called the Validity Index network (VI). Leonard et al. used a two-stage approach based on data densities derived using Parzen windows [75], and an interpolation formula used for determining the densities at arbitrary test points. These measures are now standard to most of the commercial neural network simulators that provide RBFN development options. Further examination of the use of reliability measures will be presented in Chapter 7 with the discussion over the development of the GEMNET II system. Finally, another advantage of RBFNs over other ANN architectures that is derived from their theoretical properties, is their speed of development. In the case of low input dimensionality, RBFNs’ learning is expected to be a lot faster than in any other ANN architecture used for the same problem. The author approached ore grade estimation using an input space of maximum four dimensions (Easting, Northing,

43

Radial Basis Function Networks

Elevation, and sample Length), a number low enough for the networks to be very fast to develop. In later chapters, the suitability of RBFNs for the problem of grade estimation will be further demonstrated using experimental results on a large number of case studies.

44

Applications of Artificial Neural Networks to Mining

4. Mining Applications of Artificial Neural Networks 4.1 Overview Artificial Intelligence (AI) tools have been in use for years in a number of mining related applications. Expert and knowledge based systems, probably the most popular AI tools, have found their way into a number of computer-based systems supporting everyday mining operations as well as production of mining equipment. In recent years, AI has provided tools for optimizing operations and equipment selection, problems involving large amounts of information that humans cannot easily cope with in the process of decision-making. These AI systems together with an everincreasing number of sophisticated purpose-built computer software packages have created a very favorable environment for the introduction of yet another powerful AI tool, the Artificial Neural Networks. In the ‘90s the mining industry has been introduced to a number of ANN based systems, some of them finding their way to a fully commercialized product, as will be illustrated by some examples in this chapter. It should be noted however that these examples are very few considering the total number of applications at the research level, and the overall research effort carried out at universities and research institutes around the world. The applications described in this chapter are divided in two groups. The first group will include examples of ANN systems for Exploration and Resource Estimation. These systems have many common points with the GEMNet II system developed by the author, and more importantly share the same aims. The second group of applications includes examples considering the remaining mining problems. This grouping does not mean in anyway that Exploration and Resource Estimation is the most important of the mining tasks or that there are more ANN systems targeted

71

Applications of Artificial Neural Networks to Mining

to this field of mining. The grouping as well as the selection of the examples was purely based on the relevance of the applications to the subject of this thesis.

4.2 ANN Systems for Exploration and Resource Estimation 4.2.1 General Exploration and resource estimation commonly involves the prediction of various parameters characterizing a mineral deposit or a reservoir. The input data usually comes in the form of samples with known positions in 3D space. The majority of the ANN systems developed for these predictive tasks are based on the relationship between modelled parameters and sample locations. The most common practice when developing the training patterns set for an ANN, is to generate input-output pairs with the input being the sample location and the desired output being the value of the modeled parameter at that location. In other words, most of the ANN systems treat the modeling of the unknown parameters as a problem of function approximation in the sample co-ordinates space. Some other systems, like GEMNet II, go a step further to exploit information hidden in the relationship between neighboring samples. The estimation of a parameter at a specific location in 3D space is, in this case, depending on information from samples around that location. In fact, GEMNet II is trying to use both this and the above approach wherever possible. Most of the systems described over the next paragraph work in 2D input space (Easting, Northing). They also share the same ANN architecture, usually based on the MLP or RBF network.

72

Applications of Artificial Neural Networks to Mining

4.2.2 Sample Location Based Systems The first example is an MLP based ANN for ore grade/resource estimation developed by Wu and Zhou [112]. The network architecture, as shown in Fig. 4.1, is an MLP with four layers: an input layer, two hidden layers, and one output layer. The network receives two inputs, the Easting and Northing of samples. The two hidden layers are identical and have 28 units each. It is a relatively large network considering the dimension of the input space (2D). However, the developers have used a fast learning algorithm called the Dynamic Quick-Propagation (DQP) [113] that is based on the quick-propagation algorithm [24] and a system for the determination of the hidden layer size called Dynamic Node Creation [4]. The size of the network was, therefore, determined through a learning process and should not be a cause for consideration.

1st Hidden Layer: 7 x 4

2nd Hidden Layer: 7 x 4

Figure 4.1: ANN for ore grade/resource estimation by Wu and Zhou [112].

This ANN has been tested on assay composites from a copper deposit. A set of 51 drillhole composites has been used to train the network over an area of 3600 square

73

Applications of Artificial Neural Networks to Mining

meters. The results of the trained network have been compared with results from the polygonal method (manual and computer based), inverse distance, and kriging. These results were based on Hughes, Davis, and Davey [36]. Unfortunately, there was no comparison of the grade/resources estimates with actual values. This limitation tends to be a very common problem in most of these studies. Similar to the above network, is the ANN developed recently by Yama and Lineberry [115], which is based again on the MLP architecture but uses the original back-propagation learning algorithm. This network has one hidden layer with 50 hidden units instead of two. This difference brings back the question of network complexity, i.e. whether to use a single but large hidden layer or multiple but small layers. It seems that most of the researchers in the field choose a single hidden layer mainly because of the reduced computational overhead as well as a reduction in the required quantity for training samples. Yama and Lineberry used sulphur data from 1152 samples from a 7315 x 4572-m coal property in northern West Virginia. It should be noticed that the use of real data in similar studies is very rare. The property was divided into 25 regions (914 x 914-m) due to computer memory limitations. For every region, a network was trained using the Easting and Northing as inputs and the sulfur values as output. All the data values were normalized before they were used for training and testing of the networks. The data were normally distributed, a property that usually causes the networks to give outputs close to the mean value. Presenting the tails of the distribution more often to the network and with a higher learning rate has reduced this effect. The results obtained from the ANNs were compared with results from kriging.

74

Applications of Artificial Neural Networks to Mining

Clarici et al. [16] also described a similar approach of a single hidden layer network earlier. In that study though, only one neural network was used for the entire sampling area. Moving from 2D to 3D input space, Caiti and Parisini [13] have used RBF networks to interpolate geophysical properties of ocean sediments, e.g. porosity, density, and grain size. The choice of RBF networks was based on their strong theoretical foundation especially in function approximation. GEMNet II is based on RBF networks and, in fact, for very similar reasons to those discussed in the previous chapter and will be further analysed in the following. Caiti and Parisini used the Gaussian as the basis function of the interpolating network. They suggested, as many others, that any discontinuities of the interpolated property can be handled by a smooth, continuous approximation network provided with enough information close to the discontinuity. The choice of RBF centers has been based on the number of values on the z-axis. As they very logically identified, there are normally many samples on the z-axis and less on the x-y plane due to the sampling techniques used. In the case of large number of samples on the z-axis, the RBF centers are mobile, in other words their positions can change with learning. However, in the case of small number of samples on the z-axis, the RBF centers are fixed, i.e. their positions remain unchanged during training and the network is updated by adding extra RBFs whenever a new sample is presented. Density data from cores in an area of the Tyrrhenian Abyssal Plain, in the Mediterranean Sea have been used as input data for the training and testing of the network. Part of the data has been kept out of the training procedure and then used to test the trained network’s prediction accuracy.

75

Applications of Artificial Neural Networks to Mining

One of the very few examples of ANN system being developed to a fully commercial product, is Neural Technologies’ Prospect Explorer. It is a complete system offering data analysis, visualization, and detection of anomalies as well as analysis of the relationships between them. The system is based on a neural structure called AMAN (Advanced Modular Adaptive Network), shown in Fig. 4.2 [70]. AMAN is not a type of neural network. It is a complex system consisting of different types of networks, which are trained, in both supervised and unsupervised mode. The user has a choice of networks and learning strategies depending on the problem at hand. As shown in Fig. 4.2, AMAN can be described by the following: •

A set of hierarchically arranged networks: a problem is divided to sub-problems and a network is assigned to each one of them.



The type of the individual networks can be chosen to suit the nature of the specific sub-problem.



The controller, called ‘supervisor’, can then handle the outputs of the individual networks to form a final result for the problem.

Figure 4.2: General structure of the AMAN neural system.

76

Applications of Artificial Neural Networks to Mining

AMAN as part of the Prospect Explorer can help to automate the detection of anomalies from large quantities of survey data. Prospect Explorer provides the following functions:



Anomaly Detection: an interpolated grid forms the basis of a color map showing

areas of potential anomalies. This map can be used as a guide for further analysis. •

Cluster Identification: regions of survey data sharing common types of survey

results are identified. •

Correlation Analysis: layers of interpolated data can be correlated to illustrate

the relationship between the values of different types of data. •

Fuzzy Search: pattern-searching tool to analyze how closely regions match a

search specification supplied by the user. •

Relationship Explorer: similar to correlation analysis, but performed at specific

geographic locations.

Prospect Explorer has been used with success in a reasonably complex exploration task that took place at the Girliambone region in New South Wales, Australia. This case study involved several layers of data from a copper mine area of 110 square kilometers. The system has successfully identified the already known deposits in the area as well as some unknown. Cortez et al. [18] presented a hybrid system combining ANN technology with geostatistics for grade/resources estimation. Their system, called NNRK (‘Neural Network estimation of the drift and Residuals’ Kriging’), is based on a network with 3 inputs (the sample’s X, Y, Z co-ordinates), 6 hidden units and one output, the respective Zn assay [18]. As shown on Fig. 4.3, the chosen ANN is very simple

77

Applications of Artificial Neural Networks to Mining

compared with the larger networks described in he previous examples. This ANN is used for the identification of the underlying large-scale structure (trend modelling), while residual analysis is performed at sampled locations by stationary geostatistical methods that model local spatial correlations. Final estimates are given as a sum of both estimations. The developers have chosen the use of geostatistics to support the ANN estimations because of the results they obtained from a preliminary study showing that the back-propagation network could not follow local variations of grade. In the NNRK system, these are handled by ‘residual kriging’.

Figure 4.3: Back-propagation network used in the NNRK hybrid system.

The hybrid system has been applied to a case study from a large Portuguese zinc deposit. As shown in Fig. 4.4, the data are quite spread in 3D space, a situation very common in case studies with real data. The data came from a drilling programme of the Feitais deposit, a massive orebody belonging to the Aljustrel group of mines in South Portugal. The dataset consisting of 768 samples was split in two parts. The validation set included 160 samples, about 20% of the total. The rest was used for training the ANN, a process that involved 3000 iterations.

78

Applications of Artificial Neural Networks to Mining

Figure 4.4: Drillhole data used for testing the performance of the NNRK system [18].

The results obtained by the NNRK methodology are compared with those produced by the ANN and kriging in the following table:

Table 4.1: Comparison of NNRK, ANN, and kriging estimates.

Populations Sampled data NNRK estim. ANN estim. Krig. estim.

n

m (mean)

σ2

σ/m

238 238 238 238

3.314 3.516 3.493 3.461

3.988 2.347 0.141 1.281

0.603 0.436 0.108 0.370

The results presented show that the combination of ANN and kriging can improve considerably over the results that can possibly be obtained from each one of the methodologies individually. It should be noticed though that the back-propagation network used in this study is only capable of performing global approximations leading to smooth estimates. The number of hidden units in this network is also surprisingly low considering the dimensionality of the input space.

79

Applications of Artificial Neural Networks to Mining

4.2.3 Sample Neighborhood Based Systems All of the systems above try to reconstruct the ore grade surface from the sample coordinates. This strategy works very well when this surface is fairly continuous and there are enough samples covering the considered area. It also works better when done in 2D rather than 3D – a single network seems to be producing outputs close to the average value when faced with a very complex deposit in 3D and sometimes even in 2D. Wu and Zhou [112] created a large network (56 hidden units) to perform grade estimation on a 2D dataset of a fairly continuous copper deposit. Quite reasonably, some researchers tried to take advantage of the information hidden in the relationship between neighboring samples. This approach is followed in general terms by the most advanced existing methods for grade estimation like inverse distance weighting and kriging. Most of the examples following this approach choose as neighbours the samples closest to the estimation point and treat the problem of ore grade estimation as a mapping between the surrounding grades and the grade at the estimation point (Fig. 4.5). The samples are normally arranged on a grid, and the inputs are formed from the eight nodes surrounding the estimation point. A very good example of this technique is given by Williams [110]. The main assumption made in this example is that there is a strong correlation between gold grades and magnetic data. The technique was applied in 2D space. Naturally, building the grid of magnetic data from scattered samples by interpolation can introduce errors due to smoothing. This is a serious disadvantage of all methods that require data arranged in grids.

80

Applications of Artificial Neural Networks to Mining

Figure 4.5: 2D approach of learning from neighbour samples arranged on a regular grid.

This single network approach has an additional limitation: the use of a single network over the entire area of interest leads to the assumption that the learnt mapping between the neighbour samples and the grade can be applied globally. In other words, the method leads to a global approximation of ore grades. Going a step further, some researchers have introduced multiple networks to overcome these limitations. These modular systems consist of more than one network each responsible for learning a different area of the deposit. The GEMNet system developed by Burnett [12] is a very good example of a modular neural network system for grade/resource estimation. Figure 4.6 illustrates the principle of GEMNet’s operation. The deposit is divided into overlapping zones. The selection of zones was arbitrary, which is a point where improvement could be made. In each zone, a different network was trained and the final estimate for every point was taken as the average of the networks trained in the specific area. As zones were overlapping, there was almost always more than one network giving estimates. Having more than one estimate led to the introduction of a reliability measure based

81

Applications of Artificial Neural Networks to Mining

on the variance of the individual estimates – an indicator that can be used as a guide for the reliability of the final estimate. This indicator was also used in the GEMNet II system with minor changes (Chapter 7).

Figure 4.6: Modular network approach implemented in the GEMNet system [12].

82

Applications of Artificial Neural Networks to Mining

A similar modular approach has been introduced by Geva et al. [27] for function approximation. In both cases, the developers used multiple MLP networks acting in a very similar manner with the RBFs of a single RBF network. The results obtained by both Burnett and Geva have supported the choice of RBF networks as the building unit for the GEMNet II system. However, GEMNet II is quite different in the way these networks are used to provide a combined grade estimate, as it will be shown in Chapter 7. Actual Grade vs. GEMNet Prediction for Training Points

1.6

1.4

GEMNet Grade (% Cu)

1.2

1.0

0.8

0.6

0.4

0.2

0.0 0.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

1.6

Actual Grade (%Cu)

Figure 4.7: Scatter diagram of GEMNet estimates on a copper deposit [12].

GEMNet has been tested on simple function approximation problems, as well as simulated ore deposits. Even though most of the case studies were using 2D data, the results obtained were very encouraging and suggested that further research work should be carried out to assess the effectiveness of the modular approach. Figures 4.7 and 4.8 show the results from a 2D study with GEMNet.

83

Applications of Artificial Neural Networks to Mining

600

GEMNet Reliability Indicator

550

500

0.038

Low Reliability

0.036 450

0.034 0.032

400

0.030 0.028

Latitude (m)

350

0.026 0.024

300

0.022 0.020

250

0.018 0.016

200

0.014 0.012

150

0.010 0.008

100

0.006 0.004

50

0.002 0.000

0 0

50

100

150

200

250

300

350

400

450

500

550

600

High Reliability

- Training Data Point

Departure (m) 600

Grade (% Copper) 550

1.20 1.15 1.10 1.05 1.00 0.95 0.90 0.85 0.80 0.75 0.70 0.65 0.60 0.55 0.50 0.45 0.40 0.35 0.30 0.25 0.20 0.15 0.10 0.05 0.00

500

450

400

Latitude (m)

350

300

250

200

150

100

50

0 0

50

100

150

200

250

300

350

400

450

500

550

600

Departure (m)

Figure 4.8: Contour maps of GEMNet reliability indicator and grade estimates of a copper deposit [12].

84

Applications of Artificial Neural Networks to Mining

4.2.4 Conclusions The discussion in this section has examined some of the most important examples of neural network based systems for ore grade/resource estimation. A number of techniques have been developed that differ mostly in the number of networks used and the way data is presented to them. As with the conventional methods for ore grade estimation, it is fairly safe to say that there is no universally applicable solution to the problem. This is particularly true when the neural network system is based on a single network. These systems varied considerably in their architecture from one study to the other. The number of hidden units changed even though the dimensionality of the problem remained constant. Systems with modular structure, i.e. multiple networks, are more flexible in the way they adjust to a specific deposit. Both the sample co-ordinates and the sample neighbourhood based systems can have their advantages and disadvantages depending on the deposit at hand. One would expect the first to be better suited to continuous deposits were the grade can be considered to be a hypersurface in the sample co-ordinates input space (a simple surface in the case of 2D samples). The results obtained from the described studies prove this to a certain degree. On the other hand, complex deposits presenting a localised behaviour cannot be modelled well by systems producing global approximations unless there are large amounts of data to describe the local variations, a case that is very rare. These deposits call for more flexible structures that can construct local approximations of grade. Therefore, modular systems can be the choice for modelling complex deposits using 2D or 3D data.

85

Applications of Artificial Neural Networks to Mining

4.3 ANN Systems for Other Mining Applications 4.3.1 Overview A number of other mining related problems have been approached using ANN technology. These problems commonly relate to pattern classification, prediction and optimisation. ANNs have been successfully applied to these areas and are therefore suitable for similar mining problems. In the following paragraphs a brief description of such problems and their ANN solution is described. The applications shown range from geophysics to plant optimisation and illustrate the fact that ANN systems can be useful to a very large number of problems.

4.3.2 Geophysics Geophysics is a relatively new area for ANN systems. However, in the last few years ANNs have become a very popular tool in the interpretation of seismic and geophysical data from various sources. Garcia et al. [26] have used a MLP (Fig. 4.9) trained using back-propagation for the inversion of lateral electrode well logs. Inversion represents the process of constructing an earth model from the log data. The data used for training the network were derived from a finite difference method that simulated the lateral log. The trained network was tested using real data and the results were compared with those from an automated inversion model. The study has shown promising results and has presented the advantages of the use of ANN for the specific problem.

86

Applications of Artificial Neural Networks to Mining

51 Node Input & Output Layer – One for every 2ft interval of a 100ft log

40 Node Hidden Layer

Figure 4.9: Back-propagation network used for lateral log inversion [26]. Connections between layers are not shown.

In a similar fashion, Rogers et al. [85] used a MLP network for the prediction of lithology from well logs. Malki and Baldwin [56] compared the results produced by neural networks trained using well logs from different service companies. More specifically, networks were trained using data from one service company and tested on data from another, and the study was repeated using training data from both companies and tested on data from each one individually. The results have shown that better performance is obtained when using data from both service companies. Wanstedt et al. [107] applied neural networks to the interpretation of geophysical logs for orebody delineation. The data used for the development and testing of their approach were taken from the Zinkgruvan mine in Sweden. The network used was quite small – three layers with 3 inputs, 7 hidden units, and 1 output. The inputs were the gamma-ray, density, and susceptibility, and the output was the ore grade (Zn, Pb, or Ag). The study reports good results in estimating the grades and consequently interpreting the lithology (Fig. 4.10). Unfortunately no numerical measurement of the network’s performance is provided.

87

Applications of Artificial Neural Networks to Mining

Figure 4.10: Estimated grades and assays (red and blue) vs. actual (black) (107).

Murat et al. [67] used a MLP for the identification of the first arrival on a seismogram. Roessler [84] used NETS, a neural network simulator written at NASA/Johnson Space Center to develop a neural network for analysing wave arrivals from seismic waves transmitted from one borehole and received from another. The network was trained on a binary pixel image of the seismic trace data. The input layer consisted of a large array (97 x 41 = 3977) of input nodes, the hidden layer had 50 units, and the output layer had two units. The network was trained to produce a binary pattern in its outputs, i.e. the outputs were either 1 or 0. The different combinations of outputs were indicative of the relative position of the first arrival to the current positive lobe. Once again, no numerical measurement of the networks performance during training and testing was provided in the study. Barhen and Reister [6] developed DeepNet, a system based on the MLP that predicts well pseudo logs from seismic data across an oil field. DeepNet combines a very fast learning algorithm, systematic incorporation of uncertainties in the learning process, and a global optimisation algorithm that addresses the optimality of the

88

Applications of Artificial Neural Networks to Mining

learning process. The system has been successfully applied in the Pompano field in the Gulf of Mexico.

4.3.3 Rock Engineering King et al. [43] have developed an unsupervised neural network for the discovering of patterns in roof bolter drill data. The network successfully classified 617 drill patterns to just 9 or 16 unique features representing major geologic features of a mine roof. The patterns consisted of the penetration rate, thrust, drill speed, and torque. A system consisting of this network and an expert system was developed for the evaluation of coal mine roof supports [95]. Millar et al. [63] used self organising networks to model the complex behaviour of rock masses by classifying input variables related to the rock stability into two groups: failure or stability. Walter [106] used Kohonen networks for the classification of mine roof strata into one of 32 strength classes. The developed system can provide an estimate of strength within two seconds giving the drill operator a warning almost in real time when a potentially dangerous layer is reached.

4.3.4 Mineral Processing Neural networks have been successfully applied to a number of pattern classification problems. Particle shape and size analysis seems to be a natural field of application for ANNs and specially for unsupervised techniques. Maxwell et al. [59] developed an ANN based system for particle size analysis based on video images. The system analyses images from material on a conveyor and predicts the particle size distribution. Oja and Nyström [72] applied self-organising maps for particle shape quantification. Image analysis is performed to mineral slurry particles by use of a

89

Applications of Artificial Neural Networks to Mining

SOM which extracts the features affecting the behaviour of powders and slurries. The training data set consisted of 3000 binary images of 500 particles. The produced map size was 12 x 10. The developed SOM was tested on 360 particle images with success. The test showed that the SOM was capable of clustering differently minerals that did not have strong shape features. Deventer et al. [104] used again the SOM for on-line visualisation of flotation performance. The structure of the froth is quantified by the neighbouring grey level dependence matrix. The SOM had a map size of 20 x 20 and there were three classifications of Zn grade peaks as being positive (Class_+1), zero (Class_0), or negative (Class_-1) for each of the image features. The classification was based on a number of image features. The developed SOM was to be used as part of an automated computer vision system for the control of flotation circuits. Petersen and Lorenzen [76] applied the SOM to the modelling of gold liberation from diagnostic leaching data. The data came from seven different gold mines in South Africa. The ores from the mines were fed to mills and the ore samples were screened into three size intervals. One of the fractions was further screened into six size fractions giving a total of eight fractions. Representative samples were then fed to a ball mill, and the product was screened into the same six size fractions. On each of the fractions, diagnostic leaching was performed for each of the ore types. The percentage of gold deportment and percentage of gangue, the percentage of free gold in each fraction, the head grade, and the mass distribution were projected to a 10 x 10 map. The clustering produced was well defined for the different sample sources (gold mines).

90

Applications of Artificial Neural Networks to Mining

4.3.5 Remote Sensing Probably one of the most popular areas of neural network application, remote sensing presents problems which are ideal for architectures such as the SOM, the LVQ, or even the standard MLP. The examples given here, even though not directly linked to mining activities, demonstrate the potential of ANNs in this field. Bischof et al. [8] used a MLP for the multispectral classification of Landsat images. These images came from a Landsat Thematic Mapper (TM) and were 512 x 512 pixels in size. They were also analysed into 7 spectral channels (bands) which were used as the inputs to the network (13 units for each band representing different intervals from 0 to 255). The network then had to learn to classify the 7 band values to one of four types of land (built-up land, forest, water, and agricultural land), each represented by an output of the network. Even though this architecture gave good results, the developers extended the network to include a 7 x 7 pixel map of texture from band 5. Naturally the number of hidden units was increased from 5 to 8 units. The results from this extended architecture were better than the non-extended one in all types of land. Gopal and Woodcock [29] used a MLP for the detection of forest change from Landsat TM images between 1988 and 1991. A 10-input vector of 10 TM bands (5 from 1998 and 5 from 1991) is used with the single output being the absolute or the relative change. The results obtained with the developed MLP were better than those obtained with the conventional method for this task. Poulton and Zaverton [78] give a comparative study between different neural network architectures used for classification of TM images. The architectures compare were the back-propagation network, LVQ, counter-propagation network, functional link, probabilistic network, and the SOM. From the tests performed, they concluded that the LVQ architecture was the most flexible and robust one. They also 91

Applications of Artificial Neural Networks to Mining

suggested the use of ANNs for the analysis of geochemical and geophysical data, location of favorable prospects using GIS data, lithologic mapping from remote sensing data, and estimation of parameters in a similar way with kriging. Krasnopolsky [48] used a MLP for the retrieval of multiple geophysical parameters from satellite data. These parameters were the surface wind speed, columnar water vapor, columnar liquid water, and sea surface temperature (the four outputs of the MLP). The MLP had five inputs taken from five Special Sensor Microwave Imager brightness temperatures. The hidden layer had 12 units. The simultaneous retrieval of multiple parameters improved the retrieval of each one individually allowing physically coherent and consistent geophysical fields to be produced. Xiao and Chandrasekar [114] used a MLP for rainfall estimation from radar observations. More specifically, two networks have been developed, one using reflectivity as the only input, and the other using both reflectivity and differential reflectivity as the inputs. The networks were trained on data obtained from a multiparameter radar and raingages from the Kennedy Space Center. The trained networks were then used to estimate rainfall for four days during the summer of 1991. The trainning patterns consisted of a square grid (3 x 3km) of reflectivity values as well as distances from the grid nodes to the point of estimation. The raingage values were used as the target outputs. The trained network estimates and raingage values have shown good agreement at all sites.

92

Applications of Artificial Neural Networks to Mining

4.3.6 Process Control-Optimisation and Equipment Selection Process control and optimisation tends to be a tedious task involving large amounts of data from very different sources. ANNs are ideal for handling such tasks and this is why many researchers in the field of process control turned to them for developing solutions. Process control and optimisation of mineral processing plants as well as the mining process itself are a special case of these tasks and can therefore be approached by neural networks. Van der Walt et al. [103] used the MLP for the simulation of Resin-in-pulp process for gold recovery. Flament et al. [25] used the MLP for the identification of the dynamics of a mineral grinding circuit and the development of a control strategy. Bradford [10] used neural networks in a number of studies modelling the behaviour of different parts of a mineral processing plant. Ryman-Tubb and Bolt of Neural Mining Solutions Pty Ltd [91] describe the use of the AMAN architecture (described before) for integrated process system modelling and optimisation. The suggested areas of application include froth flotation, carbon-in-pulp (CIP), milling, and others. Their case study presented a reallife example based on a multi-stage copper extraction process. The trained networks (MLPs) were used for the following: •

Prediction of stripped copper cathode from electrowinning



Prediction of raw material usage



Identification of key plant parameters



Analysis of the effect of plant input parameters



Economic optimisation to determine cost-effective control settings

93

Applications of Artificial Neural Networks to Mining

The developers claimed the following benefits from the ANN approach: •

Decreased raw material costs



Increased copper production



Optimised planning of new and existing heap operations



Ability to implement “Just-in-time” purchasing policy



Planning of new heaps



Reduce reliance on individual and human operation

Finally, Schofield [94] investigated the use of neural networks as well as other AI tools for the selection of surface mining equipment.

4.4 Conclusions Quite clearly, the spectrum of neural network applications in mining is very wide. This is demonstrated by a number of exciting and very promising studies by a number of people from different scientific fields. The examples presented in this chapter support the choice of ANNs as the basis for developing solutions to mining problems were conventional techniques fail in one way or another. Mining is always about time and money and so far neural networks have shown that they can be very good in both terms. The systems described in the above examples were fast, reliable and most of the times provided a very stable theoretical background on which the validity of the proposed solution is based. The general trend in the mining industry for automation to the greatest degree calls for technologies such as the ANNs that can utilise large amounts of data for the development of models which otherwise are very difficult or sometimes even impossible to identify. The speed of ANNs – at least in application mode – also

94

Applications of Artificial Neural Networks to Mining

allows the development of real- or almost real-time systems, which can recognize quickly potential problems or even danger during a certain process. Another advantage of ANNs is in the minimisation of the necessary assumptions for a given problem. Especially in the case of grade estimation, this attribute proves very valuable. The examples of ANN application to grade estimation given earlier in this chapter supported this and other advantages of neural networks. The ambition of the author is to implement these advantages into an integrated neural network system for grade estimation.

95

Development of a Modular Neural Network System for Ore Grade Estimation

5. Development of a Modular Neural Network System for Grade Estimation 5.1 Introduction Before moving into the in-depth analysis of the integrated GEMNet II system for grade estimation, it is necessary to go through the development steps that led to the final architecture. Many things have changed in the developed architecture since the beginning of this project. The number of networks, their topological characteristics, the learning algorithm, the error measures, and even the inputs and dimensionality of the input space were changing or, one could say, evolving as more tests were run and the author gained more insight to the numerous algorithms and developments in the field of artificial neural networks. Going through these steps helps to understand the reasoning behind the developed system and how the original aims were met. GEMNet II was named as the successor to the original GEMNet system [12] also developed at the AIMS Research Unit. The author was very fortunate to have a starting point well ahead of any research carried out elsewhere, something that inevitably set the aims for the development of GEMNet II at quite a high level. GEMNet II was developed with the real life situations in mind from day one. The main aim was to find a reliable and robust architecture that required no significant interaction with the user in order to provide accurate grade estimation results. After the identification of this architecture and the proof of its validity through a number of case studies, the next aim would be to integrate the architecture in a user-friendly system that would allow straightforward application with no important parameters to be set by the user. The system should also be capable of removing the ‘black box’ attribute neural networks are famous for, an attribute completely unacceptable in the

96

Development of a Modular Neural Network System for Ore Grade Estimation

mining industry especially when it comes to grade estimates on which decisions involving large amounts of financial resources will be based. In this chapter, the development of the modular neural network architecture for grade estimation will be described. Mathematica from Wolfram Research [111] was used for the development of all prototype systems as it was found to be a very resourceful environment providing all the necessary tools for understanding and validating different neural network architectures. Two main principles – hypotheses have been accepted during the development of the system: grade estimation can be approached as a hypersurface reconstruction problem in the spatial co-ordinates input vector space, and grades are the numerical representation of a localised phenomenon (deposit) - grades themselves present localised behaviour. As will be seen later, there are a number of implications brought by these hypotheses that have a great effect on the design of GEMNet II. The author has carried out a large number of preliminary tests on various neural network architectures and learning algorithms as part of his MSc project [42]. These tests were based entirely on simulated 2D data arranged on a square grid. The networks were trained on the grid nodes using the grade at a given node as the required output and the grade at the eight (or the four closest) surrounding nodes as inputs. There was no information provided about the spatial location of the input samples or even the location of the required output. All together, the approach was very similar to image analysis techniques using computer vision with the image being in this case the grade surface. The results of these case studies and their comparison with results from kriging showed great promise – in fact, the developed neural networks performed much better than kriging in most of the cases. However, there was no guarantee that this would happen with real data and of course the whole

97

Development of a Modular Neural Network System for Ore Grade Estimation

approach was not at all applicable to real data due to the inflexible arrangement of the inputs (fixed on a regular grid). The most important issue raised from the above project regarded the formation of the input space, i.e. which input parameters should be used or how should the task of grade estimation be decomposed into smaller tasks that would be easier to approach using neural networks. In the next paragraph, the shift from fixed-on-a-grid inputs to completely floating-in-space sampling inputs will be described in two-dimensional sampling space.

5.2 Forming the Input Space from 2D Samples It is generally accepted that the input space characteristics as well as its components play a very important role in the performance of neural networks. The input dimensionality, as was discussed in earlier chapters, controls to a great extent the overall complexity of the neural network topology as well as the amount of training data required to bring the network performance to acceptable levels. Therefore it is very important to select the inputs from the available data in a way that will help reduce the complexity of the network and at the same time provide the right information for the network to be trained on. The input space also defines the way of approaching the required task, in this case grade estimation. Using the sample co-ordinates, for example, in two dimensions (easting and northing) as inputs to a network with the output being the grade of the sample means that grade is treated as a surface in the co-ordinate space. This approach seems to be the most popular among researchers dealing with this problem. As explained in the previous chapter, another approach is to use samples close to the estimation point as the source of grade input data. Usually, the samples are arranged on a regular grid, which makes things a lot easier. If they are not arranged on

98

Development of a Modular Neural Network System for Ore Grade Estimation

a grid, then the grid is constructed by applying a polygonal or inverse distance calculation on the original data, which naturally introduces smoothing errors. The inputs are in this case the grades of the neighbour nodes and the output is the grade at the point of estimation (also on the grid). Neighbour nodes can be considered to be the eight nodes surrounding the estimation point or the four that belong to the same grid lines passing from the estimation point. The above approach gives very good results on simulated data and regular sampling schemes where the smoothing errors introduced from gridding original data are relatively low. Applying this approach though directly to real data normally obtained with an irregular sampling scheme leads to the network learning a very smooth distribution (the distribution of the polygonal or inverse distance grid nodes) of grades that does not represent the reality. It should be noticed that the polygonal and the inverse distance method assume that the modelled surface is continuous. Clearly, there is a need to develop a way of presenting to the networks information from neighbour samples that honours their relative location to the point of estimation. In other words, the aim is to form the input space in a way that includes both the surrounding grade values and their relative position in space. A very common way of choosing samples surrounding the point of estimation used by most of the conventional methods is to use octant or quadrant search (Fig. 5.1). The area surrounding the point of estimation is divided into eight (or four) sectors and a number of samples is chosen from each one of them. This technique ensures that samples are selected from all directions in 2D space and not only from the direction where there are more samples and closer.

99

Development of a Modular Neural Network System for Ore Grade Estimation

Quadrant Estimation point

Octant Selected sample

Not selected sample

Node

Figure 5.1: Illustration of quadrant and octant search method (special case where only one sample is allowed per sector). Respective grid nodes are also shown.

Dividing the area around the estimation point into octants (or quadrants) provides a way to expand the inflexible input scheme using grid nodes to a scheme that can accept samples floating in 2D space. The inputs are now the grades at neighbour samples any distance away from the estimation point and not from the surrounding grid nodes (Fig. 5.1). There is no need for gridding the original data using any interpolation method that normally introduces errors, therefore the network is modelling the original distribution of grades. The use of octant search (or quadrant) also allows the use of the same neural network architecture as in the case of gridded samples. There is however one fundamental difference between the two approaches. In the case of samples arranged on a regular grid, the distance of the inputs from the point of estimation remains constant throughout the sampling area. Using an octant search means that the samples are now at a varying distance from the estimation point 100

Development of a Modular Neural Network System for Ore Grade Estimation

and therefore there is a need to include distance information as part of the input space. This requirement is also derived from the hypothesis that grades present localised behaviour. Therefore it is necessary for the neural network to ‘know’ the distance of any input sample relative to the point of estimation.

a. Actual Grades

b. MNN Grades

400

400

39.00 38.00

300

37.00

300

36.00 35.00 34.00

200

200

33.00 32.00 31.00

100

30.00

100

29.00 28.00 27.00

0 0

100

200

300

400

0 0

100

200

300

400

26.00

Figure 5.2: Estimation results from neural network architecture developed for use with gridded data. The use of irregular data has an obvious effect in the performance of the system.

The author has initially tested the neural network architecture used for gridded data directly to original data arranged irregularly in 2D space. The results were, as expected, not as good as when using gridded data. Clearly learning a distribution based on inverse distance estimates arranged on a grid is far easier than trying to learn the original data distribution. Figure 5.2 shows contour maps from this test using data from an iron ore deposit. The results from this test have shown clearly that it is necessary to provide distance information to the network in order to improve its modelling capacity in the case of irregular data. It should be noticed that at this stage the problem of grade estimation is still approached by the use of a single network with

101

Development of a Modular Neural Network System for Ore Grade Estimation

multiple inputs (eight or four) depending on the search method used – octant or quadrant. In order to provide distance information to the network, one input is added per sample, i.e. for each of the eight octants (or four quadrants) there are two inputs: the neighbour sample grade and its distance from the estimation point. This leads to a total of 16 inputs (or eight for quadrant search). The increase in the number of inputs inevitably leads to an increase of the number of hidden units required to handle the complexity of the input space. Figure 5.3 shows two neural networks with 16 and 8 inputs used to accept data from an octant and quadrant search respectively.

Figure 5.3: Neural network architectures receiving inputs from a quadrant search (left) and from an octant search (right). The number of hidden units in the right network is lower than in the left because the number of weights is higher.

The idea behind the use of two networks with different input dimensionality was based on the fact that not all estimation points have an adequate number of neighbour samples to complete the training patterns when using octant search. In other words, when there are less than eight neighbour samples around the estimation

102

Development of a Modular Neural Network System for Ore Grade Estimation

point, quadrant search and the smaller network is to be used for the estimation. Naturally, the quadrant search based network can be trained on all locations where the octant search based network is trained. In order to get even closer to a real situation, the developed architecture should be able to handle estimation points at the edges or even outside the sampling area. In these areas there is not enough information to generate complete patterns for any of the two networks, i.e. both octant and quadrant search fail to find any neighbour samples. For this reason, a third neural network is introduced to provide estimates at these points. This network can only depend on data at the point of estimation and therefore the commonly used input scheme of sample easting and northing is used. At this stage, the developed neural network architecture for grade estimation has become modular, in the sense that there are multiple networks providing estimates but each on different estimation points from the other. These three networks are in essence trying to reconstruct the grade hypersurface in their own input vector space. In other words, no matter if they are only used in specific estimation points, they are still trained on the entire sampling area, at least the part of it that provides enough information for their training patterns. As shown in Figure 5.4, and compared with the results shown in the previous figure, this architecture provides considerably better estimation performance. The next question is naturally whether this performance can be further improved. As it was mentioned earlier in this chapter, grades tend to present localised behaviour, i.e. samples close to each other tend to have similar grade values. This similarity normally decreases with the distance between the samples. The effect of this fact is that it is very difficult to approach grade estimation as a global approximation problem. For this reason a number of researchers have been led to the use of modular neural

103

Development of a Modular Neural Network System for Ore Grade Estimation

networks that construct local approximations of grade. The architecture described so far in this chapter, even though modular, still tries to approximate the entire distribution of grades through each network. It should be noticed at this point that in the case of radial basis function networks, the modelled surface is being reconstructed by a series of locally trained basis functions, which gives an answer to this problem.

a. Actual Grades

b. MNN Grades

400

400 39.00 38.00

300

37.00

300

36.00 35.00 34.00

200

200

33.00 32.00 31.00

100

30.00

100

29.00 28.00 27.00

0 0

100

200

300

400

0 0

100

200

300

400

26.00

Figure 5.4: Improvement in estimation by the introduction of the neighbour sample distance in the input vector.

The author carried the solution even further by breaking the problem of grade surface reconstruction from neighbour points into smaller tasks that can be easier to approach by a single neural network. More specifically, structural analysis in geostatistics has been the paradigm for this problem decomposition. In structural analysis, one tries to find the model of grade variability in certain directions in space. The derived models are then used to modify the interpolation method and the sample selection routine. Unfortunately, this is where one of the main disadvantages of geostatistics appears, as structural analysis, and more specifically variography, requires skills and time and also depends on the knowledge of the modelled parameter. The author aimed at overcoming these problems, while still taking

104

Development of a Modular Neural Network System for Ore Grade Estimation

advantage of the benefits of structural analysis, by employing neural networks to learn the spatial variability from exploration data. In order to learn the spatial variability of grade, the two networks with the inputs receiving information from neighbour samples were replaced by a number of networks trained on neighbour samples coming from a single direction in space. In other words, there are eight networks with two inputs (neighbour grade and distance from estimation point) where there was one with 16 inputs, or four networks with two inputs where there was one with eight. There is now one network per sector (octant or quadrant) learning the variability of grade in that direction. As expected, it is far easier for a single network to learn the variability in one direction than in all directions. It is also easier to control the learning process and to monitor the results of training. The results obtained with this architecture are very promising [41]. This is the final architecture developed by the author to handle exploration data from a twodimensional sampling scheme (Fig. 5.5). It became part of the Modular Neural Network System (MNNS) described in later paragraphs of this chapter. The MNNS could be considered as the prototype version of GEMNet II. There were several case studies run using the MNNS and data from simulated and real deposits. These are discussed in detail in Chapter 7.

105

Development of a Modular Neural Network System for Ore Grade Estimation

Modular Neural Network System for Ore Grade Estimation

I/O Data Set used for Training, Validation and Testing

Data Types

Training Testing Sector Output

Octant Search

Quadrant Search

Octant RBFN Module

Quadrant RBFN Module

X-Y-Grade MLP Module

16 Inputs 8 RBFNs 1 Output

8 Inputs 4 RBFNs 1 Output

2 Inputs 1 MLP 1 Output

Output

Figure 5.5: Modular neural network architecture developed for grade estimation from 2D samples [41].

After the design of the input space follows the development of the neural network topology as well as the learning algorithm. These are explained in the following paragraphs.

5.3 Development of the Neural Network Topologies 5.3.1 Overview From the discussion in the previous paragraph it becomes clear that the topology of the neural networks used in the developing stages of MNNS has gone through many changes. Apart from the input layers already discussed, the hidden layer has also been changing - the number of hidden units, the type of hidden units, and their activation

106

Development of a Modular Neural Network System for Ore Grade Estimation

and output functions. Different error measures were also tested. Overall, only one aspect of the neural networks did not change and that is the number of output units. The output layer of all neural networks developed had one unit providing the grade estimate. There are two types of neural networks that were predominantly tested during the development of the MNNS. These are the Multi-Layered Perceptron (MLP) and the Radial Basis Function (RBF) network. The choice between them was not easy as the MLP is very popular in function approximation problems and there is a very good background of theory and practical examples. In theory both architectures can produce very good results given time and training information. However, the RBFN has a great advantage over the MLP in terms of speed of development, which was more than verified during testing. Also the MLP produces global approximations and in order to get the same effect with the local approximations of the RBF it is necessary to complicate the overall architecture by introducing a number of MLPs trained on localised data. This approach was implemented in the original GEMNet and seemed to produce good results in small to average 2D deposits. The RBFN was chosen as the building unit of the MNNS after a number of tests on both architectures. However the MLP was still used occasionally as an averaging network for the estimates produced by the various RBFNs, as it will be discussed later.

5.3.2 The Hidden Layer Designing the hidden layer of a RBFN is a very complex task and also a very important one for the overall performance of the network. The number of hidden units depends on the training data and the modelled parameter, and can therefore vary from one dataset to the next. In the case of the MNNS architecture described here, the

107

Development of a Modular Neural Network System for Ore Grade Estimation

problem becomes even more complex as the original drillholes samples dataset is being processed and presented in three different ways. There is training data for the octant search networks, training data for the quadrant search networks and finally, training data for the network trained on the samples’ spatial co-ordinates. In addition to the original dataset, there is also the patterns consisting of the outputs of all these networks that become the inputs to the final averaging network. The optimum number of hidden units can be found in the case of drillhole data only during training by applying one of the automated node generation or destruction algorithms. There is a number of algorithms for adjusting the number of hidden units by training. In the case of the MNNS, a simple training algorithm was employed for adding hidden units (or RBF centres). The basic steps of the algorithm are as follows:

1. Start with a minimum number of RBF centres; 2. Train the network and calculate the validation error; 3. Add one centre; 4. Repeat step two and compare with previous validation error; 5. If the change in error is too small then stop training; 6. If the change in error is significant and the maximum number of centres has not been reached, go to step 3; 7. When the number of centres reaches the maximum, exit the algorithm and save the architecture with the smaller validation error.

Altogether this algorithm finds the number of hidden units that would produce the minimum validation error and uses the respective topology during estimation. This algorithm should not be confused with the learning algorithm used for training the various topologies.

108

Development of a Modular Neural Network System for Ore Grade Estimation

Another very important issue concerning the hidden layer in any RBFN is the positioning of the basis function centres (weights between input and hidden layer). This normally takes place at the initialisation stage of the learning process, where some of the network’s free parameters are set to give learning the best start possible. The initial positioning of the centres is very crucial. Thinking of the error as a hypersurface in the weights vector space – in this case the centres vector space – it is fairly easy to understand the importance of starting from a good point on this hypersurface, as it will help find the minimum-error-producing weights. A number of centre positioning algorithms are available including Kohonen learning, random positioning, k-means clustering, and positioning on samples. After rigorous testing, and for the available testing data it was found that random positioning of the centres in the input vector space was leading to better performance than any other positioning algorithm. Again it should be noticed that this was very much depending on the data used for the studies and indeed, as it will be seen later when using data from a threedimensional sampling scheme, the random positioning was found to be inadequate for more complex data. The author believes that the random positioning is ideal for twodimensional datasets with relatively low number of training patterns. Random positioning is not expected to perform well when the number of centres to be fitted is considerably lower than the total number of training patterns available. A more difficult choice was that of the basis function. There was almost no agreement between the studies with two-dimensional data as to which basis function helps produce better results. However it seemed that the multi-quadratic and the thinplate spline were consistently producing better results and the author was convinced to use them in further studies. It should be noticed that the choice of basis function is

109

Development of a Modular Neural Network System for Ore Grade Estimation

not as crucial for the problem at hand as is the smoothing parameter of the function, which carries information about the problem. The smoothing parameter can only be set through experimenting with the training data and is unique for every study. This is one of the points where user intervention is required for optimum results. There are no rules of thumb for this problem and therefore it is required that a number of testing runs are performed in order to set the smoothing parameter to its ideal value. Generally it is a very quick process and it is only necessary to take place once per study – if more training patterns become available there is normally no need to change the smoothing parameter. It is also possible to use only a representative part of the dataset for this process if the number of training samples is too large and training is time-consuming. The final aspect of the hidden units and by far the most important one is the bias. The bias of every unit is normally set to 1 and then adjusted through training. It is very important as it changes completely the behaviour of the unit when presented with data inside its receptive field. Generally, as the bias moves away from the value of 1 (gets smaller) more hidden units become activated than just the unit whose centre location corresponds to the current training pattern.

5.3.3 Final Weights and Output In order to complete the RBFN architecture, the weights between the hidden layer and the network’s single output need to be set. This is achieved by a gradient descent method similar to that used in the MLPs. The RBFNs used in the MNNS are fully interconnected, i.e. units from one layer branch out to every unit of the next layer. As there is only one output unit, the number of weights between hidden and output layer equals the number of RBF centres in the network. The single output unit simply performs the summation of the hidden units’ weighted outputs and passes the result

110

Development of a Modular Neural Network System for Ore Grade Estimation

through an activation function (like the logistic – sigmoid) that also takes the bias of the hidden units under consideration.

5.4 Learning from 2D samples 5.4.1 Overview Learning in RBFNs has been discussed in detail in Chapter 3. In this paragraph, the details of the learning algorithm used in MNNS will be discussed. Attention will be given to the effects of the problem characteristics on the learning parameters, i.e. how the learning algorithm is adjusted to perform better with exploration data. In the MNNS architecture there are three neural network modules each trained on different patterns derived from the same data (Fig. 5.6). It is therefore necessary to describe the learning process for each one of the modules individually, as there are significant differences. The discussion begins with the RBFNs trained using the patterns formed by an octant search.

Figure 5.6: Partitioning of the original dataset into three parts each one targeted at a different module of the MNNS.

111

Development of a Modular Neural Network System for Ore Grade Estimation

5.4.2 Module 1 – Learning from Octants Module 1 has eight RBFNs each with two inputs (neighbour sample grade and distance from estimation point), one output (grade at estimation point) and a varying number of hidden units (RBF centres). Figure 5.7 shows one of these networks. Module 1 can be seen as a modular network with 16 inputs and eight outputs. These outputs are averaged to provide a single grade estimate for the module. Input vectors are normalised, i.e. reduced to vectors of equal length. This is necessary to ensure that changes of equal scale in different inputs have the same effect on the network’s performance. The outputs are denormalised to give an estimate in the original range of values.

Figure 5.7: RBFN used as part of module 1 in MNNS. Training patterns from an octant search were used to train the network.

The learning process begins with the initialisation of the RBF centres. This process involves positioning the centres and setting the bias of the basis functions. As already explained, the centres were chosen randomly in the input space and the bias

112

Development of a Modular Neural Network System for Ore Grade Estimation

was usually set to an initial value of 1. The initial centre positions and bias values can be further optimised during the learning process. However, as it was found during testing, it is very difficult to train the networks by adjusting all the free parameters simultaneously. Therefore, training in MNNS was concentrated on one parameter at a time. The number of centres, as already discussed, was set by another process, nesting the RBF learning algorithm. The first parameter to be set by the learning algorithm is the weights between hidden and output layer. These are found by solving a problem of least squares using the known output and the output of the network. The rest of the network’s free parameters (centre location and bias) were set one at a time by a gradient descent method. Figure 5.8 shows the location of the basis function centres in the input space. The distance between the current input vector and the vector of the basis function centres was measured using the Euclidean error distance measure. 2.5 2 1.5 1 0.5 0 -2.5

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

2.5

-0.5 -1 -1.5 -2 -2.5

Figure 5.8: Posting of the basis function centres from the RBFN of Fig. 5.7 in the normalised input space (X-Grade, Y-Distance).

113

Development of a Modular Neural Network System for Ore Grade Estimation

The training patterns set was split in three parts. One third was used for training, the second for validation, and the third for testing. The training patterns were randomly selected as members of each part. The learning process stopped when the maximum number of centres was reached or when the change in the validation error was less than 0.001%. The architecture with the lower validation error was saved to be further used during testing and application. In contrast with the networks shown in Fig.5.3, the networks trained using the octant search had more hidden units than their counterparts in the quadrant search. One explanation for this is that the input dimensionality is the same in both cases but the problem in the case of octant search is more difficult due to less input data than in the quadrant search. The RBFNs in octant search are required to minimise the validation error for the same mapping but with less data, and therefore more basis functions are needed to achieve the mapping. The number of centres varied between 5 and 21 throughout the case studies.

Figure 5.9: Graph showing the learned relationship between the network’s inputs (grade and distance of neighbour sample) and the network’s output (target grade) for the RBFN of Fig. 5.7.

114

Development of a Modular Neural Network System for Ore Grade Estimation

After training and validation of the networks, testing took place to measure the generalisation performance and to provide the basis for comparison with other grade estimation techniques. Figure 5.9 shows an example of a network’s learned mapping between neighbour sample grade and distance, and grade at point of estimation.

5.4.3 Module 2 – Learning from Quadrants Module 2 has four RBFNs each with two inputs (neighbour grade and distance) and one output (grade at estimation point). As in the case of Module 1, this module can be considered as a modular neural network with 8 inputs and 4 outputs. The outputs from the four networks are averaged to provide a single output. Figure 5.10 shows one of these networks. The number of basis functions was less than in Module 1 networks because quadrant search produces more training patterns than octant search from the same dataset and therefore it is easier for the RBFNs of Module 2 to produce the same mapping with less hidden units.

Figure 5.10: Example of an RBFN from Module 2.

115

Development of a Modular Neural Network System for Ore Grade Estimation

The number of basis functions varied between 2 and 17 throughout the case studies. The learning process was identical to the one used in Module 1. Figure 5.11 shows how the centres of the RBFs were located in the normalised input space for the network in Fig. 5.10 and for a specific case study. Figure 5.12 shows the learned mapping for the same network, i.e. the learned relationship between the inputs (grade and distance of the neighbour samples) and the output (grade at estimation point). It can be seen that generally the network’s output increases with increasing neighbour grade and decreases with increasing distance of neighbour sample. 2.5 2 1.5 1 0.5 0 -2.5

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

2.5

-0.5 -1 -1.5 -2 -2.5

Figure 5.11: Posting of the basis function centres from the RBFN of Fig. 5.10 in the normalised input space (X-Grade, Y-Distance).

116

Development of a Modular Neural Network System for Ore Grade Estimation

Figure 5.12: Graph showing the learned relationship between the network’s inputs (grade and distance of neighbour sample) and the network’s output (target grade) for the RBFN of Fig. 5.10.

5.4.4 Module 3 – Learning from Sample 2D Co-ordinates The single network of this module is a Multi-Layer Perceptron with two inputs (easting and northing of samples) and one output (sample grade). The number of hidden units, as shown in Fig. 5.13, was 14 but this changed from one study to the other to achieve better results. The activation function of the hidden units was the bipolar sigmoid activation function (tanh).

117

Development of a Modular Neural Network System for Ore Grade Estimation

Figure 5.13: Module 3 MLP network trained on sample co-ordinates.

Learning was based on the steepest descent algorithm. The steepest descent method measures the gradient of the error surface after each complete cycle, and changes the weights in the direction of the steepest gradient. When a minimum is reached a new gradient is measured, and the weights are changed in the new direction. The method is improved by the use of the momentum coefficient, and the learning coefficient. The learning coefficient weights the change in the connections. The momentum coefficient is a term, which tends to alter the change in the connections in the direction of the average gradient.

This can prevent the learning algorithm

stopping in a local minimum rather than the global minimum. In the MNNS the learning process is split into four periods each with different number of training cycles and different learning and momentum coefficients. Table 5.1 shows how these coefficients are chosen during training.

118

Development of a Modular Neural Network System for Ore Grade Estimation

Table 5.1: Learning strategy for Module 3 MLP network.

Period

1

2

3

4

Learning Cf.

0.9

0.7

0.5

0.4

Momentum Cf.

0.1

0.4

0.5

0.6

1000

100

100

10000

Cycles

From the table it is clear that the change in the weights is more rapid at the beginning of training and it is reduced from one period to the next. In most cases the learning process is stopped well before the end of the last period. For example, in the case of the iron ore data discussed before, learning was stopped at period 4, cycle 156. Generally there are no rules for choosing these coefficients and one has to experiment in order to find the best strategy for training. The training patterns were split into three parts for training (55%), validation (30%), and testing (15%). The validation set was used for guiding the learning process, i.e. the process stopped when there was no significant change in the validation error. At that point the topology was saved to be used for testing and application. Figure 5.14 shows what the network has learned in the case of the iron ore data. It should be noticed again that this network is used only for estimating grades at locations where the previous two modules cannot due to lack of data.

119

Development of a Modular Neural Network System for Ore Grade Estimation

Figure 5.14: Learned mapping between sample co-ordinates (easting and northing) and sample grade for MLP network of Module 3.

5.5 Transition from 2D to 3D Data 5.5.1 General Having described the modular neural network architecture for use with twodimensional data, it is now necessary to examine how this architecture can be modified or expanded to accept data from real 3D sampling schemes such as drillhole data. There are certain issues that need to be considered during this expansion. The most obvious is the added dimensionality of the samples. Now there are three coordinates defining the location of samples in space – easting, northing, and elevation. More interesting is perhaps, the fact that samples have now a volume associated with them. As the assaying procedure is carried out on different drilling core lengths, the

120

Development of a Modular Neural Network System for Ore Grade Estimation

samples come in all sorts of lengths and therefore different volumes. This extra information needs to be considered in the input space of the estimating architecture. The fact that each drillhole can give more than one samples also complicates things even further. The neighbour sample search methods have to take this fact into consideration to avoid choosing too many samples from the same drillhole. The search methods described before are also purely 2D and cannot be considered as an option with 3D data especially in the case where the orebody does not follow a specific 2D plane in space. Therefore a fully 3D search method is necessary to be developed. These issues as well as other minor ones will be discussed over the next paragraphs of this section.

5.5.2 Input Space: Adding the Third Co-ordinate In three-dimensional sampling schemes commonly used in exploration programmes, samples are located in space by three co-ordinates: easting, northing, and elevation. As was explained in previous paragraphs, one of the modules of MNNS is an MLP network trained on the 2D co-ordinates of samples. The same network now needs to increase its input dimensionality to accommodate the elevation co-ordinate of each sample. The inputs of the network change from two to three. This obviously affects the number of weights necessary, i.e. the number of hidden units has to increase. The networks in the other two modules have the distance of the neighbour samples as an input. This distance was calculated in 2D space. Now the distance is calculated in 3D space. The centres of the basis functions were initially positioned randomly in the input space. This is inadequate in the case of three-dimensional samples as it was found during testing. The more complex distribution of neighbour

121

Development of a Modular Neural Network System for Ore Grade Estimation

sample distances is responsible for this fact. Therefore a different way of centre positioning needs to be employed.

5.5.3 Input Space: Adding the Sample Volume The sample volume defines what people in geostatistics would call the support of a particular sample. In drillhole data, samples have a certain length as to the length of the drillhole itself. In order to cope with the variations in the support of samples, it is necessary to pass the samples through compositing and use the composites of equal length in the estimation procedure. This is the case for most of the conventional methods of estimation including geostatistics. In the case of the MNNS approach, there is no need to composite the samples into equal length composites. The architecture is modified to accept the length of the samples as an extra input to all neural networks involved. Specifically the network trained on the sample co-ordinates now also accepts the length of the samples - the inputs increase to four (easting, northing, elevation, and length). The networks trained on neighbour samples now receive the neighbour sample length as well as its grade and distance from the estimation point. A complication of the transition to 3D data relative to the sample volume is the fact that the estimation is now taking place in 3D as well. Block modelling is the norm for 3D grade estimation. As was described before, block modelling is based on blocks with an associated volume. This volume needs to be considered during estimation for the same reasons that sample length is considered during training and estimation. The extra input added to the neural networks enables the introduction of the block volumes during estimation.

122

Development of a Modular Neural Network System for Ore Grade Estimation

5.5.4 Search Method: Expanding to Three Dimensions The search methods used in the case of 2D data can not be used with 3D data because they take no consideration of the third dimension (elevation) which is necessary to fully define the location of samples in space. The quadrant and octant methods, as shown in Fig. 5.1, select samples from a plane rather than a 3D sample space. Even if this plane is rotated in any of the three axes (easting, northing, elevation) these methods would only be adequate for flat orebodies with not much grade variation in one of the three dimensions. It is therefore necessary to expand these search methods to three dimensions. The author first tried to achieve this by applying the quadrant and octant search in all three planes defined by the three axes: the XY, XZ, and YZ plane. Figures 5.15 and 5.16 illustrate how the quadrant and octant search would divide 3D space into sectors.

Figure 5.15: 3D version of quadrant search.

123

Development of a Modular Neural Network System for Ore Grade Estimation

Figure 5.16: 3D version of octant search.

From the figures it becomes clear that the resultant search methods become very complex and very difficult to comprehend in three dimensions. The total number of sectors produced is 64 for quadrant and 512 for octant. This means that the MNNS should have 64 networks trained on quadrant search data and 512 networks trained on octant data. Even if this was possible in computation terms, there would not be enough samples to fill each sector and provide training patterns for every network. Therefore it is necessary to simplify these search methods in order to cope with the geometrical characteristics of exploration sampling schemes. After considering a number of schemes, the author decided to use the simple search method shown in Fig. 5.17. There are only six sectors in this scheme: upper, lower, north, south, east, and west. These sectors are defined by the intersection of four planes: two planes vertical to the XZ plane at ±45° dip, and two planes vertical to the YZ plane at

124

Development of a Modular Neural Network System for Ore Grade Estimation

±45° dip. In other words, these sectors look like pyramids of square base with their top at the estimation point.

Figure 5.17: Simplified 3D search method used in the MNNS for sample selection.

The advantage of this search scheme is not just the fact that it is very simple and affordable in computation terms. With this scheme, the drillhole where the current training point belongs is always within two opposite sectors. This allows easier control of the number of samples selected from this drillhole, which can help improve the results of estimation. Another advantage of this scheme is that it can handle any inclination of the orebody or the drilling scheme. The author decided to replace both 2D-search methods (quadrant and octant) by this simplified 3D method, which means that the MNNS has now just two modules: one trained on the sample co-ordinates and length and one trained using data

125

Development of a Modular Neural Network System for Ore Grade Estimation

from the single search method. This also means that the second module now has only six networks, one for every sector of this search scheme.

5.6 Complete Prototype of the MNNS The complete modular neural network system for grade estimation using 3D data is shown in Fig. 5.18. The system comprises three neural network modules responsible for the estimation and a data processing and control module that generates the training patterns for the networks by applying the search method described.

Figure 5.18: Diagram showing the structure of the MNNS for 3D data (units are the neural network modules).

The second module or unit as shown in the figure is a single RBFN trained on the outputs of the six RBFNs of the first module. This network replaced the simple averaging of the RBFNs' outputs that was done previously. It was found necessary as it became clear during testing that some of the RBFNs of the first module were consistently producing estimates closer to the actual values while others were consistently far from them. The learning process for this RBFN is identical with that of the RBFNs in module one. The number of hidden units varied between six and

126

Development of a Modular Neural Network System for Ore Grade Estimation

nine. Figure 5.19 shows an example of how this network's output varied depending on the outputs of the RBFNs in module one.

Figure 5.19: Learned weighting of outputs from module one RBFNs by the RBFN of module two.

The third module is the modified for 3D data neural network with four inputs (easting, northing, elevation, and length) and one output (target grade). Unlike in the case of 2D data where the MLP architecture seemed to perform better, and from early tests ran using 3D data it became clear that the RBFN reduces the validation error even further than the MLP and therefore the third module is based on a single RBFN and not on the MLP as was described before. The data processing and control module accepts data in ASCII form and creates training pattern files for the neural networks of the MNNS. The formation of training patterns is based on the search method described. Basically, for every training sample in the dataset, one neighbour sample is chosen from every sector – the one closest to the training sample. The grade of the neighbour sample, its distance from the training sample and its length are written as inputs on the training pattern file of the network responsible for the specific sector, while the training sample grade is written as the require output. Clearly, in some occasions there are no neighbour samples in some of the sectors. In those cases, the training sample is marked for

127

Development of a Modular Neural Network System for Ore Grade Estimation

estimation with module three, which is trained on the training sample co-ordinates. The network of module 3 is however trained on all samples regardless of the results of the search process. Figure 5.20 shows an example of this network’s output depending on its inputs.

Figure 5.20: Learned relationships between sample co-ordinates, length (inputs) and sample grade (output) from the RBFN of module three.

128

Development of a Modular Neural Network System for Ore Grade Estimation

After training is stopped the saved topologies are used for estimation. Initially this was done on the basis of drillhole samples hidden from the training process for testing reason. Later the drillhole samples were mostly targeted on the training and validation process. Cross validation was used for testing the validity of the learned mappings and for comparing with other grade estimation techniques. Studies carried out with this architecture [40] supported most of the choices made during the development process described in this chapter. Even at this prototype stage, the system could perform reasonably well on a wide variety of data.

5.7 Conclusions In this chapter the development of the modular neural network system (MNNS) for grade estimation was described. This system with some modifications will become the core of GEMNet II. As it was explained, the MNNS is trying to approach grade estimation in two different ways:

1. Using a sample’s co-ordinates and length to construct the picture of grade in 3D space; 2. Using neighbour samples’ grade, distance and length to construct the picture of grade in specific directions in space.

This approach ensures that there is an estimate for grade even in places where sampling density is very low. This approach also takes advantage of the information hidden in the relationship between neighbour samples and takes under consideration the support of the samples. Because of that, it can provide estimates that have a volume associated with them as opposed to point estimates.

129

Development of a Modular Neural Network System for Ore Grade Estimation

The MNNS requires a minimum of human interaction – this interaction is limited to a single parameter of the RBFNs and it does not require any particular knowledge or skills from the user. The results depend solely on the data at hand – the estimation process adjusts to the available data. However, the described system, being in a prototype form, is not very userfriendly and integration of its results in the process of reserves estimation is difficult. Therefore it is necessary to integrate the MNNS into a complete resource-modelling environment in order to get the most out of the system and realise its full potential. This integration will also allow better comparison with the existing methodologies. In the next chapter this integration is described as well as a number of minor modifications to the MNNS architecture. The targeted resource-modelling environment was one of the leading mining software packages called VULCAN from Maptek/KRJA Systems Ltd. The integration of the MNNS inside VULCAN led to the development of GEMNet II.

130

Case Studies of the Prototype Modular Neural Network System

6. Case Studies of the Prototype Modular Neural Network System 6.1 Overview The case studies presented in this chapter were based on the prototype MNNS architecture. In fact there were two versions of the prototype system, as described in Chapter 5, one for 2D data and one for 3D. There are two case studies for each one of them. More specifically, these case studies are:



2D iron ore deposit



2D copper deposit



3D gold deposit



3D chromite deposit

The 2D deposits have been extensively used in geostatistical as well as neural network case studies and are ideal for comparison of different approaches. The 3D deposits have never been used in a published study. These studies are part of a larger set of tests ran using the prototype MNNS architecture. The purpose of those tests was to validate the approach and fine-tune the architecture. As the 2D datasets were created specifically to demonstrate the validity of the geostatistical approach, they were ideal for testing MNNS and comparing its results with those obtained using inverse distance and kriging. The datasets from the four case studies presented here are given in Appendix B. It should be noted that finding datasets from real deposits is fairly difficult. Mining companies are quite reluctant in giving information away. Both in the MNNS

131

Case Studies of the Prototype Modular Neural Network System

studies of this chapter and the GEMNET II studies of the next, the most common types of deposits are metal. The performance of the MNNS will be compared with inverse distance and kriging as these are the most commonly used methods for ore grade estimation in metal deposits. As the only known ore grade values are those provided in the samples, a part of the dataset is kept out of the information provided to the various methods for estimation. In other words, some of the samples become the testing points where the performance of each method is tested. This clearly compromises the overall performance of each method but unfortunately there is no other objective way of testing. The estimation performance will be expressed in terms of the mean absolute error on the test set and also with graphs of actual vs. estimated (scatter), histograms of grade distribution, and contour maps of ore grade. The datasets were of varying complexity and size and therefore presented a varying difficulty to the estimation techniques used. Table 6.1 summarises the characteristics of these datasets.

Table 6.1: Characteristics of datasets from the MNNS case studies. 2D Iron Ore 2D Copper 3D Gold 3D Chromite Total Samples Area/Volume Standard Deviation Average Grade

91

51

112

94

160,000m2

360,000m2

42,686,028m3

70,010,800m3

4.4798

0.3731

0.5521

7.0019

34.59% Fe

0.4658% Cu

0.9316gr/t Au

15.7223%

Results from inverse distance and kriging were obtained using Surfer from Golden Software in the 2D case studies, and VULCAN in the 3D case studies.

132

Case Studies of the Prototype Modular Neural Network System

6.2 Case Study 1 – 2D Iron Ore Deposit The dataset used in the first case study of the MNNS architecture is a simulated iron ore deposit [41]. It is a low-grade sedimentary deposit with an average grade of 34.59% Fe. The 91 samples contained are in essence two groups of data: 50 of them are samples taken at random over the 160,000m2 (400 x 400) sampling area and the other 41 are taken on a regular 100m grid (Fig. 6.1).

36.60 400.00 27.20

34.60 27.40

38.90

37.90

35.40

34.20 40.80 40.00

26.20

29.10

34.90 300.00

27.40

27.50

44.10 40.60

39.00

32.40

41.40

32.90

30.40

39.90

39.30 40.00 40.00

350.00

30.20 33.70

35.40

250.00

36.30

34.50

37.40

37.80 34.10 33.90 200.00

31.50

39.10

35.30 33.40

34.30 29.6032.40 29.40 150.00 28.60

41.40 36.00

35.50

36.50

37.60

39.90

29.80 34.90

33.90

30.20 27.40

34.70

34.40

34.80

29.90

28.90

30.10 33.20 32.50 37.90 100.00

50.00 35.50

40.50

31.80 28.50

30.40 30.60

36.7036.80 41.50

45.30 0.00 0.00

40.40

30.70 50.00

100.00

24.40 27.60

34.30 34.70

35.40

31.60

33.70 40.00 40.10

150.00

31.00 35.30 39.8039.50

200.00

33.30 250.00

300.00

33.50 350.00

400.00

Figure 6.1: Posting of input/training samples (blue) and test samples (red) from the iron ore deposit.

The 50 random samples were used for training and validation of the MNNS networks. They were also used as input data for inverse distance and kriging. The 41

133

Case Studies of the Prototype Modular Neural Network System

grid samples were used for testing all three approaches. The absolute errors produced by the three methods were as follows:

Table 6.2: Mean absolute errors from case study 1. Method

Mean Absolute

Mean Absolute %

Inverse Distance Squared

2.77

8.26

Kriging

2.64

7.90

MNNS

2.60

7.77

Figure 6.2 shows a scatter diagram of the actual vs. the estimated grades from the various methods.

Iron Ore Grade Values Data Fit 50

Estimated

45

40

Kriging MNNS ID2

35

30

25 25

30

35

40

45

50

Actual

Figure 6.2: Scatter diagram of actual vs. estimated iron ore grades.

The MNNS is slightly outperforming kriging and inverse distance in this dataset. It should be noted though once more that this dataset was generated to suit a geostatistical study and therefore kriging is expected to give good results. It is quite

134

Case Studies of the Prototype Modular Neural Network System

obvious that all methods tend to underestimate in high-grade areas. The reason for this, at least in the case of the MNNS, is because these areas are close to the borders of the deposit where the MLP is providing the estimates. The MLP module seems to give estimates close to the average grade. The performance of the three methods becomes even clearer by examining the grade distributions below (Fig. 6.3).

Grade Distributions 16 14

Frequency

12

Actual Kriging MNN ID2

10 8 6 4 2 0 26.2

28.9

31.1

33.3

35.6

37.8

More

Bin Figure 6.3: Iron ore grade distributions – actual and estimated.

From the above figure it seems that the MNNS generates a smooth distribution similar to that of the inverse distance. Kriging follows better the shape of the actual distribution. Generally all three methods perform well. The following contour maps show exactly how close the methods were to the actual values and to each other.

135

Case Studies of the Prototype Modular Neural Network System

a. Actual Grades

b. MNNS Grades

400

400

300

300

200

200

100

0 0

100

100

200

300

400

0 0 400

300

300

200

200

100

100

100

200

300

100

200

300

400

d. Inverse Distance Grades

c. Kriging Grades 400

0 0

%Fe

400

0 0

100

200

300

39 38 37 36 35 34 33 32 31 30 29 28 27 26

400

Figure 6.4: Contour maps of iron ore actual and estimated grades.

Kriging and the MNNS seem to perform better in different regions except for a part in the southwest of the deposit where they both perform badly. Lack of enough training samples is the main reason for the high error level areas produced. The MNNS system seems to map better the low-grade area on the northwest region while kriging did better on the southeast.

6.3 Case Study 2 – 2D Copper Deposit The 2D copper deposit in this study is in essence a level from a theoretical open pit copper mine [36]. It consists of 51 drillhole composites as shown in Fig. 6.5. These composites cover an area of 360,000m2 and are concentrate mainly in the central part

136

Case Studies of the Prototype Modular Neural Network System

of that area. This data has been used by Hughes et al. [36], Wu and Zhou [112] and Burnett [12] for testing different estimation methods.

600.00

0.175 0.417

500.00

0.215

0.392

0.489

0.396

0.685

0.377

0.427

0.140

0.320

0.717

0.806

0.889

0.475

0.8330.453

0.719

1.009

0.893

0.089

0.092

0.915

1.335

0.519

0.072

0.040

0.258

0.638

1.615

0.765

0.465

0.034

0.165

0.063

0.406

0.909

0.012

0.224

0.188

0.027

0.395

400.00 0.230 0.102 0.023 300.00

1.365

0.644

0.476

0.409

200.00 0.228

100.00 0.225

0.00 0.00

100.00

200.00

300.00

400.00

500.00

600.00

Figure 6.5: Posting of input/training samples (blue) and test samples (red) from the copper deposit.

The dataset was split in two parts: 30 composites were used for training the networks and 21 for testing the performance of the MNNS as well as of the other methods. The inverse distance and kriging estimates were obtained using the same parameters that Hughes et al. used in their study [36]. The performance of the three estimators in terms of the mean absolute error on the test data is given below:

137

Case Studies of the Prototype Modular Neural Network System

Table 6.3: Mean absolute errors from case study 2. Method

Mean Absolute

Mean Absolute %

Inverse Distance Squared

0.0226

8.21

Kriging

0.0291

7.18

MNNS

0.0258

4.81

Figure 6.6 shows a scatter diagram of the actual vs. the estimated copper grades from the various methods. Copper Grade Values Data Fit 1.6

1.4

1.2

Estimated

1.0

ID2 Kriging

0.8

MNNS 0.6

0.4

0.2

0.0 0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

Actual

Figure 6.6: Scatter diagram of actual vs. estimated copper grades.

Once again, the MNNS is performing well compared to the other two methods. Inverse distance and kriging appear to have very similar performance with their estimates being very close. Unfortunately, the locations used to test the performance of the three methods are simply samples that would otherwise have been used as input information. Unlike case study 1 where there was a good spread of the test samples, in

138

Case Studies of the Prototype Modular Neural Network System

case study 2 and in most of the studies to follow, input data are used for testing, which means that the spread of the test points is not always ideal. In these cases, testing takes the form of cross-validation, where the estimator is trying to recreate sample points from the remaining data set. The actual as well as the estimated copper grade distributions are shown in Fig. 6.7.

Grade Distributions 7

6

Frequency

5

Actual 4

ID2 Kriging

3

MNNS 2

1

0 0.2

0.4

0.6

0.8

1.4

More

Bin

Figure 6.7: Copper grade distributions – actual and estimated.

The MNNS in this study tends to slightly overestimate grades close to the average but generally the estimates are well balanced. The other two methods are also performing well. The contour maps in Fig. 6.8 illustrate the results of grade estimation. The actual grade map is limited to the sampling area as there is no information outside it. MNNS is limited to the testing area. Inverse distance and kriging extend to the borders of the map but comparison should be limited to the testing area.

139

Case Studies of the Prototype Modular Neural Network System

Actual

MNNS

600.00

600.00

550.00

550.00

500.00

500.00

450.00

450.00

400.00

400.00

350.00

350.00

300.00

300.00

250.00

250.00

1.5

200.00

200.00

1.4

150.00

150.00

1.3

100.00

100.00

1.2

50.00

50.00

1.1

0.00 0.00

50.00 100.00 150.00 200.00 250.00 300.00 350.00 400.00 450.00 500.00 550.00 600.00

0.00 0.00

% Cu

1.0 50.00 100.00 150.00 200.00 250.00 300.00 350.00 400.00 450.00 500.00 550.00 600.00

Kriging

Inverse Distance

0.9 0.8

600.00

600.00

0.7

550.00

550.00

0.6

500.00

500.00

0.5

450.00

450.00

0.4

400.00

400.00

0.3

350.00

350.00

0.2

300.00

300.00

0.1

250.00

250.00

200.00

200.00

150.00

150.00

100.00

100.00

50.00

50.00

0.00 0.00

50.00 100.00 150.00 200.00 250.00 300.00 350.00 400.00 450.00 500.00 550.00 600.00

0.00 0.00

0.0

50.00 100.00 150.00 200.00 250.00 300.00 350.00 400.00 450.00 500.00 550.00 600.00

Figure 6.8: Contour maps of copper actual and estimated grades.

Inverse distance is by far the worse method in this case. Kriging is doing better but fails to split the high-grade area. MNNS tends to underestimate the high-grade area that kriging models very well in the right side of the map. However, MNNS is better in finding the shape of the high-grade area as well as splitting it to its parts as they appear in the actual grade map.

6.4 Case Study 3 – 3D Gold Deposit With the third case study, the transition is made from 2D to 3D data. This transition means that the 3D version of the MNNS architecture is now used. The sample search methods are not the 2D octant and quadrant methods, but the 3D search scheme developed specifically for the MNNS.

140

Case Studies of the Prototype Modular Neural Network System

The data used in this case study is part of a larger dataset from a copper/gold deposit. The original dataset consists of four orebodies developed along fractures in metasomatised host rocks, which include gneissic granites, mica schists and metasomatites. In this study, only one of the orebodies was used. The input and test data were limited to the drillhole samples located inside this orebody (code named TQ2). The total number of samples was 112. As the dataset is now 3D, the visualization of the results of estimation becomes more difficult. Contour maps can only be used to show sections through the estimated area. Normally, estimation in 3D deposits is made on a block model basis, but as the actual grade values of the blocks are unknown, the estimation performance can only be measured over a part of the input dataset. The orebody model has been created in VULCAN/Envisage during a geological modeling study based on lithology. Fig. 6.9 shows a 3D view of the orebody and drillholes (screenshot from Envisage). It should be noted that in this study VULCAN is used for providing the inverse distance and kriging estimates and not as an implementation environment for MNNS. The same study including the complete dataset with four orebodies is repeated in the next chapter using the fully integrated in VULCAN system, GEMNET II.

141

Case Studies of the Prototype Modular Neural Network System

Figure 6.9: 3D view of the orebody and drillhole samples used in the 3D gold deposit study.

From the 112 available samples, 42 (37.5%) were used for testing the performance of the three estimation methods. This means that the MNNS had only 70 samples (62.5%) available to train the various networks. After testing with all three methods, the actual and estimated average gold grades were:

Table 6.4: Actual and estimated average gold grades. Actual ID2 Kriging Average (gr/t)

0.9316

0.6524

0.6581

MNNS 0.7420

The mean absolute error was quite high in comparison with the previous two studies. Clearly, a three-dimensional orebody is far more challenging and demanding than a two-dimensional one. The mean absolute errors for the three methods are given below: Table 6.5: Mean absolute errors from case study 3. ID2 Kriging

MNNS

142

Case Studies of the Prototype Modular Neural Network System

Mean ABS Error

0.4242

0.3939

0.3162

Mean ABS %

44.10%

40.17%

31.60%

The results for inverse distance and kriging were obtained using cross-validation in VULCAN. Cross-validation was limited to the 42 test samples used for testing the MNNS. The following figures (6.10) shows the data fit produced by the three methods.

Gold Grade Values Data Fit 1.6 1.4

Estimated

1.2 1 ID2 KRIGING MNNS

0.8 0.6 0.4 0.2 0 0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

Actual

Figure 6.10: Scatter diagram of actual vs. estimated gold grades.

It is obvious that none of the methods performs very well. The MNNS, even though it performs better than the other methods, tends to overestimate grades close to the average value and underestimate the high-grade samples. This becomes clearer in the next figure (Fig. 6.11) showing the actual and estimated distributions.

143

Case Studies of the Prototype Modular Neural Network System

Gold Grade Distribution 14

12

Frequency

10

Actual

8

ID2 Kriging 6

MNNS

4

2

0 0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

More

Bin

Figure 6.11: Gold grade distributions – actual and estimated.

The distribution shown in the above figure as the actual gold grade distribution refers only to the test samples and not the entire dataset. However, as it can be seen from the following graph, this distribution is not very far from the distribution of the entire dataset. The main differences are in the low and high grade areas were the test set had less and more samples respectively. This could explain the relatively average performance of all three methods.

144

Case Studies of the Prototype Modular Neural Network System

Actual Gold Grade Distribution 35 30

Frequency

25 20 15 10 5 0 0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

More

Bin

Figure 6.12: Gold grades distribution of the complete dataset.

This study, being the first using 3D data from a real deposit, shows how much more difficult it is for the estimation methods to perform well than in the case of 2D data from simulated deposits. The performance degradation, at least for the MNNS, can be considered as a result of the higher input dimensionality and the higher complexity of the required mapping. This study also shows that the MNNS in its first application to 3D data has outperformed both inverse distance and kriging. What is not clear from this study is the time difference in applying these methods. MNNS required about an hour to generate the training pattern files, train the networks, and provide estimates. Kriging required a complete geostatistical study that, depending on how thorough one wants to be, can take many hours.

145

Case Studies of the Prototype Modular Neural Network System

6.5 Case Study 4 – 3D Chromite Deposit The dataset used in the final study of the MNNS is taken from a larger samples database of an undeveloped chromite deposit. There are 94 samples from 26 drillholes in this dataset. There is no geological study and therefore the estimation is not constrained by geology. Normally, there should be an orebody to limit the samples used for the estimation as well as the locations were the estimation takes place, but in this case the dataset is very small and the lack of geological modelling is not expected to generate problems. The drillholes from the dataset are shown in Fig. 6.13.

Figure 6.13: Drillholes from a 3D chromite deposit.

From the 94 samples, 38 were used for testing the three methods while the remaining 56 were used for training the neural networks and as input information for inverse distance and kriging. The actual and estimated average chromite grades were as follows:

146

Case Studies of the Prototype Modular Neural Network System

Table 6.6: Actual and estimated average chromite grades. Actual ID2 Kriging

MNNS

Average grade 15.7639

14.7639

15.1511

16.3449

% Chromite

The estimation performance of all three methods was good considering the fact that there was no limitation as to the samples used due to the lack of a geological model. This means that the methods were able to estimate grades from samples that do not necessarily belong to the same geological domain. The mean absolute errors are given below: Table 6.7: Mean absolute errors from case study 4. ID2 Kriging

MNNS

Mean ABS Error

3.7687

3.3996

2.4536

Mean ABS %

21.83%

19.82%

16.19%

Once again, the MNNS is outperforming the other two methods but this time the difference is clearer as they all perform well. The MNNS is closer to the actual average chromite grade and produces the smaller absolute errors from the three methods. This is verified by the data fit graph and grade distribution chart shown in the following figures.

147

Case Studies of the Prototype Modular Neural Network System

Chromite Grades Data Fit 35

30

Estimated

25

20

ID2EST KRGEST MNNS

15

10

5

0 0

5

10

15

20

25

30

35

Actual

Figure 6.14: Scatter diagram of actual vs. estimated chromite grades.

Chromite Grade Distribution 16

14

12

Frequency

10 Actual ID2

8

Kriging MNNS

6

4

2

0 10

15

20

25

30

35

Bin

Figure 6.15: Chromite grade distributions – actual and estimated.

148

Case Studies of the Prototype Modular Neural Network System

From the above graphs it appears that kriging is doing better at the low to middle grade samples while the MNNS is doing better at high-grade samples. Inverse distance tends to overestimate low-grade samples and underestimate high-grade ones. Generally all three methods are performing well.

6.6 Conclusions The prototype 2D and 3D MNNS architectures were tested in this chapter in four very different case studies. The datasets used were coming from both simulated and real deposits. Each dataset had a different type of ore as its target quantity with the common point being the fact that they were all metal. The number of samples in each case study was relatively low. These were, however, case studies that aimed at the development of the modular architecture and not at the establishment of the approach as a valid ore grade estimation technique. Therefore the low number of samples allowed easy monitoring of the system’s performance and fast development times. The performance of MNNS, as measured by the produced absolute errors and estimated grade distributions, compared very well to the performance of inverse distance and kriging. MNNS seemed to perform well even on datasets that were designed to demonstrate the validity of the geostatistical approach. Clearly though, there could be plenty of space for improvement of the geostatistical studies. That is always of course at the expense of time and effort. The speed of development and the independency of the approach to the knowledge and skills of the user have been demonstrated by these case studies. The quality of the estimates also has shown that the MNNS architecture is a step in the right direction for ore grade estimation using artificial neural networks.

149

GEMNET II - An Integrated System for Grade Estimation

7. GEMNET II – An Integrated System for Grade Estimation 7.1 Overview In this chapter the discussion continues with the analysis of GEMNET II, the integrated system for grade estimation developed by the author and based on the Modular Neural Network System described in the previous chapter. GEMNET II is mainly written in C and uses parts of the SNNS, the Stuttgart Neural Network Simulator from the University of Stuttgart, Germany [97]. The GEMNET II core program is a data processing and control module written in C that processes the samples file as well as the block model file. The core program also makes external calls to parts of the SNNS simulator. These parts are the main simulator kernel, the batch execution language (BATCHMAN), and the C code extraction program (SNNS2C) that converts the trained neural network topologies to C functions. The development of neural networks is controlled by a number of scripts written in the SNNS batch language, which is very similar to AWK and C. GEMNET II is integrated within VULCAN, a leading software package for resource modelling. The control of the system is done through ENVISAGE, VULCAN's graphical editor that provides the graphical user interface for GEMNET II. The interface between GEMNET II and VULCAN is based on a number of scripts written in a very popular scripting language called Perl. Specifically for VULCAN there are a number of extensions to Perl, which are called Lava extensions. These give access to graphical objects and routines in ENVISAGE, which are very useful for integrating external programs like GEMNET II.

150

GEMNET II - An Integrated System for Grade Estimation

The integration of GEMNET II with SNNS and VULCAN provides the following additional functionality that was missing from the MNNS as a standalone system:



Ability to try practically every neural network architecture without having to modify the core of the system;



Faster training and application of neural networks during the estimation process;



A graphical user interface that is easy to learn and use;



Direct access to an integrated modelling environment allowing the incorporation of the estimation results in a larger scale modelling operation;



Estimation based on the advanced block modelling that VULCAN provides;



3D visualisation of the drillhole samples and the targeted block model;



3D visualisation of the estimation results and validation of the training process;



Straightforward comparison of GEMNET II with other estimation packages and techniques incorporated in VULCAN, like the geostatistical packages GSLIB, Geostokos, and ISATIS;



Data management based on VULCAN's project file structure;



Estimation reliability measures.

The MNNS core has also been modified to improve the estimation process and provide a number of reliability measures. The next section gives the details of the core architecture and shows how it was implemented using the SNNS simulator. It should be noted that there were some changes in the names of modules in the MNNS.

151

GEMNET II - An Integrated System for Grade Estimation

7.2 Core Architecture and Operation 7.2.1 Exploration Data Processing and Control Module This is the main part of GEMNET II. It is a program written entirely in C with the code being compatible with both Microsoft Windows based PCs and UNIX based workstations. It is responsible for processing the drillhole samples file and the block model centroids file, normalisation of the data, generation of training patterns for the various networks, and for making all the necessary external calls to the neural network simulator (SNNS). Once the development of neural networks is completed and the C code extracts have been compiled, this module carries on with the estimation process. Figure 7.1 shows schematically the operation of this module. The first operation of the data processing and control module is to read the samples file and place the sample co-ordinates and assay values (grades) in a number of arrays. This is done to increase the speed of the search process later on. The samples file is normally a map file generated by VULCAN’s compositing function. The map file contains a header describing the file structure and records consisting of sample ids, sample co-ordinates, and assay values. The values of the arrays are normalised so that all co-ordinates and assay values vary between zero and one. As was explained before, this ensures that the effects of the range of values are eliminated from the neural network training process. The normalisation information (minimum, maximum, and range values) is stored in a file that will be used later to restore the initial values and to ensure that the estimates will also be in the correct range of values. Figure 7.2 shows the normalisation information as reported by GEMNET II in VULCAN. The contents of the normalised arrays (sample co-ordinates and grade) are written in a file used for training the second module’s network. As it was mentioned before this network is trained on the

152

GEMNET II - An Integrated System for Grade Estimation

entire dataset but is used to provide estimates only where there are not enough neighbour samples for the networks of the first module.

Figure 7.1: Simplified block diagram showing the operational steps of the data processing and control module in GEMNET II.

153

GEMNET II - An Integrated System for Grade Estimation

Figure 7.2: Normalisation information panel.

The next step is the application of the search method. Each sample is taken as the centre of the search scheme. The space around the centre sample is divided into the six sectors described in the previous chapter. The centre sample is in essence the training point for the RBF networks of the neural network modules. All the remaining samples are assigned to one of the sectors depending on their relative location to the centre of the search. It should be clear that as the discussion is about samples with an associated volume, their location is identified as the centroid of the volume. The normalised distance of each neighbour sample to the centre is calculated and stored together with the normalised neighbour sample and centre sample grades in one of six files, one for every sector. At the end of the search process, there are six files each containing a different number of samples depending on the geometrical characteristics of the drilling (sampling) scheme. In fact the number of training patterns is equal between opposite sectors, e.g. the north sector has equal patterns with the south sector, etc. Therefore the networks of the second module in GEMNET II are trained on different number of samples, while in the MNNS the number of samples was constant. This is because in the MNNS only one neighbour was selected from every sector for every sample while

154

GEMNET II - An Integrated System for Grade Estimation

in GEMNET II the number of samples depends only on the available samples in a sector. This is a fundamental difference between the two implementations of the modular architecture. The networks in the MNNS were not provided with all the available information on the effects of the sample distance as they were trained on only one neighbour sample per sector. In GEMNET II the six RBF networks are trained on all the information available in order to build a more complete model of the distance-grade relationship. However, there is one implication brought by the above change. The final network trained on the outputs of the first module’s networks needs to be trained on complete patterns. As each network is trained on different samples there is no synchronisation in their training process, i.e. these networks are trained sequentially and on different centre samples. This problem is rectified by the use of test files. Together with the six training files produced by the search process, there are six test files that contain patterns formed using the closest neighbour in each sector and only for the centres where all sectors have at least one neighbour sample. This way the trained networks can be synchronised and provide individual estimates for the same centre samples, which can then be used for training the final module network. All the pattern files created need to be converted in a format compatible with the neural network simulator used (SNNS). This is fairly easy to do as SNNS reads ASCII pattern files with a very straightforward structure. An example of a training pattern file generated during a GEMNET II case study is given in Appendix A. The data processing module operation proceeds with the processing of the block model centroids file. Again this file is generated in VULCAN using a block model export option that calculates the centroid co-ordinates and volume of each

155

GEMNET II - An Integrated System for Grade Estimation

block. The centroid co-ordinates are real world co-ordinates and not relative to the origin of the block model. The block model centroids are normalised and passed one at a time to the centre of the same search scheme used for the drillhole samples. This normalisation uses the same parameters used for the normalisation of the drillhole samples to ensure that their relative locations are preserved. The search process is exactly the same only this time the place of the search centre has been taken by block centroids and only one neighbour sample is selected – the nearest – from each sector. The neighbours are again drillhole samples. Each block is flagged depending on the existence of a neighbour sample in each sector. There is one flag for each sector, which is turned to one if there is a neighbour or zero if there is not. These flags are written in a file sequentially and are used during the estimation process to control the usage of the individual networks. This will be discussed later when the estimation process is described. The grade, distance, and length of the neighbour samples from the six sectors are written in an input pattern file for the first module’s networks. The centroids of the blocks are written in another input pattern file for the second module’s network. As it was mentioned, the choice between module one and two during estimation is controlled by the file containing the flags from the search process. After the processing of the block model is completed, the data processing and control module continues with the most important aspect of the operation of GEMNET II: the neural network development. The module makes a number of external calls to the SNNS executables. These calls are arranged in command line batch files. The first set of calls is targeted at BATCHMAN, the SNNS batch language for neural network development. The calls include as arguments the name of the batch program to be

156

GEMNET II - An Integrated System for Grade Estimation

executed as well as a log file name where all the messages from the development process are to be stored. The batch programs are written in the SNNS batch language that is very similar to AWK and C. The batch language provides access to every function of the SNNS kernel: all the neural network architectures and learning algorithms. The batch programs that come with GEMNET II control the development of all the employed neural networks. The beauty of this approach is that by simply changing the batch program, one has complete control over the learning process. As the batch program is just a text file, an external process, such as VULCAN’s graphical user interface can easily alter it. This way the complete control of GEMNET II neural network development is passed to the interface with VULCAN. An example of a batch program from GEMNET II is given in Appendix A. The first batch programs train the networks of module one and two using the training patterns. The log files are written for these networks during this process. After training terminates, the test patterns are presented to these networks to provide synchronised outputs, i.e. individual estimates for the same samples. These outputs are written into ‘results’ files, which are subsequently used for generating the training patterns for the final module network. The last of the batch programs trains the final module network. Once this process terminates, the neural network development is complete. The trained networks at this stage are in the form of SNNS network files – ASCII files containing the network topology and the weights and biases after training. These networks now need to be converted into C functions to be used during the estimation process. The data processing and control module makes the necessary calls to the C code extraction utility provided with SNNS, the SNNS2C. This utility creates

157

GEMNET II - An Integrated System for Grade Estimation

both the header (.h) file and the code (.c) file from the network file. Examples of a trained file as well as the respective header and C code file are given in Appendix A. The module calls the SNNS2C to convert all the trained networks to C code. All that is left then in order to use the networks is to compile them and link the headers with the application. Upon completion of this process, GEMNET II is ready to provide grade estimates in unknown locations. The final operation of the data processing and control module is grade estimation on a block model basis. The program reads the flags file and uses module two whenever the flag value is one or module one whenever it is zero. The input pattern files generated during the block model processing described above provide the input values for the network function calls. The final module network is then called using the outputs of module one and two network functions. The data processing and control module de-normalises the final estimate and the block model centroids and writes them in an estimates file. Together with the centroid co-ordinates and grade estimate, the module also writes the variance of the individual estimates from module one and two networks as well as the flags showing which networks are responsible for the estimate. These extra parameters are used to validate the estimation process and identify any problematic areas or networks. After the estimation process is complete the data processing and control module terminates. The main and most important part of GEMNET II operation is complete. The main contributing parts of this operation are shown in Fig. 7.3. It is becoming more and more clear that during this operation there is only a minimum of human interaction required.

158

GEMNET II - An Integrated System for Grade Estimation

Figure 7.3: Interaction between GEMNET II and other parts of the integrated system

during operation of the data processing and control module.

7.2.2 Module Two – Modeling Grade’s Spatial Distribution The second neural network module in GEMNET II consists of the RBF network, as described in the MNNS architecture, as well as the batch program that controls its learning process. It is presented before the first module for consistency reasons. In contrast with MNNS, the learning process is not part of the main program but it is implemented in the SNNS batch language.

159

GEMNET II - An Integrated System for Grade Estimation

There is no difference in the RBF network topology for this module between the MNNS and GEMNET II. However, there are major differences in the learning process for this network. Most of the case studies ran using GEMNET II involved considerably large datasets – more than a thousand samples. The learning process had to be improved to cope with the abundance of training data. One of the most important changes was in the initialisation of the network. In MNNS, this was simply done by randomly placing the RBF centres in the input space. In GEMNET II this was found to be inadequate due to the large number of samples defining the input space. A more ‘intelligent’ way of locating the centres has been employed: Kohonen learning. Before the network is trained and its weights adjusted, the input patterns are clustered using a process of self-organisation known as Kohonen learning (Chapter 2). This process ensures that the input samples are clustered according to their statistical properties and an RBF centre is allocated to each cluster. The random positioning of the centres is still taking place right before this process to accelerate the initialisation stage of the development. The clustering process is accelerated, as its starting point is a random spread of centres in the input space. Initialisation continues with the weights between hidden and output layer as well as the bias of the hidden units. The initialised network topology is saved in a network file for further examination in the validation stage. Following the initialisation of the network’s input-hidden layer weights (centre positioning), two learning stages take place. As was mentioned before, RBF learning has to concentrate on one free parameter at a time. The learning process becomes unstable if more than one parameters are allowed to change. Therefore, a separate learning process is allocated for the hidden-output layer weights and the bias of the

160

GEMNET II - An Integrated System for Grade Estimation

hidden units. The learning parameters are set to experimental values that were found after a large number of tests. The learning process for these two parameters is identical to the one used in MNNS. Training is stopped again when the change in the network’s output error becomes very small. The trained network topology is saved in a network file. The final operation of the batch program is to pass the test pattern file through the network and write the results in a text file. This file can be used for generating a scatter plot of actual vs. estimated grade for the specific network. This will be shown later when the validation tools provided by GEMNET II are described. During this development process all the messages coming from the simulator are stored in a log file that can be opened with a text editor for examination. Examining the log file as well as the initialised and trained network files can draw useful conclusions about the effectiveness of the training process. The author used these files as a guide for setting the learning parameters and the required number of cycles. The network files provide a very useful piece of information: the location of the RBF centres in the normalised input space (Fig. 7.4). This will prove to be very important for validating the network’s learning and estimation performance.

161

GEMNET II - An Integrated System for Grade Estimation

RBF t

Figure 7.4: RBF centres from second module located in 3D space. Drillholes and

modelled orebody are also shown.

7.2.3 Module One – Modelling Grade’s Spatial Variability The changes in the learning process for the RBF networks of the first module are exactly the same with the second module. The initialisation procedure makes use of Kohonen learning for locating the RBF centres in the input space. From the discussion on the data processing and control module it is clear that the six RBF networks of module two are trained separately and in sequence. The learning procedure is identical. However there is one problem that became clear during testing. Because of the geometry commonly found in most sampling schemes the drillholes are arranged in sections typically perpendicular to the orebody. This can lead to some sectors of the

162

GEMNET II - An Integrated System for Grade Estimation

search scheme being overcrowded while others having a low number of samples. As there is no way of knowing in advance which sectors will be overcrowded and which not the training of the networks can be unbalanced, i.e. some networks have many samples to learn but the same number of training cycles to do it with others who have only a few training samples. The solution to this problem is a number of filters introduced between module two networks and the data processing and control module. These filters allow samples inside a distance range to pass as training patterns to the networks, while they hold samples that are further than a certain range. It should be noted that the criterion is the distance range, i.e. percentage of the maximum distance between samples, and not absolute distance. By adjusting the search range the number of samples can be limited and the networks can be trained on similar number of training samples. A very interesting issue with the first module networks is the visualisation of the RBF centres. In the second module, the input space is the ‘real’ 3D space defined by the drillhole samples’ co-ordinates and therefore visualisation of the RBF centres is straightforward. In the first module networks though, the input space is not the 3D space of real world co-ordinates, but the hyperspace defined by the distance, grade, and length of neighbour samples. In order to visualise the RBF centres, this space is constructed in Envisage using the training input patterns. A new mapping window is constructed by substituting the three co-ordinates (easting, northing, and elevation) with the grade, distance, and length of samples. The training samples and RBF centres can then be visualised in this hyperspace (Fig.7.5).

163

GEMNET II - An Integrated System for Grade Estimation

Figure 7.5: RBF centres of west sector RBF network and respective training samples in the input pattern hyperspace (X-Grade, Y-Distance, Z-Length).

It is somehow difficult to understand the way samples are placed in this hyperspace as well as how the RBF centres are located. However, after careful examination of images like the one in Fig. 7.5, the distribution of samples becomes clearer. A very interesting finding is that samples being chosen as neighbours in a specific direction appear to form lines of constant X-Grade and varying Y-Distance. This of course should have been expected, but pictures like this help to understand even further the characteristics of the input space.

7.2.4 Final Module – Providing a Single Grade Estimate The final module consists of a single RBF network responsible for weighting the individual estimates of the first and second module networks. This network does not model the grade in an input vector space. It simply tries to model the relationship between the responses of the first and second module networks and the actual grade

164

GEMNET II - An Integrated System for Grade Estimation

values. This network is completely ‘unaware’ of sample co-ordinates or neighbour sample grades, distances, and lengths. The only information provided to this network is the required output (actual grade at estimation point) and the estimates of the individual networks. The purpose of this network is to replace the simple averaging that was the way of providing a single estimate from the various networks in the earlier architectures. During testing it was found that the final estimate can be brought even closer to the actual value by weighting the individual network estimates. One could argue about the use of an artificial neural network for this task, and in fact the author had many recommendations by other researchers in the field of AI that did not suggest the use of an ANN or specifically an RBF network. However, the RBF network of the final module proved to be at least good enough for this weighting task and with this project being dominated by the use of ANNs, the author did not look any further. It should be noted though that different ANN architectures were tested. The RBF network of the final module is shown in Fig. 7.6. It is a simple 3D representation of this network and the location of the RBF hidden units has nothing to do with the positioning of the RBF centres before or after training.

165

GEMNET II - An Integrated System for Grade Estimation

Figure 7.6: Final module’s RBF network.

A training process very similar to that of the other neural modules determined the number of RBF centres and their location in the input space. Unfortunately, due to the high dimensionality of this network’s input space (6D) it is not possible to use Envisage or any other graphical environment for the direct visualisation of the RBF centres and training samples in the correct input space. It is only possible to examine the learned model using any three of the six inputs at a time. The training process for this network involves the results of the previous networks on the test samples and not on their training samples. This was necessary to allow complete freedom in the number of samples used for training the first module’s networks. However, the author believes that this could be a source of inefficiency for the complete architecture as this is the final RBF network that controls the final estimate produced. If the test samples are not representative of the dataset then the RBF network of the final module could have difficulties in providing reliable results. This is an aspect of GEMNET II’ operation that needs monitoring. The author suggests

166

GEMNET II - An Integrated System for Grade Estimation

that the distributions of grade estimates from the various first and second module networks are compared with the final module network estimates. The validation of the system’s operation during neural network development as well as during grade estimation has been a consideration of the author since the beginning of GEMNET II development. This fact led to the development of validation tools specific to GEMNET II and implemented using VULCAN’s graphical capabilities. These are the subject of the next section of this chapter.

7.3 Validation 7.3.1 Training and Validation Errors The first and most common way of measuring a neural network’s performance is by calculating its estimation error on the training or validation pattern set. The training error is less important as it reflects the performance of the network on samples that it was trained to perform well. In other words, the training error is not a good measure of a network’s performance. However, the training error can indicate problems in the learning process that can be due to inadequate number of samples or training cycles or both. If a network cannot reach an acceptable error level regardless of the number of training cycles, then the learning algorithm needs to be modified or the number of samples to be increased. One has to monitor the progress of the training error curve cycle after cycle in order to conclude as to the origin of high training errors. A more representative and reliable measure of a network’s performance is the validation error. A good learning algorithm should normally be based on the validation error to guide the weight changes but even if this is not the case, a validation pattern set can help build confidence on the learned mappings. In the case of GEMNET II and samples from drillholes, generating a validation set and using it

167

GEMNET II - An Integrated System for Grade Estimation

for measuring its performance is not an easy task. In geostatistics and other more conventional methods the developed estimation technique is validated using the process of cross-validation. Cross-validation is in essence the regeneration of the samples by hiding one at a time and trying to estimate it using the remaining samples. In the case of the neural networks in GEMNET II, this is what the training process does. In other words, cross-validation is not applicable in the case of GEMNET II because it can give very misleading results. On the other hand, by hiding samples from the training process to use them as a validation set automatically means that GEMNET II has less samples to train the networks and therefore less chances of producing good results on the validation set. This is especially applicable when the system is dealing with a very complex orebody that requires as many samples as possible to describe its grade behaviour in space. With this consideration in mind, the author suggests that a validation set be generated at first to measure the networks’ generalisation performance. If the validation errors are acceptable then the networks should be retrained using the same training process but including the samples of the validation set to ensure that the best possible mappings are generated.

7.3.2 Reliability Indicator The learning process in GEMNET II is a relatively more complex process than in other systems as it involves a very modular neural network structure. It is important to see the final estimate produced as the result of the weighting of individual estimates. Therefore by measuring the variance of these estimates one can conclude as to the reliability of the final estimate. In other words, the higher the agreement between the individual estimates the higher the reliability of the final estimate and vice versa. The variance of the individual estimates will be mirrored by the weight values of the final

168

GEMNET II - An Integrated System for Grade Estimation

network. A combination of very high and very low weight values in the final network express the difficulty of the final network in getting close to the actual grades. The variance of the first and second module networks’ estimates has been used as the basis of a reliability measure or reliability indicator. This is calculated during the estimation process. In VULCAN, the user has to add an extra variable to the block model to be used by GEMNET II for storing the reliability indicator for each block estimated. After the estimation process, the block model can be visualised in 3D or in sections with a colour scheme based on the reliability indicator (Fig. 7.7). This way it is possible to identify areas where GEMNET II has difficulties to provide an estimate. The reliability indicator though cannot lead by itself to the origin of the problem or even quantify it. It is strictly an indicator, i.e. a guide that can help identify problems.

Figure 7.7: Block model coloured by the reliability indicator in GEMNET

II.

169

GEMNET II - An Integrated System for Grade Estimation

7.3.3 Module Index Another useful source of information is the flags stored in the flags file during the processing of the block model centroids file by the data processing and control module. This file consists of records with six flags each, one for every sector. The flag values are one for sectors with neighbour samples and zero for empty sectors. These values are used during estimation for choosing between the first (sector flag = 1) or the second module networks (sector flag = 0). These flags are stored in the block model. Specific variables have to be set in the model to contain the flag values. The block model can then be visualised in Envisage using a colour scheme that depends on the flag values or module index (Fig. 7.8). By combining the module index and the reliability indicator, it is easy to identify the networks that can present problems during estimation.

Figure 7.8: Block model coloured by module index in GEMNET II. Cyan blocks represent

first module estimates while red blocks represent second module estimates.

170

GEMNET II - An Integrated System for Grade Estimation

7.3.4 RBF Centres Visualisation The RBF centres location in the input vector space is absolutely crucial to the performance of an RBF network. The RBF centres visualisation tool has been developed specifically for GEMNET II in Envisage and allows the displaying of both the centres and the training samples of any RBF network from the modular architecture (Fig. 7.9). This option loads the RBF centres using a special symbol on the screen and also the training samples as crosses. The correct input space is used, i.e. the 3D real world co-ordinates space for the second module and the neighbour sample grade, distance, and length input space for the first module.

Figure 7.9: First module RBF centres visualisation in GEMNET II. Drillholes and orebody

model are also shown.

171

GEMNET II - An Integrated System for Grade Estimation

Clearly this is an option for the users who will know the basics of the system’s operation; otherwise it will not be very useful. By looking at the positions of the RBF centres, one can decide whether the network initialisation procedure is efficient and whether the learned mapping is reliable. A well spread distribution of centres in the input space with a high density of centres in areas where grade seems to present a complex behaviour suggest that the network has been properly developed. High density of centres in areas with very few or even no samples means that the initialisation and training process needs to be modified. Usually an increase of the number of initialisation or training cycles is required, or an increase of the learning parameters.

7.4 Integration 7.4.1 Neural Network Simulator Development of neural networks in GEMNET II is based on the Stuttgart Neural Network Simulator (SNNS) developed at the Institute for Parallel and Distributed High Performance Systems (IPVR) at the University of Stuttgart, Germany. SNNS was originally developed for the UNIX operating system but was recently ported to the Microsoft Windows 95/NT environment. It is still based on X Windows and requires an X Server in Windows 95/NT for the graphical user interface. Figure 7.10 shows a schematic diagram of its main components.

172

GEMNET II - An Integrated System for Grade Estimation

Figure 7.10: Diagram of the main components of SNNS.

The four main components of SNNS are the simulator kernel, graphical user interface, batch execution language (BATCHMAN), and network C code extraction tool (SNNS2C). The graphical user interface is not used in GEMNET II as this is provided by Envisage in VULCAN. The other three parts - mainly BATCHMAN and SNNS2C - are extensively used. The simulator kernel includes a number of functions for:



Network manipulation



Network structure definition



Cell (processing element) definition and manipulation



Learning



Pattern manipulation



Pattern propagation



Network and pattern file handling



Error calculations



Memory management

173

GEMNET II - An Integrated System for Grade Estimation

The batch execution language in SNNS, BATCHMAN, has been modelled after languages such as AWK, Pascal, Modula2 and C. BATCHMAN provides a command line or scripting interface to the simulator kernel. It is possible to send commands directly in interactive mode using the interpreter or execute complete batch scripts by calling BATCHMAN with the batch script file name as an argument. The structure of the batch scripts or programs is not predetermined. There are a number of system variables available for monitoring the development of the networks. These can be used during training to create more advanced training algorithms. The available system variables are:

Table 7.1: System variables available in BATCHMAN.

SSE

Sum of squared differences of each output neuron MSE

SSE divided by the number of training patterns

SSEPU

SSE divided by the number of output neurons

CYCLES PAT EXIT_CODE SIGNAL

Number of cycles passed Number of patterns in the current pattern set Exit status of an external call Integer value of a caught signal during execution

There is a total of eight batch programs in GEMNET II for the development of the eight RBF networks. These programs are very similar to each other and generally follow the same steps: 1. Load untrained network file and training and testing pattern files 2. Initialise the network using Kohonen learning

174

GEMNET II - An Integrated System for Grade Estimation

3. Write the initialised network to a file 4. Train the network’s hidden-output layer weights 5. Train the network’s hidden units’ bias 6. Write the trained network to a file 7. Test the network using the test pattern file and write the results to a file BATCHMAN is called from the data processing and control module using the scripts and a name for the training log file. BATCHMAN runs the scripts and writes all the messages during the steps described above to the log file. After all eight scripts have been executed, control is passed back to the data processing and control module. The user can open the log files with a text editor to get more information about any possible problems as well as the training and validation errors. From the execution of each script the following files are created:

ini.net:

initialised topology

(e.g. eastini.net)

tr.net:

trained topology

(e.g. northtr.net)

.log:

training log file

(e.g. east.log)

.res:

results of testing

(e.g. east.res)

The other SNNS tool used in GEMNET II is the network compiler SNNS2C. This tool compiles a network file into an executable C source code. There are limitations as to the network types and other SNNS features supported by SNNS2C, but fortunately none of them causes any problems to the GEMNET II modular network architecture. SNNS2C supports all the necessary features for GEMNET II. The input to SNNS2C is the trained network file as created from BATCHMAN after executing the batch scripts of GEMNET II. SNNS2C generates ANSI-C source code and header files. The generated code is compiled separately. The header files are linked to the data processing and control module. This way the produced network C functions are linked to GEMNET II and can be called during grade estimation. During network compilation SNNS2C goes through the following steps:

175

GEMNET II - An Integrated System for Grade Estimation

1. Network loading: the network file is loaded with the function from the simulator kernel. 2. Dividing network into layers: individual units are grouped into layers with the same type and activation function. 3. Layers sorting: the layers are sorted in topological order. 4. Network writing: The generated network structure, activation functions and pattern propagation is written to the C source file.

Altogether, SNNS proved to be very useful for the development of neural networks in GEMNET II. The flexibility provided by the batch execution language and the very large library of network types, activation functions, and learning algorithms provided by the simulator kernel allowed quick and easy testing of different learning strategies and network architectures. It would be very time consuming, if not impossible, to do the same development and testing without the simulator, using hard-coded neural networks and learning algorithms.

7.4.2 Interface with VULCAN – 3D Visualization Grade estimation is part of a much larger process that involves other tasks such as geological modelling and reserves estimation. In order to exploit the full potential of GEMNET II, it has to be integrated in this larger process of mineral deposit evaluation [39]. This was achieved using VULCAN, one of the leading earth resources modelling packages available for the mining industry. VULCAN is a modular package, i.e. it consists of a core module (VULCAN Modeller) and a number of specialised modules like the MineModellers, GeoModellers, SurveyModeller, and Chronos (scheduler) (Fig. 7.11). VULCAN can

176

GEMNET II - An Integrated System for Grade Estimation

be customised to include the functionality required by specific projects and for that reason this system has all the necessary features that allow third party software to be interfaced to it. VULCAN’s user interface, Envisage, is an advanced 3D modelling environment that provides advanced 3D CAD and visualisation as well as triangulation modelling, grid mesh modelling, and contouring [57]. VULCAN’s

GeoModellers

provide

functions

for

drilling,

borehole

visualisation, channel sampling, geological modelling, geostatistics, block and grid modelling, stratigraphic modelling, and other tasks. For geostatistics, the GeostatModeller can be interfaced to the GSLIB, Geostokos, and ISATIS geostatistical packages. Block models can be visualised in 3D and manipulated in many different ways. GEMNET II relies on the importing and exporting functions available for block models in VULCAN as well as the drillhole compositing functions. Envisage provides customised user menus, i.e. users can create their own menus that look and act exactly like the rest of the GUI and can provide the functions that the user wants. These functions can be directly linked to a Perl script (VULCAN’s supported scripting language), which means that users can add functionality to the system. GEMNET II is interfaced to VULCAN by a number of scripts written in Perl and utilising the extensions for VULCAN, called Lava.

177

GEMNET II - An Integrated System for Grade Estimation

Figure 7.11: Modules and extensions of VULCAN.

The structure of the user interface is shown in Fig. 7.12. The menu for GEMNET II includes options for setting the estimation parameters, network topologies and learning, and validation.

178

GEMNET II - An Integrated System for Grade Estimation

Figure 7.12: Menu structure of GEMNET II in Envisage.

There is a main menu and two sub-menus for the setup and validation. All options lead to panels that accept user input from the keyboard. These panels (Fig. 7.13) access the options available with GEMNET II and allow the user to do the following things: 1. Select samples file and block model 2. Modify the learning method and network topologies 3. Run GEMNET II with the saved specifications 4. Display the block model using the reliability indicator or the module index 5. Display the input samples and RBF centres in the correct input space

GEMNET II also requires functions already built into Envisage. These include:

179

GEMNET II - An Integrated System for Grade Estimation

1. Drillhole compositing 2. Block model ASCII import/export functions 3. Block model display functions

Figure 7.13: GEMNET

II panels in Envisage.

After the user selects the input and output files for the estimation process, GEMNET II can start the network development. The data processing and control module is called using the Run option from the main menu. A console window is opened and GEMNET II begins with the processing of the samples and the generation of the training pattern files (Fig. 7.14).

180

GEMNET II - An Integrated System for Grade Estimation

Figure 7.14: Console window with messages from GEMNET II operation.

The data processing and control module continues its operation in the background while the user can carry on using Envisage. Once the network development is complete and the networks are compiled, grade estimation takes place. The results are written to a file selected by the user. This file can then be imported to the block model. The user can then validate the estimation process using the tools described and compare the results with other studies using geostatistics within the Envisage environment. VULCAN’s online help is based on a web browser and HTML files for each and every option. A number of pages were added to provide help for GEMNET II. The help is context based, i.e. it depends on the function that the user is trying to access (Fig. 7.15).

181

GEMNET II - An Integrated System for Grade Estimation

Figure 7.15: GEMNET II online help.

The system operates in a very similar manner to other functions in Envisage, which means that users can get familiarised with GEMNET II in a very short period of time.

7.5 Conclusions In this chapter an in-depth discussion was given on GEMNET II, the integrated system for grade estimation based on artificial neural networks. The benefits of the approach were explained and in particular the advantages of the integration with the neural network simulator, SNNS, and the resources modelling package, VULCAN. Even though GEMNET II is based on the basic MNNS architecture described in the previous chapter, there are many improvements that help GEMNET II be a much more usable system.

182

GEMNET II - An Integrated System for Grade Estimation

The system has many advanced features that can establish it as a commercial product. It provides validation tools that can help build confidence to the estimates while it removes most of the problems found in other grade estimation techniques. GEMNET II makes very few assumptions about the grade distribution. Its operation does not depend on the user’s knowledge of geology, geostatistics, or even neural networks. It should be noted though that knowledge of neural networks could improve sometimes the results but not significantly. Generally, the system adjusts to the data presented to it to achieve the best possible estimation. Even though it is based on artificial neural networks, GEMNET II is not a ‘black box’ approach. The technique is fairly understandable as it is based on established principles of grade spatial behaviour. The validation tools provided with GEMNET II and the exhaustive monitoring of the network development also help the user to understand how it works and why. In the next chapter the validity of the approach will be proved through a number of case studies using real 3D data from different deposits around the world.

183

GEMNET II Application – Case Studies

8. GEMNET II Application – Case Studies 8.1 Overview The case studies presented in this chapter are the final tests of the GEMNET II architecture. Their purpose was to demonstrate the full potential of the approach and provide a complete comparison with other estimation techniques. They are presented in order of increasing complexity and difficulty. The number of available samples increases as well as the structural complexity of the deposits. The data used in these case studies come from real deposits. In some of them the 3D co-ordinates of the samples have been changed without affecting their relative locations for confidentiality purposes. The number of case studies was limited to four as in the previous chapter. The selected case studies are the most representative of GEMNET II performance while they are quite different between them. These studies are also ideal for geostatistics and in fact have been used for demonstrating grade/reserves estimation using computer software. However, no results have ever been published using this data other than the papers written by the author during this project. The deposits in the four case studies that follow present a complex 3D structure. They all come with a complex geological model, which is used for constraining the estimation process. The geological model in some cases becomes more complicated by the presence of faults and other discontinuities. This factor makes grade estimation an even more challenging task. In all of the case studies, a complete geostatistical study has been performed, the results of which are presented in this chapter together with the study of GEMNET II application. Unfortunately, the author was not able to get authorization for publishing results from case studies other than copper/gold deposits. There seems to

185

GEMNET II Application – Case Studies

be an abundance of real copper/gold data available from fully exploited or undeveloped deposits. The same does not apply for other metals and minerals. The four copper/gold deposits used for testing the estimation performance of GEMNET II have very little in common. Except from the type and possibly the way they have been formed, these deposits present a very different 3D picture and a very different estimation task. Their size and geometry varies significantly as does the grade distribution suggested by the available samples. The available samples for each of the four deposits vary in number considerably. The drilling geometry is also different as is the assaying procedure. These differences ensured that GEMNET II would be tested on very different conditions and data and that the results would reflect its performance over a wide range of problems. Table 8.1 gives the main characteristics of the four deposits used in this chapter. The data from them are given in the accompanying CD-ROM.

Table 8.1: Main characteristics of the four deposits used for testing the final GEMNET II architecture. Name

MAC_DEMO

THOR

SME

GEOST_GOLD

Number of Samples

1361

3612

10,656

30,211

Estimated Grades

Au, Cu

Au

Cu

Au

Number of Orebodies

1

4

5

1

As can be seen from the table, the deposits are given code names. These names are used as a replacement of their original name and location for confidentiality purposes. The same computer system has been used for all four case studies. It was a Pentium II 300MHz with 128Mb RAM and 1Gb of virtual memory space running under Microsoft Windows NT 4.0. The time required to complete each case study has

186

GEMNET II Application – Case Studies

been affected by the specifications of the system and therefore comparison with other similar studies should not be made unless these specifications are the same. The geostatistical studies were also performed using the same computer. GEMNET II was running from VULCAN/Envisage version 3.3. Geostatistics were running also from the same environment using GSLIB. Therefore the same computational overhead from VULCAN has been present while the various approaches were tested. The measures of performance for the three approaches compared were the mean absolute error, the data fit diagram (scatter plot), and the estimated vs. actual grade distribution diagram. These performance measures were based on samples taken out of each dataset that were not provided as input information for any of the three techniques. In other words, these were unknown samples for the estimators but not for the performance measures. This was considered by the author as a more objective way of comparing the various techniques, as the actual values of those samples were known as opposed to block estimates of unknown actual grade. For GEMNET II, the reliability indicator values are also shown in slices through the estimated block model. It should be noted again that the reliability indicator is only a guide to the quality of the produced estimates from GEMNET II and not a precise performance measure. For the case studies were the deposit consists of more than one orebodies, the samples were split into groups, one for each orebody. The same was applied for the block model. In each run only data from inside an orebody are used and only blocks inside the same orebody are being estimated. As a result, two of the case studies (THOR and SME) are much more complicated and took a lot more time to complete.

187

GEMNET II Application – Case Studies

Finally, the basis of comparison for the various approaches was the results on the test set for GEMNET II described in Chapter 7, and cross-validation results for inverse distance and kriging on the same test set. Cross-validation was performed using again GSLIB inside VULCAN.

8.2 Case Study 1 – Copper/Gold Deposit 1 The dataset from the first copper/gold deposit consists of 44 drillholes containing 1361 samples in total. From these only 227 samples are within the geological model of the orebody as can be seen in Fig. 8.1 below. These samples are used for the estimation process.

Figure 8.1: Orebody and drillholes from copper/gold deposit 1.

188

GEMNET II Application – Case Studies

The number of samples is quite small and the 3D model of the orebody fairly simple making this case study a relatively easy task. The following table gives the statistics for the data used in this case study:

Table 8.2: Statistics of data from copper/gold deposit 1. Standard Coefficient of Average Grade Deviation Variance

Number of Samples

Number of Estimated Blocks

Au

227

2.34g/t

3.1758

1.0339

5,698

Cu

227

4.01%

4.7184

0.9424

5,698

The co-ordinates of the samples have been transformed from their original values for confidentiality purposes. The relative positions of the samples have not been changed. The data processing and control module of GEMNET II generated 1109, 2156, and 1905 patterns in the west-east, north-south, and upper-lower sectors respectively.

Copper Grade Values Data Fit 12

10

Estimated

8

ID2 Kriging

6

GEMNet II 4

2

0 0

2

4

6

8

10

12

Actual

Figure 8.2: Scatter diagram of actual vs. estimated copper grades from copper/gold deposit 1.

189

GEMNET II Application – Case Studies

First the three methods were tested using the copper grade data. The mean absolute errors produced were 18.9% for GEMNET II, 20.06% for inverse distance squared, and 19.68% for spherical kriging. Clearly, there was not much difference in this case between the different approaches. The data fit diagram of Fig. 8.2 shows exactly how close they were. Copper Grade Distribution 20

18

16

14

Frequency

12 Actual ID2

10

Kriging GEMNet II

8

6

4

2

0 2

4

6

8

10

12

14

More

Bin

Figure 8.3: Copper grade distributions from copper/gold deposit 1.

Unlike the absolute errors which suggest that GEMNET II is doing slightly better than kriging and inverse distance, the estimated distributions shown in Fig. 8.2 show that inverse distance is following better the actual distribution of copper grades, with GEMNET II and kriging presenting very similar distributions. GEMNET II tends to underestimate high-grade samples but the overall estimation is not biased. On the other hand, inverse distance seems to overestimate low-grade samples. The time requirements for the application of the three methods were quite different, even though geostatistics were fairly straightforward in this case. GEMNET II required 50 minutes to process the samples and block model centroids, develop the 190

GEMNET II Application – Case Studies

networks and perform grade estimation. The geostatistical study required about 3 hours to complete. The time spent for grade estimation using inverse distance and kriging, once the geostatistical study was complete, was about 15 minutes. Even though there is a difference, this study is not ideal for demonstrating the benefits from the speed of GEMNET II application. The difference in time requirements between geostatistics and GEMNET II will be demonstrated in the following case studies which present a much more complicated structural picture. In the second part of the study, the techniques were tested using gold grades from the same samples. The time requirements were identical to the first part. The errors produced were quite similar as well: 18.78% for GEMNET II, 22.47% for inverse distance squared, and 20.47% for spherical kriging. Figure 8.4 shows the data fit diagram of the estimates and Figure 8.5 the estimated and actual gold grade distributions. Gold Grade Values Data Fit 7

6

Estimated

5

4

ID2 Kriging GEMNet II

3

2

1

0 0

1

2

3

4

5

6

7

Actual

Figure 8.4: Scatter diagram of actual vs. estimated gold grades from copper/gold deposit 1.

191

GEMNET II Application – Case Studies

Gold Grade Distribution 20 18 16

Frequency

14

Actual

12

ID2

10

Kriging 8

GEMNet II

6 4 2 0 0.8

1.2

1.6

2

2.4

3

4

5

More

Bin

Figure 8.5: Gold grade distributions from copper/gold deposit 1.

Quite clearly, GEMNET II tends to underestimate high-grade samples once again, even though this time it seems to be doing a bit better than in the case of copper grades. Generally the behaviour of the three estimators is very similar for both the estimated grades, copper and gold. The following table shows the estimated average grades for gold and copper.

Table 8.3: Actual and estimated average copper and gold grades from copper/gold deposit 1.

Actual

ID2

Kriging

GEMNet II

Au

2.34

2.54

2.26

1.96

Cu

4.01

3.69

3.72

3.41

The estimation performance of GEMNET II can be monitored through the validation tools developed within VULCAN. These are mainly the reliability indicator and module index. The RBF centers visualization tool also provides some insight to the process of neural network development for grade estimation in GEMNET II. The

192

GEMNET II Application – Case Studies

following figures illustrate sections through the block model of the copper/gold deposit in this case study coloured according to the reliability indicator (Fig. 8.6), and the module index (Fig. 8.7). Figure 8.8 also illustrates the positions of RBF centers from various networks in their respective input pattern space.

Figure 8.6: Plan section (top) and cross section (bottom) of block model coloured by reliability indicator values for the gold grade estimation of copper/gold deposit 1.

193

GEMNET II Application – Case Studies

Figure 8.7: Plan section (top) and cross section (bottom) of block model coloured by module index for gold and copper grade estimation of copper/gold deposit 1.

From sections like those in Fig. 8.6 one can identify areas where the estimation process with GEMNET II is problematic. These areas are usually close to the edges of the modeled orebody or around faults and other discontinuities. In this case the low reliability area is indicated at the middle part of the orebody. This was expected before the estimation process due to a dyke that is intersecting the orebody and exactly the same location. The sections of Fig. 8.7 show which module is responsible for providing the estimate and can help optimize the estimation process in conjunction with the reliability indicator sections.

194

GEMNET II Application – Case Studies

Figure 8.8: RBF centers locations and training patterns from module 1 networks, north (top) and east (bottom).

195

GEMNET II Application – Case Studies

The visualization of RBF centers from various networks can help to understand how the system performs grade estimation and in particular how it clusters the training patterns. A good spread of the centers in the input space, as in this case study, means that the neural network development is responding properly to the data at hand.

Figure 8.9: Plan section (top) and cross section (bottom) of block model coloured by gold grade estimates for the copper/gold deposit 1.

196

GEMNET II Application – Case Studies

The results from grade estimation are shown in the Fig. 8.9 as sections through the estimated block model. It should be noted that the real grade values for the blocks are unknown and therefore it is not possible to compare the estimated values with the actual. It is also of little use to compare the block estimates from the three approaches.

8.3 Case Study 2 – Copper/Gold Deposit 2 The dataset of this case study is a superset of that used in the third case study described in chapter 6. It is a public domain set from a large undeveloped copper/gold deposit. It consists of four orebodies as shown in Fig. 8.10.These orebodies occur in the form of chains of lenses (fractions of the deposit) developed along shear fractures in metasomatised host rocks, which include gneissic granites, mica schists and metasomatites. The set contains 77 drillholes providing a total of 3600 observations on lithology, bleaching, structure and assays. Figure 8.10 shows the drillholes together with the lenses in the area. The networks in GEMNET II are trained and tested on each lens individually, i.e. only samples inside the volume of a single lens are used to train and test the networks each time.

Figure 8.10: Orebodies and drillholes from copper/gold deposit 2.

197

GEMNET II Application – Case Studies

The data processing and control module searched the dataset for each orebody and formed training patterns, the number of which varies from one orebody to the next. The results of the training pattern generation as well as other information about the data used are shown in Table 8.4 below:

Table 8.4: Samples and block model file information and training pattern generation results for copper/gold deposit 2.

Orebody

TQ1

TQ1A

TQ2

TQ3

Samples Included

689

382

133

484

Blocks Included

8,003

8,188

2,040

16,596

Patterns – WE

38,023

3,086

649

7,842

Patterns – NS

16,117

2,829

283

99

Patterns - UL

9,514

7,182

897

12,342

It becomes clear from the table above that the higher the number of the available drillhole samples the higher the number of training patterns produced. This however depends on the sampling geometry and can vary between sectors. In the case of the orebody TQ1 for example, and specifically for the West-East sector, the number of generated training patterns is fairly high (38,023). This inevitably leads to longer training time requirements for the specific networks. In fact in some cases the time requirements are so high that it is practically impossible to train the networks with all of the available patterns. This is the reason why the data are filtered by a distance criterion, i.e. percentage of maximum distance between samples. This criterion, as introduced in the previous chapter, has nothing in common with the structural analysis and the range values in variography. The maximum distance ranges in GEMNET II

198

GEMNET II Application – Case Studies

are set to limit the number of training pattern per network depending on the hardware limitations only. GEMNET II was applied to each orebody individually. The required development and application time varied between orebodies as did the produced mean absolute errors on the test set. The following table shows statistical information on the four orebodies as well as the estimation performance results from the three estimators.

Table 8.5: Statistics from copper/gold deposit 2 and estimation performance results. Orebody

TQ1

TQ1A

TQ2

TQ3

Coefficient of Variance

1.0612

1.0612

1.0615

1.0611

Actual Avg. Grade

0.9109 g/t

0.9272 g/t

0.7339 g/t

1.1354 g/t

ID2 Avg. Grade

0.8571 g/t

0.8610 g/t

0.6843 g/t

1.0719 g/t

Kriging Avg. Grade

0.8577 g/t

0.8683 g/t

0.6794 g/t

1.0587 g/t

GEMNET II Avg. Grade

0.8374 g/t

0.8273 g/t

0.6271 g/t

1.0245 g/t

ID2 ABS %

22.40 %

20.68 %

31.69 %

19.85 %

Kriging ABS %

18.61 %

16.92 %

25.30 %

17.83 %

GEMNET II ABS %

15.64 %

16.73 %

25.51 %

14.92 %

Grade estimation by GEMNET II lasted over one hour for the first orebody and similar times for the other three. The time spent on the geostatistical study was a bit more difficult to calculate as the author spent days to complete the variography and perform kriging and inverse distance. The geostatistical study was carried out once for the entire deposit. The results of estimation from the three methods are shown in the following figures (Fig. 8.11 to 8.18).

199

GEMNET II Application – Case Studies

Gold Grades Data Fit 2 1.8 1.6

Estimated

1.4 1.2

ID2 Kriging

1

GEMNet II

0.8 0.6 0.4 0.2 0 0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

Actual

Figure 8.11: Scatter diagram of actual vs. estimated gold grades from zone TQ1 of copper/gold deposit 2.

Gold Grade Distribution 90

80

70

Frequency

60

Actual 50

ID2 Kriging

40

GEMNet II 30

20

10

0 0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

More

Bin

Figure 8.12: Gold grade distributions from zone TQ1 of copper/gold deposit 2.

200

GEMNET II Application – Case Studies

All three methods perform well. Inverse distance is producing a very smooth distribution of grades, while kriging and GEMNET II try to follow the peaks a bit better. GEMNET II also tends to underestimate high-grade samples, something that has been quite consistent through the various studies. Gold Grades Data Fit 2.5

Estimated

2

1.5

ID2 Kriging GEMNet II

1

0.5

0 0

0.5

1

1.5

2

2.5

Actual

Figure 8.13: Scatter diagram of actual vs. estimated gold grades from zone TQ1A of copper/gold deposit 2. Gold Grade Distribution 45 40 35

Frequency

30

Actual 25

ID2 Kriging

20

GEMNet II 15 10 5 0 0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

More

Bin

Figure 8.14: Gold grade distributions from zone TQ1A of copper/gold deposit 2.

201

GEMNET II Application – Case Studies

Gold Grades Data Fit 2 1.8 1.6

Estimated

1.4 1.2

ID2 Kriging

1

GEMNet II

0.8 0.6 0.4 0.2 0 0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

Actual

Figure 8.15: Scatter diagram of actual vs. estimated gold grades from zone TQ2 of copper/gold deposit 2. Gold Grade Distributions 14

12

Frequency

10

Actual 8

ID2 Kriging

6

GEMNet II 4

2

0 0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

More

Bin

Figure 8.16: Gold grade distributions from zone TQ2 of copper/gold deposit 2.

In the TQ2 zone, GEMNET II presents severe underestimation of high-grade samples and overestimation of average grade samples. Inverse distance fails to follow the actual distribution while kriging seems to be performing better overall. This zone is

202

GEMNET II Application – Case Studies

quite different from the other three in that it has very few samples and low average grade.

Gold Grade Estimates Fit 2.5

Estimated

2

1.5

ID2 Kriging GEMNet II

1

0.5

0 0

0.5

1

1.5

2

2.5

Actual

Figure 8.17: Scatter diagram of actual vs. estimated gold grades from zone TQ3 of copper/gold deposit 2.

Gold Grade Distribution 60

50

Frequency

40

Actual ID2

30

Kriging GEMNet II 20

10

0 0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

More

Bin

Figure 8.18: Gold grade distributions from zone TQ3 of copper/gold deposit 2.

203

GEMNET II Application – Case Studies

It is quite notable from the diagrams presented that all four zones seem to have a very difficult distribution of gold grades. The distribution graphs are all split in two areas with a very low population around 1.2 g/t. This could be due to the geological modelling as the samples are selected within the modelled orebodies. If these extend between two or more actual geological zones then each of the four datasets could include samples from two or more different populations. As an effect of that, the expected good performance from all three approaches is not realised and the produced absolute errors are quite high. The estimation process with GEMNET II was validated using the same tools as in the previous case study. The following figures show slices through the block model coloured according to the reliability indicator, module index, and estimated grades. Screenshots from the RBF centre location tool are also shown. It should be noted that the block model shown includes all four zones, which can be identified by the subblocking.

204

GEMNET II Application – Case Studies

Figure 8.19: Plan section (top) and cross section (bottom) of block model coloured by reliability indicator values for the gold grade estimation of copper/gold deposit 2.

205

GEMNET II Application – Case Studies

Figure 8.20: Plan section (top) and cross section (bottom) of block model coloured by module index for gold and copper grade estimation of copper/gold deposit 2.

206

GEMNET II Application – Case Studies

Figure 8.21: RBF centers locations and training patterns from module 1 network north (top) and module 2 network (bottom) in copper/gold deposit 2.

207

GEMNET II Application – Case Studies

Figure 8.22: Plan section (top) and cross section (bottom) of block model coloured by gold grade estimates for the copper/gold deposit 2.

208

GEMNET II Application – Case Studies

The results from grade estimation are shown in Fig. 8.22 as sections through the estimated block model. Once again the real grade values for the blocks are unknown and therefore it is not possible to compare the estimated values with the actual. The time requirements for the three approaches were significantly different in this case study. The complete geostatistical study including all four zones lasted over a week. In this time, the author spent time driving the software and examining the results. On the other hand, GEMNET II required a total of about eight hours to develop the networks and complete the grade estimation. Quite clearly, the advantage of GEMNET II in time requirements is significant and more importantly the results from GEMNET II did not depend on the author’s knowledge of the given dataset. This case study has also shown the importance of geological modelling in the process of grade estimation. If the samples selected as input information for the estimation process are not part of the same geological domain, none of the techniques will be able to perform well. GEMNET II is not meant to replace the very important stage of geological modelling.

8.4 Case Study 3 – Copper/Gold Deposit 3 This case study is very similar to the previous one in that the deposit consists of several (five) orebodies. These orebodies come in the form of almost parallel veins. The models of the orebodies have been constructed in VULCAN as part of a geological study. The five orebodies and the associated drillholes are shown in Fig. 8.23.

209

GEMNET II Application – Case Studies

Figure 8.23: Plan and side views of copper/gold deposit 3 orebodies. Drillholes and extents of block model are also shown.

The pattern generation process for the development of neural networks had to be adjusted for the elongated shape of the orebodies and the drilling scheme. Specifically, the patterns for module two networks in the east-west direction had to be limited to those within 10% of the maximum distance between samples. This was necessary, as the total number of possible patterns was too high for the hardwaresoftware combination (more than 100,000 patterns). The following table gives information about the pattern generation process for the five zones.

210

GEMNET II Application – Case Studies

Table 8.6: Samples and block model file information and training pattern generation results for copper/gold deposit 3. Zone

TQ1

TQ1A

TQ3

TQ4

TQ7

Samples

1,912

829

1,144

534

330

Blocks

4,280

2,425

4,291

344

2,254

Patterns WE

6,018

1,013

100

94

744

Patterns NS

9,244

1,573

239

335

348

Patterns UL

4,945

1,865

4,908

1,288

585

All three methods were tested on the copper grade data from the five zones. It was not possible to test their performance on gold grades due to problems with the specific drillhole database. In the graphs following below, the data fit and estimated distributions are shown as before. The output of the validation tools for GEMNET II is given at the end of the case study and for the entire block model. The results from the five zones are given in the following table. Again, the geostatistical study for the entire deposit took at least a week to complete, while GEMNET II required about 12 hours to complete the estimation of copper grades. Table 8.7: Statistics from copper/gold deposit 3 and estimation performance results. Orebody

TQ1

TQ1A

TQ3

TQ4

TQ7

Actual Avg. Cu Grade

1.0187

1.0309

1.1006

0.7798

0.6468

ID2 Avg. Cu Grade

0.9963

1.0327

1.0451

0.7580

0.6083

Kriging Avg. Cu Grade

1.0096

0.9623

1.0540

0.7487

0.5744

G. II Avg. Cu Grade

1.0196

0.9498

1.0072

0.7654

0.6594

ID2 ABS % (Cu)

17.96 %

16.12 %

17.70 %

23.64 %

16.23 %

Kriging ABS % (Cu)

14.73 %

15.00 %

14.77 %

14.74 %

14.61 %

GEMNET II ABS % (Cu)

16.32 %

17.16 %

14.67 %

12.80 %

12.39 %

211

GEMNET II Application – Case Studies

From the above table it is clear that the three techniques are performing better than in the previous case study, even though there are certain similarities between the two specially in the geological modelling and drilling scheme. The improvement in performance can be associated with a much better geological model, which means better separation of the sample groups between the five zones. The following graphs and slices through the deposit’s block model are grouped by zone starting with zone TQ1. As before, the data fit and distribution graphs are given first and following are the block model slices showing the reliability indicator, module index, and estimated grade values. Copper Grades Data Fit - TQ1 4

3.5

3

Estimated

2.5

ID2 Kriging

2

GEMNet II 1.5

1

0.5

0 0

0.5

1

1.5

2

2.5

3

3.5

4

Actual

Figure 8.24: Scatter diagram of actual vs. estimated copper grades from zone TQ1 of copper/gold deposit 3.

212

GEMNET II Application – Case Studies

Copper Grade Distribution - TQ1 80

70

Frequency

60

50

Actual ID2

40

Kriging GEMNet II

30

20

10

0 0.5

1

1.5

2

2.5

3

3.5

4

More

Bin

Figure 8.25: Copper grade distributions from zone TQ1 of copper/gold deposit 3.

Copper Grades Data Fit - TQ1A 3.5

3

Estimated

2.5

ID2

2

Kriging GEMNet II

1.5

1

0.5

0 0

0.5

1

1.5

2

2.5

3

3.5

Actual

Figure 8.26: Scatter diagram of actual vs. estimated copper grades from zone TQ1A of copper/gold deposit 3.

213

GEMNET II Application – Case Studies

Copper Grade Distribution 40

35

Frequency

30

25

Actual ID2

20

Kriging GEMNet II

15

10

5

0 0.5

1

1.5

2

2.5

3

3.5

More

Bin

Figure 8.27: Copper grade distributions from zone TQ1A of copper/gold deposit 3.

Copper Grades Data Fit 2.5

Estimated

2 1.5

ID2 Kriging GEMNet II

1 0.5 0 0

0.5

1

1.5

2

2.5

Actual

Figure 8.28: Scatter diagram of actual vs. estimated copper grades from zone TQ3 of copper/gold deposit 3.

214

GEMNET II Application – Case Studies

Copper Grade Distribution 45

40

35

Frequency

30

Actual 25

ID2 Kriging

20

GEMNet II 15

10

5

0 0.5

1

1.5

2

2.5

3

3.5

More

Bin

Figure 8.29: Copper grade distributions from zone TQ3 of copper/gold deposit 3.

Copper Grades Data Fit 1.6

1.4

1.2

Estimated

1

ID2 Kriging

0.8

GEMNet II 0.6

0.4

0.2

0 0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

Actual

Figure 8.30: Scatter diagram of actual vs. estimated copper grades from zone TQ4 of copper/gold deposit 3.

215

GEMNET II Application – Case Studies

Copper Grade Distribution 40

35

30

Frequency

25

Actual ID2

20

Kriging GEMNet II

15

10

5

0 0.5

0.7

0.9

1.1

1.3

1.5

More

Bin

Figure 8.31: Copper grade distributions from zone TQ4 of copper/gold deposit 3. Copper Grades Data Fit 1.6

1.4

1.2

Estimated

1

ID2 Kriging

0.8

GEMNet II 0.6

0.4

0.2

0 0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

Actual

Figure 8.32: Scatter diagram of actual vs. estimated copper grades from zone TQ7 of copper/gold deposit 3.

216

GEMNET II Application – Case Studies

Copper Grade Distribution 7

6

Frequency

5

4

Actual ID2 Kriging

3

GEMNet II

2

1

0 0.2

0.4

0.6

0.8

1

1.2

1.4

More

Bin

Figure 8.33: Copper grade distributions from zone TQ7 of copper/gold deposit 3.

From the above graphs it is clear that GEMNET II tends underestimate highgrade samples in the three high-grade zones (TQ1, TQ1A, TQ3) while it shows some overestimation of the low-grade samples in the low-grade zone TQ7. Its performance is consistent through the rest of the distribution. Generally, it shows to be less affected by high-grade samples than the other two techniques, which can prove very useful. GEMNET II is meant to be a robust technique that can accept data of unknown quality and still provide sensible results. Its performance is verified by the absolute errors, which were always close to if not better than those of kriging. The underestimation of high-grade samples can also be explained by the geometry of the zones. This geometry controls the number of samples being selected as neighbours for the networks of module two in GEMNET II. As the zones are fairly narrow and long, some of the sectors are consistently empty and the network of module one provides the estimate for those. This network, as was explained before,

217

GEMNET II Application – Case Studies

tends to give estimates close to the overall average grade and hence the underestimation of high-grade samples. Inverse distance weighting performed exceptionally well in this case study compared to both kriging and GEMNET II, considering how simple the method really is. However, it was benefited by a complete geostatistical study that improved the search method for the sample selection. The performance of kriging was once again very good as in the previous studies. The following figures show slices through the block model of the copper/gold deposit 3 coloured by the reliability indicator, module index, and estimated copper grade values.

Figure 8.34: Plan section (top) and cross section (bottom) of block model coloured by reliability indicator values for the copper grade estimates of copper/gold deposit 3.

218

GEMNET II Application – Case Studies

It should be noted that the block model was modified to reflect the geological environment of the deposit as modelled by the geologist (not the author!). For this reason there are blocks that have been deleted from the model as shown in the figures. The block model consists of major blocks and sub-blocks inside them that follow better the zones and other geological entities. As the estimation only takes place inside the zones, the blocks outside them retain the default values, which is why the majority of the blocks in the slices have the same colour.

Figure 8.35: Plan section (top) and cross section (bottom) of block model coloured by module index values for the copper grade estimates of copper/gold deposit 3.

As explained before, the zones are very narrow and the system is choosing the module one network (red blocks) in most cases. In the cases where module two networks are

219

GEMNET II Application – Case Studies

also used, the reliability indicator shows disagreement between the individual estimates. This is caused mainly by the module one network that still contributes to the final estimate by filling empty sectors.

Figure 8.36: Plan section (top) and cross section (bottom) of block model coloured by copper grade estimates for the copper/gold deposit 3.

8.5 Case Study 4 – Copper/Gold Deposit 4 The final case study for GEMNET II tested its limits in terms of the speed and computational overhead. The dataset used is relatively large at least for a case study (over 30,000 samples!). It is very different from the previous three not only in the number of samples but also in the sampling density and the complexity of the orebody. As shown in Fig. 8.37, it is a massive copper/gold deposit that has undergone an extensive exploration programme.

220

GEMNET II Application – Case Studies

Figure 8.37: Orebody, drillholes, and block model extents from copper/gold deposit 4.

The dataset used in this case study consists of only the underground drillholes as these were intersecting the orebody. There are over 300 underground drillholes from existing underground workings that were used to delineate the orebody and prove the reserves. As expected, this was the longest case study in terms of the time required by GEMNET II to complete the estimation process. More specifically, GEMNET II required a total of 14 hours for the entire process. It is quite interesting though to give the breakdown of this time period as to the various processes involved. The generation of training patterns for the neural networks took most of this time (10

221

GEMNET II Application – Case Studies

hours!). Processing of the block model require around one hour and a half, and the actual estimation process only 30 minutes. The time requirements in this case study are very similar to those from other neural network applications that involve large amount of data. The author did not perform the geostatistical study. It has been performed by a geologist at Maptek who has definitely done a better job than the author could have done himself. Unfortunately there is no information on the time spent during the geostatistical study, but the author believes that it would be at least a matter of days. The following table summarises the results from the application of all three techniques to the data of this case study. Once again, it should be noted that inverse distance weighting has benefited from the geostatistical study that improved significantly the results obtained with this technique.

Table 8.8: Summary of estimation results from copper/gold deposit 4.

Average Grade

ABS Error %

Actual

4.1316

-

Inverse Distance Weighting

3.9264

19.78 %

Kriging

3.9014

14.46 %

GEMNet II

3.8907

15.04 %

The performance of the three estimators becomes clearer by examining the data fit and distribution graphs given in the following figures. All three techniques performed well.

222

GEMNET II Application – Case Studies

Gold Grades Data Fit 12

10

Estimated

8

ID2 Kriging

6

GEMNet II 4

2

0 0

2

4

6

8

10

12

Actual

Figure 8.38: Scatter diagram of actual vs. estimated gold grades from copper/gold deposit 4.

Gold Grade Distribution 600

500

Frequency

400

Actual ID2

300

Kriging GEMNet II 200

100

0 2

4

6

8

10

More

Bin

Figure 8.39: Gold grade distributions from copper/gold deposit 4.

223

GEMNET II Application – Case Studies

Quite clearly, GEMNET II is performing well in low and average grades with some underestimation of high-grade samples, which however is less than in the previous cases. Unlike the previous case studies, slices through the block model are given in 3D view illustrating the capabilities of the graphical environment (Envisage) and the benefits from the integration of GEMNET II. As usual, the block model slices are coloured according to the reliability indicator, module index, and estimated gold grade values.

Figure 8.40: 3D view of sections through the block model coloured by the reliability indicator values from copper/gold deposit 4. Orebody model is also shown.

224

GEMNET II Application – Case Studies

Figure 8.41: 3D view of block model sections coloured by module index values from copper/gold deposit 4.

Figure 8.42: 3D view of block model sections coloured by estimated gold grades from copper/gold deposit 4.

225

GEMNET II Application – Case Studies

8.6 Conclusions The case studies in this chapter have demonstrated the use of GEMNET II as an integrated grade estimation system. The case studies were presented in order of increasing complexity and were chosen to illustrate the usability as well as the performance of GEMNET II. They were also chosen to provide the basis for comparison with the two most popular advanced estimation techniques, inverse distance weighting and kriging. The four case studies were completed without major problems especially in the application of GEMNET II. The author did not use any additional information or knowledge for its application other that the grades provided by the drillhole samples. The same cannot be said for the other two methods that required a geostatistical study that normally took days to complete. On the other hand, the author did make use of the validation tools developed for GEMNET II to examine the system’s operation and draw conclusions as to potential problems. These validation tools were used to finetune the modular neural network architecture that comprises the core of GEMNET II. The benefits of integration with VULCAN, the resource-modelling package, were also demonstrated. The advanced graphical functions provided by its graphical environment, Envisage, allowed the visualisation of the results from GEMNET II and the development and use of specialised validation tools. In all four case studies, GEMNET II performed very well even in comparison with the other already established techniques. The results obtained have shown that it is a reliable and fast grade estimation system. GEMNET II has shown its potential as a valid alternative that can handle large amounts of data quickly and without being prone to extreme values. However, these case studies have clearly demonstrated that

226

GEMNET II Application – Case Studies

GEMNET II, as any other advanced grade estimation system, still shows dependency on the results of geological modelling. Finally, it should be noted once again that the case studies presented have been limited by the fact that most of the available to the author data were confidential. GEMNET II has been applied with success to a number of other deposits including potash, zinc, and iron ore deposits. Unfortunately, the author did not have the right to publish the results from these studies.

227

Conclusions and Further Research

9. Conclusions and Further Research 9.1 Conclusions Grade estimation is the most computationally intensive stage of a mineral deposit evaluation. It is also one of the most critical ones, as the results obtained at this stage will determine to a great extent the profitability of a mining project. In other words, decisions that involve large amounts of financial resources are depending on the results of the grade estimation process. Grade estimation is mainly a process of interpolation from exploration data. There is a high cost associated with exploration in mineral deposits and for this reason, the amount of available data is usually low in comparison to the required estimated area. Depending on the complexity of the given deposit and the required accuracy of the estimates, different techniques are currently used with the most advanced being the techniques provided by the geostatistical methodology. These techniques have been developed to reflect the geological picture of the deposit in space. When effectively applied, these techniques can provide very accurate results. However, the geostatistical methodology, being very complex, requires knowledge and expertise to be effectively applied. This knowledge and expertise is often not present and in some cases the people who apply geostatistics have insufficient experience in the field. As a result, the grade estimates produced are not accurate and the mining industry very often doubts the reliability of the method. It is generally accepted by geostatisticians that given the same data, different people will almost certainly produce different estimates using geostatistics. Geostatistics is also based on assumptions for the distribution of grades, which in many cases are acceptable. There are, however, deposits where the required

185

Conclusions and Further Research

assumptions cannot be made. Unfortunately, there are cases when people who apply geostatistics do not consider this fact. In some cases it is also very difficult to understand if the required assumptions can be valid for the given deposit. The above problems have led scientists in search for alternative methods. In recent years the application of Artificial Intelligence (AI) tools in the mining industry has become more common, especially in system control applications and decision-making. One of the most important AI tools, Artificial Neural Networks (ANNs), has been applied with success to problems that involve large amounts of data of unknown quality. ANNs are computing structures based on simplified models of biological neural systems such as the human brain. They develop solutions to problems by ‘learning’ a required response from examples. One of the problems that ANNs are very successful in providing solutions is function approximation. Grade estimation can be considered as a problem of approximating an unknown function from examples provided from exploration data. There are different ways of forming examples for training ANNs from exploration data. Examples usually come in the form of input-output patterns, with the output being the modelled variable. In the case of grade estimation, inputs can be the sample co-ordinates in space, or other measurements and the output is normally the grade. The choice of input parameters dictates the vector space where the grade will be approximated. This choice is essential for the estimation process using ANNs. The choice of a type of ANN for grade estimation is limited to those following the supervised paradigm explained above. There are two main candidate types of ANNs for function approximation problems. These are the Multi-Layered Perceptron (MLP) and the Radial Basis Function network (RBF). The RBF network seems to be

186

Conclusions and Further Research

a better choice for grade estimation as it constructs local approximations as opposed to global approximations of the MLP. It is generally accepted that grade is a localised variable and therefore RBF networks are ideal for its estimation. The objectives of the research presented in this thesis were to go a step further into the development of a neural network based estimation techniques than other researchers have done in the past. More specifically, the developed ANN based system for grade estimation should be able to handle 3D exploration data from real deposits and perform estimation on 3D block model basis. The estimation process itself should honour the distribution of grades in 3D space and take into account the spatial variability of grades in different directions in space. The developed system, GEMNET II, is integrated with one of the leading packages for earth resources modelling, VULCAN. The potential benefits of integration were exploited to the maximum extent. GEMNET II takes advantage of VULCAN’s graphical environment and capabilities to present its estimation results in 3D. A number of validation tools measuring the reliability of the produced estimates as well as showing useful information on the estimation process itself have been developed using these graphical capabilities. As a result, GEMNET II is not just an ANN based interpolator but a complete system for grade estimation that can be integrated in the larger mine planning and design process. The reliability and estimation performance of GEMNET II has been verified by a number of case studies, some of them presented in this thesis. From the results obtained in most of these studies and in comparison to results obtained using geostatistics, it becomes clear that GEMNET II is a valid alternative that can turn the great potential of ANNs in the field of grade estimation to a complete system.

187

Conclusions and Further Research

GEMNET II is user-independent, i.e. its results do not depend on user input or modifications of the estimation technique. This however does not mean that the user has no control over the estimation process, or that GEMNET II can override the geological modelling that should precede any grade estimation. GEMNET II still depends on the given data – in fact its performance depends solely on them. The user can therefore improve its performance by controlling the data used for building the examples for training the networks in GEMNET II. The validation tools provided by the system, can aid the user in this task by indicating areas where GEMNET II is facing difficulties in giving accurate results. The work presented in this thesis shows that ANNs can be used to develop solutions for grade estimation problems and that ANNs as approximators do not lack the mathematical background, a misconception done by many people in the mining industry. ANNs have a very rich theoretical background that spans over many different scientific fields. Their application in a problem like grade estimation is not a ‘black box’ approach, i.e. their results and overall operation can be validated and justified. GEMNET II is a good example of how ANNs can be successfully used to develop a grade estimation solution.

9.2 Further Research Artificial Neural Networks is a rapidly evolving field. This means that there is an almost constant development of new architectures and learning algorithms. Therefore, there will always be new ANNs to try for the problem of grade estimation. Regarding GEMNET II, there have been many improvements to the standard Radial Basis Function Network since the beginning of the work presented in this thesis. The most important ones concern the intelligent control of the number of RBFs required for a given problem as well as the design and adaptation of their shape. There is no reason

188

Conclusions and Further Research

why the hyper-spherical shape of the RBFs is ideal for all problems. Many researchers had tried other basis functions like the rectangular basis functions. As the architecture of the GEMNET II is modular, i.e. consists of several neural networks, there will always be space for improvement. The number of networks, the 3D search method, and most importantly the way the individual estimates are used to form a single estimate can be areas of further work. The author suggests that a more flexible search method that adapts to the given sampling geometry could improve the estimation performance as well as the speed of training pattern generation. A new search method of course would result to a change of the number of networks – varying number of sectors leads to varying number of networks trained on sector data. The way that individual estimates are used to generate a single estimate for each point can also be further optimised. In GEMNET II, an RBF network is responsible for averaging the individual estimates but this is not necessarily the only way for achieving this. New networks can be tested and perhaps another solution can be found that will not be based on ANNs. The effect of using various ANN modules on the block model estimates needs to be examined. The use of different ANNs for different blocks can be a source of inconsistencies in the estimates produced and can possibly introduce a bias. The author believes that using GEMNET II and especially the validation tools provided can help in investigating ways of improving the system. The integration with VULCAN can be taken even further. Direct access of the block model, and possibly allowing the use of grid models as the basis of grade estimation will significantly increase the speed of the system.

189

Conclusions and Further Research

The use of a neural network simulator like SNNS helped in the development stages of GEMNET II. Once the architecture is finalised, there is no reason why the system should still depend on a simulator for the development of neural networks. Including the network code into the core of the system will have significant effects on the speed of training and application of the networks. However, this should not be done to the expense of the flexibility to change critical parameters of the learning algorithm or the RBFs. The validation tools can be further developed to include options for more accurate measurement of the estimates’ reliability, such as confidence intervals. An indication of when the networks are extrapolating would also be useful to indicate areas where the sampling is insufficient. The performance of GEMNET II in terms of the block model estimates needs to be investigated. As it is very difficult to have the actual block model grades from real deposits, other cases should be examined such as simulated deposits in order to examine the behaviour of the system while estimating volumes larger than those of drillhole samples. The effect of the sample support input to the system needs further investigation. Finally, it should be noted that a system such as GEMNET II based on artificial neural networks will require time to gain the acceptance of the mining industry. One should not forget how difficult it was and how much time it required for geostatistics to be established and widely used three decades ago. Allowing as many people as possible to experience the use of GEMNET II and make their own conlcusions is the only way to establish it as a valid alternative method for grade estimation and probably the best way towards further improvements.

190

Appendix A – File Structures

Appendix A – File Structures A1. SNNS Network Description File SNNS network definition file V1.4-3D generated at Tue Sep 28 11:56:29 1999 network name : east source files : no. of units : 44 no. of connections : 160 no. of unit types : 0 no. of site types : 0

learning function : RadialBasisLearning update function : Topological_Order

unit default section : act | bias | st | subnet | layer | act func | out func ---------|----------|----|--------|-------|------------------|------------0.00000 | 0.00000 | h | 0 | 1 | Act_RBF_Gaussian | Out_Identity ---------|----------|----|--------|-------|------------------|-------------

unit definition section : no. | typeName | unitName | act | bias | st | position | act func | out func | sites ----|----------|----------|----------|----------|----|----------|--------------|---------|------1 | | Grade | 0.02936 | 0.00000 | i | 2, 2,72 | Act_Identity | | 2 | | Distance | 0.44031 | 0.00000 | i | 3, 2,72 | Act_Identity | | 3 | | Length | 0.00646 | 0.00000 | i | 4, 2,72 | Act_Identity | | 4 | | c1 | 0.97190 | 0.93847 | h | 1, 7,68 ||| 5 | | c2 | 0.87191 | 0.85472 | h | 2, 7,68 ||| 6 | | c3 | 0.96511 | 0.70236 | h | 3, 7,68 ||| 7 | | c4 | 0.85429 | 0.89278 | h | 4, 7,68 ||| 8 | | c5 | 0.87341 | 0.91482 | h | 5, 7,68 ||| 9 | | c6 | 0.83433 | 0.94826 | h | 1, 7,69 ||| 10 | | c7 | 0.85495 | 0.89164 | h | 2, 7,69 ||| 11 | | c8 | 0.85175 | 0.90161 | h | 3, 7,69 ||| 12 | | c9 | 0.85231 | 0.88927 | h | 4, 7,69 ||| 13 | | c10 | 0.82321 | 0.95233 | h | 5, 7,69 ||| 14 | | c11 | 0.84729 | 0.88713 | h | 1, 7,70 ||| 15 | | c12 | 0.85888 | 0.90323 | h | 2, 7,70 ||| 16 | | c13 | 0.83049 | 0.85461 | h | 3, 7,70 ||| 17 | | c14 | 0.86441 | 0.89943 | h | 4, 7,70 ||| 18 | | c15 | 0.85100 | 0.88699 | h | 5, 7,70 ||| 19 | | c16 | 0.84793 | 0.92424 | h | 1, 7,71 ||| 20 | | c17 | 0.84574 | 0.86460 | h | 2, 7,71 ||| 21 | | c18 | 0.85321 | 0.89859 | h | 3, 7,71 ||| 22 | | c19 | 0.81511 | 0.96778 | h | 4, 7,71 ||| 23 | | c20 | 0.85326 | 0.90119 | h | 5, 7,71 ||| 24 | | c21 | 0.86209 | 0.88879 | h | 1, 7,72 ||| 25 | | c22 | 0.86017 | 0.88878 | h | 2, 7,72 ||| 26 | | c23 | 0.85986 | 0.89859 | h | 3, 7,72 ||| 27 | | c24 | 0.86807 | 0.93349 | h | 4, 7,72 ||| 28 | | c25 | 0.86555 | 0.89128 | h | 5, 7,72 ||| 29 | | c26 | 0.87574 | 0.92786 | h | 1, 7,73 ||| 30 | | c27 | 0.86566 | 0.90208 | h | 2, 7,73 ||| 31 | | c28 | 0.87349 | 0.87303 | h | 3, 7,73 ||| 32 | | c29 | 0.95634 | 0.91290 | h | 4, 7,73 ||| 33 | | c30 | 0.99132 | 0.96239 | h | 5, 7,73 ||| 34 | | c31 | 0.54082 | 0.81074 | h | 1, 7,74 ||| 35 | | c32 | 0.85517 | 0.90117 | h | 2, 7,74 |||

239

Appendix A – File Structures

36 37 38 39 40 41 42 43 44

| | | | | | | | |

| | | | | | | | |

c33 c34 c35 c36 c37 c38 c39 c40 Target

| | | | | | | | |

0.82977 0.86362 0.87332 0.84698 0.89793 0.92686 0.84971 1.00000 0.10965

| | | | | | | | |

0.97302 0.92169 0.93760 0.92475 0.92879 0.90759 0.90679 0.67025 0.42555

| | | | | | | | |

h h h h h h h h o

| | | | | | | | |

3, 7,74 4, 7,74 5, 7,74 1, 7,75 2, 7,75 3, 7,75 4, 7,75 5, 7,75 3,12,72

||| ||| ||| ||| ||| ||| ||| ||| | Act_Logistic |

| ----|----------|----------|----------|----------|----|----------|--------------|---------|-------

connection definition section : target | site | source:weight -------|------|-------------------------------------------------------------------------------------------------------------------4 | | 1: 0.10449, 2: 0.59754, 3: 0.00881 5 | | 1: 0.07599, 2: 0.04265, 3: 0.01435 6 | | 1: 0.06131, 2: 0.21773, 3: 0.00430 7 | | 1: 0.03022, 2: 0.02039, 3: 0.01435 8 | | 1: 0.00691, 2: 0.05634, 3: 0.00287 9 | | 1: 0.15199, 2: 0.02084, 3: 0.01076 10 | | 1: 0.04750, 2: 0.02152, 3: 0.00001 11 | | 1: 0.06909, 2: 0.02042, 3: 0.01578 12 | | 1: 0.03022, 2: 0.01647, 3: 0.01435 13 | | 1: 0.18998, 2: 0.01791, 3: 0.01435 14 | | 1: 0.17349, 2: 0.03287, 3: 0.01004 15 | | 1: 0.03800, 2: 0.03009, 3: 0.01435 16 | | 1: 0.23143, 2: 0.02020, 3: 0.00861 17 | | 1: 0.17349, 2: 0.06453, 3: 0.01004 18 | | 1: 0.11744, 2: 0.02308, 3: 0.01435 19 | | 1: 0.01036, 2: 0.01827, 3: 0.00646 20 | | 1: 0.15026, 2: 0.01704, 3: 0.00574 21 | | 1: 0.05354, 2: 0.02071, 3: 0.01004 22 | | 1: 0.20812, 2: 0.01691, 3: 0.00359 23 | | 1: 0.04836, 2: 0.02118, 3: 0.01435 24 | | 1: 0.03627, 2: 0.03181, 3: 0.00003 25 | | 1: 0.08981, 2: 0.03318, 3: 0.01435 26 | | 1: 0.18048, 2: 0.05929, 3: 0.01004 27 | | 1: 0.13126, 2: 0.06458, 3: 0.00859 28 | | 1: 0.02159, 2: 0.04027, 3: 0.05021 29 | | 1: 0.02159, 2: 0.06224, 3: 0.00574 30 | | 1: 0.05354, 2: 0.04115, 3: 0.00716 31 | | 1: 0.15026, 2: 0.06573, 3: 0.00574 32 | | 1: 0.11744, 2: 0.23762, 3: 0.01435 33 | | 1: 0.09499, 2: 0.37245, 3: 0.01865 34 | | 1: 0.90000, 2: 0.44864, 3: 0.01548 35 | | 1: 0.12867, 2: 0.03576, 3: 0.01578 36 | | 1: 0.15285, 2: 0.02016, 3: 0.00359 37 | | 1: 0.15976, 2: 0.06343, 3: 0.01291 38 | | 1: 0.11054, 2: 0.06900, 3: 0.00717 39 | | 1: 0.22884, 2: 0.06650, 3: 0.01433 40 | | 1: 0.18998, 2: 0.14022, 3: 0.01435 41 | | 1: 0.06304, 2: 0.15300, 3: 0.00430 42 | | 1: 0.10190, 2: 0.02282, 3: 0.00001 43 | | 1: 0.02936, 2: 0.44031, 3: 0.00646 44 | | 4: 1.97859, 5:49.71570, 6:59.90763, 7:-37.35740, 8:-16.30947, 9:37.28542, 10:-39.35506, 11:12.70683, 12:-30.57939, 13:22.76179, 14:-13.37884, 15:-13.06231, 16:-15.15001, 17: 5.64048, 18:-39.55381, 19:50.59846, 20:-33.14001, 21:-9.88406, 22:22.56762, 23:14.93855, 24:39.54713, 25:32.85326, 26:-6.63768, 27:38.57726, 28:12.00233, 29:-22.77133, 30:-3.05694, 31:42.40548, 32:-5.19528, 33:27.19900, 34:-2.25054, 35:-47.43574, 36:48.85722, 37:-50.62162, 38:-30.12116, 39:16.00671, 40:-21.60845, 41:-2.33463, 42:17.54285, 43:-39.69540 -------|------|--------------------------------------------------------------------------------------------------------------------

240

Appendix A – File Structures

A2. SNNS Network Pattern File SNNS pattern definition file V3.2 generated at Tue Jun 16 11:15:00 1998

No. of patterns : 8618 No. of input units : 3 No. of output units : 1 0.001727 0.041451 0.063040 0.041451 0.158895 0.041451 0.140760 0.041451 0.293610 0.041451 0.014680 0.041451 0.162349 0.041451 0.071675 0.041451 0.004318 0.041451 0.008636 0.041451 0.075993 0.041451 0.001727 0.054404 0.063040 0.054404 0.158895 0.054404 0.140760 0.054404 0.293610 0.054404 0.014680 0.054404 0.162349 0.054404 0.071675 0.054404

0.021182 0.002740 0.021057 0.010760 0.020829 0.014347 0.020625 0.009326 0.019989 0.012195 0.019852 0.008608 0.019708 0.014347 0.019544 0.014347 0.019400 0.014347 0.019272 0.014347 0.019161 0.014347 0.021308 0.002740 0.021180 0.010760 0.020946 0.014347 0.020736 0.009326 0.020079 0.012195 0.019935 0.008608 0.019785 0.014347 0.019613 0.014347

241

Appendix A – File Structures

A3. BATCHMAN Network Development Script # # # #

GEMNet II East Module Training Procedure Optimized 29/7/1999 Ioannis Kapageridis 1999

print("GEMNet II - Neural Network Development") print("Module 1 - East Network") loadNet ("east\eastut.net") loadPattern("east\east.pat") loadPattern("east\eastx.pat") setPattern("east\eastx.pat") print ("Number of patterns :",PAT) trainNet() setInitFunc("Randomize_Weights") initNet() setInitFunc("RBF_Weights_Kohonen",1000.0,0.4,0.0) initNet() setInitFunc("RBF_Weights",-0.8,0.8,0.2,0.9,0.0) initNet() saveNet ("east\eastini.net") print("SSE = ",SSE)

# hidden unit bias training setLearnFunc("RadialBasisLearning",0.0,0.001,0.0,0.01,0.6) while CYCLES < 500 do trainNet() endwhile print("SSE = ",SSE) # RBF centres training setLearnFunc("RadialBasisLearning",0.001,0.0,0.0,0.01,0.6) while CYCLES < 1000 do trainNet() endwhile print("SSE = ",SSE) # hidden-output layer weights training setLearnFunc("RadialBasisLearning",0.0,0.0,0.001,0.01,0.6) while CYCLES < 1250 do trainNet() endwhile print ("SSE = ",SSE, " MSE = ", MSE) loadPattern("east\eastx.pat") setPattern("east\eastx.pat") saveResult("east\east.res",1,PAT,FALSE,TRUE,"create") saveNet("east\easttr.net")

242

Appendix A – File Structures

A4. SNNS2C Network C Code Extract /********************************************************* d:\gemnns\east\east.c -------------------------------------------------------generated at Tue Sep 28 12:33:23 1999 by snns2c ( Bernward Kett 1995 ) *********************************************************/ #include #define ) : 0.0 #define #define #define

Act_Logistic(sum, bias) ( (sum+biasact = in[member]; } for (member = 0; member < 40; member++) { unit = Hidden1[member]; sum = 0.0; for (source = 0; source < unit->NoOfSources; source++) { static float diff; diff = unit->sources[source]->act - unit->weights[source];

245

Appendix A – File Structures

sum += diff * diff; } unit->act = Act_RBF_Gaussian(sum, unit->Bias); }; for (member = 0; member < 1; member++) { unit = Output1[member]; sum = 0.0; for (source = 0; source < unit->NoOfSources; source++) { sum += unit->sources[source]->act * unit->weights[source]; } unit->act = Act_Logistic(sum, unit->Bias); }; for(member = 0; member < 1; member++) { out[member] = Units[Output[member]].act; } return(OK); }

246

Appendix A – File Structures

A5. VULCAN Composites File * * DEFINITION * HEADER_VARIABLES 5 * COMPID C 16 0 key * CTYPE C 12 0 * DATE C 12 0 * TIME C 12 0 * DESCRP C 80 0 * VARIABLES 17 * DHID C 12 0 * MIDX F 12 3 * MIDY F 12 3 * MIDZ F 12 3 * TOPX F 12 3 * TOPY F 12 3 * TOPZ F 12 3 * BOTX F 12 3 * BOTY F 12 3 * BOTZ F 12 3 * LENGTH F 12 3 * FROM F 12 3 * TO F 12 3 * GEOCOD C 12 0 * BOUND C 12 0 * AU F 12 3 * ORE F 2 0 * * HEADER:GOLD STRAIGHT 23-Oct-98 DDFD/A7 1910.318 2088.532 1013.920 1909.987 2088.398 1013.563 1.010 1.550 0 DDFD/A7 1909.656 2088.264 1013.206 1909.325 2088.131 1012.848 1.010 3.930 0 DDFD/A7 1908.994 2087.997 1012.491 1908.662 2087.863 1012.134 1.010 1.370 0 DDFD/A7 1908.331 2087.729 1011.777 1908.000 2087.595 1011.420 1.010 2.990 0 DDFD/A7 1907.669 2087.462 1011.063 1907.338 2087.328 1010.706 1.010 1.650 0 DDFD/A7 1907.007 2087.194 1010.349 1906.676 2087.060 1009.992 1.010 1.070 0 DDFD/A7 1906.345 2086.927 1009.635 1906.014 2086.793 1009.278 1.010 1.620 0 DDFD/A7 1905.683 2086.659 1008.920 1905.352 2086.525 1008.563 1.010 2.690 0 DDFD/A7 1905.020 2086.392 1008.206 1904.689 2086.258 1007.849 1.010 4.230 0 DDFD/A7 1904.358 2086.124 1007.492 1904.027 2085.990 1007.135 1.010 1.550 0 DDFD/A7 1903.696 2085.856 1006.778 1903.365 2085.723 1006.421 1.010 9.120 0

16:40:47 Compositing Run 1910.649 2088.666 1014.277 144.720 145.730NONE 0 1909.987 2088.398 145.730 146.740NONE

1013.563 0

1909.325 2088.131 146.740 147.750NONE

1012.848 0

1908.662 2087.863 147.750 148.760NONE

1012.134 0

1908.000 2087.595 148.760 149.770NONE

1011.420 0

1907.338 2087.328 149.770 150.780NONE

1010.706 0

1906.676 2087.060 150.780 151.790NONE

1009.992 0

1906.014 2086.793 151.790 152.800NONE

1009.278 0

1905.352 2086.525 152.800 153.810NONE

1008.563 0

1904.689 2086.258 153.810 154.820NONE

1007.849 0

1904.027 2085.990 154.820 155.830NONE

1007.135 0

247

Appendix B –Case Study Data

Appendix B – Case Study Data B1. Case Study 1 – 2D Iron Ore Deposit Easting 0 10 15 55 125 175 120 160 240 260 235 365 285 345 335 325 350 290 10 85 50 200 400 360 335 5 20 25 50 155 145 130 175 220 205 265 390 325 310 385 325 375 200 55 395

Northing % Fe 170 40 135 145 20 50 180 175 185 115 15 60 110 115 170 195 235 230 390 380 270 280 355 335 310 195 105 155 40 15 125 185 185 90 0 65 65 105 150 165 220 215 230 375 245

34.3 35.5 28.6 29.4 41.5 36.8 33.4 36 30.2 33.2 33.7 34.3 35.3 31 27.4 33.9 37.6 39.9 27.2 34.2 30.2 30.4 39.9 40 40.6 33.9 32.5 29.6 30.6 40.4 30.1 35.3 41.4 28.5 40.1 24.4 31.6 39.5 34.8 29.9 37.8 29.8 37.4 27.4 36.5

165 270 365 330 330 0 100 200 300 400 50 150 250 350 0 100 200 300 400 50 150 250 350 0 100 200 300 400 50 150 250 350 0 100 200 300 400 50 150 250 350 0 100 200 300 400

355 285 340 320 290 0 0 0 0 0 50 50 50 50 100 100 100 100 100 150 150 150 150 200 200 200 200 200 250 250 250 250 300 300 300 300 300 350 350 350 350 400 400 400 400 400

40.8 32.9 40 44.1 41.4 45.3 30.7 40 33.3 33.5 30.4 36.7 27.6 34.7 37.9 40.5 31.8 39.8 35.4 32.4 34.7 34.4 28.9 34.1 31.5 39.1 35.5 34.9 33.7 35.4 36.3 34.5 34.9 27.4 27.5 39 32.4 26.2 40 29.1 39.3 36.6 34.6 38.9 37.9 35.4

Appendix B –Case Study Data

B2. Case Study 2 – 2D Copper Deposit Easting Northing Cu 182.88 579.12 243.84 548.64 335.28 548.64 67.06 487.68 152.4 487.68 213.36 487.68 274.32 487.68 335.28 487.68 457.2 487.68 91.44 426.72 152.4 426.72 210.31 426.72 274.32 426.72 335.28 426.72 396.24 426.72 152.4 365.76 243.84 365.76 274.32 365.76 335.28 365.76 396.24 365.76 457.2 365.76 518.16 365.76 579.12 365.76 121.92 335.28 335.28 304.8 396.24 304.8 457.2 304.8

0.175 0.417 0.489 0.215 0.396 0.685 0.377 0.427 0.14 0.392 0.32 0.717 0.806 0.889 0.475 0.23 0.833 0.453 0.719 1.009 0.893 0.089 0.092 0.102 0.915 1.335 0.519

518.16 579.12 284.68 220.68 152.4 274.32 335.28 396.24 457.2 518.16 579.12 115.82 182.88 274.32 335.28 396.24 457.2 518.16 152.4 274.32 335.28 396.24 457.2 335.28

304.8 304.8 304.8 304.8 274.32 243.84 243.84 243.84 243.84 243.84 243.84 219.46 213.36 182.88 182.88 182.88 182.88 182.88 152.4 121.92 121.92 121.92 121.92 64.01

0.072 0.04 1.365 0.023 0.644 0.258 0.638 1.615 0.765 0.465 0.034 0.476 0.409 0.165 0.063 0.406 0.909 0.012 0.228 0.224 0.188 0.027 0.395 0.225

Appendix B –Case Study Data

B3. Case Study 3 – 3D Gold Deposit Easting Northing Elevation Length Au 78303.29 4776.742 120.257 0.435 0.028 78017.81 4631.307 93.487 0.682 0.045 78303.38 4776.22 118.688 0.02 0.089 78303.09 4777.861 123.62 0.02 0.104 77902.53 4564.935 74.358 0.199 0.123 78263.42 4744.765 95.17 0.343 0.141 77902.5 4558.01 79.926 0.589 0.155 78303.34 4776.443 119.357 0.03 0.159 77902.73 4564.614 72.496 0.189 0.163 78018.33 4630.683 90.6 0.199 0.172 78018.14 4630.912 91.658 0.02 0.18 78018.53 4630.444 89.493 0.305 0.181 78299.67 4784.724 92.872 0.257 0.199 78303.2 4777.247 121.773 0.03 0.201 78299.8 4784.294 91.44 0.218 0.203 77902.63 4564.766 73.378 0.06 0.207 78265.54 4740.429 95.17 0.857 0.211 77903.05 4564.098 69.507 0.38 0.225 77903.25 4563.777 67.645 0.444 0.227 78264.56 4742.429 95.17 0.644 0.234 77902.54 4557.89 78.833 0.208 0.237 78299.74 4784.509 92.156 0.179 0.238 77902.84 4564.445 71.516 0.199 0.251 77902.94 4564.276 70.536 0.343 0.254 77902.58 4557.775 77.79 0.462 0.256 78299.52 4785.227 94.594 0.305 0.266 78014.15 4644.552 62.082 0.1 0.274 78014.21 4644.394 61.507 0.267 0.277 78264.09 4743.395 95.17 0.218 0.278 78263.03 4745.574 95.17 0.238 0.281 78299.59 4785.007 93.828 0.159 0.285 78220.04 4727.461 95.17 0.961 0.287 78299.46 4785.448 95.36 0.371 0.29 78014.27 4644.235 60.931 2.179 0.305 78102.56 4676.925 153.74 0.946 0.305 78263.72 4744.159 95.17 0.02 0.313 77903.98 4562.514 60.492 0.904 0.321 77906.55 4550.72 95.17 0.497 0.325 78299.38 4785.71 96.27 0.333 0.326 78014.19 4644.46 61.747 0.02 0.327 77903.37 4563.583 66.518 0.333 0.336 78017.97 4631.109 92.573 0.01 0.337 77903.7 4563.003 63.236 0.54 0.349 77902.64 4557.615 76.25 0.343 0.361 77905.67 4552.12 95.17 0.54 0.364 78024.59 4634.531 95.17 0.514 0.364 78174.45 4716.776 88.219 0.286 0.367 77902.52 4557.944 79.33 0.11 0.368

78219.09 77903.5 77903.15 78174.1 77903.8 78175 78264.93 78100.83 78219.46 78220.51 78014.1 77907.08 78014.03 78024.98 77906.02 78265.25 78220.76 77903.89 78013.93 78365.7 78014.6 78014.52 78103.53 77904.08 78014.65 78219.82 78102 78366.26 78102.83 78220.25 78102.24 78014.24 78172.44 78013.66 78014.74 78101.7 78172.1 78014.57 78365.98 78014.38 78102.44 78365.44 78315.56 78061.19 78172.68 78366.52 78101.42 78013.82 78013.32

4730.069 4563.378 4563.938 4717.228 4562.828 4716.042 4741.665 4680.052 4729.059 4726.169 4644.671 4549.872 4644.869 4633.71 4551.568 4741.013 4725.465 4562.671 4645.132 4791.595 4643.391 4643.589 4675.176 4562.323 4643.259 4728.049 4677.931 4789.672 4676.444 4726.874 4677.494 4644.328 4712.58 4645.782 4643.008 4678.478 4713.573 4643.457 4790.633 4643.945 4677.144 4792.508 4779.559 4660.662 4711.895 4788.759 4678.981 4645.391 4646.565

95.17 65.343 68.576 89.604 62.256 86.053 95.17 153.74 95.17 95.17 62.514 95.17 63.233 95.17 95.17 95.17 95.17 61.374 64.193 153.84 57.862 58.581 153.74 59.414 57.382 95.17 153.74 153.84 153.74 95.17 153.74 61.267 95.17 66.644 56.471 153.74 95.17 58.102 153.84 59.876 153.74 153.84 153.84 129.789 95.17 153.84 153.74 65.153 69.625

0.389 0.286 0.08 0.333 0.343 0.228 0.199 1.55 0.286 0.659 0.159 0.218 1.513 0.371 0.352 0.179 0.589 0.13 1.549 0.847 0.352 2.154 1.56 0.565 1.527 0.296 1.513 0.942 0.882 0.169 0.324 0.169 0.745 1.559 1.493 1.558 1.559 0.169 0.802 2.239 0.07 0.38 1.872 0.218 0.333 0.621 0.471 0.13 0.862

0.368 0.371 0.381 0.381 0.386 0.398 0.411 0.421 0.438 0.441 0.457 0.475 0.497 0.499 0.506 0.547 0.571 0.596 0.612 0.618 0.623 0.646 0.646 0.656 0.657 0.673 0.695 0.706 0.714 0.733 0.74 0.741 0.746 0.78 0.782 0.787 0.805 0.81 0.812 0.815 0.822 0.837 0.852 0.853 0.854 0.887 0.912 0.919 0.926

Appendix B –Case Study Data

78061.5 78315.08 78101.14 78013.21 78061.83 78171.71 78012.63 78171.87

4659.817 4780.977 4679.484 4646.83 4658.924 4714.707 4627.031 4714.235

129.789 153.84 153.74 70.634 129.789 95.17 129.789 95.17

0.589 0.637 0.159 0.597 1.547 0.841 1.557 0.904

0.937 0.947 0.964 0.966 0.967 0.969 0.981 1.042

78103.19 78013.5 78100.45 78316.04 78013.73 78013.42 78315.32

4675.788 4646.161 4680.73 4778.188 4645.618 4646.35 4780.268

153.74 68.086 153.74 153.84 66.019 68.807 153.84

0.573 1.518 0.435 0.751 0.149 0.913 0.724

1.072 1.076 1.136 1.136 1.14 1.178 1.196

Appendix B –Case Study Data

B4. Case Study 4 – 3D Chrome Deposit Easting Northing Elevation Chromite 13384.18 22298.82 663.02 7.7 13197.41 22053.74 702.095 23.8 13311.75 22093.95 715.03 20.31 13311.75 22093.95 705.68 18.17 13311.75 22093.95 706.88 28.86 13311.75 22093.95 712.23 9.27 13311.75 22093.95 708.98 26.03 13382.35 22123.39 691.42 15.05 13297.75 22223.5 695.37 15.63 13352 22301.75 683.46 1.01 13352 22301.75 681.86 13.74 13352 22301.75 713.61 7.2 13352 22301.75 686.16 10.96 13352 22301.75 690.66 8.04 13352 22301.75 692.06 16.8 13323.25 22291.75 680.34 7.22 13328.31 22289.96 706.817 11.2 13333.9 22305.12 702.84 26.5 13333.9 22302.75 700.472 26.9 13333.9 22306.71 704.431 12.2 13323.25 22291.75 690.29 15.01 13383.08 22294.74 670.338 22.5 13383.71 22297.08 666.137 15.7 13382.4 22292.2 674.884 16.6 13381.8 22289.96 678.911 22.9 13386.71 22315.73 701.553 14.61 13361.12 22392.97 710.023 9.55 13311.75 22093.95 743.53 16.21 13311.75 22093.95 730.68 24.26 13311.75 22093.95 751.58 14.95 13311.75 22093.95 745.08 21.62 13330.52 22103.94 741.882 28.8 13314.85 22125.05 752.94 9 13313.25 22119.07 759.127 33.1 13318.21 22137.58 739.964 37 13315.14 22126.14 751.808 15.8 13337.66 22146.56 735.332 20.4 13318.38 22138.23 739.292 21.6 13316.42 22139.87 755.798 17.54 13316.27 22139.32 756.364 10.81 13337.04 22128.29 716.674 14.19 13316.98 22141.98 753.606 24.26 13338.19 22148.54 733.281 15.7 13337.92 22147.55 734.306 19.2 13314.53 22172.26 750.899 7.56 13314.2 22170.72 752.776 7.81 13316.2 22180.15 741.285 15.58 13316.9 22183.42 737.302 23.69 13341.15 22162.6 733.222 15

13340.76 13341.41 13297.75 13297.75 13297.75 13297.75 13346.36 13347.4 13320 13330.32 13320 13326.77 13327.95 13384.06 13385.94 13333.9 13330.49 13332.58 13331.03 13333.74 13334.17 13388.94 13249.5 13250.75 13249.5 13250.75 13249.5 13327.27 13358.8 13357.78 13357.41 13335.09 13420.48 13396.5 13396.5 13396.5 13396.5 13396.5 13396.5 13429.45 13428.73 13432.34 13431.73 13432.71 13297.75

22161.13 22163.56 22223.5 22223.5 22223.5 22223.5 22221.91 22225.79 22262.75 22301.27 22262.75 22288.02 22292.43 22296.62 22303.66 22353.24 22318.76 22326.58 22320.78 22314.04 22315.65 22314.85 22380.69 22410.38 22374.89 22411.83 22377.69 22366 22384.3 22380.51 22379.11 22395.16 22395.98 22374 22374 22374 22374 22374 22374 22400.55 22401.28 23380.89 23378.61 23382.26 22223.5

734.743 732.233 758.52 762.92 724.32 727.52 722.733 717.945 727.66 732.779 730.26 746.497 741.936 755.765 745.361 750.959 742.864 734.768 740.778 719.556 717.895 728.815 739.71 739.822 733.912 741.272 736.705 750.57 719.003 722.928 724.377 720.376 757.862 740.55 754.35 742.1 748.05 749.35 751.1 731.145 730.12 731.634 734.003 730.22 767.12

17 15.4 9.21 14.65 7.57 6.14 11.4 6.3 6.3 8.19 10.71 17.55 19.2 12.37 15.71 8.3 15.39 10.14 7.05 26.35 12.08 18.88 29.52 14.2 26.6 15.78 23.8 10.2 7.89 8.62 7.01 15.33 30.4 12.52 17.48 17.19 17.02 17.95 18.31 8.9 7.5 16.3 14.6 11.6 15.75

References

References 1.

Amari, S., Learning Patterns and Pattern Sequences by Self-Organising Nets of Threshold Elements, IEEE Trans. Computers, C-21 (11), 1197-1206, November 1972.

2.

Anderson, J.A., Cognitive Capabilities of a Parallel System. In: Bienenstock, E., et al [eds], Disordered Systems and Biological Organisation, NATO ASI Series, F20, Springer-Verlag, New York, 1986

3.

Arbib, M.A. (ed), The Handbook of Brain Theory and Neural Networks. MIT Press, Cambridge, 1995.

4.

Ash, T., Dynamic Node Creation in Backpropagation Networks. ICS Report 8901, Institute of Cognitive Science, University of California, San Diego, California, 1989.

5.

Badiozamani, K., Computer Methods. – Mining Engineering Handbook

6.

Barhen, J, and Reister, D., DeepNet: an Ultrafast Neural Learning Code for Seismic Imaging. In: International Joint Conference on Neural Networks (IJCNN ’99), International Neural Network Society and The Neural Networks Council of IEEE, Washington DC, USA, 1999.

7.

Barto, A.G., Reinforcement Learning and Adaptive Critic Methods. In: White, D.A., and Sofge, D.A., (eds), Handbook of Intelligent Control, pp. 469-491, Van Nostrand Reinhold, New York, 1992.

8.

Bischof, H., Schneider, W., and Pinz, A.J., Multispectral Classification of Landsat Images Using Neural Networks. IEEE Transactions of Geoscience and Remote Sensing, Vol. 30, No. 3, 1992.

9.

Bishop, C.M., Neural Networks for Pattern Recognition. Clarendon Press, Oxford, 1995. 60

10. Bradford, S.H., The Application of Artificial Intelligence to Mineral Processing Control. Ph.D. Thesis, Department of Mineral Resources Engineering, University of Nottingham, 1994. 11. Broomhead, D.S., and Lowe, D., Multivariable Functional Interpolation and Adaptive Networks. Complex Systems, Vol. 2, pp 321-355, 1988. 12. Burnett, C.C.H., Application of Neural Networks to Mineral Reserve Estimation. Ph.D. Thesis, Department of Mineral Resources Engineering, University of Nottingham, 1995. 13. Caiti, A., and Parisini, T., Mapping of Ocean Sediments by Networks of Parallel Interpolating Units. IEEE Conference on Neural Networks for Ocean Engineering, pp 231238, Washington DC, USA, 1991. 14. Chen, S., Nonlinear Time Series Modelling and Prediction Using Gaussian RBF networks with Enhanced Clustering and RLS Learning. Electronic Letters, Vol. 31, No. 2, pp 117-118, 1995. 15. Chinunrueng, C., and Sequin, C.H., Optimal Adaptive k-means Algorithm with Dynamic Adjustment of Learning Rate. IEEE Trans. On Neural Networks, Vol. 6, pp 157-169. 1994.

253

References 16. Clarici, E., Owen, D., Durucan, S., and Ravenscroft, P., Recoverable Reserve Estimation Using a Neural Network. 24th International Symposium on the Application of Computers and Operations Research in the Minerals Industries (APCOM), Montreal, Quebec, Canada, 1993. 17. Clark, I., Practical Geostatistics. Elsevier, Amsterdam, 1979. 18. Cortez, L.P., Sousa, A.J., and Durao, F.O., Mineral Resources Estimation Using Neural Networks and Geostatistical Techniques. 27th International Symposium on the Application of Computers and Operations Research in the Minerals Industries (APCOM), The Institution of Mining and Metallurgy (IMM), London, 1998. 19. Cybenko, G., Approximation by Superpositions of a Sigmoidal Function. Mathematics of Control, Signals, and Systems, Vol. 2, pp 303-314, 1989. 20. David, M., Geostatistical Ore Reserve Estimation. Elsevier, Amsterdam, 1977. 21. David, M., Handbook of Applied Advanced Geostatistical Ore Reserve Estimation. Elsevier, Amsterdam, 1988. 22. Duchon, J., Spline Minimising Rotation-Invariant Semi-norms in Sobolev Spaces. In: Schempp W., and Zeller, K., (eds), Constructive Theory of Functions of Several Variables, Lecture Notes in Mathematics, pp 85-100, 1977. 23. Duda, R.O., and Hart, P.E., Pattern Classification and Scene Analysis. Wiley, New York, 1973. 24. Fahlman, S.E., Fast Learning Variations on Backpropagation: An Empirical Study. In: Touretzky, D.S., Hinton, G., and Sejnowski, T., (eds), Proceedings of 1988 Connectionist Models Summer School, Morgan Kaufmann Publishers, San Mateo, California, 1988. 25. Flament, F., Thibault, J., and Hodouin, D., Neural Network Based Control of Mineral Grinding Plants. Minerals Engineering, Vol. 6, No. 3, pp 235-249, 1993. 26. Garcia, G., and Whitman, W.W., Inversion of a Lateral Log Using Neural Networks. SPE 24454, Society of Petroleum Engineers, 1992. 27. Geva, S., and Sitte, J., A Constructive Method for Multivariate Function Approximation by Multilayer Perceptrons. IEEE Transactions on Neural Networks, Vol. 3, No. 4, 1992. 28. Golub, G.H., and Van Loan, C.G., Matrix Computations, 3rd Edition. Johns Hopkins University Press, Baltimore, 1996. 29. Gopal, S., and Woodcock, C., Remote Sensing of Forest Change Using Artificial Neural Networks. IEEE Transactions of Geoscience and Remote Sensing, Vol. 34, No. 2, 1996. 30. Grossberg, S., Studies of Mind and Brain: Neural Principles of Learning, Perception, Development, Cognition, and Motor Control. Reidel Press, Boston, 1982. 31. Hassoun, M.H., Fundamentals of Artificial Neural Networks. MIT Press, Cambridge, 1995. 14 32. Haykin, .S., Neural Networks – A Comprehensive Foundation. Prentice Hall, New Jersey, 1999. 33. Hebb, D., The Organisation of Behaviour. John Wiley, New York, 1949. 34. Hopfield, J.J., Neural Networks and Physical Systems with Emergent Collective Computational Abilities. Proc. National Acad. Sci.,79, pp 2554-2558, 1982.

254

References 35. Hopfield, J.J., Neurons with Graded Response Have Collective Computational Properties Like those of Two-State Neurons. Proc. National Acad. Sci., USA, Vol. 81, 3088-3092. 36. Hughes, W.E., Davis, F.B., and Darey, R.K., Drillhole Interpolation: Mineralised Interpolation Techniques. In: Crawford, J.T., and Hustrulid, W. A., (eds), Open-Pit Mine Planning and Design, AIME, New York, pp 51-64, 1979. 37. Isaaks, E.H., and Srivastava, R.M., Applied Geostatistics. Oxford University Press, New York, 1989. 38. Journel, A.G., and Huijbregts, Ch.J., Mining Geostatistics. Academic Press, London, 1978. 39. Kapageridis I., Denby B., and Hunter G., Integration of a Neural Ore Grade Estimation Tool In a 3D Resource Modeling Package. In: Proceedings of the International Joint Conference on Neural Networks (IJCNN ’99), International Neural Network Society and The Neural Networks Council of IEEE, Washington D.C., 1999. 40. Kapageridis I., Denby B., Neural Network Modelling of Ore Grade Spatial Variability. In: Proceedings of the International Conference for Artificial Neural Networks (ICANN 98), Vol. 1, pp 209 – 214, Springer-Verlag, Skovde, 1998. 41. Kapageridis I., Denby B., Ore Grade Estimation with Modular Neural Network Systems – a Case Study. In: Panagiotou G (ed) Information technology in the minerals industry (MineIT ’97). AA Balkema, Rotterdam, 1998. 42. Kapageridis, I.K., Assessment of Neural Network Prediction Techniques for Grade Estimation. MSc Thesis, AIMS Research Unit, Department of Mineral Resources Engineering, University of Nottingham, 1996. 43. King, R.L., Hicks, M.A., and Signer, S.P., Using Unsupervised Learning for Feature Detection in a Coal Mine Roof. Engineering Applications of Artificial Intelligence, Vol. 6, No. 6, pp 565-573, 1993. 44. Kirsch, A., An Introduction to the Mathematical Theory of Inverse Problems. SpringerVerlag, New York, 1996. 45. Kohonen, T., Correlation Matrix Memories. IEEE Trans. Computers, Vol. C-21, pp 353-359, 1972. 46. Kohonen, T., Self-Organisation and Associative Memory. Springer-Verlag, Berlin, 1984. 47. Kohonen, T., Self-Organising Maps, 2nd Edition. Springer-Verlag, Berlin, 1995. 48. Krasnopolsky, V., Using NNs to Retrieve Multiple Geophysical Parameters from Satelite Data. In: International Joint Conference on Neural Networks (IJCNN ’99), International Neural Network Society and The Neural Networks Council of IEEE, Washington DC, USA, 1999. 49. Krige, D.G., Log-normal – de Wijsian Geostatistics for Ore Evaluation. South African Institute of Mining and Metallurgy, Johannesburg, 1981. 50. Lang, K.J., and Hinton, G.F., The Development of the Time-Delay Neural Network Architecture for Speech Recognition. Technical Report CMU-CS-88-152, Carnegie-Mellon University, Pittsburgh PA, 1988.

255

References 51. Leonard, J.A., Kramer, M.A., and Ungar, L.H., A Neural Network Architecture that Computes Its Own Reliability. Computers Chem. Engineering, Vol. 16, No. 9, pp 819-835, 1992. 52. Leonard, J.A., Kramer, M.A., and Ungar, L.H., Using Radial Basis Functions to Approximated a Function and Its Error Bounds. IEEE Transactions on Neural Networks, Vol. 3, No. 4, pp 624-627, 1992. 53. Looney, C.G., Pattern Recognition Using Neural Networks: Theory and Algorithms for Engineers and Scientists. Oxford University Press, New York, 1997. 54. Lowe, D., Novel ‘Topographic’ Nonlinear Feature Extraction Using Radial Basis Functions for Concentration Coding in the ‘Artificial Nose’. In: Third IEE International Conference on Artificial Neural Networks, Conference Publication 349, pp 95-99, Institute of Electrical Engineers, 1993. 55. Lowe, D., Radial Basis Function Networks. In: Arbib, M.A.(ed), The handbook of Brain Theory and Neural Networks, pp 930-934, MIT Press, Cambridge, 1995. 56. Malki, H.A., and Baldwin, J.L., On the Comparison Results of the Neural Networks Trained Using Well-Logs from one Service Company and Tested on Another Service Company’s Data. In: Simpson, P.K., (ed), Neural Networks Applications, IEEE Technology Update Series, pp 665-668, IEEE, New York, 1996. 57. Maptek, Envisage Core Reference Manual. Maptek/KRJA Systems Ltd, 1998. 58. Matheron, G., The Theory of Regionalised Variables and Its Applications. Les Cahiers du Centre de Morphologie Mathematique de Fontainebleau, Ecole des Mines de Paris, 211p, 1971. 59. Maxwell, A.P., Denby, B., and Pitts, W., The Application of Neural Networks to Size Analysis of Minerals on Conveyors. 25th International Symposium on the Application of Computers and Operations Research in the Minerals Industries (APCOM), Brisbane, Australia, 1995. 60. McCulloch, W., and Pitts, W., A Logical Calculus of the Ideas Immanent in Nervous Activity. Bulletin of Mathematical Biophysics, 1943, Vol. 5, pp. 115-133. 61. Meinguet, J., Multivariate Interpolation at Arbitrary Points Made Simple. Journal of Applied Mathematics and Physics (ZAMP), 30, pp 292-304, 1979. 62. Micchelli, C.A., Interpolation of Scattered Data: Distance Matrices and Conditionally Positive Definite Functions. Constructive Approximation, Vol. 2, pp 11-22, 1986. 63. Millar, D.L., and Hudson, J.A., Rock Engineering System Performance Monitoring Using Neural Networks. Preprints of the ‘Artificial Intelligence in the Minerals Sector’ (one day symposium held at the University of Nottingham), 1993. 64. Minsky, M., Neural Nets and the Brain – Model Problem. Doctoral Dissertation, Princeton University, Princeton NJ, 1954. 65. Moody, J., and Darken, C.J., Fast Learning in Networks of Locally-Tuned Processing Units. Neural Computation, Vol. 1, pp 281-294, 1989.

256

References 66. Morozov, V.A., Regularisation Methods for Ill-Posed Problems. CRC Press, Boca Raton, FL, 1993. 67. Murat, M.E., and Rudman, A.J., Automated First Arrival Picking: A Neural Network Approach. Geophysical Prospecting, Vol. 40, pp 587-604, 1992. 68. Nadaraya, E.A., On Estimating Regression. Theory of Probability and its Applications, Vol. 9, pp 141-142, 1964. 69. Neumann, J. von, Probabilistic Logic and the Synthesis of Reliable Organisms From Unreliable Components. In: Shannon, C., and McCarthy, J. (eds), Automata Series, Princeton University Press, Princeton, 1956, pp. 43-98. 70. Neural Mining Solutions, Neural Computing in Mineral Exploration. White Paper, Neural Technologies, 1996. 71. Noble, A.C., Ore Reserve/Resource Estimation. In: Mining Engineering Handbook, SME. 72. Oja, M., and Nystom, L., The Use of Self-Organising Maps in Particle Shape Quantification. In: Hoberg, H., and von Blottnitz, H., (eds), Proceedings of the XX International Mineral Processing Congress, Vol. 1, pp 141-150, Aachen, Germany, 1997. 73. Park, J., and Sandberg, I.W., Approximation and Radial Basis Function Networks. Neural Computation, Vol. 5, pp 305-316, 1993. 74. Park, J., and Sandberg, I.W., Universal Approximation Using Radial Basis Function Networks. Neural Computation, Vol. 3, pp 246-257, 1991. 75. Parzen, E., On Estimation of A Probability Density Function and Mode. Ann. Math. Statist., Vol. 33, pp 1065-1076, 1962. 76. Petersen, K.R.P., and Lorenzen, L., Gold Liberation Modelling Using Neural Network Analysis of Diagnostic Leaching Data. In: Hoberg, H., and von Blottnitz, H., (eds), Proceedings of the XX International Mineral Processing Congress, Vol. 1, pp 391-400, Aachen, Germany, 1997. 77. Poggio, T., and Girosi, F., Regularisation Algorithms for Learning that Are Equivalent to Multilayer Networks. Science, Vol. 247, pp 978-982, 1990. 78. Poulton, M., and Zaverton, K., Comparison of Neural Network Paradigms for Classification of TM Images. 23rd International Symposium on the Application of Computers and Operations Research in the Minerals Industries (APCOM), Arizona, USA, 1992. 79. Powel, M.D., Approximation Theory and Methods. Cambridge University Press, Cambridge, 1981. 80. Powel, M.D., The Theory of Radial Basis Function Approximation in 1990. In Light, W., (ed.), Advances in Numerical Analysis Vol. II: Wavelets, Subdivision Algorithms, and Radial Basis Functions, pp 105-210, Oxford Science Publications, Oxford, 1992 81. Readdy, L.A., Bolin, D.S., and Mathieson, G.A., Ore Reserve Calculation – Underground Mining Methods Handbook. 82. Ripley, B.D., Pattern Recognition and Neural Networks. Cambridge University Press, Cambridge, 1996.

257

References 83. Ripley, B.D., Statistical Ideas for Selecting Network Architectures. In: Kappen, B., and Gielen, S., (eds), Neural Networks: Artificial Intelligence and Industrial Applications, Springer, London, 1995. 84. Roesler, K.S., Improved Geo-Sensing Using Artificial Intelligence Techniques for Tomographic Interpretation. 23rd International Symposium on the Application of Computers and Operations Research in the Minerals Industries (APCOM), Arizona, USA, 1992. 85. Rogers, S.J., Fang, J.H., Karr, C.L., and Stanley, D.A., Determination of Lithology from Well Logs Using Neural Networks. Bulletin of the American Association of Petroleum Geologists, pp 731-739, May 1992. 86. Rojas, R., A Graphical Proof of the Backpropagation Learning Algorithm. In: V. Malyshkin (ed.), Parallel Computing technologies, PACT 93, Obnisk, Russia. 87. Rojas, R., Neural Networks – A Systematic Introduction. Springer-Verlag, Berlin, 1996. 88. Rosenblatt, F., The Perceptron: A Probabilistic Model for Information Storage and Organisation in the Brain. Pyschol. Rev., 65, 386-408, 1958. 89. Rumelhart, D., and McClelland, J., Parallel Distributed Processing. MIT Press, Cambridge MA, 1986. 90. Rumelhart, D.E., and Zipper. D., Feature Discovery by Competitive Learning. Cognitive Science, Vol. 9, pp.75-112, 1985. 91. Ryman-Tubb, N., and Bolt, G., The Use of Neural Techniques for Integrated Process System Modelling and Optimisation. White Paper, Neural Technologies Limited, 1996. 92. Schalkoff, R.J., Artificial Neural Networks. McGraw-Hill, Computer Science Series, New York, 1997. 93. Schalkoff, R.J., Digital Image Processing and Computer Vision. John Wiley & Sons, New York, 1989. 94. Schofield, D., Surface Mine Design Using Intelligent Computing Techniques. Ph.D. Thesis, Department of Mineral Resources Engineering, University of Nottingham, 1992. 95. Signer, S.P., and King, R.L., Evaluation of Coal Mine Roof Supports Using Artificial Intelligence. In: 23rd International Symposium on the Application of Computers and Operations Research in the Minerals Industries (APCOM), Arizona, USA, 1992. 96. Singh, S.P., (ed.), Approximation Theory, Spline Functions and Applications. Kluwer, Dordrecht, The Netherlands, 1992. 97. SNNS, Stuttgart Neural Network Simulator Version 4.1 User’s Manual. Report No. 6/95, Institute for Parallel and Distributed High Performance Systems (IPVR), University of Stuttgart, 1996. 98. Steibuch, K., Die Lernmatrix. Kybernetik (Biol. Cyber.), 1(1), 36-45, 1961. 99. Stent, G.S., A Physiological Mechanism for Hebb’s Postulate of Learning. Proceedings of the National Academy of Sciences, USA, Vol. 70, pp. 997-1001, 1973. 100. Stevens, C., Die Nervenzelle, in: Gerhin und Nervensystem, 1988, pp. 2-13. 101. Tikhonov, A.N., and Arsenin, V.Y., Solutions to Ill-Posed Problems. W.H. Winston, Washington, DC, 1977.

258

References 102. Tikhonov, A.N., On Solving Incorrectly Posed Problems and Method of Regularisation. Doklady Akademii Nauk USSR, Vol. 151, pp 501-504, 1963. 103. Van der Walt, T.J., van Deventer, J.S.J., Barnard, E., and Oosthuizen, G.D., The Simulation of Ill-Defined Processing Operations Using Connectionist Networks. 23rd International Symposium on the Application of Computers and Operations Research in the Minerals Industries (APCOM), Arizona, USA, pp 881-888, 1992. 104. Van Deventer, J.S.J., Bezuidenhout, M., and Moolman, D.W., On-Line Visualisation of Flotation Performance Using Neural Computer Vision of the Froth Texture. In: Hoberg, H., and von Blottnitz, H., (eds), Proceedings of the XX International Mineral Processing Congress, Vol. 1, pp 315-326, Aachen, Germany, 1997. 105. Waibel, A., Hanazawa, T., Hinton, G., Shikano, K., and Lang, K.J., Phoneme Recognition Using Time-Delay Neural Networks. IEEE Transactions on Acoustics, Speech and Signal Processing, Vol. ASSP-37, pp. 328-339, 1989. 106. Walter, K.U., Neural Network Technology for Strata Strength Characterisation. In: International Joint Conference on Neural Networks (IJCNN ’99), International Neural Network Society and The Neural Networks Council of IEEE, Washington DC, USA, 1999. 107. Wanstedt, S., huang, Y., and Malmstrom, L., Using Neural Networks to Interpret Geophysical Logs in the Zinkgruvan Mine. In: Panagiotou, G., (ed) Information technology in the minerals industry (MineIT ’97). AA Balkema, Rotterdam, 1998. 108. Watson, G.S., Smooth Regression Analysis. Sankya: The Indian Journal of Statistics, Series A, Vol. 26, pp 359-372, 1964. 109. Widrow, B., Generalisation and Information Storage in Networks of ADALINE Neurons. In: Yovits, G.T., (ed.), Self-Organising Systems, Spartan Books, Washington DC, 1962. 110. Williams, P.M., Image Compression for Neural Networks Using Chebyshev Polynomials. In: Alexander, I., and Taylor, J., (eds), Artificial Neural Networks, pp 1139-1142, 1992. 111. Wolfram, S., Mathematica 2.1 User’s Manual. Wolfram Research, Cambridge University Press, 1991. 112. Wu, X., and Zhou, Y., Reserve Estimation Using Neural Network Techniques. Computers & Geosciences, Vol. 9, No. 4, pp 567-575, 1993. 113. Wu, X., Neural Network-Based Material Modelling. Ph.D. Thesis, Dept. Civil Engineering, University of Illinois, Urbana, Illinois, 1991. 114. Xiao, R., and Chandrasekar, V., Development of a Neural Network Based Algorithm for Rainfall Estimation from Radar Observations. IEEE Transactions on Geoscience and Remote Sensing, Vol. 35, No. 1, 1997. 115. Yama, B.R., and Lineberry, G.T., Artificial Neural Network Application for a Predictive Task in Mining. SME, Mining Engineering, February 1999, pp 59-64. 116. Yee, P.V., Regularised Radial Basis Function Networks: Theory and Applications to Probability Estimation, Classification, and Time Series Prediction. Ph.D. Thesis, McMaster University, Hamilton, Ontario, 1998.

259

References 117. Zadeh, L.A., Knowledge Representation in Fuzzy Logic. In: Yager, R.R., and Zadeh, L.A., eds, An Introduction to Fuzzy Logic Applications in Intelligent Systems, Kluwer Academic, Boston, 1992. 118. Zipellius, A., and Engel, A., Statistical Mechanics of Neural Networks. In: Arbib, M.A.(ed), The handbook of Brain Theory and Neural Networks, pp 930-934, MIT Press, Cambridge, 1995.

260