Towards Effective Fingerprint Prediction using

Proceedings of CIBB 2014

Towards Effective Fingerprint Prediction using Computational Intelligence Methods Promise Molale(1), Bhekisipho Twala(2) (1) Department of Statistics and Operational Research, University of Limpopo, P O Box 107, Medunsa 0240, South Africa, [email protected] (2) Department of Electrical and Electronic Engineering Science, University of Johannesburg P.O. Box 524, Auckland Park 2006, South Africa, [email protected] Keywords: biometrics, fingerprints, computational intelligence, prediction, classification. Abstract. Fingerprint prediction provides an important mechanism in a fingerprint database. An accurate and consistent predicting model can greatly reduce fingerprint matching time for a large database. We present a two-step modeling procedure which achieved an accuracy of 97.3%, which is better than previously reported in the literature. The first step involved selecting the data using Linear Discriminant Analysis (LDA) and in the second step we build the models using five classification methods namely: Decision Trees (DT), Support Vector Machine (SVM), Naïve Bayes Classifier (NBC), k-Nearest Neighbor (k-NN) and Logistic Regression (LR). We classified the fingerprints into five classes: whorl, plain arch, tented arch, left loop and right loop. The assessment was done using the National Institute of Standard and Technology (NIST) database. Logistic Regression performed the best when compared with the other methods. 1 Scientific Background The use of fingerprints as a unique identifier is probably one of the oldest biometrics. Fingerprints intended as legal “signatures” were found in ancient Babylonian clay tablets about 2000 BCE. The first scientific paper examining the structures in fingerprints was published in 1694, and the first fingerprint classification system was proposed by Jan Purkinje in 1823 [1]. By the beginning of the 20th century, fingerprint identification was being used for police work. When the FBI’s fingerprint identification division was established in 1924, it had 810 000 fingerprint cards in its database. The number had grown to more than 200 million by 2003 and more than 500 million by 2009. This rapid increase in database size and the resulting time required for matching led to the first automated fingerprint identification system (AFIS) in the 1969 [2]. Fingerprint classification has long been an important part of any fingerprint system. The most widely used system of the classification of fingerprints is by Henry [3] although the initial scientific studies on fingerprints were made by Galton [4] with the former having more major classes. Recently, several classifiers, supervised learning methods, have been developed for fingerprint classification [2]. Most of the existing work dealing with fingerprint prediction has looked at different classifiers: i.e. Linear Discriminant Analysis (LDA), Logistic Discrimination (LgD), k-

Proceedings of CIBB 2014 Nearest Neighbour (k-NN), Artificial Neural Network (ANN), Association Rules (AR) Decision Trees (DT), Naive Bayes Classifier (NBC) and Support Vector Machine (SVM). These classifiers generate different model forms: linear, regression, density estimation, networks and trees. The performance of individual classifiers tends to be out performed by the ensemble classifier and other classifier tend to perform poorly [8,10]. Classifiers are learners that are trained with a particular dataset before a model is obtained. The quality of the model is determined by the training step which involves a number of processes. The key factor that affects the accuracy of the model is the sample data used during the training step. If the features from the fingerprints cannot distinguish between the different classes then the model will not be able to correctly predict a new instant. Currently there are two fingerprint databases that are commonly used for assessing classification methods, i.e. National Institute of Standard and Technology (NIST) databases and the international Fingerprint Verification Competition (FVC) databases. Our experiments were run using the National Institute of Standard and Technology (NIST 4 special fingerprints) database [5, 6]. According to literature it was reported that NIST 4 special fingerprint database has about 17 %( i.e. 340 out of 2000) fingerprint pairs which are ambiguous [19]. That means some of the data has noise which would make a classifier not form an accurate model [8]. To overcome this problem we propose a two-step procedure which identifies the noisy data in the first step. Linear Discriminant Analysis was used to select the data for modelling. This paper looks at the five class case: whorl, plain arch, tented arch, left loop and right loop. The images below are level 1 patterns which were used to manually classify the fingerprints in the NIST 4 special database. Figure 1: The CT Class Fingerprints: (a) Whorl and (b) Twin Loop. The Arches: (c) Tented Arch (TA) and (d) Plain Arch (PA). The Loops: (e) Left Loop (LL) and (f) Right Loop (RL) [7].

(a)

(b)

(e)

(c)

(f)

(d)

Proceedings of CIBB 2014

Figure 2: The two-step Modelling procedure

STEP 1: Data selection using Linear Discriminant Analysis

STEP 2: Build model

The major contributions of the paper include the development of a two-step modeling procedure for predicting fingerprint patterns, and an empirical comparison of existing computational intelligence methods used for fingerprint prediction. To the best of our knowledge this is the first study that has looked at the proposed two-step procedure for improving fingerprint prediction. The rest of the paper is organized as follows. Section 2 presents the existing methods and develops a new strategy for improving fingerprint prediction. Section 3 discusses computational intelligence methods used for fingerprint prediction. Section 4 discusses the experimental design and presents the results. Finally, the paper is concluded in Section 5. 2 The Two-Step Modeling Procedure 2.1 Justification for the proposed modelling procedure The proposed two-step procedure was inspired by the “poor performance” of Linear Discriminant Analysis [8] and the concept of prediction. Prediction has to do with statistical inference. Proper Inference can only be draw when the sample data is a correct representation of the population. The sample is then used to build a model for prediction. So the starting point is the sample. Since some of the NIST data had noise that meant the data needed be split into two samples from two different populations (i.e. data that has noise and data that does not have noise). This would allow for a model to be built on sample data that is not noisy to predict new instances that are not noisy. Discrimination analysis attempts to establish whether a set of variables can be used to distinguish between two or more groups or classes. Linear discriminant analysis is used for studying the relationship between a set of predictors and a categorical response and that is why we choose it to split the data into the two samples. Based on our survey related to fingerprint prediction we have seen 3 major focal areas that were researched on in order to improve fingerprint prediction. The first focal area was improving the algorithms, for example [20] who introduced a two-step training process. The second focal area was the fingerprint features, i.e. classifying the fingerprint database on minutiae sets and singular points [7] to improve fingerprint prediction. The third focal area was combining classification techniques to improve fingerprint prediction [8].

Proceedings of CIBB 2014 The one thing that has been overlooked is the sample data which is a factor that can improve fingerprint prediction. Sample data is extremely important when it comes to prediction, i.e. inference. 2.2 Linear discriminant Analysis In the first step of the proposed procedure, Linear Discriminant analysis was used to select the sample data. The two most important assumptions in LDA are that the data (for the variables) represent a sample from a multivariate normal distribution and the variance/covariance matrices of variables are homogeneous across groups (they are equal). With these assumptions, a linear discriminant function can be computed. In order to understand how the posterior probabilities are computed for classification purposes, it is important to first consider the so-called Mahalanobis distance (a measure of distance between two points in the space defined by two or more correlated variables). Mahalanobis distance is used to do the classification, and thus, derive the probabilities. Canonical analysis [10] was also been performed when dealing with discriminant functions for multiple classes. 3 Computational Intelligence Methods 3.1 Logistic Regression Logistic regression Analysis due to [15] is related to Logistic Discriminant Analysis. The dependent variable can only take values of 0 and 1, say, given two classes. This technique is partially parametric, as the probability density functions for the classes are not modelled but rather the ratios between them. A new element is classified as 0 if  0 .  c and as 1 if  0 .  c , where c is the cut-off point score. Typically, the error rate is lowest for cut-off point = 0.5 [16]. In fact, the slope of the cumulative logistic probability function has been shown to be steepest in the region where, say,  i  0.5 . Thus, if  i  0.5 , the unknown instance is classified as “1” and if  i  0.5 , the unknown instance is classified as “0”. The generalization of the logistic discrimination approach to the case of three or more classes is known as the Multinomial Logit Model (MLM) and the derivation is similar to that of the logistic discrimination model. For more details about MLMs the interested reader is referred to Hosmer and Lameshow [17]. 3.2 Support Vector Machine Support Vector Machines (SVMs) are pattern classifiers that can be expressed in the form of hyper-planes to discriminate positive instances from negative instances pioneered by [13]. The principal goal of SVM approach is to fix the computational problem of predicting with kernels. The basic idea of SVMs is to determine a classifier or regression machine which minimizes the empirical risk (i.e., the training set error) and the confidence interval (which corresponds to the generalisation or test set error). In other words, the idea is to fix the empirical risk associated with architecture and then use a method to minimize the

Proceedings of CIBB 2014 generalisation error. Motivated by statistical learning theory, SVMs have successfully been applied to numerical tasks, including classification. They can perform both binary classification (pattern recognition) and real valued function approximation (regression estimation) tasks. 3.3 k-Nearest Neighbor One of the most venerable algorithms in Machine Learning is the nearest neighbor. kNN methods are sometimes referred to as memory-based reasoning or instance-based learning (IBL) or case-based learning techniques [14] and they have been used for classification tasks. They essentially study by assigning to an unclassified sample point the classification of the nearest of a set of previously classified points. The entire training set is stored in the memory. To classify a new instance, the Euclidean distance (possible weighted) is computed between the instance and each stored training instance and the new instance is assigned the class of the nearest neighbouring instance. More generally, these k-NNs are computed and the new instance is assigned the class that is most frequent among the k neighbours. Also, in our experiments, k is set to 5. 3.4 Decision Tree DTs [12] are powerful and popular tools for classification and prediction. A DT is constructed by repeatedly splitting the sample in two descendant subsets starting with the entire sample. The problem is to find the attribute that splits the sample optimally. One property that sets DTs apart from all other classifiers is their invariance to monotone transformations of the predictor variables. For example, replacing any subset of the predictor variables { x j } by (possible different) arbitrary strictly monotone functions of them {x j  m j ( x j ) , gives rise to the same tree model. Thus, there is no issue of having to

experiment with different possible transformations m j ( x j ) for each individual predictor x j to try to find the best ones. This invariance provides immunity to the presence of extreme values (“outliers”) in the predictor variable space. In addition, DTs incorporate a pruning scheme that partially addresses the outlier (noise) removal problem. 3.5 Naïve Base Classifier The NBC is perhaps the simplest and most widely studied probabilistic learning method based on the so-called Bayesian theorem and is particularly suited when the dimensionality of the inputs is high. The independence assumptions often do not have an impact on reality. Therefore, they are considered as naive. Despite its simplicity, NBC can often outperform more sophisticated classification methods. NBC learns from the training data, the conditional probability of each attribute Ai , given the class label C. The NBC can handle an arbitrary number of predictor (independent) attributes whether continuous or categorical. The strong major assumption is that all Ai attributes are independent given the value of the class C. Classification is therefore done applying Bayes rule to compute the

Proceedings of CIBB 2014 probability of C given A1 ,..., An and then predicting the class with the highest posterior probability [11]. 4 Experimental Design 4.1 Empirical Data collection The data we have used is obtained from NIST special database 4 biometric images dataset [5, 6]. The NIST dataset consists of 256 grey-level images; two different fingerprint instances (F = first, S = second) are present for each finger. Each fingerprint was manually analysed by a domain expert and assigned to one of the following 5 classes: arch (A), left loop (L), right loop (R), tented arch (T) and whorl (W). The dataset contains 2000 uniformly distributed fingerprint pairs in five classes (A = 3.75%; T = 2.9%; L = 33.8%; R = 31.7% and W = 27.9%). However, only 1682 fingerprint images were used for experimentation. Out of these 164 images were plain arch, 146 were tented arch, 464 were left loop, 348 were right loop, and 560 were whorl. The dataset consists of 6 variables, i.e., five independent continuous variables and one dependent categorical variable. The categorical variable, i.e., the class, was manually classed by a human expert and the five independent numeric variables, A-Orient, B-Orient, C-Orient, D-Orient and X-Orient, which extracted from the fingerprint orientation map. A fingerprint orientation map is a matrix whose cells contain the local orientation of each and every ridge in the original fingerprint image. Each of the five numeric variables represents an average orientation value for a 25 by 25 square block of pixels, where the first four blocks (A, B, C, and D) are situated at the corners of the chosen region of interest (ROI), while the fifth block (X) is situated at the centre of the ROI. Figure 3 shows an original fingerprint image overlaid with a typical ROI, and shows the respective location of the blocks that represent the mentioned five numeric variables. Figure 3: A fingerprint image overlaid with a typical ROI, showing the blocks that represent the mentioned five numeric variables

4.2 Method of Modelling The NIST dataset was used to carry out the prediction of the classifiers estimated model. A k-fold cross validation, i.e., 5-fold cross validation, was used to estimate the accuracy of the classifiers estimated model. The dataset was divided into two parts, i.e.,

Proceedings of CIBB 2014 training and validation set in ratio 8: 2. Thus 80% was used for training the model and 20% was used for validating accuracy of the model. The dataset was divided into 5 subsets (20% each) and the experiments were repeated five times for each classifier, i.e., changing the testing data in each of the 5 runs. The five classification methods are performed using the Waikato Environment for Knowledge Analysis (WEKA) software [18]. WEKA is a collection of machine learning algorithms for data mining tasks. The algorithms can either be applied directly to the dataset or called from your own Java code. For the purposes of this study, we follow the former procedure. We also used default values so that our results can be comparable with the work done by [8]. 4.3 Results: Tables and Figures The results are summarized figure 4 and table 1. LR achieved the highest accuracy (97.3%), followed by SVM (95.5%), k-NN (94.6%), DT (92.3%) and respectively NBC (92.3%). The difference in performance between the classification methods is not significant at the 5% level.

Misclassifcation Error (%)

Figure 4: Classification Methods 10.00 8.00 6.00 4.00 2.00 0.00 DT

SVM

NBC

k-NN

LR

Classification Mehods

Table 1: Five classification methods given six performance measures DT

SVM

NBC

k-NN

LR

Random Mean Square Error (RMSE)%

16.66

31.90

15.55

12.65

9.21

Relative Absolute Error (RAE)%

11.72

80.44

15.33

12.27

7.88

3.52

24.18

4.61

3.69

2.37

43.07

82.48

40.21

32.71

23.83

96.70

99.20

99.40

99.70

99.90

2.78

3.02

0.07

0.00

0.16

Mean Absolute Error(MAE)%

Root Relative Squared Error (RRSE)% Area Under the ROC Curve %

Time taken to build model(seconds)

Proceedings of CIBB 2014 The model with lower ME, RMSE, RAE, MAE, RRSE, with area under the ROC curve 75% above and that takes the least time to build the model is considered to be the best among others. As shown in figure 1 and table 1 tables, Logistic regression method is found to be effective in predicting fingerprint class. 5 Conclusion In this research we have made a comparative analysis of five classification methods. We have obtained results using the data obtained from NIST database. The results show that logistic regression had better results as compared with other classifiers. k-Nearest neighbor took the least amount of time to build the model. Hence LR had the best performance. Fairly speaking all the methods built accurate models (none of them struggled with the data) and when we tested for significant difference in the results, we found that the results were not significantly different at the 5% level. Decision trees, k-nearest neighbor, support vector machine and Naıve Bayes classification methods are among the top ten algorithms in data mining [9]. The proposed two-step modelling procedure would aid the other existing methods. For instance improving individual classifiers performance would make ensemble (combining individual classifiers for improving predictive accuracy) even more accurate. The strengths of the two-step modelling procedure are that:  



It aids other proposed methods for improving fingerprint prediction. It consistently improves the performance of all the individual classification methods, none of the classifiers struggle with forming accurate models. Correct Inference is made.

In fact, we assumed that the data we used was noise free. Future work will include that data that had noise in all our experiments. We also plan on replicating our study to predict iris class based on other machine learning algorithms. This will help with improving search time and accuracy of the iris system. Acknowledgments This study was carried out with the financial support of the Statistics and Operational Research Department at the University of Limpopo, Medunsa Campus and Department of Mechanical engineering Science University of Johannesburg. We would also like to thank NIST Special 4 for allowing us to have access to their fingerprint images database and further use them for all our experiments. The authors would like to also thank Ishmael Msiza, at Chenits Technology Enterprise (ChTE) for helping with the feature extraction process. References [1] P. Campisi, R.L. Carter, C.W. Crooks, V. Govindaraju, W. Hamilton, J. Hurt, A.A. Ross, C.J. Tilton and M. Jr. Waymire. ”Certified Biometric Professional Learning System. Biometrics: Biometric Modalities”. IEEE, 2010. [2] D. Maltoni, D. Maio, A.K. Jain and S. Prabhakar. ”Handbook of fingerprint Recognition”. Springer Science + Business Media, LLC, New York, 2003.

Proceedings of CIBB 2014 [3] E.R. Henry. ”Classification and Uses of Finger Prints”. London: Routledge, 1900. [4] F .Galton. ” Finger Prints”. London: McMillan, 1892. [5] C.I. Watson and C.L. Wilson. ”NIST Special Database 4: Fingerprint Database”. Technical report, National Institute of Standards and Technology, 1992. [6] G.T Candela and R. Chellappa. ” Comparative Performance of Classification Methods for Fingerprints” NIST Technical Report NISTIR 5163, 1993. [7] I. S. Msiza, B. Leke-Betechuoh, F.V. Nelwamondo and N. Msimang. ”A fingerprint pattern classification approach based on the coordinate geometry of singularities”. Proc. IEEE International Conference on Systems, Man, and Cybernetics San Antonio, pp. 510-51, 2009. [8] P. T. Molale, B. Twala and S.M. Seeletse. ”Fingerprint prediction using classifier ensembles”. 53 ANNUAL South African Statistical Association (SASA) Conference, CSIR Convention center, pp. 47-61, 2011. [9] X. Wu, V. Kumar, J.R. Quinlan, J. Ghosh, Q. Yang, H. Motoda, G.J. McLachlan, B. Liu, P.S. Yu., Z.H. Zhou, M. Steinbach, D.J. hand and D. Steinberg. ”The top 10 algorithms in data mining”. Knowledge Information Systems, vol.14, pp. 1-37, 2008. [10] G.J. McLachlan. ”Discriminant Analysis and Statistical Pattern Recognition, John Wiley & Sons”.2005. [11] I. Kononenko. ”Semi-naïve Bayesian classifier”. Proc. of the European Conference on Artificial Intell., pp. 206-219, 1991. [12] J.R Quinlan. ” C.4.5: Programs for Machine Learning”. Los Altos,Morgan Kauffman Publishers, Inc., California, 1993. [13] V. Vapkin. ”The Nature of statistical Learning theory”. Berlin: Springer-Verlag, 1995. [14] D.W. Aha, D. Kibbler and M.K. Albert. ”Instance-based learning algorithms”. Machine Learning, vol. 6, pp. 37-66, 1991. [15] D.R. Cox. ”Some Procedures Associated with The Logistic Qualitative Response Curve”. Wiley, New York, pp. 55-71, 1966. [16] D.E. Rumelhart, G.E. Hinton and R.J. Williams.” Learning Internal Representations by Error Propagation”. MIT Press, pp. 318-362, 1986. [17] D.W. Hosmer and S. Lemeshow. ”Applied Logistic Regression”. Wiley, New York, 1989. [18] I.H. Witten and E. Frank. ”Data Mining: Practical Machine Learning Tools and Techniques”. San Francisco, Morgan Kaufmann Publishers, CA, USA, 2005. [19] D. Maltoni, D. Maio, A.K. Jain and S. Prabhakar. ”Handbook of fingerprint Recognition”. Springer Science + Business Media, New York, 2009. [20] M. Kamijo. ”Classifying fingerprint images using neural network: Deriving the classification state”. IEEE International Conference on Neural network, vol. 3, pp. 1932-1937,1993.