An Online Learning Algorithm for Biometric Scores Fusion - IEEE Xplore

An Online Learning Algorithm for Biometric Scores Fusion Youngsung Kim, Kar-Ann Toh, and Andrew Beng Jin Teoh

Abstract— In biometrics fusion, the match score level fusion has been frequently adopted because it contains the richest information regarding the input pattern. However, in practice, the size of training match scores increases almost exponentially with respect to the number of users. Under this situation, the cost of learning computation and memory usage can be very high. In this paper, we propose an online learning algorithm to resolve the computational problem. While the existing recursive least squares learning approach contains a mismatch between its objective function and the desired classification performance, the proposed online learning directly optimizes the classification performance with respect to fusion classifier design. Since the proposed method includes a weight that varies according to the class type of newly arrived data, an online learning formulation is non-trivial. Our empirical results on several public domain databases show promising potential in terms of verification accuracy and computational efficiency.

I. I NTRODUCTION In conventional security systems, the authentication is often given to tokens or knowledge such as keys, cards or passwords. Anyone who possesses such authentication means can gain access to an intended control application. Instead of using tokens or knowledge, biometrics use inherited behavioral characteristics or biological traits from the user himself for authentication [1], [2], [3]. This solves the problem of relying on transferable means for identity authentication. However, a biometric system which uses a single trait is often affected by practical problems such as noisy data, non-universality, and ease of spoofing. By combining multiple number of biometrics from different traits of the same identity, the system can better withstand or even overcome many of these problems. Fusion of multiple biometrics can be performed either before matching or after matching. For fusion before matching, either sensor level fusion or feature level fusion can be performed. For fusion after matching, the fusion can be performed at abstract level, rank level, or match score level. Among the fusion levels, the match score level fusion is often preferred because it contains the richest information regarding the input pattern and it is relatively easy to access [4]. In this paper, we shall adopt the match score level fusion in an online learning formulation. Fusion of multiple biometrics can be treated as a classification problem since the decision inference is to decide whether This work was supported by the National Research Foundation of Korea (NRF) through the Biometrics Engineering Research Center (BERC) at Yonsei University (No.R112002105090010(2010)). The authors are with School of Electrical and Electronic Engineering, Yonsei University, Seoul, Korea. myradio, katoh, bjteoh@yonsei.ac.kr

978-1-4244-7580-3/10/$26.00 ©2010 IEEE

a query identity belongs to one of the two class labels— genuine-user or imposter. For fusion classifier learning, either non-training based methods or training based methods can be adopted [5]. While non-training based methods assume that the outputs of individual biometric classifiers have known statistical properties, the training-based methods do not need such assumption and they use the match scores directly for learning. For this reason, the training based method has been frequently adopted for fusion classifier design. Since the training data consists of match scores of individual biometrics obtained from matching between two templates from the same or different users, the number of scores grows dramatically with respect to the number of enrollees and the number of samples per enrollee. Under this situation, the learning computation and memory allocation may be very high when the number of enrollees (as well as the number of samples per enrollee) is large. With limited memory and CPU capability, the learning may breakdown when the size of data reaches a huge amount. To circumvent the computational problem, an online or incremental learning can be formulated. Since an online method uses only the newly arrived data to update the system, a low computing cost is anticipated for each processing step. In this paper, we propose an online learning algorithm to fuse several biometrics based on verification error rate minimization [6]. Different from the conventional least squares error minimization approach which treats the fusion classifier design and performance evaluation as a two-stage process, the learning solution optimizes the target total error rate directly. Since the proposed method includes a weight that varies according to class type of newly arrived data, an online learning formulation becomes non-trivial. The paper is organized as follows. In section II, we provide related preliminaries for immediate reference. In section III, our proposed online learning algorithm based on a deterministic solution to minimize the verification error is presented. This is followed by a presentations of empirical evidences on several data sets in section IV. Finally, some concluding remarks are given in section V. II. D EFINITIONS AND P RELIMINARIES A. Linear estimation models for biometrics fusion Linear models have been widely used due to their tractability in optimization and related analysis. In this section, we introduce the method in brief. When we have a set of training data uni-modal biometric scores as input target label

, we can pack them in matrix and vec Ê , Ê . Having the data in place, we are ready to estimate the parameter vector 1 in the sequel. 1) Minimization of residual sum of squared error: To stabilize a solution for estimation, a weight decay regularization can be adopted. The criterion function to be minimized is thus:

tor form:

(1)

where Ê controls the weighting of regularization. By setting the first derivative to zero, we can obtain the least squares solution:

LSE:

(2)

where Ê is identity matrix. 2) Minimization of total error rate (TER): Consider the following error rates: false positive rate (FPR Number of false positive samples Number of negative samples ), false negative rate (FNR Number of false negative samples Number of positive samples · ), and total error rate (TER FPR FNR). The total error rate minimization problem can be solved by adopting a differentiable function to approximate the non-differentiable step function used in error counting. In order to obtain a deterministic closedform solution, a quadratic approximation function has been adopted in [6] with proper parameter settings to activate a single quadratic arm. The solution which minimizes TER is obtained as:

TER:

where

Ê

(3)

·

·

(4)

·

· ,

· and Ê , Ê , · · · Ê and Ê . The variables are indicated by superscripts/subscripts and

for respective imposters and genuine-users. Using the above solutions, (2) for LSE or (3) for TER, an unseen data can be classified after applying a threshold to the output . The above linear model can embed nonlinear functions to facilitate nonlinear input-output mapping. We shall adopt a reduced multivariate polynomial model for the embedding in this work. Here we can simply replace by a polynomial regressor matrix [5].

B. Recursive least squares (RLS) The recursive least squares has been widely used in filter design or incremental learning formulations. This algorithm is operated under the assumption that training data arrives sequentially where the parameter vector is updated based on 1 Since

the variable ¼ is included in Ü Ê·½ .

Ê·½ and

and a constant

in Ü , it becomes

a previously learned parameter and an inverse of a correlation matrix [7]. III. A N A PPROXIMATE O NLINE TER L EARNING In practice, the size of training set is determined by the imposter and genuine-user scores obtained from inter and intra comparisons among all the available templates. The number of imposter scores is given by

and that of genuine

user is given by

where

is the number of users, is the number of samples per user. The total number of scores is given by . To illustrate the almost exponential incre ment of data, consider the following two cases: (i)

and (ii)

. The total number of scores is respectively (i) 4,950 and (ii) 499,500. Here we see that case (ii) is approximately times of case (i) when

of case (ii) is 10 times of that in case (i). Therefore, when more number of users (and samples) is anticipated, a much larger increment of scores is anticipated. In practice, we anticipate the number of users (as well as the number of samples) increases as time goes by. Although the RLS algorithm mentioned in the preliminary section can be used to solve the computational cost problem occurred from a large size of training data, there remained a mismatch between the least squares distance learning objective and the performance criterion (FPR and FNR) with respect to biometric verification. In this section, we propose an online version to learn the TER instead of the least square distance. While the RLS has no weighting for each sequence, the TER includes a varying weight which is determined according to the number of samples of each class. Moreover, since each incoming data can be either genuine-user or imposter, the entry of weighting matrix varies according to the class label of new data. Hence, the task towards an online TER learning is non-trivial. A. Learning based on sequential data Consider a set of training match scores which arrives one by one: , , , . Here, the following notations will be adopted: as a newly arrived data vector, Ê as an accumulated data matrix from to sequence, and Ê as an accumulated label vector. Based on the packed matrix notation, the original batch mode TER solution (3) can be rewritten as:

(5)

where , are weighted correlation matrices, and Ê is a weight matrix · . In order to with elements facilitate a summation operation, we use to indicate the classes respectively. Now we shall separate the newly arrived data and the previously accumulated data . Here, the packed data

can be expressed as:

(6) as elements the matrix,

where Ê contains instead of since it is a partition of

class if is a new weight class if determined by at the sequence. The above weighted correlation matrix and can be rewritten based on the elements of the above partitioned matrices: and

.

we utilize

as a single matrix,

and its generalized inverse

an approximated formulation. Hence, (9) and (10) can be approximated as:

up

´½µ ´½µ ½ ´¾µ

´¾µ ½

½

(13)

Similarly, (8)

up

½

´½µ ´½µ ½ ´¾µ

´¾µ ½

(9)

(10)

Essentially, in (9) and (10), a re-scaling of is needed to update to for both and . Here, we note again that and are different, since consists of as elements and consists of as elements. Remark 1: Different from the generalized (weighted) least squares where the weight elements are tied to each sample, the weights for TER solution are tied to each class. Since the weight of the generalized least squares can be prefixed, an update of the weight matrix is not required. However, the weight of TER varies according to the number of received samples. For this reason, the above scaling is employed for weight matrix updating. B. An approximation to the weighted correlation matrices To facilitate the recursive formulation, we shall remove the summation expression in (9), (10) using partitioned matrices: ´½µ ´½µ ½ (11) ´¾µ ´¾µ ½ ´½µ ´½µ ½ (12) ´¾µ ´¾µ ½

up

Remark 2: Here we note that the inverse of

up

Consequently, (7) and (8) can be rewritten

as:

for

(7)

Our next task is to express the above in (7) and in (8) in terms of previous data and . The previous weighted correlation matrices are defined as and using the previous weight

Ê

In order to express

(14) does

not exist due to its singularity. However, a pseudo inverse is possible. According to the Moore-Penrose generalized inverse theorem [8], a given matrix Ê has a unique inverse Ê if the matrix satisfies the following four conditions (considering only the case where consists of real numbers): (i) , (ii) , (iii) , (iv) . The effect of using this generalized inverse is that the up´½µ ´¾µ dating weight ´½µ becomes an approximated ´¾µ ½ ½ ´¾µ ´½µ

which is derived from weight given by ´½µ ´¾µ ½ ½ ´½µ ´¾µ . The net effect of this approxima´½µ ´¾µ ½ ½ tion is an equal weighting for the accumulated data with the incoming data updated with a new weight according to its class label. We shall observe the impact of such approximation in the experiments. C. An online learning solution Using the above approximations in (13) and (14), an online TER learning solution can be written. The estimated parameter update rule can be written as:

(15)

where the weighted correlation matrix is updated as

up

up

up up

(16)

Here, (16) was obtained based on the matrix inversion lemma [9], [10],

(17)

by letting up , , , . The online TER algorithm is summarized in Algorithm 1. Remark 3: The online TER learning is performed without matrix inversion and multiplication of large matrices since can be obtained using previous each current . This is differentiated from the batch mode TER learning where matrix inversion of is required at each training. This is an advantage of the online TER learning in terms of computational cost. Algorithm 1 Online TER learning algorithm (training) Input: Ü Ê·½ Ê½ , .

Output: . Initialization: Ê¼ ½ ½ Á Ê´·½µ´·½µ , , ¼ random ´¾µ Ê·½ , ¼´½µ ,¼´¾µ , ´½µ ¼ , ¼ . for do if Ü

½ then ´½µ ´½µ ½ , ´¾µ

´¾µ

´½µ

´½µ

½ . else Ü

¾ ´¾µ ´¾µ ½ ,

½ . end if 1. Update weights: ´µ ½´µ , ,

´¾µ ´½µ ´½µ ´¾µ . ½ ½ 2. Update Ê ½ in (16) and solution up

½¾

in (15).

end for

IV. E XPERIMENTS

The data set consists of 3,000 genuine-user scores and 8,997,000 imposter scores. This data set represents the case of a multi-algorithm system [13]. (2) The fingerfinger set was generated based on one fingerprint system (V) which was applied to 6,000 users. The same matcher was used to generate scores from the right index finger and the left index finger of the same user. The data set consists of 6,000 genuine-user scores and 35,994,000 imposter scores. This data set represents the case of a multi-instance system [13]. (3) The finger-face set was generated based on the fingerprints and face templates of 517 users. The fingerprint scores were generated similar to that in (2) and face scores were generated similar to that in (1). The data set consists of 517 genuine-user scores and 266,772 imposter scores. This data set represents the case of a multimodal system [13]. In the following, we run all experiments using Matlab [14] on a PC with Intel Core2 2.5 GHz CPU and 4 GB RAM. B. Experimental setup 1) Setup for experiments on UCI data: a) Protocol: For training, validation, and test purposes, the data set is partitioned into three sets of similar size, each consists of randomly permuted samples. For statistical evidence, the evaluation is performed a total of 12 runs. Treating each set independently, we have six permutations for training-validation-test. Next we randomize the entire data again and perform the six permutations of trainingvalidation-test. This gives 12 test results (we call it 12 runs) where the parameters are selected based on trainingvalidation. Then test accuracy results will be averaged. The input data for all UCI data experiments is normalized to within the interval range except for the OSELM case which is normalized to within the interval [15]. To examine the effects beyond linear input model, we adopt

A. Data sets and computing environment Prior to evaluating the biometric fusion performance, we shall observe the classification performance of the proposed method as a classifier. An experiment is performed on the widely used public domain database. Five data sets from the University of California, Irvine (UCI) data sets [11] will be adopted in the experiment to test the effect of approximation in classification accuracy. For the evaluation of biometric performance, the NIST Biometric Scores Set– Release 1 (BSSR1) [12] will be adopted in a scores fusion scenario. UCI data: TABLE I shows a summary of the selected five data sets used in classifier performance evaluation. NIST–BSSR1 scores data: TABLE II summarizes the data sets used in our biometric performance evaluation. The NIST–BSSR1 database consists of raw outputs of similarity scores from comparing two templates. There are three biometrics score sets namely face–face, finger– finger, and finger–face in the database. (1) The face–face set was generated based on two different face recognition systems (C, G) which were applied to 3,000 users.

TABLE I S UMMARY OF UCI DATA SETS FOR CLASSIFICATION . (i) #cases:

Dataset Name (1) (2) (3) (4) (5)

(Training+Validation+Test)

(ii)#feats ():

345 683 768 830 958

6 9 8 5 9

Bupa-liver-disorder Wisconsin-breast-cancer Pima diabetes Mammographic masses Tic-tac-toe

TABLE II S UMMARY OF NIST–BSSR1 DATA SETS FOR SCORES FUSION .

Dataset

biometric traits

#matchers: (name)

#users

#scores per matcher: (genuineimposter)

(1) Face –Face (2) Finger –Finger (3) Finger –Face

Face (2 images/person) Fingerprint (right&left fingers) Fingerprint & Face

2 (C & G)

3,000

3,0008,997,000

2 (V 2)

6,000

6,00035,994,000

4 (C & G & V 2)

517

517266,772

Remark: C, G, and V are the name of matchers.

the RM2 polynomial model as an expansion feature (by replacing by a polynomial regressor [5]) for both the batch TER and the online TER which are abbreviated respectively as TER-RM and OTER-RM. The proposed method is compared with the-state-of-the-art algorithms which include IPCA [16], ILDA [17], ISVM [18], and OSELM [15]. All methods will be experimented under the same testing protocol described above. For performance evaluation measure, the classification accuracy is defined as:

2) NIST–BSSR1 data results: TABLE IV shows the averaged accuracy, HTER and their standard deviations among the compared methods. Under the given settings, the proposed OTER-RM shows a near 100 accuracy with low HTER values. This is similar to TER-RM for all the experimented data sets. In TER methods, fusion of face and fingerprint shows a better HTER performance than that of both multiple same biometrics (two face biometrics fusion and two fingerprint biometrics fusion). ISVM shows a poorer HTER than TER methods on the given setting. Moreover, the Number of correctly classified samples ISVM cannot be computed for large data size under the given Accuracy memory capability. This is because when a high polynomial Number of total samples (18) order and a large number of support vectors are adopted, the b) Parameter setting: The parameters including (i) the polynomial kernel terms grow exponentially. polynomial order in TER-RM, OTER-RM, ISVM, (ii) the 3) CPU and learning trend: Fig. 1 shows the training number of hidden nodes in OSELM, (iii) the number of execution (CPU) time of TER-RM and OTER-RM when the bases in IPCA, ILDA are selected based on the above data is arriving one by one. The experiment is performed for mentioned training-validation-test protocol. For each run, ten ten different RM orders ([1, 10]) using the Mammographic different values of the classifier parameters are examined: masses data from UCI. Here, the batch based TER-RM (i) polynomial order ranged from 1 to 10, (ii) the number shows a higher training CPU time than that of online TERof hidden nodes ranged from the feature dimension to ten RM for each arriving data in general. This is due to the times of the feature dimension, (iii) ten different numbers of re-computation of entire data in the batch mode. As data bases. Then those selected parameters are used for testing. is accumulated, batch based TER-RM shows an exponential For TER-RM and OTER-RM, both and are set as 0.5. growth of training execution time while the online TER-RM The bias term is set as following [6]. shows a nearly constant execution time. Moreover, when the polynomial order () increases, the computational time of 2) Setup for experiments on NIST–BSSR1 data: a) Protocol: Since SVM method is found to be among batch based TER-RM increases too. This shows that the the best methods for fusion in literatures [4], [5], we shall approximated online learning can reduce the learning cost adopt an incremental SVM (ISVM) to compare with the under practical scenario when a large number of data is proposed online TER method for fusion. For performance accumulated. To compare the accuracy progress between batch and evaluation, apart from the classification accuracy which is online TER-RM techniques, we plot the accuracy along the defined in the above setup, the half total error rate which is learning process. Here, the incoming data ( ) is predicted commonly used in biometric field [19] is adopted: based on the previous solution ( ) and compared with the ground truth ( ) to obtain the test ac (19) curacy. Subsequently, the incoming data is used for training HTER FPR FNR in next cycle based on its true label. From Fig. 2, we see a b) Parameter setting: Here, only low polynomial orders similar learning trend for both batch based and online based within the range of 1 to 3 will be used in this experiment due TER. This validates the progressive learning capability of the to two reasons. Firstly, the decision boundary for biometric proposed method. problems is not too complex [5]. Another reason is that V. C ONCLUSION both TER-RM and ISVM encounter difficulty in terms of In this paper, an online learning formulation to directly both computational and memory requirements for high order optimize the decision total error rate with respect to a polynomials. fusion classifier was proposed for multimodal biometric scores fusion. Since the proposed method includes a classC. Results specific weight which varies according to incoming data, the 1) UCI data accuracy results: TABLE III shows the av- update of weights and correlation matrices has been noneraged classification accuracy and standard deviation among trivial. Through a generalized inverse multiplication to an the compared methods. Here we see that the online TER- updating weight, an approximated online learning solution RM shows a comparable accuracy with that of the origi- has been proposed. Our empirical evaluations on publicly nal batch based TER-RM. Since OTER-RM and TER-RM available UCI and NIST-BSSR1 databases showed promising use different RM orders which are obtained from training- potential in terms of good fusion performance with a low validation, their accuracies can be different. Both the batch computational cost during updating. and online based TER methods show either comparable or better accuracy as compared with those state-of-the-art methods [16], [17], [18], [15].

R EFERENCES [1] A. Jain, S. Pankanti, and R. Bolle, BIOMETRICS: Personal Identification in Networked Society. Kluwer Academic Publishers, 1999.

TABLE III C OMPARISON OF AVERAGE CLASSIFICATION ACCURACY ( ) STANDARD DEVIATION ( ) FOR FIVE DATA SETS . Dataset (1) (2) (3) (4) (5)

TER-RM

Bupa-liver-disorder Wisconsin-breast-cancer Pima diabetes Mammographic masses Tic-tac-toe

67.15 96.60 74.93 81.45 97.31

6.40) 1.72) 1.64) 2.21) 1.59)

( ( ( ( (

OTER-RM 68.03 95.90 74.77 82.41 97.52

IPCA

4.35) 1.87) 1.87) 2.15) 1.49)

( ( ( ( (

54.06 95.43 66.08 72.44 64.07

ILDA

4.01) 1.19) 3.74) 3.56) 3.06)

( ( ( ( (

55.79 95.50 65.53 71.42 93.53

3.04) 1.65) 2.64) 2.82) 1.78)

( ( ( ( (

ISVM 57.97 96.60 76.01 79.07 76.72

0.22) 1.41) 1.49) 2.81) 1.66)

( ( ( ( (

OSELM

1.34) 11.00) 2.76) 12.24) 1.31)

57.83 ( 85.11 ( 66.18 ( 67.13 ( 64.64 (

TABLE IV C OMPARISON OF AVERAGE ACCURACY ( ) AND HTER ( ) STANDARD DEVIATION ( ) FOR NIST–BSSR1 DATA SETS .

Dataset

TER-RM

Accuracy OTER-RM

ISVM

TER-RM

HTER OTER-RM

ISVM

97.8945(

) 99.9971( ) 99.9919( )

97.5509(

) 99.9970( ) 99.9855( )

out of memory out of memory 99.9543( )

4.1635( ) 4.4964( ) 0.5108( )

4.0300( ) 3.7714( ) 0.3483( )

out of memory out of memory 11.8192( )

(1) Face–Face (2) Finger–Finger (3) Finger–Face

90

0.035 TER−RM(r=1) TER−RM(r=2) TER−RM(r=3) TER−RM(r=4) TER−RM(r=5) TER−RM(r=6) TER−RM(r=7) TER−RM(r=8) TER−RM(r=9) TER−RM(r=10) OTER−RM(r=1) OTER−RM(r=2) OTER−RM(r=3) OTER−RM(r=4) OTER−RM(r=5) OTER−RM(r=6) OTER−RM(r=7) OTER−RM(r=8) OTER−RM(r=9) OTER−RM(r=10)

0.025

0.02

0.015

0.01

80 TER−RM(r=1) TER−RM(r=2) TER−RM(r=3) TER−RM(r=4) TER−RM(r=5) TER−RM(r=6) TER−RM(r=7) TER−RM(r=8) TER−RM(r=9) TER−RM(r=10) OTER−RM(r=1) OTER−RM(r=2) OTER−RM(r=3) OTER−RM(r=4) OTER−RM(r=5) OTER−RM(r=6) OTER−RM(r=7) OTER−RM(r=8) OTER−RM(r=9) OTER−RM(r=10)

70

60 Test accuracy (%)

0.03

Training CPU time (seconds)

50

40

30

20

0.005

10

0

0 0

Fig. 1.

100

200

300

400 500 Number of samples

600

700

800

Training CPU time versus number of data samples.

[2] A. K. Jain, A. Ross, and S. Pankanti, “An introduction to biometric recognition,” IEEE transactions on Circuits and Systems for Video Technology, vol. 14, no. 1, pp. 4–20, 2004. [3] A. K. Jain, P. Flynn, and A. A. Ross, Handbook of Biometrics. Springer, 2008. [4] A. Jain, K. Nandakumar, and A. Ross, “Score normalization in multimodal biometric systems, pattern recognition,” Pattern Recognition, vol. 38, no. 12, pp. 2270–2285, December 2005. [5] K.-A. Toh, J. Kim, and S. Lee, “Biometric scores fusion based on total error rate minimization,” Pattern Recognition, vol. 41, no. 3, pp. 1066–1082, 2008. [6] K.-A. Toh and H.-L. Eng, “Between classification-error approximation and weighted least-squares learning,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 30, no. 4, pp. 658–669, April 2008. [7] S. Haykin, Adaptive Filter Theory (4th edition). Prentice Hall, 2001. [8] R. Penrose, “A generalized inverse for matrices,” Mathematical Proceedings of the Cambridge Philosophical Society, vol. 51, no. 3, pp. 406–413, July 1955. [9] J. Sherman and W. J. Morrison, “Adjustment of an inverse matrix corresponding to a change in one element of a given matrix,” The Annals of Mathematical Statistics, vol. 21, no. 1, pp. 124–127, March 1950. [10] M. A. Woodbury, “Inverting modified matrices,” Technical Report, vol. 42, 1950. [11] A. Asuncion and D. Newman, “UCI machine learning repository,” 2007. [Online]. Available: http://www.ics.uci.edu/ mlearn/MLRepository.html

0

Fig. 2.

100

200

300

400 500 Sample index

600

700

800

Progress of test accuracy over number of samples.

[12] National Institute of Standards and Technology, “NIST biometric scores set–release 1,” 2004. [Online]. Available: http://www.itl.nist. gov/iad/894.03/biometricscores [13] A. A. Ross, K. Nandakumar, and A. K. Jain, Handbook of multibiometrics. Springer, 2006. [14] The Mathworks Inc., “Matlab.” [Online]. Available: http://www. mathworks.com/products/matlab/ [15] G.-B. Huang, N.-Y. Liang, H.-J. Rong, P. Saratchandran, and N. Sundarajan, “On-line sequential extreme learning machine,” in Proc. the IASTED International Conference on Computational Intelligence (CI’2005), July 2005. [16] J. Weng, Y. Zhang, and W.-S. Hwang, “Candid covariance-free incremental principal component analysis,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 25, pp. 1034–1040, 2003. [17] T.-K. Kim, S.-F. Wong, B. Stenger, J. Kittler, and R. Cipolla, “Incremental linear discriminant analysis using sufficient spanning set approximations,” in Proc. IEEE Conference on Computer Vision and Pattern Recognition, 2007 (CVPR’07), June 2007, pp. 1–8. [18] C. P. Diehl and G. Cauwenberghs, “SVM incremental learning, adaptation and optimization,” in Proc. the 2003 International Joint Conference on Neural Networks, 2003, pp. 2685–2690. [19] N. Poh and S. Bengio, “Database, protocols and tools for evaluating score-level fusion algorithms in biometric authentication,” Pattern Recognintion, vol. 39, no. 2, pp. 223–233, 2006.