1 Biometric Authentication: A Copula Based ... - Satish Giridhar Iyengar

2 downloads 0 Views 822KB Size Report
show how copula theory enables the development of a mathematical frame- work for fusion of ... Further, methods for multi-biometric fusion can be classified into two broad .... later, there is also a reduction in multivariate PDF estimation complexity .... hypotheses H0,··· ,Hk is true based on the acquired observation vector of.
1 Biometric Authentication: A Copula Based Approach Satish G. Iyengara , Pramod K. Varshneya and Thyagaraju Damarlab a

EECS Department, Syracuse University, Syracuse, NY 13244, USA b Army Research Laboratory, Adelphi, MD, USA

1.1 Introduction Biometrics involves the design of automatic human recognition systems that use physical features such as face, fingerprints, iris or behavioral traits such as gait or rate of keystrokes, etc. as passwords. For example, in building access control applications, a person’s face may be matched to templates stored in a database consisting of all enrolled users. Decision to allow or deny entry is then taken based on the similarity score generated by the face matching algorithm. Such security systems that rely on biometrics have several advantages over the conventional ones where alphanumeric personal identification numbers (PINs) are provided to the users. For example, a PIN, if leaked, may be used by an unauthorized person causing serious security concerns. However, a person’s physical signature belongs only to that individual and it is very difficult if not impossible to emulate it. Further, biometric systems may be more convenient and user-friendly as there is no code to remember or any token to carry. However, there exist several limitations. Biometric traits such as face and voice change with age. One may be required to update the systems’ database to counter this time variabity. Environmental noise and noise in the acquisition system further affect the accuracy and reliability of the system. Overlap between physical features or inter-class similarity (e.g., twins with identical facial features) limits the system’s ability to distinguish between the classes. There also exist intra-class variations due to differences between the acquired biometric signature of an individual requesting the access and his/her template registered in the database. Apart from noise sources stated above, these differences may also stem from the psychological and behavioral variations of an individual at different instances of time. One method to overcome these limitations is to consider combining multiple sources of information. It 1

2

Fig. 1.1. A multi-biometric authentication system. Biometric signatures of disparate modalities such as face, iris and fingerprint are fused.

may include fusing observations of disparate modalities (e.g., voice and face) or multiple features (extracted from the same biometric trait), multiple classifiers or multiple samples of the same source. This method of fusing several biometric sources is called multi-biometrics. Figure 1.1 shows a multimodal biometric system which considers fusion of disparate biometric signatures such as face, iris and fingerprints. Fusion of multimodal biometrics offers several advantages and new possibilities for system improvement. For example, in video/image based person authentication systems for security and access control applications, the system performance degrades when the subjects age or when the lighting conditions are poor. The presence of audio signature along with video would overcome many of these difficulties. In other words, noise in one modality may not affect the other and thus make the system more robust to noise. Secondly, multiple modalities may contain complementary information relevant to the verification task. The level of detail and the type of information present in one modality may be different from the other. An efficient system design is one which exploits this heterogeneity of the multimodal data set. In this chapter, we concern ourselves with the design of rules for fusing

Biometric Authentication: A Copula Based Approach

3

different biometric signatures and describe a new approach based on copula theory (Dass et al. 2005, Iyengar et al. 2007 and Iyengar et al. 2009). We show how copula theory enables the development of a mathematical framework for fusion of data from heterogeneous sources. We also consider the problem of analyzing the effect of inter-modal dependence on system performance and describe some recent results from Iyengar et al. 2009. The chapter provides a detailed exposition on copula theory and its applicability to biometric authentication. Although Fig. 1.1 implies fusion of multimodal observations, the developed framework is general enough to be also able to handle other problems where the source of heterogeneity is due to multiple samples, algorithms or multiple classifiers, i.e., the measurements z1 , z2 , . . . , zN in Fig. 1.1 may also denote multiple samples or multiple features (extracted from the same modality) or output of multiple algorithms which are combined (jointly processed) at a fusion center.

1.2 Fusion of Multi-biometrics A biometric authentication task is essentially a binary hypotheses testing problem where, data (biometric signatures) from several modalities are fused to test for the hypothesis H1 against H0 where, H0

: claimant an impostor

H1

: claimant a genuine user

Fusion of information from multiple biometric sensors can be classified into three different levels: • Data or Feature level fusion: Local observations from each source are processed directly or are transformed so that only relevant features can be extracted and retained for further processing. Resultant features from each source are then combined to obtain a global decision regarding the identity of the human under test. This method is most efficient in terms of performance as it involves minimum information loss. However, feature sets obtained are typically of high dimensionality and one often suffers from the well known curse of dimensionality. Further, the design of a fusion rule may not be straightforward as the acquired feature set may not be commensurate (e.g., audio and corresponding video features). • Score level fusion: Fusion of match scores is second to the feature level fusion in terms of performance efficiency as the input undergoes moderate information loss in its transformation to similarity scores. Similarity (or dissimilarity) scores are obtained for each modality by matching the input

4

data (or features) to those stored in the system’s database. The approach is then to fuse the scores thus obtained by matching the input at each source to its corresponding template. The range of score values may be different for different matching algorithms. One of the challenges in the design of score level fusion rule is to incorporate this heterogeneity. • Decision level fusion: Local decisions regarding the presence or absence of a genuine user are obtained by independently processing each modality. Binary data thus obtained for each source are subsequently fused to make a global decision. Significant reduction in system complexity can be obtained. However, this is at the cost of potentially high information loss due to the binary quantization of individual features. Further, methods for multi-biometric fusion can be classified into two broad categories (Jain et al. 2005); (a) the classification approach and (b) the combination approach. As an example, consider the problem of fusing information from an audio and a video sensor. Approach (a) involves training a classifier to jointly process the audio-visual features or matching scores and classify the claimant as a genuine user (Accept) or an impostor (Reject). The classifier is expected to be capable of learning the decision boundary irrespective of how the feature vectors or matching scores are generated and thus no processing is required prior to feeding them into the classifier. Several classifiers based on neural networks, k-NN, and support vector machines (SVM) and that based on the likelihood principle have been used in the literature. On the other hand, approach (b) is to combine all the feature vectors or matching scores to generate a single scalar score and a global decision regarding the presence or absence of a genuine user is based on this combined score. Thus, in order to achieve a meaningful combination, the features obtained from disparate modalities must be transformed or normalized to a common domain. Design of such a transform is not straightforward and is often data-dependent and requires extensive empirical evaluation (Toh et al. 2004, Snelick et al. 2005, Jain et al. 2005). Our interest in this chapter is in the likelihood ratio based fusion approach. The method does not require the data to be transformed so as to make them commensurable. Further, the method has a strong theoretical foundation and is proved optimal (in both Neyman-Pearson (NP) and Bayesian sense) (Lehmann 2008). However, it requires complete knowledge of joint probability density functions (PDF) of the multiple genuine (fZ (z|H1 )) and impostor features (fZ (z|H0 )). Thus, estimation of these PDFs is one major challenge and optimality suffers if there is mismatch between the true and the estimated joint PDFs.

Biometric Authentication: A Copula Based Approach

5

Next, we discuss statistical modeling of heterogeneous data (features or scores).

1.3 Statistical Modeling of Heterogeneous Biometric Data A parametric approach to statistical signal processing applications such as detection, estimation and tracking necessitate complete specification of the joint PDF of the observed samples. However, in many cases, the derivation of the joint PDF becomes mathematically intractable. In problems such as fusion of multi-biometrics, random variables associated with each biometric trait may follow probability distributions that are different from one another. For example, it is highly likely that features (or match scores) derived from the face and acoustic measurements follow disparate PDFs. The differences in physics governing each modality results in disparate marginal (univariate) distributions. There may also be differences in signal dimensionality, support and sampling rate requirements across modalities. Moreover, the marginal distributions may exhibit non zero statistical dependence due to complex intermodal interactions. Deriving the underlying dependence structure is a challenge and may not always be possible. We can thus identify the following two challenges when modeling the joint distribution of heterogeneous data: • Quantification of inter-modal dependence and interactions • Derivation of joint probability density function (PDF) between dependent heterogeneous measurements when each of the underlying marginals follow disparate distributions Prabhakar and Jain 2002 use non-parametric density estimation for combining the scores obtained from four fingerprint matching algorithms and use likelihood ratio based fusion to make the final decision. Several issues such as the selection of the kernel bandwidth and density estimation at the tails complicate this approach. More recently, Nandakumar et al. 2008 consider the use of finite Gaussian mixture models (GMM) for genuine and impostor score densities. They show that GMM models are easier to implement than kernel density estimators (KDE) while also achieving high system performance. However, GMM models require selecting the appropriate number of Gaussian components. The use of too many components may result in over-fitting the data while using too few components may not approximate the true density well. They use a GMM fitting algorithm developed by Figueiredo and Jain 2002 which automatically estimates the number of components and the component parameters using the expectation-maximization (EM) algorithm and the minimum message length criterion.

6

We present, in this chapter, an alternative approach based on copula theory. We show how the copula functions posess all the ingredients necessary for modeling the joint PDF of heterogeneous data. One of the main advantages of the copula approach is that it allows us to express the log-likelihood ratio as a sum of two terms; the first that corresponds to the strategies employed by the individual modalities and the second term to cross-modal processing. This allows us to separately quantify system performance due only to the model differences across the two hypotheses and that contributed only by the cross-modal interactions. Thus, it provides an elegant framework to study the effects of cross-modal interactions. Further, as will be evident later, there is also a reduction in multivariate PDF estimation complexity as the estimation problem can be split into two steps, (i) Estimation of only the marginal distributions (ii) Estimating the copula parameter The use of nonparametric measures such as Kendall’s τ (as opposed to MLE) to estimate the copula parameter further reduces the computational complexity. We now explain how copula theory can be exploited to address the modeling issues discussed above. We begin with the following definition, Definition 1 A random vector Z = {Zn }N n=1 governing the joint statistics of an N-variate data set can be termed as heterogeneous if the marginals Zn are non-identically distributed. The variables Zn may exhibit statistical dependence in that fZ (z) 6=

N Y

fZn (zn )

(1.1)

n=1

The goal is to construct the joint PDF fZ (z) of the heterogeneous random vector Z. Definition 1 is, of course, inclusive of the special case when the marginals are identically distributed and/or are statistically independent. Characterizing multivariate statistical dependence is one of the most widely researched topics and has always been a difficult problem (Mari and Kotz 2001). The most commonly used bivariate measure, the Pearson’s correlation ρ captures only the linear relationship between variables and is a weak measure of dependence when dealing with non-Gaussian random variables. Two random variables X and Y are said to be uncorrelated if the covariance, ΣX,Y = E(XY ) − E(X)E(Y )

(1.2)

Biometric Authentication: A Copula Based Approach

7

Fig. 1.2. Illustrative example that shows that correlation coefficient ρ is a weak measure of dependence

 is zero

 Σ ρ = √ X,Y = 0 . Statistical independence has a stricter require2 2 σX σY

ment in that X and Y are independent only if their joint density can be factored as the product of the marginals. In general, a zero correlation does not guarantee independence (except when the variables are jointly Gaussian). For example, we see that though dependence of one variable on the other is evident in the scatter plots (Fig. 1.2 a and b), the correlation coefficient is zero. The problem is further compounded when dependent heterogeneous random variables with disparate PDFs are involved. Often one then chooses to assume multivariate Gaussianity or inter-modal independence (also called the product model) to construct a tractable statistical model. A multivariate Gaussian model necessitates the marginals to be Gaussian and thus would fail to incorporate the true non-Gaussian marginal PDFs. Assuming statistical independence neglects inter-modal dependence thus leading to suboptimal solutions. As we will show later, a copula based model for dependent heterogeneous random vectors allows us to retain the marginal PDFs as well as capture the inter-modal dependence information. 1.3.1 Copula Theory and its Implications Copulas are functions that couple multivariate joint distributions to their component marginal distribution functions (Nelsen 1999), (Kurowicka and Cooke 2006). The main advantage of the copula based approach is that it allows us to define inter-modal dependence irrespective of the underlying marginal distributions. One can construct joint distributions with arbitrary marginals and the desired dependence structure. This is well suited for heterogeneous random vectors.

8

Sklar (1959) was the first to define copula functions. Theorem 1 (Sklar’s Theorem) Let FZ (z1 , z2 , · · · zN ) be the joint cumulative distribution function (CDF) with continuous marginal CDFs FZ1 (z1 ), FZ2 (z2 ), · · · , FZN (zN ). Then there exists a copula function C(·) such that for all z1 , z2 , · · · , zn in [−∞, ∞], FZ (z1 , z2 , · · · zN ) = C(FZ1 (z1 ), FZ2 (z2 ), · · · , FZN (zN ))

(1.3)

For continuous marginals, C(·) is unique; otherwise C(·) is uniquely determined on RanFZ1 × RanFZ2 · · · × RanFZN where RanX denotes the range of X. Conversely, if C(·) is a copula and FZ1 (z1 ), FZ2 (z2 ), · · · , FZN (zN ) are marginal CDFs then the function FZ (·) in (1.3) is a valid joint CDF with the marginals FZ1 (z1 ), FZ2 (z2 ), · · · , FZN (zN ). Note that the copula function C(u1 , u2 , · · · , uN ) is itself a CDF with uniform marginals as un = FZn (zn ) ∼ U(0, 1) (by probability integral transform). The copula based joint PDF of N continuous heterogeneous random variables can now be obtained by taking an N th order derivative of (1.3), ! N Y fZ (z) = fZn (zn ) c(FZ1 (z1 ), · · · , FZN (zN )) =

n=1 c fZ (z)

(1.4)

where Z = [Z1 , Z2 , · · · , ZN ] and we use the superscript 0 c0 to denote that fZc (z) is the copula representation of fZ (z). Note that we need to know the true copula density function ‘c(·)’ to have an exact representation as in (1.4). We emphasize here that any joint PDF with continuous marginals can be written in terms of a copula function as in (1.4) (due to Sklar’s theorem). However, identifying the true copula is not a straightforward task. A common approach then is to select a copula function k(·) a priori and fit the given marginals and the desired dependence structure to derive the joint distribution. Thus, model mismatch errors are introduced when k(·) 6= c(·); i.e., the selected copula is not equal to the true dependence structure given by c(·) and it is essential to account for this error and its effect on the detection performance. In the following, we first consider system design and its performance analysis assuming the knowledge of the true underlying copula c(·). This allows us to analyze the effects of inter-modal dependence. We defer the discussion on joint PDF construction with potentially misspecified copula functions until Section 1.5.

Biometric Authentication: A Copula Based Approach

9

1.4 Copula Based Multi-biometric Fusion The biometric authentication problem can be formulated as a binary hypotheses test. A decision theory problem consists of deciding which of the hypotheses H0 , · · · , Hk is true based on the acquired observation vector of (say) L samples. An optimal test (in both the Neyman-Pearson (NP) and Bayesian sense) for a two hypotheses problem (H0 vs. H1 ) computes the log-likelihood ratio (Λ) and decides in favor of H1 when the ratio is larger than a pre-defined threshold (η), Λ(z) = log

fZ (z|H1 ) H1 ≷ η fZ (z|H0 ) H0

(1.5)

where fZ (z|Hi ) is the joint PDF of the random observation vector z = [z1 , · · · , zN ]T ∈ RN under the hypothesis Hi , (i = 0, 1). In the NP set up, the threshold ’η’ is selected to constrain the false alarm error probability, PF to a value α < 1 and at the same time minimize the probability of miss, PM . The two error probabilities are given as PF = P (Λ > η|H0 ), PM = P (Λ < η|H1 )

(1.6)

Consider a binary hypotheses testing problem where H0 and H1 correspond to an impostor and a genuine user respectively. H1 : fZ (z1 , z2 , · · · , zN |H1 ) H0 : fZ (z1 , z2 , · · · , zN |H0 ) =

N Y

fZn (zn |H0 )

(1.7)

n=1

The random variables Z1 , · · · , ZN are assumed to be statistically independent under the hypothesis H0 contrary to when H1 is true. 1.4.1 Log-likelihood ratio test for heterogeneous signals Using copula functions, the log-likelihood ratio test in (1.5) can be written as, ! N Y fZc (z|H1 ) fZn (zn |H1 ) Λc (z) = log = log + fZ (z|H0 ) f (z |H0 ) n=1 Zn n   log c(FZ11 (z1 ), · · · , FZ1N (zN )) (1.8) where the copula density c(·) characterizes dependence under H1 . The superscript ’i’ in FZi n (zn ) denotes the CDF of Zn under hypothesis i. The first term in (1.8) corresponds to the differences in the statistics of each modality across the two hypotheses while the cross modal dependence

10

and interactions is included in the second term. This allows us to exactly factor out the role of cross modal dependence and quantify performance gains (if any) achieved due to inter-modal dependence. We denote the test based on the decision statistic in (1.8) as LLRT-H.

1.4.2 Log-likelihood ratio test for the product distribution It is interesting to note the form of the test statistic in (1.8). The first term, ! N Y fZn (zn |H1 ) Λp (z) = log (1.9) fZn (zn |H0 ) n=1

is the test obtained when the variables Z1 , Z2 , · · · , ZN are statistically independent or when dependence between them is deliberately neglected for simplicity. The test based on this decision statistic is the log-likelihood ratio test for the product distribution (LLRT-P). In problems where the derivation of the joint density becomes mathematically intractable, tests are usually employed assuming independence between variables conditioned on each hypothesis. This naturally results in performance degradation. We now compare performances of LLRT-H and LLRT-P detectors.

1.4.3 Performance analysis The asymptotic performance of a likelihood ratio test can be quantified using the Kullback-Leibler (KL) divergence, D (fZ (z|H1 )||fZ (z|H0 )), between the PDFs underlying the two hypotheses. For two distributions pX (x) and qX (x), the KL divergence is defined as   Z pX (x) D(p||q) = pX (x) log dx (1.10) qX (x) and it measures how ‘different’ pX (x) is relative to qX (x)†. Further, • D(p||q) ≥ 0 • D(p||q) = 0 ⇔ p = q For L (independent) users of the system, through Stein’s Lemma (Chernoff 1956), we have for a fixed value of PM = β, (0 < β < 1), 1 log PF = −D (fZ (z|H1 )||fZ (z|H0 )) L→∞ L lim

(1.11)

† The base of the logarithm is arbitrary. In this paper, log(·) denotes natural logarithm unless defined otherwise.

Biometric Authentication: A Copula Based Approach

11

The greater the value of D (fZ (z|H1 )||fZ (z|H0 )), faster is the convergence of PF to zero as L → ∞. The KL divergence is thus indicative of the performance of a log-likelihood ratio test. Further, it is additive when the dependence across the heterogeneous observations is zero, D

 fZp (z|H1 )||fZ (z|H0 )

=

N X

D (fZn (zn |H1 )||fZn (zn |H0 ))

(1.12)

n=1

where D (fZn (zn |H1 )||fZn (zn |H0 )) is the KL divergence for a single modality Zn . The following theorem helps us understand the effect of statistical dependence across the biometric traits on recognition performance, Theorem 2 (Iyengar et al., 2009) The KL divergence between the two hypotheses (H1 vs. H0 ) increases by a factor equal to the multi-information between the random variables when dependence between the variables is taken into account, D(fZ (z|H1 )||fZ (z|H0 )) = Dp (fZ (z|H1 )||fZ (z|H0 )) + I1 (Z1 ; Z2 , · · · , ZN ) {z } | ≥0

(1.13) where Ii (Z1 ; Z2 , · · · , ZN ) = I(Z1 ; Z2 , · · · , ZN ; Hi ). The result in (1.13) is intuitively satisfying as the multi-information I1 ( Z1 ; Z2 , · · · , ZN ) (which reduces to the well-known mutual information for N = 2) describes the complete nature of dependence between the variables.

1.4.4 Effect of statistical dependence across multiple biometrics on fusion Poh and Bengio (2005) note, “Despite considerable efforts in fusions, there is a lack of understanding on the roles and effects of correlation and variance (of both the client and impostor scores of base classifiers/experts)”. While it is widely accepted that it is essential to correctly account for correlation in classifier design (Roli 2002, Ushmaev 2006), the exact link between classification performance and statistical dependence has not been established to the best of our knowledge. Some recent contributions in this direction include, apart from Poh and Bengio 2005, Koval et al. 2007 and, Kryszczuk and Drygajlo 2008. Poh and Bengio 2005 studied the problem under the assumption of normality for both genuine and impostor features or scores and concluded that

12

a positive value for the correlation coefficient is detrimental to the system. Contrary to this result, Koval et al. 2007 used error exponent analysis to conclude that a non-zero inter-modal dependence always enhances system performance. Recently, Kryszczuk and Drygajlo 2008 considered the impact of correlation for bivariate Gaussian features. They used Matusita distance as a measure of separation between the PDFs of the competing hypotheses. They showed that the conclusions of the above two studies do not hold in general, i.e., they do not extend to arbitrary distributions. Copula theory allows us to answer this important question and is general in treatment. The result in Theorem 2 makes no assumptions about the PDFs of biometric features. From Theorem 2, D(fZ (z|H1 )||fZ (z|H0 )) ≥ Dp (fZ (z|H1 )||fZ (z|H0 ))

(1.14)

due to the non-negativity of the multiinformation. Thus, in addition to the model differences across the hypotheses, the presence of non-zero dependence further increases the inter-class distinguishibility. However, the problem is more complicated when the variables exhibit statistical dependence under both hypotheses and the result that Dependence can only enhance detection performance is no longer true (Iyengar - PhD Diss.). Next, we discuss methods to construct joint PDFs with potentially misspecified copula functions.

1.5 Joint PDF construction using Copulas As pointed out earlier, the copula density c(·) (the true dependence structure) is often unknown. Instead, a copula density k(·) is chosen a priori from a valid set A = [k1 (·), · · · , kp (·)] of copula functions. Several copula functions have been defined especially in the econometrics and finance literature (e.g. Clemen and Reilly 1999); the popular ones among them being multivariate Gaussian copula, Student’s t copula and copula functions from the Archimedean family. Given a copula density function k(·) and the marginal distributions, the joint PDF estimate then has the form similar to (1.4),

fbZ (z) = =

N Y n=1 k fZ (z)

! fZn (zn ) .k(FZ1 (z1 ), · · · , FZN (zN )) (1.15)

As an example, let Z1 and Z2 be the random variables associated with two

Biometric Authentication: A Copula Based Approach

13

Table 1.1. Some well known copula functions Copula Gaussian Clayton

C(u1 , u2 ) ΦN [Φ

−1

[u−θ 1 

(u1 ), Φ

+

u−θ 2

−1

Kendall’s τ

− 1]

−θu1

2 π

(u2 ); θ] − θ1 −θu2

θ θ+2



−1)(e −1) − θ1 log 1 + (e 1− e−θ −1 h  i 1/θ Gumbel exp − (− log u1 )θ + (− log u2 )θ

Frank

Product

u1 .u2

arcsin (θ)

4 θ

h

1−

1 θ

1−



i

t dt 0 et −1

1 θ

0

heterogeneous biometrics; i.e., they may follow disparate distributions. One can first estimate the marginals fZ1 (z1 ) and fZ2 (z2 ) individually if they are unknown and then proceed to estimate the parameters of the copula density k(·). In the following, we assume that the marginal PDFs are known or have been consistently estimated and concentrate only on copula fitting. Given a copula function K(·) selected a priori, we wish to construct a copula based bivariate density function of the form (1.15) based on acquired data. Table 1.1 lists some of the well known bivariate copulas †. Each of these functions is parameterized by ’θ’ that controls the ’amount of dependence’ between the two variables. Thus, it is required to estimate θ from the acquired bivariate observations. 1.5.1 Estimation using nonparametric dependence measures Nelsen 1999 describes how copulas can be used to quantify concordance (a measure of dependence) between random variables. Nonparametric rank correlations such as Kendall’s tau (τ ) and Spearman’s rho (ρs ) measure concordance between two random variables. Now let (z1 (i), z2 (i)) and (z1 (j), z2 (j)) be two observations from a bivariate measurement vector (Z1 , Z2 ) of continuous random variables. The observations are said to be concordant if (z1 (i) − z1 (j)) (z2 (i) − z2 (j)) > 0 and discordant if (z1 (i) − z1 (j)) (z2 (i) − z2 (j)) < 0. The population version of Kendall’s τ can be expressed in terms of K(·) (Nelsen 1999) Z Z τZ1 ,Z2 = 4 K(u1 , u2 ; θ)dK(u1 , u2 ; θ) − 1 (1.16) where un = FZn (zn ). Thus, for a given τ , the integral equation above can be † ΦN (·, ·) and Φ(·) in Table 1.1 denote standard bivariate and univariate Gaussian CDFs respectively.

14

solved to obtain an estimate θˆτ ; the subscript τ denotes that it is the τ -based estimate. Table 1.1 shows the relationship between τ and θ for some of the well-known copula functions (Mari and Kotz 2001, Nelsen 1999, Kurowicka and Cooke 2006). When τ is unknown, θˆτ can be obtained from the sample estimate τˆ. Given L i.i.d measurements (z1 (l), z2 (l))l (l = 1, 2, · · · , L), the observations are rank ordered and τˆ can be computed as τˆZ1 ,Z2

=

c−d c+d

(1.17)

where c and d are the number of concordant and discordant pairs respectively. Similar relations hold between ρs and a copula function K(·). The population version of ρs in terms of the copula function K(·) is given as, Z Z s ρZ1 ,Z2 = 12 u1 u2 dK(u1 , u2 ) − 3 (1.18) Equation (1.18) can be used to obtain θˆρs ; the subscript ρs denotes that it is the ρs -based estimate. When ρs is unknown, its sample estimate can be used. Bivariate measurements (z1 (l), z2 (l)) are first converted to rankings xi and yi . The sample estimate ρˆs is then given as P 6 d2i (1.19) ρˆZ1 ,Z2 = 1 − L(L2 − 1) where di = xi − yi = the difference between the ranks of z1 and z2 respectively L = number of observations Thus, the joint PDF constructed using the above method captures the inter-modal rank correlations (τ or ρs ) even though the copula function chosen a priori is misspecified.

1.5.2 Maximum likelihood estimation Maximum likelihood estimation (MLE) based approaches can also be used to estimate θ and are discussed in Bouy´e et.al. 2000. The copula representation allows one to estimate the marginals (if unknown) first and then the dependence parameter θ separately. This two-step method is known as the

Biometric Authentication: A Copula Based Approach

15

method of inference functions for margins (IFM)(Joe and Xu 1996). Given L i.i.d realizations (z1 (l), z2 (l))l (l = 1, 2, · · · , L), θˆIF M = argmax

L X

log k (FZ1 (z1 (l)), FZ2 (z2 (l)); θ)

(1.20)

l=1

When the marginal CDFs in (1.20) are replaced by their empirical estimates, L

1X I(Xl ≤ x) FˆZn (x) = L

(1.21)

l=1

where I(E) is an indicator of event E, the method is called the canonical maximum likelihood (CML) method, θˆCM L = argmax

L X

  log k FˆZ1 (z1 (l)), FˆZ2 (z2 (l)); θ

(1.22)

l=1

Though we have only discussed the case of two modalities here, it is easy to extend the method described above to construct joint PDFs when more than two modalities are involved. Clemen and Reilly 1999 discuss the multivariate Gaussian copula approach by computing pair-wise Kendall’s τ between the variables. Multivariate Archimedean copulas are studied in Nelsen 1999. 1.6 Experimental Results In the following, we apply the copula based method to fuse similarity scores from two different face matching algorithms to classify between genuine and impostor users. We consider both the NP and the Bayesian framework for this example. We use the biometric score set developed by National Institute of Standards and Technology (NIST-BSSR 1 database). Three thousand subjects participated in the experiment and two samples were obtained per subject thus giving (2 × 3000) genuine and (2 × 3000 × 2999) impostor scores. Similarity scores generated by the two face recognition systems are heterogeneous as they use different matching algorithms. Let z1i and z2i denote the scores generated by face matchers 1 and 2 respectively under the hypothesis Hi . Further, z1i ∈ R1 = [0, 1] and z2i ∈ R2 = [0, 100]. Higher the value of the scores, better is the match between the subject under test Pn and the template Pm to which it is compared. In Fig. 1.3, we show a score-matrix generated by the face matcher 1. An entry at (m, n) in the matrix corresponds to the score obtained when Pn

16

Fig. 1.3. Scores generated by Face Matcher 1: NIST BSSR 1 Database

is matched to Pm stored in the database. It can be seen that for some Pn (n = 29 and 1273 here ), a match score of negative one (−1) is reported for all m. Surprisingly, this is true even for some cases when m = n (e.g. m = n = 29, 1273); i.e., when Pn is matched to its own template. This may have been due to errors during data acquisition, image registration or feature extraction due to the poor quality of the image. Negative one (−1 6∈ R1 ) thus appears to serve as an indicator to flag the incorrect working of the matcher. The global performance of the biometric authentication system will thus depend on how this anomaly is handled in decision making. For example, the fusion center can be designed to respond in one of the following ways upon the reception of the error flag, (i) Request for retrial: The fusion center does not make a global decision upon receiving the error flag. Instead the person Pn claiming an identity is requested to provide his biometric measurement again. The request for retrial design ensures Z1 ∈ R1 . We emulate this by deleting the users whose match scores were reported to be negative one. For example, we would delete both the row and column corresponding to user 29 in the above matrix. We present results using (2 × 2992) genuine and (2 × 2992 × 2991) impostor scores. However, there may be applications where the system does not have the liberty

Biometric Authentication: A Copula Based Approach

17

to request for a retrial and the fusion center has to make a decision after each match. (ii) Censoring (face matchers that generate the error flag): In our example, face matcher 1 generates the error flag. Upon the reception of the error flag (z1 = −1), the terms of the log likelihood ratio test that depend on z1 are discarded. Thus, the first and the third terms in c1 (·) fZ (z1 |H1 ) fZ (z2 |H1 ) + log (1.23) Λ(z) = log 1 + log 2 fZ1 (z1 |H0 ) fZ2 (z2 |H0 ) c0 (·) | {z } | {z } =0

=0

are set to zero. (iii) Accept H0 : The system decides in favor of H0 when one or more of the face matchers generate an error flag. This design is conservative and is thus suitable for applications where one desires minimal false alarm rates. (iv) Accept H1 : The system decides in favor of H1 when one or more error flags are generated. (v) Random decision: Toss a fair coin to decide between H0 and H1 . We show performance results for all five designs in this section. Data is first partitioned into two subsets of equal size where the first subset is the training set to be used for model fitting. Recognition performance (PF vs. PD ) is evaluated with the second subset; the testing set. The marginal PDFs for the impostor (H0 ) and genuine (H1 ) scores are shown in Fig. 1.4. A Gaussian mixture model is fit to the scores generated by both face matchers 1 and 2 (Figueiredo and Jain 2002). Scores under both hypotheses are statistically dependent and a KL divergence based criterion resulted in the use of Frank and Gumbel copula functions to model genuine and impostor scores respectively. For more details on the selection procedure, see Iyengar - PhD Diss. Data is randomly partitioned to generate thirty training-testing sets (resamples) and a mean receiver operating characteristic (ROC) is obtained using threshold averaging (Fawcett 2006). ROCs for LLRT-P and LLRTH for each strategy are shown in Fig. 1.5. Note that the ’Accept H1 ’ and ’Random Decision’ methods are more liberal (in granting access) when compared to the other schemes. In other words, both the methods suffer heavily due to increased false alarm rates. However, the superiority of the copula based method over LLRT-H is evident in all five approaches. We note here that Dass et al. 2005 addressed the biometrics scores fusion problem using copulas and observed no improvement over the product rule (LLRT-P). To

18

Fig. 1.4. Marginal PDF estimation: (a), (b) Gaussian models for impostor and genuine scores generated by Face Matcher 1; (c), (d) Gaussian mixture models for impostor and genuine scores generated by Face Matcher 2

account for the error flags (negative ones), they modeled the marginal PDFs as a mixture of discrete and continuous components. However, copula methods require the marginals to be strictly continuous. Further, their analysis was limited to the use of Gaussian copula densities which are insufficient to model the inter-modal dependence. In this paper, we have employed different approaches to handle the error flags and have considered the use of a more general family of copula functions with the potential of improving system performance. These reasons could explain the differences between our results and Dass et al. 2005. We now consider the Bayesian framework. In some problems, one is able assign or know a priori, the probabilities of occurence for the two competing classes, H0 and H1 , denoted by P (H0 ) and P (H1 ) respectively. The objective of a Bayesian detector is to minimize the probability of error PE (or more generally, the Bayes risk function) given the priors where PE = min(P (H0 |z), P (H1 |z))

(1.24)

P (H0 |z) and P (H1 |z) are the posterior probabilities of the two hypotheses given the observations. In Fig. 1.6, we plot PE averaged over the thirty resamples versus the prior

Biometric Authentication: A Copula Based Approach

19

Fig. 1.5. Receiver Operating Characteristics for the five approaches to handle error flags using the Neyman-Pearson framework

probability P(H1 ) for all five strategies. We see that LLRT-H achieves the best performance over the entire range of priors showing that our copula based approach performs better than the one using the product model.

1.7 Concluding Remarks In this chapter, we have discussed the statistical theory of copula functions and its applicability to biometric authentication in detail. Copulas are bet-

20

Fig. 1.6. Probability of Error vs. P (H1 ) for the five approaches to handle error flags using the Bayesian framework

ter descriptors of statistical dependence across heterogeneous sources. No assumption on the source of heterogeneity are required; the same machinery holds for fusion of multiple modalities, samples, algorithms or multiple classifiers. Another interesting property of the copula approach is that it al-

Biometric Authentication: A Copula Based Approach

21

lows us to separate the cross-modal terms from the unimodal ones in the log likelihood ratio, thus allowing intra-modal vs. inter-modal analyses. Performance analysis in the asymptotic regime proved the intuitive result that when inter-modal dependence is accounted for in the test statistic, discriminability between the two competing hypotheses increases over the product rule by a factor exactly equal to the multi-information between the heterogeneous biometric signatures. In all, the copula approach provides a general framework for processing heterogeneous information. Applicability and the superiority of our copula based approach was shown by applying it to the NIST-BSSR 1 database. A couple of extensions that are of interest to us include • Combination of multiple copula densities. Different copula functions exhibit different behavior and a combination of multiple copula functions may better characterize dependence between several modalities than just using a single copula function. It would be interesting to explore this multi-model approach in detail. • Joint feature extraction. The design of a multibiometric identification system includes, apart from information fusion, several pre-processing steps such as feature selection and extraction. In this chapter, we focussed only on the fusion aspect and chose to entirely omit the discussion on feature selection and extraction methods. Deriving features of reduced dimensionality is an essential step where data is transformed so that only relevant information is extracted and retained for further processing. This alleviates the well known curse of dimensionality. There have been several studies and methods proposed for common modality or homogeneous signals. Heterogeneous signal processing offers new possibilities for system improvement. One can envision a joint feature extraction algorithm that exploits the dependence structure between the multimodal signals. Development of feature extraction methods that optimize for inter-modal redundancy/synergy could be an interesting direction for future research.

1.8 Acknowledgements Research was sponsored by Army Research Laboratory and was accomplished under Cooperative Agreement No. W911NF-07-2-0007. It was also supported by ARO grant W911NF-06-1-0250. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied of the Army Research Laboratory or the U.S. Government. The U.S. Government

22

is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation hereon. The authors also thank Dr. Anil K. Jain and Dr. Karthik Nandakumar for their valuable comments and suggestions during the preparation of this chapter. Bibliography Bouye, E., Durrleman, V., Nikeghbali, A., Riboulet, G. and Roncalli, T. (2000). Copulas for Finance - A Reading Guide and Some Applications. Available at SSRN : http://ssrn.com/abstract=1032533. Chernoff, H. (1956). Large-sample theory: Parametric case. Ann. Math. Statist, 27, pp. 1–22. Clemen, R. T. and Reilly, T. (1999). Correlations and copulas for decision and risk analysis, Management Science, 45, pp. 208–224. Cover, T. and Thomas, J., (2006). Elements of Information Theory. (John Wiley and Sons, Ltd, New Jersey). Dass, S.C., Nandakumar, K. and Jain, A. K. (2005). A principled approach to score level fusion in multimodal biometric systems. In proc. of Audio and Video based Biometric Person Authentication Fawcett, T. (2006). An introduction to ROC analysis. Pattern Recognition Letters, 27(8), pp. 861874. Figueiredo, M. and Jain, A. K. (2002). Unsupervised learning of finite mixture models. IEEE Trans. Pattern Anal. Mach. Intell., 24, pp. 381-396. Iyengar, S.G., Varshney, P.K. and Damrla, T. (2007). On the detection of footsteps using acoustic and seismic sensing. In proc. of 41st Annual Asilomar Conference on Signals, Systems and Computers, pp. 2248–2252. Iyengar, S. G., Varshney, P. K. and Damarla, T. (2009). A parametric copula based framework for multimodal signal processing. In proc. of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1893– 1896. Iyengar, S. G. PhD dissertation in progress. Syracuse University, Syracuse, NY Jain, A. K., Nandakumar, K. and Ross, A. (2005). Score normalization in multimodal biometric systems. Pattern Recognition. 38, pp. 2270–2285. Joe, H. and Xu, J. J. (1996). The estimation method of inference functions for margins for multivariate models. Technical Report, Department of Statistics, University of British Columbia. Koval, O., Voloshynovskiy, S. and Pun, T. (2007). Analysis of multimodal binary detection systems based on dependent/independent modalities. IEEE 9th Workshop on Multimedia Signal Processing, pp. 70–73. Kryszczuk, K. and Drygajlo, A. (2008). Impact of feature correlations on separation between bivariate normal distributions. In proc. of the 19th International Conference on Pattern Recognition, pp. 1–4. Kurowicka, D. and Cooke, R. (2006). Uncertainty Analysis with High Dimensional Dependence Modeling. (John Wiley and Sons, Ltd, West Sussex, England). Lehmann, E. L. and Romano, J. P. (2008). Testing Statistical Hypotheses. (Springer, 3rd edition). Mari, D. and Kotz, S. (2001). Correlation and Dependence. (Imperial College Press, London).

Biometric Authentication: A Copula Based Approach

23

Nandakumar, K., Chen, Y., Dass, S., and Jain, A. K. (2007). Likelihood ratiobased biometric score fusion. IEEE Trans. Pattern Anal. Mach. Intell., 55, pp. 3963-3974. Nat’l Inst. of Standards and Tech., (2004). NIST Biometric Scores Set, Release 1. http://www.itl.nist.gov/iad/894.03/biometricscores Nelsen, R. B. (1999). An Introduction to Copulas (Springer-Verlag, New York). Poh, N. and Bengio, S. (2005). How Do Correlation and Variance of Base-Experts Affect Fusion in Biometric Authentication Tasks? IEEE Trans. Signal Processing, 53 (11), pp. 4384–4396. Prabhakar, S and Jain, A.K. (2002). Decision-Level Fusion in Fingerprint Verification. Pattern Recognition, 35 (4), pp. 861–874. Roli, F., Fumera, G. and Kittler, J. (2002). Fixed and trained combiners for fusion of imbalanced pattern classifiers. In Proc. of the International Conference on Information Fusion, pp. 278-284. Snelick, R, Uludag, U, Mink, A., Indovina, M. and Jain, A. K. (2005). Large Scale Evaluation of Multimodal Biometric Authentication Using State-of-theArt Systems. IEEE Trans. Pattern Anal. Mach. Intell., 27 (3), pp. 450–455. Toh, K. A., Jiang, X. and Yau, W. Y. (2004). Exploiting Global and Local Decisions for Multimodal Biometrics Verification. IEEE Trans. Signal Processing, supplement on secure media, 52 (10), pp. 3059-3072. Ushmaev, O. and Novikov, S. (2006). Biometric fusion: Robust approach. In Proc. of the 2nd Workshop on Multimodal User Authentication (Toulose, France). Varshney, P.K. (1997). Distributed Detection and Data Fusion (Springer, New York).