Formulating Face Verification With Semidefinite Programming

4 downloads 0 Views 660KB Size Report
is learned by a semidefinite programming approach, along with the constraints of the ... by a set of constraints in the semidefinite programming problem; and the ...
2802

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 16, NO. 11, NOVEMBER 2007

Formulating Face Verification With Semidefinite Programming Shuicheng Yan, Member, IEEE, Jianzhuang Liu, Senior Member, IEEE, Xiaoou Tang, Senior Member, IEEE, and Thomas S. Huang, Life Fellow, IEEE

Abstract—This paper presents a unified solution to three unsolved problems existing in face verification with subspace learning techniques: selection of verification threshold, automatic determination of subspace dimension, and deducing feature fusing weights. In contrast to previous algorithms which search for the projection matrix directly, our new algorithm investigates a similarity metric matrix (SMM). With a certain verification threshold, this matrix is learned by a semidefinite programming approach, along with the constraints of the kindred pairs with similarity larger than the threshold, and inhomogeneous pairs with similarity smaller than the threshold. Then, the subspace dimension and the feature fusing weights are simultaneously inferred from the singular value decomposition of the derived SMM. In addition, the weighted and tensor extensions are proposed to further improve the algorithmic effectiveness and efficiency, respectively. Essentially, the verification is conducted within an affine subspace in this new algorithm and is, hence, called the affine subspace for verification (ASV). Extensive experiments show that the ASV can achieve encouraging face verification accuracy in comparison to other subspace algorithms, even without the need to explore any parameters. Index Terms—Dimensionality reduction, face verification, subspace dimension determination, threshold determination.

I. INTRODUCTION

T

ECHNIQUES of dimensionality reduction [3] have been widely applied to face verification [10], [13] due to their computational simplicity and analytical attractiveness. Most of them, such as principal component analysis (PCA) [6], [20], linear discriminant analysis (LDA) [1], [28], and the recently proposed marginal Fisher analysis (MFA) [27], are solved by using the generalized eigenvalue decomposition approach (GEVD) [3], [4]. There are also many other algorithms that can be applied for face verification, such as the Bayesian algorithm [14] and the support vector machine (SVM) [12]. PCA is an eigenvector method designed to model linear variation in high-dimensional data. Its goal is to find a set of mutually orthogonal basis functions that capture the directions of maximum variance in the data and for which the coefficients Manuscript received May 15, 2006; revised April 25, 2007. The associate editor coordinating the review of this manuscript and approving it for publication was Dr. Eli Saber. S. Yan is with the School of Computer Engineering, Nanyang Technological University, Singapore 639798 (e-mail: [email protected]). J. Liu and X. Tang are with the Information Engineering Department, The Chinese University of Hong Kong, Hong Kong (e-mail: [email protected]; [email protected]). T. S. Huang is with the Beckman Institute, University of Illinois at UrbanaChampaign, Urbana, IL 61801 USA (e-mail: [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TIP.2007.906271

are pairwise decorrelated. LDA is a supervised learning algorithm and searches for the project axes on which the data points of different classes are far from each other while requiring data points of the same class to be close to each other. Recently, subclass discriminant analysis [31] was proposed by considering that each class consists of multiple subclasses, and the possible nonlinearity is consequently alleviated. Most algorithms for dimensionality reduction, such as PCA and LDA, are originally proposed for general classification tasks. Their objective functions are usually heuristically proposed, and the derived solutions do not necessarily yield the optimal verification accuracy. When applied to a verification task, these algorithms often suffer from the disadvantages: 1) the optimal feature dimension cannot be easily determined from the objective function; 2) the verification threshold cannot be obtained automatically in the model training stage; 3) it is unclear how to combine the projected features with different weights for optimal verification. All of these disadvantages stem from their similar problem formulations that are solved with the GEVD, that is, a quadratic objective function with a single quadratic constraint. Motivated by our recent work in [26], we present in this work a unified solution to the above problems. A transformation matrix, encoding the information of projection directions as well as corresponding weights, is used for dimensionality reduction. The first problem of subspace dimension selection is avoided by directly computing the product of the transformation matrix and its transpose. This product is called the similarity metric matrix (SMM) in this paper. The remaining two problems are solved by recasting the verification task as a semidefinite programming problem [23]–[25] with respect to the SMM. In this formulation, the verification threshold is fixed as a positive constant; the verification accuracy on the training set is guaranteed by a set of constraints in the semidefinite programming problem; and the objective function characterizes the separability of homogenous and heterogenous sample pairs. The transformation matrix is then computed from the singular value decomposition of the derived SMM, and the subspace dimension corresponds to the number of positive eigenvalues of the SMM. Essentially, the final low dimensional representation lies within an affine subspace of the original data space. Hence, we name our algorithm the affine subspace for verification (ASV). In summary, the ASV algorithm has the following advantages over the previous algorithms. 1) By the constraints of the semidefinite programming formulation, the ASV is guaranteed to attain the minimal verification error rate in the context of verification with the subspace learning techniques, while most previous subspace learning algorithms, such as LDA and its variant [27], do not optimize the verification accuracy directly.

1057-7149/$25.00 © 2007 IEEE

YAN et al.: FORMULATING FACE VERIFICATION WITH SEMIDEFINITE PROGRAMMING

2) The ASV automatically determines three model parameters: the verification threshold, subspace dimension and feature fusing weights. All the three parameters are determined by solving a single optimization problem. Hence, it is superior to the traditional subspace learning algorithms which need explore all possible parameters to determine the optimal ones for specific test data set. Our face verification experimental results on three popular face databases well validate the effectiveness of ASV. The rest of the paper is organized as follows. The ASV algorithm is introduced in Section II. Section III presents the weighted and tensor extensions of the ASV. The comparison experiments are demonstrated in Section IV. Finally, we conclude this paper in Section V. II. AFFINE SUBSPACE FOR VERIFICATION be a set of samples, where the correLet sponding class labels are . Denote as the number of samples belonging to the th class. Since in practice the dimension is often very large, it is usually necessary to transform the data from the input high-dimensional feature space to a low-dimensional one for alleviating the curse of dimensionality. Many dimensionality reduction techniques have been extensively studied and achieved much success in face verification tasks [12], [16]–[18]. In Sections II-A and B, we first review the general framework of subspace learning techniques used in verification tasks, and then we propose our new subspace learning algorithm. A. Subspace Learning for Verification To find a low-dimensional representation for face verification, a simple, but effective, way is to find a column-wise linearly independent matrix to transform the original high-dimensional data into a low-dimensional one (usually ) by (1) The dimensionality reduction process may greatly facilitate the subsequent verification task, making the verification be performed within a much lower dimensional feature space where even much stronger discriminating power can be obtained. In a verification task, the question “is this person whom he/she claims to be” is presented, and the system attempts to verify the provided identity by computing the distances between the presented image and the gallery images of the claimed subject in a low-dimensional space transformed by the matrix , and then by comparing the difference with a preset threshold , that is if if

.

(2)

A verification task differs from a traditional pattern recognition problem where the algorithm finds from a database an object that is the best matched to the input object. However, most previous algorithms for verification with subspace learning techniques ignore the special characteristics of the verification

2803

task, which makes these algorithms not necessarily optimal for verification. More specifically, the traditional subspace learning algorithms, such as PCA [20] and LDA [1], suffer from the following problems. 1) The optimum of the objective function does not necessarily lead to an optimal verification accuracy. For example, the unsupervised algorithm PCA finds the most representative, yet maybe less discriminating, features, and the effectiveness of LDA heavily relies on the assumption that the samples of each class follow Gaussian distribution with the same covariant matrix. 2) The determination of the subspace dimension and verification threshold cannot be completed in the subspace learning stage, and the objective functions cannot generally be used for guiding the selection of these model parameters. As demonstrated in [27], most linear subspace learning problems are in this form

(3)

where and are both positive semidefinite, and the opis the trace of a matrix. It is easy to prove that erator the minimum of the objective function is obtained when , and the solution is the eigenvector corresponding to the smallest eigenvalue of the generalized eigenvalue , because any addidecomposition problem tional projection will not increase the value of the objective function. 3) These algorithms only select the projection directions, yet not provide a solution of how to fuse these projected features for optimal verification. In the following, we present our solution to the above problems by taking advantage of several techniques including semidefinite programming, singular value decomposition, and tensor-based data representation. B. Formulating Verification With Semidefinite Programming Directly learning the transformation matrix for dimensionality reduction may lead to the problem that we have to know the subspace dimension in advance. To avoid such an problem, we learn the product of and its transpose instead, i.e., (4) It is obvious that the matrix is positive semidefinite, denoted . We call the SMM in this paper, since it determines by the inner product of two points and in the low-dimensional . space by The verification accuracy is often measured by two indices: the false acceptance rate and the false rejection rate. Like in (2), to achieve high verification accuracy, the distance between each sample and its nearest sample of the same class should be smaller than the verification threshold, and the distance between two samples from different classes should be larger than the

2804

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 16, NO. 11, NOVEMBER 2007

Fig. 1. Optimization problem for SMM learning.

Fig. 2. Optimization problem for relaxed SMM learning.

verification threshold. Formally speaking, for the samples of the same class, a sufficient condition to achieve the best result is that

this paper, we use the solver SeDuMi and the CSDP 4.9 toolbox in MATLAB [2] to handle it. From Fig. 1, we can observe that the matrix scales with the value of the threshold . If is the is also the solution for , . solution for a given , then . Thus, for simplicity, we fix In practical applications, it is probable that no exists and satisfies all the constraints. In this case, we present a variant formulation of this problem which can tolerate the unsatisfied constraints by adding relaxation parameters as given in Fig. 2. The are used to transform the constraints from parameters and are used to inequalities to equalities. The parameters and relax the constraints such that the constraints can be unsatisfied, but this kind of violation is given penalty in the objective function. Note that the objective function described in Fig. 1 is rein Fig. 2 to charplaced with acterize the total verification confidence by taking into account the case with unsatisfied constraints. This problem can also be solved by using the semidefinite programming solvers. Com(set to in this work) gives a larger penalty monly, to the items that cannot satisfy the constraints, which results in and being zero for every , and one of and one of being zero for every . If all the constraints are satisfied, all and will be zero [2]; hence, we can still consider the items the proposed algorithm to be free of parameter tuning. Note that will minimize if is sufficiently large, the result from , which can be roughly considered as the expectation of the sum of false acceptance rate and false rejection rate. In real applications, different may be required, when the false rejection rate is expected for a specific . false acceptance rate, e.g., Once the SMM is obtained, the transformation matrix can be derived by sufficiently preserving the information of . Let the singular value decomposition of be

(5) is the sample nearest to measured in the input where feature space and comes from the same class as . In addition, for two samples of different classes, the sufficient and necessary condition to achieve best result is that (6) , we have For any matrix , where with s being the elements of . From this property, (5) can be rewritten as (7) where

. From (6), we have (8)

. where To enhance the potential verification performance on testing data, a reasonable way is to make the distances among the samples in the same class smaller than the verification threshold as much as possible, namely , and the dismaximizing tances among samples in different classes larger than the threshold as much as possible, namely maximizing . This is equivalent to maximizing the objective function

(11)

(9)

(10)

and where Then, the transformation matrix is obtained by

.

(12) is used to balance these two terms. where With the objective function in (10) and the constraints in (7) and (8), the SMM can be obtained by solving the problem defined in Fig. 1. For a fixed , Fig. 1 is a classical semidefinite programming problem, and there are several general purpose toolboxes and polynomial-time solvers available for the problem [24], [25]. In

Using the derived transformation matrix , we can conduct the verification as in (2). Note that, for real applications, there maybe exit very small values for the s due to the computational errors, and we empirically remove them in our implementation without sacrificing the automatic property of the whole framework.

YAN et al.: FORMULATING FACE VERIFICATION WITH SEMIDEFINITE PROGRAMMING

Fig. 3. Optimization problem for weighted SMM learning.

From the above deduction, we can see that the subspace dimension, verification threshold, and feature weights are automatically determined by solving a single optimization problem. III. WEIGHTED AND TENSOR VARIANTS

2805

. The norm of tensor is defined as , and the tensor distance between tensors and is computed by . Definition 2: ( -mode product). The -mode product with matrix , denoted by of tensor , is defined as , . Definition 3: ( -mode unfolding). The -mode unfolding of into a matrix an th-order tensor , denoted by , is defined as , . With the tensor representation, the dimensionality reduction is conducted with a series of transformation matrices, denoted , and by we have (15)

A. Weighted Variant for Effectiveness The objective function in Fig. 1 may easily be biased by the data pairs with distances far from the verification threshold; yet, for real applications, a sample pair with distance close to the threshold is the most difficult to be verified and plays an important role in model training. A natural way to handle this problem is to utilize the weighted/nonparametric method and put larger weights on pairs with distances closer to the threshold. In this paper, we use the following functions to assign weights for different sample pairs. A sample pair of the same class is weighted by

Then, the constraints defined in (5) and (6) become (16) (17) and the objective function given in Fig. 1 is now

(13) and a sample pair of different classes is weighted by

(18)

(14)

There exists a closed-form solution for this optimization problem if only one transformation matrix is unknown, say . Let the -mode unfolded matrix of the tensor is rep, then (16) becomes resented by

where is a constant to control the weight distribution. Then, we have the weighted SMM learning algorithm as shown in Fig. 3. The final transformation matrix can be derived from the learned SMM as in (11) and (12).

(19)

B. Tensorization for Efficiency In the proposed algorithms, when the feature dimension is large, they will be impractical in both computation and memory. As the constraint matrices and are not sparse in our formulation of the semidefinite programming problem, we may only handle cases with fewer than 400 features. In this section, we discuss how to solve this computational problem by encoding the objects as high-order tensors. Assume that the training samples are represented as th-order . First, we briefly review tensors some definitions related to tensors, and more details on tensor are referred to in [22]. Definition 1: (tensor inner product, norm, and distance). The inner product of two tensors and is defined as

where and (17) is now

and

,

(20) where simplified to

. Hence, (18) can be

(21) Therefore, we can iteratively optimize the objective function in (21) with the constraints in (19) and (20) until the whole , procedure converges. The optimal transformation matrices are derived from the singular value decomposition of the matrices as in (11) and (12). The optimization

2806

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 16, NO. 11, NOVEMBER 2007

correlation among row vectors and column vectors, is lost; however, as shown in the previous section, the feature dimension in each iteration of TASV is much smaller. D. Relationship to SVM

Fig. 4. Optimization problem for tensor-based SMM learning.

problem in one iteration is given in Fig. 4. Finally, the verification task is conducted based on the following criterion:

if if

.

(22) In the algorithm based on the tensor representation, the optimization problem defined in each iteration runs on a much lower dimensional feature space. For example, when the image is represented in a second-order tensor, i.e., a matrix, of size 100 100, the parameter number in a direct vector formulation , which is impossible to be handled on a computer of is moderate configuration. However, if the image is encoded as parameters for the a second-order tensor, there are only optimization problem defined in each iteration, which greatly reduces the computational complexity of the optimization problem. C. Relationship to Tensorface Tensorface [21], [22] is a recently proposed algorithm for dimensionality reduction and face recognition with pose, illumination and expression variations. It also utilizes higher order statistics [7] for data analysis similar to our tensor extension of the ASV (TASV). While Tensorface and TASV both utilize tensorbased formulations [8], [9] from an algorithmic viewpoint, they are intrinsically different because of their basic tensor compositions. For Tensorface, a face image is treated as a vector and the face image ensembles of all persons under different pose, illumination and expression are treated as a high-order tensor. However, in TASV, an image object, such as a face, is directly represented as a higher order tensor. This difference in tensor composition differentiates Tensorface and TASV in several aspects. 1) The physical meanings of the learned subspaces are different. In Tensorface, the subspaces characterize variations caused by external factors, such as pose, illumination and expression, whereas in TASV the subspaces characterize internal variations such as row, column, etc. 2) Tensorface requires a large number of samples with different pose, illumination, and expression for training, which is generally unavailable in piratical applications, such as face verification, and, hence, we do not compare our TASV with Tensorface in the experimental section. In contrast, TASV can work even when only two images are available for each person. 3) In Tensorface, the image is treated as a vector. Thus, the feature dimension is still high and the potential spatial structure information, such as the

SVM is widely used for two-class classification problem owing to its good generalization capability. If we consider the difference between two samples of the same class as a positive sample, and the difference between two samples of a different classes as negative sample, the ASV algorithm is similar to SVM in formulation, but they are different in several aspects. 1) ASV tries to find a transformation matrix for dimensionality reduction, while SVM can be considered to find one projection vector for dimensionality reduction. 2) The target of SVM is to ensure that the projection of the positive sample is positive while that of the negative sample is negative, but the target of ASV is to ensure that the transformation of the positive sample lies within a sphere while that of the negative sample lies out of the the sphere. 3) SVM is originally proposed for two-class classification problems, while ASV is natural for multiclass verification problems. IV. EXPERIMENTS In this section, we compare our ASV algorithm with two conventional subspace learning algorithms PCA (Eigenfaces) [20] and Fisherfaces [1] (all LDA dimensions are explored, and the best result is reported) for face verification. Three face databases are used: CMU pose, illumination, and expression (PIE) [19], FERET [15], and XM2VTS [11]. In these experiments, we explore all possible feature dimensions of Eigenfaces and Fisherfaces, and report the best results. Histogram equalization is used for the preprocessing in all the experiments. The implementation of Eigenfaces follows the work in [20], and that of Fisherfaces follows the work in [1]. For more detailed comparisons, the results on all the feature dimensions are also reported in this section. As there is no parameter to explore, only one result from the ASV is reported. To reduce the computational complexity of the ASV, we first conduct PCA on the training set to reduce the feature dimension to 300, and restrict the inhomogeneous sample pairs in (6) to the shortest 20 pairs for each cluster whose index is within . Then the ASV is simplified to a smaller size problem. A. Database Preparation The CMU PIE database contains more than 40 000 facial images of 68 persons. The images were acquired in different poses, under variable illumination conditions and with different facial expressions; 63 persons are used in our experiments due to the data incompleteness of the other five persons. The training set (also used as gallery set which contains representative samples for different subjects), consists of 567 images (nine images per person at poses 11, 27, and 37 with three different illuminations). The probe set contains 756 images (12 images per person from poses 05, 07, 09, and 29 with three different illuminations) for testing. All the images are aligned by fixing the locations of the two eyes, and normalized to a size of 64 64 pixels. For the FERET databases, we use the images from 70 persons with six images for each person. The images are also aligned by fixing

YAN et al.: FORMULATING FACE VERIFICATION WITH SEMIDEFINITE PROGRAMMING

2807

Fig. 5. Left: FRR versus FAR curves. Right: EERs versus the projected feature numbers for Eigenfaces, Fisherfaces, and ASV on the CMU PIE database.

Fig. 6. Left: FRR versus FAR curves. Right: EERs versus the projected feature numbers for Eigenfaces, Fisherfaces, and ASV on the FERET database.

Fig. 7. Left: FRR versus FAR curves. Right: EERs versus the projected feature numbers for Eigenfaces, Fisherfaces, and ASV on the XM2VTS database.

TABLE I

EERS AND FRRS AND

(FAR = 0 01%) OF EIGENFACES, FISHERFACES :

ASV ON THE CMU PIE DATABASE

the locations of the two eyes, and normalized to a size of 56 46 pixels. Three images per person are randomly selected to be the training set and the gallery set, and the other three images per person are used to be the probe set. The XM2VTS database contains 295 persons and each person has four frontal images each taken in a different session. All the images are again aligned by fixing the locations of the two eyes and normalized to a size of 64 64 pixels. The images from the first three sessions are used for the training and gallery set, and the images from the last session are used for the probe set. B. Verification Results of the ASV We first compare the algorithm ASV that solves the problem in Fig. 2 with Eigenfaces and Fisherfaces for face verification. The experimental results are listed in Figs. 5–7 and Tables I–III. The tables list the equal error rates (EERs) and the from the three

TABLE II

EERS AND FRRS

(FAR = 0 01%) OF EIGENFACES, FISHERFACES :

AND

ASV ON THE FERET DATABASE

TABLE III

EERS AND FRRS AND

(FAR = 0 01%) OF EIGENFACES, FISHERFACES :

ASV ON THE XM2VTS DATABASE

algorithms. The EER is defined as the false rejection rate (FRR) or the false acceptance rate (FAR) when they are equal in the verification with a threshold. The numbers in the parentheses are the feature dimensions corresponding to the lowest EER values. The FRRs with fixed FAR (0.01%) are also listed in these tables. Figs. 5–7 show the curves of FRR with respect to FAR, and the EERs with different numbers of projected features. From these results, we have the following observations. 1) Even without the need to explore any model parameter, the EER of the ASV is comparable with, and mostly lower

2808

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 16, NO. 11, NOVEMBER 2007

TABLE IV EERS OF ASV, WEIGHTED AND TENSOR EXTENSIONS OF ASV ON THE CMU PIE, FERET, AND XM2VTS DATABASES

Fig. 8. EERs of ASV with different sufficiently large values for FERET database.



on the

than those of both Eigenfaces and Fisherfaces. For the PIE and XM2VTS databases, both EERs and FRRs of ASV are much lower than those from Eigenfaces and Fisherfaces, which validates the effectiveness of ASV. For the FERET database, though ASV shows not as good as Fisherfaces at FRR with given FAR in many cases, the EER of ASV is comparable with that of Fisherfaces. A possible explanation for the good performance of Fisherfaces on FERET database is that the intra-class variation is relatively small in this case, and the assumption of same Gaussian distribution for different class is roughly satisfied, and consequently this extra prior information help Fisherfaces work well in this small sample size case. 2) The verification accuracies of Eigenfaces and Fisherfaces algorithms will change greatly when different numbers of projected features are used for final verification. 3) All the results show that, when the distributions of the gallery set and the probe set are not the same (such as in the CMU PIE database, where the selected poses are different, and in XM2VTS database where the sessions are in different environments), the verification accuracy of Fisherfaces is even worse than that of Eigenfaces, and when the distributions are similar, Fisherfaces outperforms Eigenfaces. A possible explanation of this phenomena is that when the distributions of the gallery set and the probe set are different, the latent Gaussian model in Fisherfaces will overfit to the gallery samples, and consequently resulting in the poor generalization capability on the probe set. In the literature, there are many other algorithms for face verification, such as the Bayesian algorithm [14] and the SVM [12]. We did not compare our algorithm with them since the ASV aims to solve the problem of automatic determination of the model parameters in face verification using subspace learning techniques. The compared Eigenfaces and Fisherfaces are the most popular ones in this context. As mentioned in Section II-B, when the parameter is large enough, it will force as many constraints to be satisfied as possible. Fig. 8 displays the equal error rates of ASV with different , and the result shows that ASV is sufficiently large stable when the value of is reasonably large enough. C. Verification Results of the Weighted and Tensor ASV We also evaluate the verification performances of the ASV and its weighted and tensor extensions given in Figs. 3 and 4, respectively. The comparative experiments are also conducted

on the three databases CMU PIE, FERET, and XM2VTS with the same experimental configurations on the training set, gallery set, and probe set as in the previous section. For the weighted ASV, the for setting the weights of different sample pairs is set to 0.2. The EERs of these three algorithms are listed in Table IV. We can see that 1) the weighted ASV improves the face verification performance of the ASV in all the experiments; 2) the results from the tensor extension of the ASV is a little worse than that of the ASV. As we mentioned before, however, the advantage of the tensor ASV is that it can reduce the computational cost of the ASV. V. DISCUSSIONS AND FUTURE WORK In this paper, a novel algorithm ASV and its two extensions are presented for the task of verification with subspace learning techniques. Instead of directly learning the transformation matrix, the ASV investigates the SMM, the product of the transformation matrix and its transpose. The advantages of this alteration are that we do not have to know the subspace dimension in advance and can learn the projection directions and the feature fusing weights simultaneously. The technique of semidefinite programming is employed to search for this matrix, where the objective functions characterize the separability of the akin sample pairs and inhomogeneous sample pairs. A set of constraints are imposed to guarantee the optimal verification accuracy on the training set. Then the transformation matrix is obtained as a result from the singular value decomposition of the SMM. Moreover, the effectiveness and efficiency of the ASV are further enhanced by the weighted and tensor extensions, respectively. Beyond the elegant solution of the ASV algorithm to the problem of automatic determination of the verification threshold, subspace dimension, and feature fusing weights, ASV also has its limitations in real applications. ASV has the assumption that the gallery set and training set are the same, otherwise the advantage of ASV is not guaranteed in the testing stage. A possible solution to overcome this limitation is to design online learning algorithm and adaptively update the transformation matrix for new subjects. Also, ASV is not applicable for the case with only one training sample for each subject as in [29] and [30], since it needs multiple images for each subject in model training stage. This limitation can be potentially overcome by image synthesis approaches which produce new images of the same subjects with novel illuminations or poses [5]. There also exist some interesting general issues that are worthy of further study. The first one is how to efficiently solve the semidefinite programming problem when the original feature dimension is over 400, which is the cases in some practical applications. The second one is that, since most dimensionality

YAN et al.: FORMULATING FACE VERIFICATION WITH SEMIDEFINITE PROGRAMMING

reduction formulations are in the form of a quadric objective function together with a quadric constraint, as pointed out in [27], they can all be reformulated as a semidefinite programming problem to attain the advantages of the ASV. Hence, ASV raises a new tool and a novel viewpoint for general dimensionality reduction, and it is interesting to investigate whether it is possible to present a general framework for dimensionality reduction by using the semidefinite programming formulation. The third one is that the formulation of semidefinite programming is very similar to that of the SVM algorithm except for the positive semidefinite constraint of the SMM, and, hence, the intrinsic relationship between these two algorithms desires further study.

REFERENCES [1] P. Belhumeur, J. Hespanha, and D. Kriegman, “Eigenfaces vs. Fisherfaces: Recognition using class specific linear projection,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 19, no. 7, pp. 711–720, Jul. 1997. [2] B. Borchers, “CSDP, a C library for semidefinite programming,” Optim. Meth. Softw., vol. 11, no. 1, pp. 613–623, 1999. [3] F. Chung, “Spectral graph theory,” presented at the Regional Conf. Ser. Mathematics, 1997. [4] K. Fukunaga, Introduction to Statistical Pattern Recognition, 2nd ed. New York: Academic, 1991. [5] Y. Hu, D. Jiang, S. Yan, L. Zhang, and H. Zhang, “Automatic 3D reconstruction for face recognition,” in Proc. IEEE Int. Conf. Automatic Face and Gesture Recognition, 2004, pp. 843–848. [6] I. Joliffe, Principal Component Analysis. New York: SpringerVerlag, 1986. [7] T. Kolda, “Orthogonal tensor decompositions,” SIAM J. Matrix Anal. Appl., vol. 23, pp. 243–255, 2001. [8] L. Lathauwer, B. Moor, and J. Vandewalle, “A multilinear singular value decomposition,” SIAM J. Matrix Anal. Appl., vol. 21, pp. 1253–1278, 2000. [9] L. Lathauwer, B. Moor, and J. Vandewalle, “On the best rank-1 and rank-(r 1; r 2; . . . ; rn) approximation of high-order tensors,” SIAM J. Matrix Anal. Appl., vol. 21, pp. 1324–1342, 2000. [10] H. Liu, C. Su, Y. Chiang, and Y. Hung, “Personalized face verification system using owner-specific cluster-dependent LDA-subspace,” in Proc. Int. Conf. Pattern Recognition, 2004, vol. 4, pp. 344–347. [11] J. Luettin and G. Maitre, “Evaluation protocol for the extended m2vts database (xm2vts),” DMI for Percept. Artif. Intell., 1998. [12] J. Matas, “Comparison of face verification results on the xm2vts database,” in Proc. Int. Conf. Pattern Recognitions, 2000, vol. 4, pp. 858–863. [13] K. Messer, “Face authentication competition on the banca database,” presented at the Int. Conf. Biometric Authentication, 2004. [14] B. Moghaddam, T. Jebara, and A. Pentland, “Bayesian face recognition,” Pattern Recognit., vol. 33, no. 11, pp. 1771–1782, 2000. [15] I. Philips, H. Wechsler, J. Huang, and P. Rauss, “The feret database and evaluation procedure for face recognition algorithms,” Image Vis. Comput., vol. 16, pp. 295–306, 1998. [16] P. Phillips, P. Flynn, T. Scruggs, K. Bowyer, J. Chang, K. Hoffman, J. Marques, J. Min, and W. Worek, “Overview of the face recognition grand challenge,” presented at the IEEE Conf. Computer Vision and Pattern Recognition, 2005. [17] P. Phillips, P. Grother, R. Micheals, D. Blackburn, E. Tabassi, and J. Bone, “FRVT 2002: Overview and Summary,” Rep. FRVT2002 2003. [18] P. Phillips, P. Rauss, and S. De, Feret (Face Recognition Technology) Recognition Algorithm Development and Test Teport U.S. Army Res. Lab., ARL-TR-995, 1996. [19] T. Sim, S. Baker, and M. Bsat, “The CMU pose, illumination, and expression database,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 25, no. 12, pp. 1615–1618, Dec. 2003. [20] M. Turk and A. Pentland, “Face recognition using eigenfaces,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Maui, HI, 1991, vol. 1, pp. 586–591. [21] M. Vasilescu and D. Terzopoulos, “Multilinear analysis of image ensembles: Tensorfaces,” in Proc. Eur. Conf. Computer Vision, 2002, vol. 1, pp. 447–460.

2809

[22] M. Vasilescu and D. Terzopoulos, “Multilinear subspace analysis for image ensembles,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2003, vol. 2, pp. 93–99. [23] K. Weinberger, B. Packer, and L. Saul, “Nonlinear dimensionality reduction by semidefinite programming and kernel matrix factorization,” presented at the 10th Int. Workshop on Artificial Intelligence and Statistics (AISTATS-05), 2005. [24] K. Weinberger and L. Saul, “Unsupervised learning of image manifolds by semidefinite programming,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2004, vol. 2, pp. 988–995. [25] K. Weinberger, F. Sha, and L. Saul, “Learning a kernel matrix for nonlinear dimensionality reduction,” in Proc. 21st Int. Conf. Machine Learning, 2004, vol. 1, pp. 839–846. [26] S. Yan, J. Liu, X. Tang, and T. Huang, “A parameter-free framework for general supervised subspace learning,” IEEE Trans. Inf. Forensics and Security, vol. 2, no. 1, pp. 69–76, 2007. [27] S. Yan, D. Xu, B. Zhang, and H. Zhang, “Graph embedding: A general framework for dimensionality reduction,” in Proc. Computer Vision and Pattern Recognition Conf., 2005, vol. 2, pp. 830–837. [28] H. Yu and J. Yang, A Direct lda Algorithm for High Dimensional Data With Application to Face Recognition, 2nd ed. New York: Academic, 2001, vol. 34, pp. 2067–2070. [29] D. Zhang, S. Chen, and Z.-H. Zhou, “A new face recognition method based on SVD perturbation for single example image per person,” Appl. Math. Comput., vol. 163, no. 2, pp. 895–907, 2005. [30] L. Zhang and D. Samaras, “Face recognition from a single training image under arbitrary unknown lighting using spherical harmonics,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 28, no. 3, pp. 351–363, Mar. 2006. [31] M. Zhu and A. Martinez, “Subclass discriminant analysis,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 28, no. 8, pp. 1274–1286, Aug. 2006. Shuicheng Yan (M’06) received the B.S. and Ph.D. degrees from the Applied Mathematics Department, School of Mathematical Sciences, Peking University, China, in 1999 and 2004, respectively. His research interests include computer vision and machine learning.

Jianzhuang Liu (M’02–SM’02) received the B.E. degree from Nanjing University of Posts and Telecommunications, China, in 1983, the M.E. degree from Beijing University of Posts and Telecommunications, China, in 1987, and the Ph.D. degree from the Chinese University of Hong Kong in 1997. From 1987 to 1994, he was a faculty member in the Department of Electronic Engineering, Xidian University, China. From August 1998 to August 2000, he was a research fellow at the School of Mechanical and Production Engineering, Nanyang Technological University, Singapore. Then, he was a postdoctoral fellow in the Chinese University of Hong Kong for several years. He is now an Assistant Professor in the Department of Information Engineering, Chinese University of Hong Kong. His research interests include image processing, computer vision, pattern recognition, and graphics.

Xiaoou Tang (S’93–M’96–SM’02) received the B.S. degree from the University of Science and Technology of China, Hefei, in 1990, the M.S. degree from the University of Rochester, Rochester, NY, in 1991, and the Ph.D. degree from the Massachusetts Institute of Technology, Cambridge, in 1996. He is a Professor and Director of the Multimedia Lab, Department of Information Engineering, Chinese University of Hong Kong. He is also the Group Manager of the Visual Computing Group, Microsoft Research Asia. His research interests include computer vision, pattern recognition, and video processing.

2810

Dr. Tang was a Local Chair of the IEEE International Conference on Computer Vision (ICCV) 2005, a General Chair of the ICCV International Workshop on Analysis and Modeling of Faces and Gestures 2005, an Area Chair of ICCV 2007, and he will be a Program Chair of ICCV 2009. He is a Guest Editor of the special issue on underwater image and video processing of the IEEE JOURNAL OF OCEANIC ENGINEERING and the special issue on image- and video-based biometrics of the IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY. He is an Associate Editor of IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE.

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 16, NO. 11, NOVEMBER 2007

Thomas S. Huang (S’61–M’63–SM’76–F’79– LF’01) received the B.S. degree in electrical engineering from the National Taiwan University, Taipei, Taiwan, R.O.C., and the M.S. and D.Sc. degrees in electrical engineering from the Massachusetts Institute of Technology (MIT), Cambridge. He was on the Faculty of the Department of Electrical Engineering at MIT from 1963 to 1973 and the School of Electrical Engineering and Director of its Laboratory for Information and Signal Processing at Purdue University, West Lafayette, IN, from 1973 to 1980. In 1980, he joined the University of Illinois at Urbana-Champaign, Urbana, where he is now the William L. Everitt Distinguished Professor of Electrical and Computer Engineering, a Research Professor at the Coordinated Science Laboratory, and Head of the Image Formation and Processing Group at the Beckman Institute for Advanced Science and Technology and Co-Chair of the Institute’s major research theme Human Computer Intelligent Interaction. He has published 20 books, and over 500 papers in network theory, digital filtering, image processing, and computer vision. His professional interests lie in the broad area of information technology, especially the transmission and processing of multidimensional signals. Dr. Huang is a Member of the National Academy of Engineering, a Foreign Member of the Chinese Academies of Engineering and Sciences, and a Fellow of the International Association of Pattern Recognition and the Optical Society of America, and has received a Guggenheim Fellowship, an A. V. Humboldt Foundation Senior U.S. Scientist Award, and a Fellowship from the Japan Association for the Promotion of Science. He received the IEEE Signal Processing Society’s Technical Achievement Award in 1987 and the Society Award in 1991. He was awarded the IEEE Third Millennium Medal in 2000. In 2000, he received the Honda Lifetime Achievement Award for “contributions to motion analysis.” In 2001, he received the IEEE Jack S. Kilby Medal. In 2002, he received the King-Sun Fu Prize from the International Association of Pattern Recognition and the Pan Wen-Yuan Outstanding Research Award. In 2005, he received the Okawa Prize. In 2006, he was named by IS&T and SPIE as the Electronic Imaging Scientist of the Year. He is a Founding Editor of the International Journal Computer Vision, Graphics, and Image Processing and Editor of the Springer Series in Information Sciences, published by Springer Verlag.