Expression-invariant 3D face recognition - CiteSeerX

4 downloads 0 Views 1MB Size Report
typically involves using a multicamera setup to record the face from multiple viewpoints. Our two eyes are a good example of this approach in that the binocular ...
10.1117/2.1200811.1366

Expression-invariant 3D face recognition Michael Bronstein Tools from differential geometry enable a biometric technology that efficiently and accurately distinguishes between people, even as similar as a pair of identical twins. Face recognition is one of the classical and still unsolved problems that has kept computer vision scientists busy since the early 1970s.1 Though a routine task for humans, identifying a face is a far greater challenge for machines. One of the main problems is the vast number of degrees of freedom in the appearance of a human face. External factors such as variations in illumination and head pose can make the same subject look completely different. For this reason, most of today’s face-recognition technologies work only in controlled environments, where the influence of such factors on face perception is moderate. We believe that a major sticking point in face recognition is that it is approached as a problem in 2D image analysis, whereas in practice the human face is a 3D object. Recent research has shown that taking into account the geometry of a face as well as its pose could reduce the sensitivity of identification systems to the environmental context. At the same time, acquiring facial geometry is more painstaking than simply capturing a pose, and typically involves using a multicamera setup to record the face from multiple viewpoints. Our two eyes are a good example of this approach in that the binocular system gives us the ability to perceive depth. In regards to the recognition task itself, however, technology deviates from nature. Human visual perception relies mostly on analyzing how an object appears rather than its geometric structure. Machines, on the other hand, treat human faces as 3D surfaces and compare them using relevant shape analysis methods. Unfortunately, human faces are nonrigid, and the variability of different shapes a deformable object can assume is immense. For instance, when we smile, our facial surface moves in a way that dramatically changes its geometry. As a consequence, 3D facerecognition methods have been struggling with the problem of sensitivity to facial expressions.

Figure 1. How do you canonize a person? The process of expressioninvariant 3D face recognition involves computing the canonical (i.e., most essential) form of the facial surface, which ‘undoes’ the degrees of freedom due to facial expressions. The resulting forms can then be compared as rigid shapes.

It is admittedly very difficult to model the deformations of the facial surface as a result of different expressions. Computer graphics experts routinely encounter this challenge in producing 3D animation movies. What is easier is to characterize the geometric properties of the surface that remain unaltered or invariant under such transformations. We found that most facial expressions have little effect on the intrinsic geometry, that is, the structure of distances measured on the facial surface. In differential geometry, such distances are called geodesic, a notion that derives from map-making applications.2 This can be illustrated intuitively as follows. Take a piece of paper and draw a line between a pair of points. If you now bend the paper, the position of the points in space can change, but the length of the path connecting them (which is no more a straight line since the paper is curved) remains unchanged. Similarly, facial expressions are deformations that approximately preserve the geodesic distances, and for this reason are called isometries. Consequently, comparing two faces requires examining their geodesic distance structures. Since the structures are isometryinvariant, and expressions are modeled as isometries, the comparison is tantamount to expression-invariant face recognition. Continued on next page

10.1117/2.1200811.1366 Page 2/2

Having found a way to capture the expression-invariant characteristic of the face in terms of the geodesic distances, we still have to answer the question of how to compare two sets of geodesic distances describing two faces. The solution lies in analytical techniques used for visualizing multidimensional data. The method we borrowed from these applications is called multidimensional scaling (MDS), and it allows visualizing a complicated distance structure in a low-dimensional Euclidean space. Applying MDS to our geodesic distances, we obtain lowdimensional representations (introduced by Elad and Kimmel and referred to as canonical forms3 ). These forms are a single way of representing all the possible surfaces that have the given geodesic distances (i.e., all the expressions of the face), thus, in a sense, undoing them. Based on this approach, face recognition is performed by ‘canonizing’ faces, followed by comparing the forms as rigid objects (see Figure 1). This technique involves two numerical stages of computing the geodesic distances and performing MDS. Both can be performed very efficiently, allowing for a real-time face-recognition system.4 We tested this method on a database of faces with extreme facial expressions and obtained very high recognition performance. We were even able to distinguish between identical twins.2 In follow-up studies,4, 5 we showed that the distance structures capturing the expression-invariant characteristics of the faces can be compared directly without using canonical forms. For illustration, think of stretching a rubber mask to fit your face. The more similar the mask is to your face, the less you need to stretch it. Considering a facial surface as such a mask, we can measure its similarity to another face by quantifying the amount of stretching we have to introduce. This approach has a profound relation to the so-called Gromov–Hausdorff distance,6 another powerful tool in metric geometry. We believe that borrowing such tools from theoretical geometry and applying them to computer vision problems could create important practical applications in face recognition. At this point, we are beyond feasibility testing, proven by a lab prototype. Currently, we are working on developing a commercial prototype based on our 3D face-recognition technology.

Author Information Michael Bronstein Technion – Israel Institute of Technology Haifa, Israel http://www.cs.technion.ac.il/∼mbron/ Michael Bronstein received his PhD from the Department of Computer Science, Technion. He has authored more than 40 papers, several patents, and a book, Numerical Geometry of NonRigid Shapes. Highlights of his research were featured in CNN news. He co-chaired the IEEE Computer Society Conference on Computer Vision and Pattern Recognition 2008 workshop on nonrigid shapes and deformable image alignment. References 1. T. Kanade, Computer Recognition of Human Faces, Birkh¨auser, Basel, 1977. 2. A. M. Bronstein, M. M. Bronstein, and R. Kimmel, Expression-invariant 3D face recognition, Proc. AVBPA, pp. 52–61, 2003. 3. A. Elad and R. Kimmel, On bending invariant signatures for surfaces, Trans. Pattern Analy. Machine Intell. 25 (10), pp. 1285–1295, 2003. 4. A. M. Bronstein, M. M. Bronstein, and R. Kimmel, Numerical Geometry of NonRigid Shapes, Springer, New York, 2008. 5. A. M. Bronstein, M. M. Bronstein, and R. Kimmel, Generalized multidimensional scaling: a framework for isometry-invariant partial surface matching, Proc. Nat’l Acad. Sci. USA 103, pp. 1168–1172, 2006. 6. M. Gromov, Structures m´etriques pour les vari´et´es riemanniennes, Cedic/Fernand Nathan, Paris, 1981.

c 2008 SPIE