LPT: Eye Features Localizer in an N-Dimensional Image Space

1 downloads 0 Views 797KB Size Report
Abstract - Facial feature extraction is one of the most important challenges in the area of facial image processing. This paper introduces a new method for ...
Int'l Conf. IP, Comp. Vision, and Pattern Recognition | IPCV'10 |

347

LPT: Eye Features Localizer in an N-Dimensional Image Space Mohammad Mahdi Dehshibi1, Azam Bastanfard2, and Alireza Abdi3 Young Researchers Club, Islamic Azad University South Tehran Branch, Tehran, Iran 2 IT Research Laboratory, Faculty of Engineering, Islamic Azad University Karaj Branch, Karaj, Iran 3 Faculty of Electrical, Computer and IT, Islamic Azad University Qazvin Branch, Qazvin, Iran 1

Abstract - Facial feature extraction is one of the most important challenges in the area of facial image processing. This paper introduces a new method for locating eye features that is capable of processing images rapidly while achieving high detection rates. The proposed method is applicable to an n-dimensional space. Therefore, a new representation is considered for image, where an m×n image consists of m observation sets in an n-dimensional space. The main contribution to this paper is proposing a one-to-one linear transform based on this new representation called Linear Principal Transformation (LPT). LPT reduces the dimension of the image from n to two and allows extraction of all image features rapidly and efficiently. A set of experiments on the FERET and IFDB image data set is presented. The performance of eye feature extraction system is comparable to the best previous systems, where the success rate of the proposed method is 95.2%. Keywords: Eye Features Extraction; Image Space; Linear Principal Transform; Iris Detection

1

Introduction

Transforming the input data into a set of features is called feature extraction. In the area of facial image processing, the feature extraction process is the most significant stage in applications like Face Recognition [1], Facial Expression [2], Face Detection [3], Age Classification [4], etc. Regarding facial feature extraction, there is a general agreement that says eyes are the most important facial features; thus a great research effort has been devoted to their detection and localization [10]. This is due to several reasons among which are the following:  Existence of eyes verifies that the object of interest is human.  Factors which influence the face appearance have less effect on the appearance of eyes. For instance, the eyes are unaffected by the presence of facial hair, and are little altered by small in-depth rotations.  Knowledge about the eyes’ position helps the estimation of the face scale and degree of its in-plane rotation.  Accurate localization of the eyes, allows identifying all other facial features of interest.

Due to this agreement, our method concentrates on the extraction of main eye features — eyebrow, eyelids, and pupil. The proposed method is a straightforward linear oneto-one transform, which utilizes Eigen analysis to extract the features of interest. LPT assumes that an m×n image consists of m observation sets (vectors) in an n-dimensional vector space. Among these vectors, the vector which has the highest variance corresponds to the features of interest. To obtain the features, first, the covariance matrix of image is calculated. Then, the covariance matrix is diagonalized by a matrix of its orthonormal eigenvectors. Finally, the eigenvector which has the highest eigenvalue is derived and mapped into a 2dimensional vector space in order to be graphed desirably. An analysis of the graph's extrema will highlight the features of interest. This approach has advantages over other feature extraction schemes in its speed and simplicity, and insensitivity to image conditions such as illumination and occlusion. Although LPT in terms of Eigen analysis is similar to the face recognition system that is introduced by Turk and Pentland [1], it has fundamental difference with the Eigenface method. Eigenface is a clustering method which uses N images to construct a discriminative model. In the Eigenface approach, first, each image is vectorized. Then the difference between each vector and the mean image is calculated. Finally, these vectors lie beside each other to construct data matrix. At this stage, eigenvectors and eigenvalues of the covariance matrix are calculated and only k eigenvectors with the highest eigenvalues are chosen. These eigenvectors are called Eigenfaces. In the recognition step, first, the new image is transformed into its Eigenface components and then the Euclidean distance between the input image and data matrix is calculated to identify the input image. The subsequent discussion is organized into the following five sections: Section 2 explains the background of facial feature extraction and the defects of projection functions. Section 3 describes the proposed method, which is based on a new look at an image. Experimental results are represented in Section 4. Finally Section 5 concludes the paper.

2

Background

Facial feature extraction, in general, refers to the detection of eyes, mouth, nose and other important facial

348

Int'l Conf. IP, Comp. Vision, and Pattern Recognition | IPCV'10 |

components. For this purpose, various techniques have been proposed which may be mainly classified into four groups: template-based, appearance-based, color segmentation-based, and geometric feature-based approaches. The first two methods require using a template that an expert generates or a machine generated template. Template matching [6], ASM [7], SVM [9], and AdaBoost [10] fall into this category. Although these methods meet with success and popularity, their learning and computational time are long. Furthermore, they lost their efficiency due to variation in the illumination and out of axis rotation. Color segmentation techniques [5] use skin color to isolate the face. Any non-skin color region within the face can be represented as a candidate for “eyes” and/or “mouth.” This method is useful only for color images, and their false positive rate is high. In geometric feature-based approaches, the features are extracted using anthropometric relation [4], [11] of the face components. Valley detection filters [12] and analysis of horizontal and vertical edge projections are such examples [8]. Functions which are used in this approach cannot distinguish slight changes in the image very well. Geng et al [13] used the Variance Projection Function for iris localization. Dehshibi et al [4] used the Combined Projection Function to locate the position of the facial feature points such as the iris, nose, center of lips and chin. Even though geometric feature-based methods are free from the defects of the first three approaches, and can reduce the problem’s space size desirably, they are not one-to-one. Therefore, all distinct points on the same vertical line are mapped into the same point in the hyper-plane. This defect causes loss of structure in data and the feature extraction process may face problems. Figure 1 shows the result of applying projection function to an image X. As it is obvious, the main horizontal change in the picture occurs in line 3, but the projection cannot highlight that. The proposed method, LPT, overcomes this weakness. LPT is a linear transformation, which maps a vector space V into a vector space W and highlights the main changes in the image.

3

Linear Principal Transformation

The goal of Linear Principle Transformation (LPT) is to identify the most meaningful basis, which contains the features of interest. The proposed method will reveal the hidden structure of data. As the remainder of the article entails, this transformation operates in an N-dimensional vector space. An image, in fact, is a scene that is recorded at a special moment. Even a fixed scene may change invisibly due to slight changes of illumination. Therefore, image has a statistical nature. We assume that an image consists of some observation sets. This assumption leads to defining a new space for representing image. Definition 1: Let Xm×n be a grayscale image, this image is an n-dimensional space (ℝn) with m vectors. To illustrate the above definition, we define a set of maps. This set helps to establish the addition and scalar multiplication properties for the defined vector space [14]. These maps are as follow: 1. f + g: F → F (f + g)(x) = (f(x) + g(x)) mod 256 2. λf: F → F (λf)(x) = λf(x) mod 256 3. f: F → F (-f)(x) = 255 – f(x) We assume that F is a finite field in the range of 0-255. This new representation of the image space can be used effectively for feature extraction. As it is evident in Figure 1(a), the main change in the image occurs in the third row where the variance is maximum. Therefore, we assume that directions with largest variances in our image space contain the feature of interest. Let us represent an image by X = [x1, x2, …, xm]T, where xi = [xi1, xi2, …, xin]. Here m and n are the number of rows and columns of the given image, respectively. Variance only operates on one set of observation. Therefore, we must calculate covariance to find out the amount of variance of dimensions with respect to each other. Consider two rows of image: xi = {xi1, xi2, …, xin} xj = {xj1, xj2, …, xjn}

Figure 1. (a) A 5×3 pixels grayscale image X, (b) Integral Projection Function (IPF), (c) Mean Integral Projection Function (MIPF), (d) Variance Projection Function (VPF), (e) General Projection Function (GPF), and (f) Combined Projection Function (CPF) of the image X.

Int'l Conf. IP, Comp. Vision, and Pattern Recognition | IPCV'10 |

The covariance between Xi and Xj is defined as: 𝑛 𝑘=1 𝑥𝑖𝑘 − 𝑥𝑖 (𝑥𝑗𝑘 − 𝑥𝑗 ) (1) 𝜎𝑥 𝑖 ,𝑥 𝑗 = 𝑛−1 where 𝑥𝑖 and 𝑥𝑗 are the mean of xi and xj data set, respectively. In general, the covariance between each pair of the standard observation constructs the covariance matrix: (2) 𝐶𝑋 = 1 𝑛 𝑋𝑋 𝑇 CX is a square symmetric m×m matrix. The diagonal terms of CX are the variance of particular rows and the offdiagonal terms of CX are the covariance between observations. In the diagonal terms, by the assumption, large values correspond to the structure of interest. The best feature extraction occurs by defining the linear transformation P such that the dependency on data in CX is eliminated and the large variances remain. With respect to the following Theorem [14], to achieve a diagonal matrix, P must be a matrix of orthonormal eigenvectors related to CX. Theorem 2: A symmetric n×n matrix is diagonalized by a matrix of its orthonormal eigenvectors. Therefore, this covariance matrix is subject to Eigen analysis, which seeks a set of M orthonormal vectors, ψ, which best describes the distribution of each row of image: CX ψ = λ ψ (3) where λ is eigenvalue of CX . When eigenvectors of symmetric matrix CX is calculated, the eigenvector corresponding to the highest eigenvalue, ψmax, is chosen. To illustrate this vector in a two dimensional space, a linear map is defined as follows: f: ℝn → ℝn (4) f(x1, x2, …, xi) = (i, xi); i ∊ 1, 2, …, n Soundness of the defined map is obtained by the following properties: f: ℝn → ℝ2 (1) ∀ X, Y ∊ ℝn; f(X + Y) = f(X) + f(Y)  f(X + Y) = f((x1, x2, …, xn) + (y1, y2, …, yn)) = f((x1 + y1), …, (xn + yn)) = [(1, x1 + y1), …, (n, xn+ yn)]  f(X) + f(Y) = [(1, x1), …, (n, xn)] + [(1, y1), …, (n, yn)] = [(1, x1 + y1), …, (n, xn + yn)] (2) ∀ X ∊ ℝn, ∀ c ∊ F; f(cX) = cf(X)  f(cX) = f(cx1, cx2, …, cxn) = [(1, cx1), …, (n, cxn)]  cf(X)=c f(x1, x2, …, xn) = c[(1, x1), …, (n, xn)] = [(1, cx1), …, (n, cxn)] where f is a linear map from ℝn to ℝn. ■ Figure 2 shows the map of an eigenvector corresponding to the highest eigenvalue of CX. It is evident that the extremum of LPT is placed in the location of the main change in the image. Finding the feature of interest is subject to analyzing the map's extrema. This transformation is very useful for locating primary features of a face, which we describe in the next section.

4

Locating eye features using LPT

Eye features extraction consists of two stages. In the first stage, known as eye detection, the eye perimeter is extracted from the face image. Next, the interest region is

349

searched for localizing the feature point(s). Many appropriate algorithms have been proposed in the literatures [16], [17] for eye detection. Here, we concentrate on locating features in this perimeter.

Figure 2. Linear Principle Transformation. There is just one extremum in the location of main change.

In order to locate the exact iris, first, the eye searching perimeter is extracted from the face image. Second, this area is divided into left and right eye searching perimeters to increase the algorithm’s accuracy. Finally, the proposed method is applied on each perimeter. Figure 3 shows the result of utilizing LPT on the left eye searching perimeter.

Figure 3. Result of performing LPT on the left eye image (59×50 pixels). Two maximums are placed on the top and bottom eyelid.

Features localization algorithm is as follows: Feature Localization Algorithm 1. Extract the eye searching perimeter from face image and name it “eye”. eye = eye_detect(face) 2. Calculate EIGENVECTORS matrix of symmetric matrix eye×eyeT and name it “P”. [P, eigenvalue] = eigen(eye × eyeT) 3. Choose the eigenvector corresponding to the highest eigenvalue and name it “Phigh”. max_e = max(eigenvalue) max_e_p = find(eigenvalue==max_e) Phigh = P[:, max_e_p] 4. Map eyes on the new basis, Phigh, and name it “LPT”. LPT = MAPn→2(PhighT) 5. Calculate derivation of LPT and their extrema.

350

6.

Int'l Conf. IP, Comp. Vision, and Pattern Recognition | IPCV'10 |

extremum = ∅ f(x) = LPT’ extremum = extremum ⋃ {x| f(x)=0} There are three features as follows: eyelid_top = first maximum of extremum set eyelid_bottom = second maximum of extremum set iris = 1/2 (eyelid_top + eyelid_bottom)

Choosing the interest region and its neighborhood is crucial for the performance of many eye features localization algorithms. If this neighborhood is small enough, a high accuracy is achieved. In contrast, if it is too big, the existence of hair, background or glasses noticeably reduces the accuracy. Furthermore, changes in the conditions like illumination and pose can influence the performance of the system. Nevertheless, the proposed method is free from these limitations. For instance, we perform LPT on the area which includes the left eye, eyebrow, part of ear and hair. The obtained results are very favorable. Moreover, beside the three previously mentioned features, we could also extract top and bottom of the eyebrow line. The result is depicted in Figure 4.

affecting the image quality. Figure 5 shows the result of applying the LPT algorithm on some nonstandard images. The proposed algorithm is performed on the Iranian Face Database (IFDB) [8] and FERET [15], [20]. Performance of iris localization method is measured by computing the maximum of the left and right eye position estimation errors and by normalizing this error to the annotated eye distances [18], as follows: max⁡ { 𝜂𝐿 − 𝜂𝐿 2 , 𝜂𝑅 − 𝜂𝑅 2 } (5) 𝜖𝑚𝑎𝑥 = , 𝜂𝐿 − 𝜂𝑅 2 where ηL = (ηLx , ηLy ) and ηR = (ηRx , ηRy ) are the locations of the estimated left and right eye center position, respectively. ηL and ηR are the annotated eye positions and ∙ 2 is the Euclidean distance. In this measure, εmax ≤ 0.25 (a quarter of the interocular distance) corresponds roughly to the distance between the eye center and the eye corners, εmax ≤ 0.1 relates to the range of the iris, and εmax ≤ 0.05 corresponds to the range of the cornea. In order to illustrate the efficiency of the proposed algorithm in eye location estimation, the accuracy of LPT is presented in Figure 6 together with the accuracy of three eye localizer methods (Kroon et al [19], Geng et al [13], and Dehshibi et al [4]). Table 1 and 2 tabulate comparison of eye localization methods on the IFDB and FERET image data set for different εmax error limits.

5

Figure 4. Five extracted features in the eye searching perimeter.

The proposed method yields low computational cost allowing real time processing. Time complexity of this method is O (mn2), where m is the number of rows and n is the number of columns in the image. The execution time on a Pentium D machine is 0.052090 seconds. In addition, the algorithm is robust against illumination changes, occlusion such as glasses and other disturbing factors, which are

Conclusion

We have presented an approach for eye features extraction which minimizes computation time while achieving high localization accuracy. The proposed method may well have broader application in computer vision and facial image processing. Various techniques have been proposed in the literatures for eye features extraction and can be mainly classified in four groups: geometric feature-based, template-based, color segmentation-based and appearance-based approaches. The main contribution of this paper is proposing an efficient algorithm for eye features extraction.

Figure 5. Results of performing LPT on (a) a blurred image (b) an occluded image (c) an image with nonstandard illumination. As expected, although the quality of images is very low; nevertheless, the proposed method detects iris locations with a few pixel errors.

Int'l Conf. IP, Comp. Vision, and Pattern Recognition | IPCV'10 |

This algorithm is a linear one-to-one transformation which we called “Linear Principal Transformation.” In comparison with other approaches, this algorithm has high localization accuracy, while its computation time is short. Furthermore, this algorithm has not the shortcoming of other algorithms such as failure against nonstandard illumination, occlusion, long training time, low speed search etc. Finally, this paper presents a set of detailed experiments on two data sets which include faces under a very wide range of conditions. Experiments also prove the power of the proposed method in comparison with other algorithms.

(a)

(b) Figure 6. Comparisons between the eye detection results on (a) IFDB (b) FERET. Table 1. Comparison of eye localization methods on the ifdb for different εmax error limits. Method < 0.05 < 0.1 < 0. 15 < 0.2 < 0.25 7.5% 65% 78.6% 87% 93.8% Kroon et al 68.6% 72.03% 84.25% Geng et al 9.2% 60.3% 76.23% 84.33% 90.56% Dehshibi et al LPT 23.2% 70% 85.6% 92.36% 95.2% Table 2. Comparison of eye localization methods on the feret for different εmax error limits. Method < 0.05 < 0.1 < 0. 15 < 0.2 < 0.25 41.95% 74.97% 80.98% 86.48% Kroon et al 49.95% 73.97% 78.98% Geng et al 37.94% 73.97% 80.98% 86.48% Dehshibi et al LPT 3% 58.4% 79.98% 86.48% 90.07%

6

351

References

[1] M. Turk, A. Pentland, "Eigenfaces for Recognition," Journal of Cognitive Neurosicence, Vol. 3, No. 1, pp. 71—86, 1991. [2] Y. Yang, S. S. Ge, T. H. Lee, and C. Wang, “Facial expression recognition and tracking for intelligent human-robot interaction,” Intel Serv Robotics, Springer, Vol. 1, pp. 143—157, 2008. [3] M. H. Yang, D. J. Kriegman, N. Ahuja, “Detecting Faces in Images: A Survey,” IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 24, pp. 34—58, 2002. [4] M. M. Dehshibi, and A. Bastanfard, “ A new algorithm for age recognition from facial images," Signal Processing, Vol. 90(8), pp. 2431-2444, 2010. [5] F. Marques and C. Sobrevals, “Facial Feature Segmentation from Frontal View Images,” in Proc. of 11 European Signal Processing Conference, pp. 212—220, 2003. [6] H. Y. Kim and S. A. Araujo, “Grayscale Template matching Invariant to rotation, scale, brightness and contrast,” in Proc. of IEEE Pacific-Rim Symposium on Image and Video Technology, Lecture Notes in Computer Science, Vol. 4872, pp. 100—108, 2007. [7] M. H. Mahoor, M. Abdel-Mottaleb and A. N. Ansari, “Improved Active Shape Model for Facial Feature Extraction in Color Images,” Journal of Multimedia, Vol. 1, pp. 21—28, 2007. [8] A. Bastanfard and M. M. Dehshibi, “Iranian face database with age, pose and expression,” in Proc. of International Conference on Machine Vision, pp.50—56, 2007. [9] P. Campadelli, R. Lanzarotti, and G. Lipori, “Automatic facial feature extraction for face recognition,” In Face Recognition, Publisher I-Tech Education, 2007. [10] X. Tang, Z. Ou, T. Su, H. Sun, and P. Zhao. “Robust Precise Eye Location by AdaBoost and SVM Techniques”. in Proc. of Int’l Symposium on Neural Networks, pp. 93—99, 2005. [11] A. Bastanfard, O. Bastanfard, H. Takahashi, and M. Nakajima, “Toward Anthropometrics Simulation of Face Rejuvenation and Skin Cosmetic,” Journal of Visualization and Computer Animation, Vol. 15, pp. 347—352, 2004. [12] A. Gunduz and H. Krim, "Facial feature extraction using topological methods," in Proc. of ICIP, pp. 673—679, 2003. [13] X. Geng, ZH. Zhou, SF. Chen, “Eye location based on hybrid projection function,” Journal of Software, pp. 1394—1400, 2004. [14] H. Anton, C. Rorres, Elementary linear algebra with applications 9 edition, Wiley, 2005. [15] P. J. Phillips, H. Moon, S. A. Rizvi, P. J. Rauss, “The FERET Evaluation Methodology for Face-Recognition Algorithms,” IEEE Trans. on Pattern Recognition and Machine Intelligence, Vol. 22, No. 10, pp. 1090—1104, 2000.

352

Int'l Conf. IP, Comp. Vision, and Pattern Recognition | IPCV'10 |

[16] D. Cristinacce, T. Cootes and I. Scott, “A multi-stage approach to facial feature detection,” in Proc. of 15th British Machine Vision Conference, pp. 231—239, 2004. [17] S. Karungaru, M. Fukumi, N. Akamatsu, “Automatic Human Faces Morphing Using Genetic Algorithms Based Control Points Selection,” International Journal of Innovative Computing, Information and Control, Vol. 3, No.2, pp. 247—258, 2007. [18] O. Jesorsky, K. J. Kirchberg, and R. W. Frischholz, “Robust face detection using the hausdorff distance,” in Proc. of Audio- and Video-Based Biometric Person Authentication, pp. 90—96, 2001. [19] B. Kroon, A. Hanjalic, and S. M. P. Maas, “Eye Localization for Face Matching: Is It Always Useful and Under What Conditions?” in Proc. of the ACM international conference on Content-based image and video retrieval, pp. 379—388, 2008. [20] P. J. Phillips, E. M. Newton, “Meta-Analysis of Face Recognition Algorithms,” in Proc. of the Fifth IEEE International Conference on Automatic Face and Gesture Recognition (FRG'02), pp. 224—230, 2002.