Local descriptors and similarity measures for frontal

0 downloads 0 Views 2MB Size Report
Aug 21, 2013 - A Modified Census. Transform (MCT) was introduced in [33], which is equivalent to. ILBP. A few works were reported on the application of CT ...
J. Vis. Commun. Image R. 24 (2013) 1213–1231

Contents lists available at ScienceDirect

J. Vis. Commun. Image R. journal homepage: www.elsevier.com/locate/jvci

Local descriptors and similarity measures for frontal face recognition: A comparative analysis Michał Bereta a,b,⇑, Witold Pedrycz a,c,d, Marek Reformat a a

Department of Electrical and Computer Engineering, University of Alberta, 9107–116 Street, Edmonton, T6R 2V4 AB, Canada Institute of Computer Science, Cracow University of Technology, ul. Warszawska 24, 31-155 Kraków, Poland c Department of Electrical and Computer Engineering, Faculty of Engineering, King Abdulaziz University, Jeddah 21589, Saudi Arabia d Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland b

a r t i c l e

i n f o

Article history: Received 28 October 2012 Accepted 6 August 2013 Available online 21 August 2013 Keywords: Face recognition Face identification Face verification Local descriptors Local binary patterns Gabor filters FERET database Local descriptors’ taxonomy

a b s t r a c t Face recognition based on local descriptors has been recently recognized as the state-of-the-art design framework for problems of facial identification and verification. Given the diversity of the existing approaches, the main objective of this paper is to present a comprehensive, in-depth comparative analysis of the recent face recognition methodologies based on local descriptors. We carefully review and contrast a suite of commonly encountered local descriptors. In particular, we highlight their main features in the setting of problems of facial recognition. The main advantages and limitations of the discussed methods are identified. Furthermore a carefully structured taxonomy of the existing approaches is presented We show that the presented techniques are particularly suitable for large scale facial authentication systems in which the training stage with the use of the overall face database might be computationally prohibited. A variety of approaches being used to realize a fusion of the local descriptions into the global ones are discussed along with their pros and cons. Furthermore different similarity measures and possible extensions and hybridizations with statistical learning techniques are elaborated on as well. Experimental results obtained for the FERET database are carefully assessed and compared. Ó 2013 Elsevier Inc. All rights reserved.

1. Introduction Facial recognition has been attracting research attention for years due to its wide practical applicability. The area itself poses a great deal of challenges from the perspective of computer vision and image understanding. Face as an object of recognition and classification has its own specific characteristics, which makes many techniques commonly available in image processing not applicable in a direct way. A general overview of the face recognition methods can be found in [1]. The methods of face recognition can be roughly divided into two categories, namely global and local approaches. For many years the global approaches have been establishing a dominant and visible position. Methods such as Principal Component Analysis (PCA) [2,3], Linear Discriminant Analysis (LDA) [4–6] or Independent Component Analysis (ICA) [7] have brought significant contributions to the field. These methods, however, exhibit some limitations. For instance, it is well known that PCA is sensitive to

⇑ Corresponding author at: Department of Electrical and Computer Engineering, University of Alberta, 9107–116 Street, Edmonton, T6R 2V4 AB, Canada. Fax: +1 780 492 1811. E-mail addresses: [email protected], [email protected] (M. Bereta), wpedrycz @ualberta.ca (W. Pedrycz), [email protected] (M. Reformat). 1047-3203/$ - see front matter Ó 2013 Elsevier Inc. All rights reserved. http://dx.doi.org/10.1016/j.jvcir.2013.08.004

the changes of lightning conditions, which could have a significant detrimental effect. In general, these methods aim to describe the face as a whole and as such they are highly susceptible for changes in the facial appearance such as, e.g., different expression (smile, anger, etc.). The global approaches usually require forming a new subspace for representing face images by optimizing different criteria (e.g., by minimizing a reconstruction error in PCA or maximizing the ratio of between and within class distances in LDA). Finding the proper subspace requires a training set and for that reason the obtained solution may be biased and strongly dependent on the particular training set. Additionally, in a case of a large face database these methods may become computationally prohibited. The publicly available databases usually used for evaluating facial recognition methods (e.g. FERET [8]), consist of images of no more than a thousand of individuals. In real-world situations, such as those encountered in systems of public or national safety, the number of persons in a database can reach thousands, sometimes with just one image per person. In such cases, solutions that do not require the learning stage are especially appreciated. Selecting subsets of the available images solves the problem only partially as the resulting model will be still biased toward the selected data. In such cases, solutions, which offer good discriminatory abilities without the training stage become advantageous. Nevertheless, global methods form an important research area and new

1214

M. Bereta et al. / J. Vis. Commun. Image R. 24 (2013) 1213–1231

techniques have been developed [9,10]. A comprehensive overview of such methods can be found in [11]. The local approach comprises an alternative group of methods of facial recognition. Here the face is described by expressing specific characteristics at specific points called landmarks or fiducial points. One of the techniques present within the category of local approaches is the widely known Elastic Bunch Graph Matching (EBGM) method [12] which produced very good results in the FERET contest [8]. This approach has been further studied and improved, see [13]. Gabor wavelets [14], used also in the EBGM, showed to be a particularly effective way of forming facial features [15]. However, the process of an automatic localization of facial landmarks can be an issue in itself affecting the final results. In the EBGM, for a given test image these landmarks are localized by using a database of training images with manually annotated landmark positions. Existing algorithms for detection of interest points are used as well [16,17]. While the interest points may serve as automatically determined facial landmarks, the problem with such methods is that they deliver different number of interest points for different images and additional care must taken to proceed with the landmarks selected in such a way. Recently reported results for facial recognition show a large improvement compared to those previously known in the literature [18–21]. The visible trend that can be observed among them is the local approach for facial description. The starting stage deals with the description of the face realized at a pixel level by making use of the local neighborhood of each pixel. Then, the image is divided into a number of sub-regions and from each sub-region a local description is formed as a histogram of the pixel level descriptions calculated in the previous step. Finally, regions’ information is combined into the final facial description by concatenating the partial histograms. This approach has several advantages. When properly designed, it does not require a training stage. Additionally, it is more robust to the changes in the facial image such as change in the expression or occlusion. By applying proper pixel level descriptors the influence of the luminance change can be minimized as well. For many years texture analysis and classification have not been closely linked with problems of face recognition. However, recently many texture descriptors have been successfully applied for face recognition. One of the first was the approach based on so-called LBP (Local Binary Patterns) [22]. The successful application of the idea stemming from texture analysis inspired further studies, see e.g. [23–26]. These approaches, often in combination with Gabor filters, form the most successful techniques available nowadays. Their advantage is that in most cases they do not require training. The drawback is the length of the facial description they produce. Apart from the technique used to form the description of the image, a significant influence on the quality of the final results has the similarity (or dissimilarity) measure being used for comparison of images coming from the gallery and the test sets. In local descriptor based methods, there is an additional issue of aggregating the local descriptions in order to make the global one. An additional difficulty, which is associated with the length of the description, manifests in the form of the ‘‘curse of the dimensionality’’. For example, in the case of image of 128  128 pixels, after applying ‘‘traditional’’ 40 Gabor filters, a length of description is 128  128  40 = 655,360 (when only the magnitude part is taken into account). High dimensionality affects the computing overhead. What is more important, the accuracy of the model could be significantly affected as well. While there exist many methods for linear and nonlinear dimensionality reduction, they often suffer from the already mentioned drawbacks. One of the possibilities to alleviate the dimensionality problem is to apply voting among multiple base classifiers, where the classification decisions are made on a basis of a part of the available description.

There are several main objectives of this study. First, our intent is to deliver a comprehensive and in-depth comparative analysis of the most recent local descriptors along with their application to face recognition. Given the truly remarkable diversity of these descriptors, the objective is not only to offer their detailed characterization but also to discuss some rationale behind and contrast their functionality. Second, what is equally important, we develop a certain taxonomy of the descriptors so that the reader can gain a certain general view at the ongoing activities in this realm. Local descriptors, while concatenated (combined) lead to an inherent curse of dimensionality. With this regard, we elaborate on several ways of avoiding (eliminating) this highly undesired phenomenon. Third, we critically review various architectures of the classifiers making use of local descriptors. We have compiled a comprehensive, fully updated list of references that could be helpful to the reader to gain a better view at the diversity of the most recent research developments in the area. We anticipate that the study could serve as a certain tutorial as we have paid attention to a thorough and in-depth exposure of the main concepts. One may note that a brief survey on the applications of Local Binary Patterns to facial recognition is given in [27]. However, by considering a recent rapid increase in the number of publications on LBP-like descriptors for face recognition, our comparative study is aimed at offering a more up-to-date introduction to the area. The study is organized as follows. In Section 2, we present a survey of the recent facial recognition models based on local descriptors. Here the most relevant local descriptors are described and a general outline of the recognition procedure is provided. Section 3 elaborates on different similarity (dissimilarity) measures used for matching faces as well as discusses possible methods of fusing and weighting partial descriptions derived from different sub-regions and/or local descriptors of different types. A comparison of the recent most successful experimental results are provided in Section 4 while the discussion is covered in Section 5. The conclusion is drawn in Section 6.

2. Local descriptors for face recognition Describing face images based on local features provides a global description – local features of the image are evaluated in the neighboring pixels and then aggregated to form the final ‘‘global’’ description. This is unlike in global methods in which the entire image is utilized to produce each feature. Some local approaches such as already mentioned EBMG [12,13] focus on only limited number of fiducial points, which should be properly determined in each new image in order for the system to work reliably. Another approach is to calculate the description for each pixel based only on its neighborhood and then combine the resulting pixels’ labels into the final descriptor. The final feature vectors are usually constructed as concatenated sub-regional histograms of these labels. The main idea is presented in Fig. 1. The process consists of three steps: - Select appropriate local descriptor and determine the description for each selected fiducial point (Fig. 1(a)) or for each pixel in the image (Fig. 1(b)). - Aggregate the resulting local descriptions in order to form the final global description of the image. In the case the local descriptor is calculated for each pixel, the resulting image description is usually formed through a concatenation of subregional histograms of local patterns. - Match the unknown image description with all known images based on some selected similarity (dissimilarity) measure.

M. Bereta et al. / J. Vis. Commun. Image R. 24 (2013) 1213–1231

1215

Fig. 1. The main idea of face representation based on local descriptors: (a) a description based on fiducial points, (b) a description based on histograms of local patterns.

The final result depends on all the three steps outlined above. Recently, much effort was directed toward developing highly discriminative local features. In what follows, we review different local descriptors and then in Section 3 discuss similarity (dissimilarity) measures. 2.1. Gabor filters We start the presentation of different local descriptors by briefly recalling the essence of Gabor filters. Gabor filters are important local descriptors on their own. The most successful methods described later in this section work in conjunction with Gabor filtered images. A comprehensive review of recent methods in Gabor based face recognition can be found in [15,14]. Although some of the methods, which are discussed in this study are also covered in these reviews, none of these two works are focused on local patterns. As the recent studies have shown, Gabor filters and LBP-like descriptors should not be perceived as competitive or alternative descriptors. Instead they could be treated as complementary ones. Gabor wavelets are widely used in image analysis and since the successful applications of EBMG algorithm [12] to face recognition, they have shown to be a sound choice for facial features descriptions. Each Gabor kernel is a product of a Gaussian envelope and a complex plane wave. These kernels exhibit desirable characteristics of spatial locality and orientation selectivity. The Gabor kernels are self-similar and can be generated from one mother wavelet by scaling and rotating:

wl;v ðx; yÞ ¼

kkl;t k2

r2

eðkkl;t k 

2

kzk2 =2r2 Þ



½eikl;t  er

2 =2



The Gabor wavelet representation of the image I(x, y) is defined as a convolution of the image with the Gabor kernels, i.e.,

Gl;v ðx; yÞ ¼ Iðx; yÞ wl;v ðx; yÞ where ⁄ denotes the convolution operator and wl;v ðx; yÞ is a Gabor kernel with a given orientation and scale. The Gabor wavelet coefficient being the result of the convolution, is a complex number

Gl;t ðx; yÞ ¼ Al;t ðx; yÞ  eihl;t ðx;yÞ with the amplitude Al;t ðx; yÞ and phase hl;t ðx; yÞ. For a long time, only the magnitude has been considered to be a useful descriptor for facial discrimination. Recent studies [18–20], however, indicated that the phase is a very useful discriminating feature, especially when being combined with local operators such as the ones described in the next subsection. The usual set of 40 Gabor filters (resulting from 8 orientations and 5 frequencies) when applied to every pixel, produces for each face image 40 complex valued images. For example, for an image 128  128 pixels, when only magnitude values are used, this results in 128  128  40 = 655,360 features. To limit the number of features the resulting complex valued images are often downsampled with a Gaussian pyramid. Another approach is to apply dimensionality reduction methods such as PCA or LDA. Such Gabor features can be used directly to match faces. However, as the recent studies show, the full potential of Gabor features is revealed when proper local texture operators are applied to Gabor magnitude images (or/and Gabor phase images) instead of original pixel valued images. We briefly describe such local texture operators (Sections 2.2–2.5).

ð1Þ



 kt cos/l kjx where z = (x, y), kl;m ¼ ¼ , kv ¼ fmax =2t=2 , in which kt sin/l kjy l and v parameterize the orientation and scale (frequency) of the Gabor filter, respectively. Usually, l ¼ 0; . . . ; lmax  1 and lmax ¼ 8 and /l ¼ lp=8. For t we have m ¼ 0; . . . ; mmax  1 and we have

mmax ¼ 5. Parameter r defines the ratio of the Gaussian window to the wavelength. The value fmax is the maximum frequency under consideration. For example, the values of this parameter could be set up as 0.5 or 0.25. The proper value of fmax strongly depends on the actual image size and should be tuned according to the particular task.

2.2. Local binary patterns The recently reported results for face authentication and verification show the importance of local descriptors originated from the texture analysis techniques. We start with basic local binary patterns, and then elaborate on some of their extensions. 2.2.1. Local binary pattern (LBP) The local binary pattern (LBP) utilizes differences among a central and neighboring pixels to express local texture information as a binary pattern. Originated as a texture descriptor [26], LBP was successfully applied to face authentication [22–25], detection

1216

M. Bereta et al. / J. Vis. Commun. Image R. 24 (2013) 1213–1231

[28], and facial expression recognition [29]. The basic idea of calculating a binary pattern description for a given central pixel pc is described as follows:

LBPðpc Þ ¼

7 X sðpi  pc Þ  2p i¼0

where

sðxÞ ¼



1; x P 0 0;

x : 1 if pi 6 pc  t where pi is the value of the neighboring pixel, pc is the value of the central pixel and p0i is the value assigned to the neighboring pixel. This approach is less sensitive to noise but is no longer strictly invariant to grey level transformation. This descriptor is called Local Ternary Pattern (LTP) [40]. The LTP binary code provided by the described procedure is first split into two parts, the upper pattern (LTPU channel, by replacing all 1 values by zeros) and lower pattern (LTPL channel, by replacing all 1 values by zeros and all 1 by

1218

M. Bereta et al. / J. Vis. Commun. Image R. 24 (2013) 1213–1231

Fig. 5. Examples of local patterns with the use of averaged values instead of pixel values: (a) 9  9 Multi-scale Block LBP (MB-LBP) with averaged values over 3  3 neighborhoods, (b) Circular MB-LBP with radius R and six neighboring patches of 3  3.

Fig. 6. Local Ternary Patterns (LTP) and Differential Local Ternary Patterns (DLTP).

1). Both, upper and lower patterns, are next processed as in the traditional LBP, i.e., the two binary codes are translated into their decimal values and further processed separately until the final histogram concatenation or classification have been realized. The concept is presented in Fig. 6. Also by a simple fusion of the lower and upper pattern, a new local pattern is formed as follows [41]:

DLTPðpÞ ¼ jLTPUðpÞ  LTPLðpÞj where DLTP(p) stands for Differential Local Ternary Pattern of the pixel p. Of course, instead of using direct pixels’ values like in LTP, one can use their averages. Incorporating additional threshold values and averaging done over regions, the local descriptor is potentially more resistant to noisy pixels. Different versions of this general approach can lead to different final descriptions. One of them is GRAB (General Region Assigned to Binary) [42], which uses overlapping regions to calculate average values (yet still a computationally efficient integral image can be used). Good results of GRAB for low resolution images were reported [42]. Based on the obtained results, the authors concluded that GRAB is better for multi-resolution analysis than circular LBP with different radii. This is for the reason that sampling raw pixel values at different distances from the center pixel is not able to capture the changes in resolution and scale.

Estimating a suitable value of the threshold level used in LTP pattern could be difficult. In [43] Adaptive Extended Local Ternary Pattern was described. It was proposed there how to automatically determine the local pattern threshold based on local statistics. The approach has been tested on avatar face image database yielding good results. Another idea called Three Patch Local Binary Pattern (TPLBP) was introduced in [44]. The originality of the method (whose essence is highlighted in Fig. 7) comes with a way in the calculations of each bit response in the binary code of the pattern based on distances of pixels in patches located on the circle around the central patch. Three patches are used to calculate each bit in the binary code of the label, which is determined according to the following expression [44]:

TPLBPR;P;w;a ðpÞ ¼

P X sðdðC i ; C p Þ  dðC iþa mod P ; C p ÞÞ  2i i¼0

where R is a radius, P is the number of patches regularly located on the circle around pixel p, w is the size of the patch (w x w) and a is the distance between the patches on the circle. The distance function d of the gray level values can be any distance function, e.g., the Euclidean one. In [44] was suggested to modify the function s as follows

M. Bereta et al. / J. Vis. Commun. Image R. 24 (2013) 1213–1231

1219

Fig. 7. Three Patch LBP (TP-LBP) with radius R, P = 8, w = 3 and a = 2. Note that no average values are used in this local pattern; here d(x) is a distance function.

 sðxÞ ¼

1; x P s 0; x < s

where s > 0. One can observe that the general idea of LBP can be easily extended in many different ways. However, further augmentations may not always guarantee improvement of the results, although many of the changes seem to be reasonable (e.g., average values should be less sensitive to noise). The already mentioned uniform LBP are able to capture the most frequent patterns present in texture (and face) images. However, the observation may be not valid for other modified local patterns especially when applied to already processed images (e.g., filtered with Gabor kernels). Some researchers (by introducing DLBP, Dominant LBP, or SE-LBP, Statistically Effective LBP) [45,46] suggest that in such cases one should not apply the definition of uniform patterns directly, but rather try to find dominant patterns by estimating which labels produced by a given local operator are in fact most common in a given image database and use only a given percentage of the most common labels. All others not so common values should be treated as a single label, as in the traditional uniform LBP.

2.2.6. Rotation invariant and other circular local patterns Local binary patterns originated from texture analysis tasks. In this category of problems, descriptors invariant to rotation are important. It can be observed [8] that rotating the image results in the shifting of the binary code of LBP. The shift can be eliminated by finding the minimum value among all possible values of the circularly shifted LBP, according to the following expression:

LBPP;R ðpc Þ ¼ min

06n 0

1 if ReðGl;t ðx; yÞÞ 6 0 

0

if ImðGl;t ðx; yÞÞ > 0

1 if ImðGl;t ðx; yÞÞ 6 0

where ReðGl;t ðx; yÞÞ and ImðGl;t ðx; yÞÞ are the real and imaginary parts of the Gabor response with orientation l and scale t for a pixel at coordinates (x, y). QBC coding is the quantification of Gabor phase feature and is much more stable than the exact phase values – only significant changes in the phase response will result in the change of the QBC code. Yet, as it was empirically shown, it exhibits significant discriminative characteristics. As a result of the QBC coding, for any face image two binary images are produced for each Gabor filter which results in 80 binary images for the standard set of 40 Gabor filters. The binary images are further encoded in order to produce two types of local descriptions, namely Global Gabor Phase Pattern (GGPP) and Local Gabor Phase Pattern (LGPP), each of them for real and imaginary parts of QBC coding separately. GGPP label, for real and imaginary binary QBC images separately, is constructed for a given frequency t based on the 8 QBC bits of 8 different orientations. The standard set of Gabor filters consists of 5 scales and 8 orientations. Thus a GGPP label is encoded as the ordinary LBP, by the means of eight bits and weights being powers of 2:

GGPPRe v ðx; yÞ ¼

lmax X1

i PRe i;v ðx; yÞ  2

i¼0

GGPPIm v ðx; yÞ ¼

lmax X1

i PIm i;v ðx; yÞ  2

i¼0

for m ¼ 0; . . . ; mmax  1. Unlike LBP, the label is determined based not on the special neighborhood but rather along the third dimension formed by combining the responses of different orientations of Gabor filters. As there are 8 different orientations in the standard

1222

M. Bereta et al. / J. Vis. Commun. Image R. 24 (2013) 1213–1231

Gabor filter set, the 8-bit-length GGPP label is very easy to visualize. As the result, for 40 Gabor filters (u = 8, v = 5) there are 10 GGPP maps (5 for real part and 5 for imaginary part). The second type of the local descriptor [18] is based on the calculated QBC bits is the Local Gabor Phase Pattern (LGPP). The LGPP map is calculated for each of the real and imaginary QBC binary images by means of a XOR operator. The XOR operator (called Local XOR Pattern – LXP) is applied to each pair consisting of a central pixel and binary pixels coming from its 3  3 neighborhood. In other words, for a given pixel at (x, y) we have

LGPPRe l;v ðx; yÞ ¼

7 X i Re P Re l;v ðx; yÞXORP l;v ðxi ; yi Þ  2 i¼0

LGPPIm l;v ðx; yÞ ¼

7 X i Re P Im l;v ðx; yÞXORP l;v ðxi ; yi Þ  2 i¼0

where i iterates over the pixels from the 3  3 neighborhood of the pixel at (x, y). As a result, 80 LGPP maps are formed (for each u and v and for the real and imaginary parts of QBC). Finally there are 10 + 80 = 90 (in a general case this number is vmax2 + vmaxumax2) local patterns maps, which are used to develop regional histogram that are afterwards concatenated in order to form the final image description. As it can be seen, no Gabor magnitude is required in order to calculate Histogram of Gabor Phase Patterns (HGPP), yet this approach brought one of the state-of-the-art results. The key point are the adequately designed local operators set which are able to capture the characteristics of Gabor phase. A different approach, also based on the Gabor phase, called Local Gabor Phase Difference Pattern (LGPDP) was proposed in [20]. This method encodes Gabor phase difference relationships between neighborhood pixels at each scale and orientation of Gabor filters. To calculate LGPDP, the 3  3 neighborhood of a given pixel in Gabor phase image is considered. For each neighboring pixel the absolute value of its phase difference according to the central pixel is calculated. If this value is between 0 and p/2, a bit value of 1 is assigned to this neighbor, otherwise we assign 0. Thus the value of 1 is assigned to neighbors with similar phases. Eight bits are encoded as a decimal number and this LGPDP label reflects how the phase of the central pixel differs from its neighbors. For a standard set of 40 Gabor filters there are 40 LGPDP patterns maps. As usual, they are divided into sub-regions from which local histograms are computed and concatenated. Although this approach seems to be simpler one when compared to HGPP [18] it produces shorter image description and the results reported outperformed those obtained by the HGPP. An interesting idea of combining both magnitude and phase information, called Monogenic Binary Pattern (MBP), was presented in [57] The uniform LBP was used to encode the magnitude information. As there are 58 different uniform LBP8,1, it is enough to encode them using only six bits. Additional two bits can be used to store QBC encoded phase information. Although the original method was not applied to Gabor magnitude and phase responses, such an application is possible. Here the main advantage would be the length of the resulting encoding, shorter than in many other approaches, where both the magnitude and phase information is encoded using a single byte. A successful approach to fusion magnitude and phase information has been recently introduced in [19]. The main difference to LGPP presented in [18] when using the phase information is that the information about the real part and imaginary parts are not treated separately as in the LGPP described in [18]. In the Local Gabor XOR Pattern (LGXP) proposed in [19], one compares phase values of two neighboring pixels and checks whether they belong to the same interval (e.g. [0, p/2)). In this way, the descriptor is not

as much sensitive to variations of the phase values yet still captures important discriminative information. In LGXP, phases are first quantized into different ranges. Then the LXP operator is applied to the quantized phases of the central pixel and each of its 3  3 neighbors – if the phases of the central pixel and its neighbor belong to the same interval, 1 is assigned to this neighbor, otherwise we assign 0. Finally, the resulting binary labels are concatenated and converted to the corresponding decimal label assigned to the central pixel. The number of intervals suggested in [19] is equal to 4. This makes LGXP quite similar to LGPP (real and imaginary) [18]. The results reported for LGXP with conjunction with LBP on Gabor magnitude maps (LGBP) surpassed the results of all current state-of-the-art face recognition methods on FERET database, however much of the success has to be attributed to the application of LDA analysis on the top of the sub-region histograms of local patterns. This complicates the proposed approach and requires the use of the training set. Notwithstanding, this approach belongs to the most promising ones. 2.5. Other local descriptors The presented local descriptors do not form a complete list of possible methods. There are many ways in which the presented techniques can be combined and modified in order to produce a novel descriptor, potentially better, at least in a case of a given set of face images. On the other hand, there are other local descriptors, not necessary LBP-like, which can be used in facial recognition. Texture analysis, object recognition, interest regions detection – these areas of image analysis have been developing their own computational tools for years and can serve as a source of inspiration for building improved facial recognition algorithms. Not all adaptations of the existing methods, however, were as successful as the ones based on LBP like features. For example, the widely known SIFT operator [58], effective at interest point detection and object/images matching, was also applied to face recognition [16,17]. Although this might produce satisfactory results, one of its drawbacks would be the fact that it produces potentially different number of so-called key points for different images. This requires the use of similarity functions to compare features vectors of different lengths, which might be difficult. Some other descriptors that were considered are, for example, edge orientation histogram (EOH) [59] or a histogram of gradients (HOG) [60]. On the other hand, texture analysis still offers many local descriptors, which have not yet been applied to facial recognition problems. Some of them seem to be worth exploring in this context. One possible example is the recently proposed WLD descriptor [61], which has shown to be successful in texture analysis and face detection. Weber Local Descriptor (WLD) [61] was proposed as an attempt to simulate the well known psychological law of perception formulated by Weber, stating that the human perception of the pattern depends not only on the change of the stimulus (signal) but also on the original strength of the stimulus. For example, one must shout in a loud environment in order to be heard, while it is enough to whisper in a quiet environment. WLD consists of two components, the differential excitation and orientation. Differential excitation depends not only on differences of the intensity values of the central pixel and it neighbors but also on the intensity of the central pixel, according to the expression:

WLDdiff :ex: ðpc Þ ¼ arctan

P1 X pi  pc pc i¼0

!

where i iterates over 3  3 neighborhood [61]. The arctan function is used in order to limit the values of the sum of relative differences.

1223

M. Bereta et al. / J. Vis. Commun. Image R. 24 (2013) 1213–1231

Table 1 Summary of local descriptors derived from texture analysis and successfully applied to facial recognition. Local descriptor

Advantages

Disadvantages

Comments

Gabor filters [15,14,12]

Well tested and recognized as local descriptor at a given scale and orientation

Computationally more demanding than most other pixel based local descriptors discussed in this study

Local binary pattern (LBP) [22–26]

Fast, easy to implement. Basic idea for other local descriptors. Easy to visualize. Invariant to pixels’ grey level transformations such as linear scaling or multiplication Allows to focus only on relevant subset of LBP labels which are most common in real images. The description produced is shorter than for original LBP

Capture only local variance. May be sensitive to noisy pixels

Until recently, only magnitude part of Gabor filter response was considered. The phase response was regarded as less useful due to its rapid changes with displacements The first local binary pattern successfully applied to face recognition

Uniform Local Binary Pattern (LBPu2) [22]

Circular LBP (LBPP,R) [22]

Improved LBP (ILBP) [31]

Multi-scale Block LBP (MBLBP) [37,39], General Region Assigned to Binary (GRAB) [42] Local Ternary Pattern (LTP) [40], Derivative Local Ternary Pattern (DLTP) [41] Three Patch Local Binary Pattern (TPLBP) [44], Four Patch Local Binary Pattern (FPLBP) [44] Dominant LBP (DLBP), Statistically Effective LBP (SE-LBP) [45,46]

Allows to calculate the local patterns based on a given number of neighbors (P) angularly evenly distributed along the circle of radius R, thus allows for flexible adjustment of size and number of neighbors. Supports limited multi-resolution analysis by applying LBPP,R with different R It can limit the influence of noise by thresholding the pixel values against the average value of all neighbors including the central pixel It can limit the influence of noise by comparing the average values. By changing the block size a kind of multi-resolution analysis can be performed. MBLBP can be hybridized with other local patterns, e.g. circular The use of three values (1, 0, 1) instead of two (0, 1) to encode the neighboring pixels’ differences, makes the method less susceptible to noise They represent a distinct approach than other local patterns by utilizing the distance measure between the vectors of pixels values. This captures the local diversity in a novel way Allows to select the most common patterns. Shorter description is produced

Rotation invariant LBP [8], Centersymmetric LBP (CSLBP) [47] Volume LBP (V-LBP), Three Orthogonal Planes LBP (LBPTOP) [29,48]

Can be more resistant to geometrical displacement of images such as rotation (rotation invariant LBP). May produce shorter histograms (CS-LBP) Allow to capture dynamic (temporal) relationships, thus extends the LBP into third dimension. It was shown to work well for facial expression recognition

Weber Local Descriptor (WLD) [61]

Based on psychological law of human response to changing stimulus may be more adequate to simulate human-like image perception

If the images processed are not original images but already processed ones (such as Gabor magnitude maps) the original definition of uniform patterns may not be relevant and may not determine the subset of most common patterns Some neighbours’ values must be interpolated (most commonly by bilinear interpolation) which is more computationally demanding

Instead of using the original definition of original uniformity, some researchers propose to estimate the subset of patterns which are most common in a given image set by means of statistical methods

Counting of the average values requires more computations, however it can be efficiently implemented by means of integral image Counting of the average values requires more computations (integral image can be used to ease the computation)

Taking into account the average values instead of original values is an idea utilized also in other local patterns (such as Multiscale Block LBP) The average values are calculated unlike in ILBP by calculating average values of subsquares. There are several average values which are compared while in the case of ILBP there is one average value and the pixel values are thresholded against that value Can also be performed on averaged values instead of original pixel values

It is not strictly invariant to pixels’ grey level transformation

The idea of placement of neighbors along the circle is utilized in other local patterns, such as Three Patch LBP

More difficult to implement than others

There is no theoretical evidence on why thresholding distance measures instead of pixel values or their averages should work better

There is a need to choose suitable statistical estimation methods. The final results depend also on the cardinality of the subset of selected patterns Rotation invariance may work better for texture analysis than for facial recognition – many important facial features may depend on the orientation Not directly usable for static images

This step is rather seldom used; only few works report on this approach. The final results seem to depend more on the choice of a type of local pattern The quality of rotation invariant local descriptors for facial recognition has not been widely explored

In the original work, a 2D histogram was used which makes the method more difficult to implement

Other functions, such as sigmoid, are also possible, however arctan was presented as producing good results [61]. The orientation component of the WLD is based on the gradient direction of the current pixel and is used to quantize the differential excitation values. Two-dimensional histogram is formed for different ranges of orientation values and different ranges of differential excitation. Additionally, the WLD descriptor does not have to have fixed parameters such as the size of the neighborhood. Similar to LBPP,R the WLDP,R descriptor can sample the P neighbors from the neighborhood of radius R.

It can be reformulated to work not on temporal image sequences but rather on multiple filtered images resulting from a single image, for example, processed by filters with different orientations or scales [48] WLD has not yet been tested on face recognition, however, it was shown to bring sound results on facial detection

Beside the WLD descriptor, it is possible to consider other, recently proposed local descriptors, which have not yet been tested for facial recognition. We can refer here to contrast context histogram [62], LBPV [63] or local descriptor based on Radon transform [45] which have been reported to yield highly discriminative features in texture analysis. 2.6. Summary Many local descriptors presented so far, expresses in fact the local variance of the pixel neighborhood and in many cases exhibit

1224

M. Bereta et al. / J. Vis. Commun. Image R. 24 (2013) 1213–1231

Table 2 Summary of local descriptors used with Gabor filtered images (including specialized operators designed for Gabor phase images). Local descriptor

Advantages

Disadvantages

Comments

Local Gabor Binary Pattern (LGBP) [49,51], Multi-resolution Histograms of Local Variation Patterns (MHLVP) [49], Local Gabor Binary Pattern Histogram Sequence (LGBPHS) [21] Multi-resolution Uniform Local Binary Patterns on Gabor magnitude maps (MULGBP) [50]

They provide more discriminative features (compared to LBP on grey level pixel values) by utilizing images already processed by Gabor filters

They use only magnitude parts of the Gabor responses while newer research show that superior results can be obtained by incorporating also the phase part of Gabor response

They were the first approaches on combining ideas of LBB and Gabor filtering and their satisfactory results motivated further research bringing state-of-the-art results

Multi-resolution analysis may capture more discriminative features

Multi-resolution produces longer descriptions

Gabor Volume Three Orthogonal Planes LBP (GV-LBPTOP) [48]

It measures the variation patterns of Gabor magnitude maps in three dimensions, where the third dimension is considered to be the scale or orientation of the Gabor filter, thus providing potentially more discriminative features Make use of derivative local patterns which may be up to the 4th order in several directions

The third dimension is not a spatial dimension and requires additional parameters adjustments (e.g. optimal radius of the neighborhood in scale/ orientation dimension may differ from the spatial neighborhood radius)

The ability of multi-resolution analysis on Gabor filtered images to increase the classification accuracy has not been completely tested The results reported in[48] showed that in the case of Gabor filtered images the third dimension formed by different orientations works better than the one along the different scales of Gabor filters

Exploit phase information LGBP (ELGBP) [55]

Phase information encoded in such form seems to be complementary to the description gained from magnitude part of Gabor responses

Histogram of Gabor Phase Patterns (HGPP) [18]

By utilizing QBC coding [56] scheme the Gabor phase is utilized in a way to limit its rapid changes and yet to provide highly discriminative features

Other researchers shown that the phase part of Gabor response should be treated with special descriptors due to its rapid spatial changes. Thus, this approach to Gabor phase seems to be suboptimal This approach is slightly more difficult to implement in comparison with the realization of other descriptors. It also produces long description

Local Gabor Phase Difference Pattern (LGPDP) [20]

It is a very straightforward way to capture local phase variations, easy to implement providing highly discriminating features. Despite its simplicity, the results reported are very promising Very easy to implement, yet providing state-of-the-art results

Multi-band Gradient Component Pattern (MGCP) [52], Local Derivative Pattern (LDP) [53]

Local Gabor XOR Pattern (LGXP) [19]

Monogenic Binary Pattern (MBP) [57]

Combines information about both magnitude and phase in one pixel label. Produces shorter description, which still utilizes both magnitude and phase

gradient-like behavior. One possible direction of further research could involve the application of steerable filters [64,65], which are able to capture derivative information of higher order from Gabor filtered images. The approach presented in [53] can be extended in order to capture the derivative information from an extended neighborhood. Another question is about which descriptors can offer discriminative information being complementary to others, i.e., a combination of which will work better together than when being considered separately. This is still an open question. However, empirical results can shed some light on this important issue. The problem is in the resulting length of the description. Most of the presented descriptors have been tested individually. There are some preliminary observations showing the advantages of combining different descriptors. For example, the results have been improved when both Gabor magnitude and phase have been used en block. Another argument for designing hybrid system based on several descriptors is that Gabor filters, which can be regarded as very good descriptors on their own, seem to bring much more discriminating features when further processed by LBP-like operators.

The local derivative patterns of order higher than 4 do not improve the results as they become sensitive to noise

The maximum difference between two neighboring phase values has been set arbitrary and may not be optimal for different sets of images

Much of the success has to be attributed to the application of LDA analysis completed on the top of the sub-region histograms of local patterns To produce shorter description uniform patterns are used. It should be noted that in some cases the set of most common patterns should be estimated by means of statistical analysis based on the training set

One possible research direction could be testing of other gradient-like operators such as steerable filters which employ derivatives of Gaussian and operates on a bigger neighborhood This first attempt to utilize Gabor phase for constructing LBP histogram patterns has an important influence and motivated further research on local patterns of Gabor phase maps In fact, HGPP consists of two patterns, Global Gabor Phase Pattern (GGPP) and Local Gabor Phase Pattern (LGPP). LGPP utilizes XOR operator called Local XOR Pattern (LXP) LGPDP is a very good descriptor if one takes into account its simplicity in relation to the quality of the reported results

In LGXP, phases are first quantized into different ranges, this, however, should not be confused with QBC coding This approach has not been tested on Gabor magnitude and phase, thus its usefulness with conjunction with Gabor filters is difficult to assess

Although the final success of each face recognition technique depends on all processing steps (say, image acquisition, normalization, feature extraction and selection, similarity/dissimilarity measure, etc.), the recent research, discussed in this section, suggests that the approaches to face recognition (especially authentication and verification) based on the local descriptors are among the best, state-of-the-art approaches and they are likely to bring further improvements in the field. Finally, Tables 1 and 2 summarize the most commonly used and effective local descriptors. Our intent is to contrast them by pointing out some of their advantages and limitations. A number of criteria, which have been used in this comparative analysis include:       

classification rates reported for a given descriptor easiness of implementation and flexibility to use computation overhead support of multi-resolution and multi-scale analysis invariance to grey level and geometrical transformation length of the description produced robustness against the noise

M. Bereta et al. / J. Vis. Commun. Image R. 24 (2013) 1213–1231

1225

Table 3 Other local descriptors used in general object recognition or texture analysis not showing satisfactory results in facial recognition or not being fully tested or experimented with. Local descriptor SIFT [58], Edge Orientation Histogram (EOH) [59], Histogram of Gradients (HOG) [13,60]

Advantages

Disadvantages

Comments

These descriptors have not been shown as being able to outperform other descriptors reported in this study These descriptors have not been tested in facial recognition problems so far. However, they produced promising results in other problems such as face detection or texture analysis. This may suggest their potential for face recognition tasks

Weber Local descriptor (WLD) [61], Contrast Context Histogram [62], LBP Variance (LBPV) [63], Radon transform based local descriptor [45]

Fig. 12. Taxonomy of local descriptors discussed in the study.

 type of variance captured (2D, 3D)  type of coding used, e.g. LBP-like labels, quantization (e.g. QBC), derivative patterns etc.  ability to capture more than one type of image description (such as Gabor magnitude and phase)  versatility of the descriptor (e.g. whether it should work well independently on the image data, e.g. pixels grey level values or Gabor magnitude or phase maps)  number and ease of parameters adjustment  additional processing required, such as bilinear interpolation, statistical estimation, etc. Table 3 provides a list of descriptors, which either have not shown to be very successful so far in facial recognition or have not been extensively experimented with. Fig. 12 offers taxonomy of the discussed local descriptors. Many of the local descriptors are similar to each other in some sense while differing according

to other criteria. Thus most of them could be classified into more than one category. This taxonomy takes into account the most discriminative characteristics, which differ a given group from all others. Although MBLBP and GRUB have been explicitly placed into two categories (operating on averaged values and multi-resolution/multi-scale) many of other descriptors were allocated only to a single group to keep the taxonomy as transparent as possible. 2.7. Applicability to facial recognition Many types of local descriptors proposed for texture analysis are designed to be rotation invariant. While this is essential in image processing, this feature is not beneficial in face authentication problems as the faces are considered to be previously processed and geometrically normalized (usually according to the positions of eyes which is assumed to be known following the previous processing step).

1226

M. Bereta et al. / J. Vis. Commun. Image R. 24 (2013) 1213–1231

One may argue that the local descriptions in the sense presented here, do not take into account the global features of the face such as its shape. This claim is not completely valid. As explained in [66], the shape and global arrangement of the parts of the face such as eyes, nose, etc. are implicitly taken into consideration. Note that if the faces are geometrically normalized (e.g. aligned according to the location of eyes) the facial parts and the shape of faces of different persons are automatically positioned in slightly different coordinates. There is of course some variation resulting from the different expressions, however, the differences resulting from personal characteristics strongly influence the description and thus the final classification.

images. There are several possibilities on how to further proceed with such partial image descriptions in order to complete the final matching. In face authentication problems, the goal is to find for a given unknown image an image most similar from the gallery (i.e., known) images. Differently from that, in face verification problem the goal is to decide whether the unknown image is in fact the image of the person whose identity has been declared. In general, this is done by thresholding the similarity value and rejecting all images whose similarity with a declared person falls below this predefined threshold. Nevertheless, in both situations the final similarity measure has to combine the histogram information from all sub-regions. There are several options:

3. Similarity measures, feature weighting and fusion of local patterns

1. All sub-regional histograms can be concatenated in order to form the final, long histogram. Any histogram similarity measure can be used to match the images. This approach, however, is not recommended as it generates very long descriptions, thus it will suffer from the curse of the dimensionality. 2. The histograms coming from corresponding sub-regions of two images are compared pair-wise and the results are aggregated. The most common and easiest way to do this is the sum rule:

3.1. Similarity measures In face recognition, most of the design effort has been focused on the development of sound description which is often followed by a dimensionality reduction technique such as, e.g., PCA or LDA. The final image matching or/and comparison is usually realized by using a simple Euclidean metric or other distance measures such as cosine, correlation, Manhattan and others. Because of the specificity of the histogram based feature vectors, more specialized similarity/dissimilarity measures are used. The most frequently used are [22]:  Histogram intersection – it sums up the common part of the two histograms H1 and H2. It is a similarity measure in the sense that the higher the value of the measure, the more similar the two images are:

DðH 1 ; H 2 Þ ¼

X

minðH1;i ; H2;i Þ

i

where i is taken over all bins of the histograms.  Chi square statistics (v2), serves as a dissimilarity measure – the lower the value, the more similar the two images are:

v2 ðH 1 ; H 2 Þ ¼

X ðH1;i  H2;i Þ2 H1;i þ H2;i

i

 Log-likelihood statistics, which serves as a similarity measure. This measure is used less frequently in comparison with the ones discussed above. Normalization of the histogram is required.

LðH 1 ; H 2 Þ ¼ 

X H1;i logðH2;i Þ i

3.2. Fusing sub-region local pattern descriptions In case of local descriptors applied to face recognition the usual way of developing the global description is first to form the sub-region histograms and then concatenate them to form the final description. The key point here is the size (and thus the number) of the sub-regions. While this choice is very much problem specific and depends also on the normalization applied to the given face image set, many authors reported that the general approach is quite robust and not affected by the values of this parameter. Clearly, the number of sub-regions should not be too small as such histograms will not be able to capture properly the spatial relationships specific to the face images. On the other hand, too small subregions tend to produce statistically unreliable histograms. In most of the methods described in this paper there are many regional histogram describing each sub-region of the original

Sfinal ðI1 ; I2 Þ ¼

M 1X SðHjI1 ; HjI2 Þ M j¼1

where S is any of the previously given histogram similarity (i.e., histogram intersection, chi-square or log likelihood statistics), j iterates over all the sub-regions of all the pattern maps of the images I1 and I2. It is a common approach to incorporate weights of the individual regions in the form:

Sfinal ðI1 ; I2 Þ ¼

M X wj SðHjI1 ; HjI2 Þ j¼1

where wj reflects the importance of a certain region of the given pattern map. There may be many strategies to adjust (optimize) the weights. One simple approach is to recall the known fact that different regions of the face exhibit different levels of importance in facial recognition realized by humans, for example eyes and nose areas are known to be more important than others [22]. If a training set is available, each sub-region can be evaluated on how it can discriminate the faces on its own. Higher weights are applied to regions with higher classification rates [23]. Fisher criterion is also used to adjust the weights [21]. The Fisher analysis is performed on the sub-region histograms level, not on the histograms bins level, i.e., for jth sub-region the mean and the variance of the intrapersonal and extra-personal similarities can be computed and the weight for that region is set so that regions which exhibit good discriminatory characteristics are promoted [21]. One can envision a number of other alternatives. Following [19] we can consider a situation in which there are many types of local descriptors applied to the images and for each sub-region there is a histogram resulting from each type of the descriptor. Let us assume that two different descriptors are used, LGBP_mag (local Gabor binary pattern – only on Gabor magnitude maps) and LGXP (local Gabor XOR pattern) [19]. For each sub-region there are two histograms, one resulting from LGBP_mag (HLGBP_mag) and the second one resulting from LGXP (HLGXP). In [19] two possible approaches to fuse these two histogram types were envisioned: 1. Feature-level fusion. For each jth sub-region, two histogram types are concatenated to form a single histogram for each image. We have

Hjreg

con;I1

¼ ½HjLGBP

j mag;I1 ; HLGXP;I1 

M. Bereta et al. / J. Vis. Commun. Image R. 24 (2013) 1213–1231

for image I1 and

Hjreg con;I2

¼

½HjLGBP mag;I2 ; HjLGXP;I2 

for image I2, where reg_con stands for the concatenation of histograms resulting from a given sub-region. Such histograms are further concatenated with histograms resulting from other subregions in order to form the final description. The final similarity measure is calculated as:

Sfinal ðI1 ; I2 Þ ¼

M 1X SðHjreg M j¼1

j con;1 ; H reg con;I2 Þ

2. Score-level fusion. Instead of concatenating the histograms resulting from different local descriptors, the similarity measures are calculated for each of the descriptor types separately and the scores are then combined, possibly by weighting the individual scores. To continue the example with two local descriptors, the two scores

SjLGBP

mag ðI1 ; I 2 Þ

¼ SðHjLGBP

j mag;I1 ; HLGBP mag;I2 Þ

and

SjLGXP ðI1 ; I2 Þ ¼ SðHjLGXP;I1 ; HjLGXP;I2 Þ can be combined to form the jth sub-region score

Sj ðI1 ; I2 Þ ¼ w  SjLGXP ðI1 ; I2 Þ þ ð1  wÞ  SjLGBP

mag ðI1 ; I 2 Þ

which are used in the sum rule or weighted sum rule to calculate Sfinal. The weight w balances the importance of one type of descriptor over the other. This approach can be easily extended to more than two types of descriptor. Note that the weight w is of different type that the one used to weight sub-regions. This allows for much flexibility in adjusting the model, however, it requires more computing and may not guarantee any significant improvement. Although the image descriptions produced on the basis of local descriptors such as LBP are in most cases histogram-like, the described similarity measures are not always used. In [59,19] cosine distance measure has been used. In case of [19] its usage is motivated by the fact that histogram data is further processed by special LDA, thus the features that are compared during the matching step are not purely histogram bin values. 3.3. Hybridization with other methods and avoidance of the curse of the dimensionality Applying several local descriptor tends to produce over-complete representation. There are several possible ways to deal with this difficulty, which not only imposes the computational demands but also can affect the classification accuracy. One directions of possible solutions is the application of already mentioned subspace analysis [19]. Another approach is to use boosting such as AdaBoost [67,68] or similar learning schemes to select the attributes which best suit for the purpose of classification. The word ‘‘attributes’’ does not have to be taken literally. In [37] a weak classifier is learnt based on the dissimilarity of the two corresponding histogram bins. However, one can easily modify this approach and prepare the base classifiers based on sub-region histograms. This would allow to evaluate not particular histogram bins but rather sub-region histograms of a given pattern map of a given local descriptor, which would additionally allow to test which descriptors are better in a given image sub-region. Which approach is better should be in general answered based on empirical results. Boosting with local patterns was also used in [24] and [69] for face recognition, in [70] for facial expression recognition, in [71] and [72] for gender

1227

classification. In [73] AdaBoost learning was combined with recent local descriptor, HGPP (histogram of Gabor phase patterns), and applied to face verification. Many boosting algorithms are defined for two-class problems, similarly to the basic formulation of SVM algorithm. Although there exist modifications of these algorithms, which allow handling multi-class classification problems, they are not particularly useful in face recognition problems. One reason is the large number of classes (individuals). Additionally, in a face recognition system, new persons can be added to the database and the multiclass classifier should be re-learnt in order to discriminate the new classes (new persons). One solution would be to train one such classifier for each class treating all the other classes (persons) as the other class. This, however, is much too expensive computationally. The solution that is frequently applied in facial recognition problems is to transform multi class problem into two class problem by means of the so called feature difference space which models dissimilarities between face images [74]. The new training set is formed by combining dissimilarities between images of the same person (intra-personal space) and dissimilarities between images of different persons (extra-personal space) and only one boosting or SVM classifier is trained. One drawback of this approach is that the number of extra-personal dissimilarities is much higher than that of intra-personal dissimilarities. In a case of N images captured   N for each of P persons, there are P intra-personal differences 2     N NP P extra-personal differences. For example, for and 2 2 200 images of 100 persons (two images per person) we have only 100 intra-personal differences and 19,800 extra-personal differences and the classifier training task becomes ill-posed. If there is only one picture of a given person in the database, this approach cannot be applied. Another solution to high-dimensional feature vectors generated by the approach based on histograms of local patterns can involve methods of classifier combination that are more sophisticated than a simple sum rule presented in previous section. A concatenation of histograms resulting from different sub-regions or from different local descriptors can be regarded as feature combination method. The sum rule can be included among the classifier combination methods. Other methods could be used such as e.g., majority voting. Also boosting can be seen as a classifier selection and combination method. In [66] Borda count [75], a well-known election scheme, was tested and compared with histogram concatenation approach. The results showed that the Borda count is less vulnerable to the curse of dimensionality; refer to Table 1. Additional advantage of the classifier combination such as Borda count is that they do not require training sets as in case of boosting, which also can be seen as a classifier combination method. The results of face recognition methods based on sub-regional histograms of local patterns can be further improved by hybridizing with other methods such as LDA or boosting. However, these methods are applicable only if there is a training set in which many face images per person are desirable. Similar situation occurs for the problem of weight adjustment discussed in the previous subsection. Fisher criterion and other discussed approaches require a training set.

4. Experimental results In this section, we summarize some empirical results reported in the literature on facial recognition based on local descriptors. The intent is to look at possible improvements offered by the individual methods. However, one needs to be aware that these results are not directly comparable given the diversity present at various

1228

M. Bereta et al. / J. Vis. Commun. Image R. 24 (2013) 1213–1231

preprocessing steps, different parameters’ settings (e.g., Gabor filter frequencies), different similarity measures, etc. Notwithstanding this diversity, a comparative analysis of this nature offers at least a sound semi-quantitative picture as to the usefulness and applicability of the existing methods. One of the most frequently used face databases is FERET database [8]. It is one of the largest publically available face databases. In the usual evaluation protocol, the gallery consists of 1196 images of 1196 persons, and there are four probe sets used for testing: Fb (1195 images of 1195 persons), Fc (194 images of 194 persons), Duplicate I (722 images of 243 persons) and Duplicate II (234 image of 75 persons). Probe set Fb tests the robustness against the different face expressions, Fc tests the influence of changing illumination, while both Duplicate I (dup1) and Duplicate II (dup2) test the robustness against aging, where dup2 is a more difficult subset of dup1. Although there are many face databases available for facial recognition testing, FERET has been considered here to illustrate the differences in relevance and effectiveness of the different methods as it is the most frequently referenced database. It would be hard to make a similar comparison for other databases for the same number of methods described in this paper as the other databases are not used so often as FERET. For that reason it should be not judged which of the local descriptors or which methodology is ultimately superior. However, the results summarized in Tables 4 and 5 can serve as a reference and offer a rough evaluation as to the potential of a certain method. What can be noticed after analyzing Table 5 is a large improvement of the results of the local descriptor-based methods when compared to the results obtained several years ago by methods such as PCA or LDA and the best results from FERET 97 (Table 4). What also should be mentioned is that although the methods referenced in Table 5 sometimes differ much in details, they all fall into the same general scheme discussed here. The differences among them can motivate further studies and the development of new techniques. Also, it can be observed that the general approach to face recognition based on the discussed framework in almost any of the particular exemplification, brings an improvement over the referenced more traditional methods (e.g. PCA, LDA, EBGM). This is especially visible in the case of Duplicate I and Duplicate II parts of FERET protocols – for example, for Duplicate II, a traditional PCA was able to achieve only 22% classification rate, while the recently proposed method based on Gabor phase and specialized local descriptor succeeded in 93%. It should be noted that there has been no such increase in classification rate reported for global based facial recognition methods. Several modifications of the basic LBP operator, which seem to be responsible for the improvement in the classification rate of the subsequent methods, can be pointed out:  Feature weighting  Application of LBP-like operators on Gabor filtered images  Utilization of both magnitude and phase parts of the complex Gabor response

Table 4 Results of traditional global and graph based methods for FERET database (shown are classification rates for four probe sets). Method

Fb

Fc

Duplicate I

Duplicate II

PCA [20] UMD LDA [20] USC EBGM [20] Best FERET’97 [8] HOG-EBGM [13]

85.0 96.2 95.0 96.0 95.5

65.0 58.8 82.0 82.0 81.9

44.0 47.2 59.1 59.0 60.1

22.0 20.9 52.1 52.0 55.6

Table 5 Results of different local descriptors for FERET database (shown are classification rates for four probe sets). Method

Fb

Fc

Duplicate I

Duplicate II

LBP [23] LBP weighted [23] LGBPHS [21] LGBPHS weighted [21] LGBP_PhaOnly [55] LGBP_PhaOnly weighted [55] ELGBP (Mag + Pha) [55] ELGBP (Mag + Pha)_weighted [55] V-LGBP-s [48] V-LGBP-so [48] V-LGBP-o [48] V-LGBP-ou2 [48] V-LGBP-ous [48] HGPP [18] HGPP_weighted [18] LGPDP [20] MGCP [52] Gabor filters + Borda count [66] LGBP_mag + LGXP [19]

93.0 97.0 94.0 98.0 93.0 96.0 97.0 99.0 97.0 98.0 97.0 97.0 97.0 97.6 97.5 97.3 97.4 99.5 99.0

51.0 79.0 97.0 97.0 92.0 94.0 96.0 96.0 99.0 99.0 99.0 98.0 98.0 98.9 99.5 97.2 97.3 99.5 99.0

61.0 66.0 68.0 74.0 65.0 72.0 77.0 78.0 77.0 79.0 80.0 77.0 80.0 77.7 79.5 79.8 77.8 85.0 94.0

50.0 64.0 53.0 71.0 59.0 69.0 74.0 77.0 71.0 75.0 77.0 74.0 77.0 76.1 77.8 77.5 73.5 79.5 93.0

 Including the third dimension in the form of the tensor composed of images filtered with different Gabor wavelet settings (scale and orientation)  Developing specialized local descriptors for phase part of Gabor filter response  Classification by means of committee voting  Combining histogram-like descriptors with subspace analysis (LDA)

5. Discussion After analyzing the general framework of local pattern based face recognition methods and after reviewing several most successful realizations of the model, some general comments can be drawn. Apart from the evident impressive empirical results obtained by these methods, several advantages of such approach can be enumerated. First of all, these models in their basic formulation do not require learning as their potential lies in the appropriate local descriptors being able to capture the local variations and to form the global image description based on these local features. However, a training set, if available, can be easily utilized for example to adjust weighted similarity measures or weighted classifier combinations. Training data also make it possible to further process the feature vectors by means of subspace analysis methods, such as LDA. As the local descriptors often are based on the difference between local intensity values, they offer robustness against the changes in illumination conditions. They can also be modified (e.g. by introducing averaging instead of pure intensity values) to be more robust against the noisy data. The general framework is open for including novel local descriptors and for combining the responses of several different descriptors, which can provide the complementary discriminative features (such as Gabor magnitude and Gabor phase or combination of LBP with different scales). Additionally, most of the local descriptors are easy to implement and fast to compute. On the other hand, using local descriptors tends to produce very long feature vectors, which are often over-completed. These shortcomings can be alleviated by applying feature selection by means of boosting or/and dimensionality reduction such as LDA (in case when a training set is available). Another concern would be that the general framework depends on the settings of several parameters such as the size and the number of the sub-regions into which the image is divided to produce the local histograms, number of

M. Bereta et al. / J. Vis. Commun. Image R. 24 (2013) 1213–1231

histogram bins, local descriptor type, similarity measure used or the feature/classifier combination scheme. These parameters are problem-dependent, however, the empirical results show a significant level of tolerance of the approach to many of these parameters – for example, the local pattern based methods seem to work well for a wide range of sub-regions numbers and histograms bins bringing only slight change in classification accuracy. The list of the local descriptors provided in this review is by no means complete. However, it provides the introduction into the most popular and well established group of local descriptors which presented themselves as useful tools in different facial recognition tasks. On the other hand, new descriptors are constantly developed. For example, in [76] Local Directional Pattern (LDP) was proposed which is based on the magnitude values of edge responses in eight directions for every pixel location. In [77] Binarized Statistical Image Features (BSIF) were described. The statistical nature of these features comes from the fact that their binary code is calculated by linearly projecting local image patches onto a subspace generated by independent component analysis. These example show that the research area is still developing and new ideas are constantly added. A certain important characteristic of the described image descriptors should be underlined. All of these descriptors require no learning procedure, thus each image is described by means of the same predefined procedure. This approach exhibits its advantages, such as simplicity. However, one should be aware of the possible alternatives. Another approach to calculation of image descriptors can be based on local dictionary learning [78–80]. These methods attempt to learn a set of local visual patterns based on the training set of images. Promising results have been reported. Additionally, by using the sparse coding described in [81] alternatives to histogram representation of image local features is proposed. Among the possible future research direction one could evidently consider the application of novel local descriptors, which may be influenced by the works on texture analysis. Such inspirations have shown to be reasonable after the successful pioneering application of LBP to facial recognition. It can be believed that another texture descriptors can also be useful. The most successful studies make use of Gabor filters to filter the images firstly and then apply the local descriptors which task is to capture the local variability of the resulting Gabor magnitude and/or phase map. Some local descriptors (e.g., local derivative patters) were designed especially for that purpose. It can be expected that some other variability measures can also be adopted for this purpose. Another ideas originating from texture analysis can be adopted to facial recognition. For example, the idea of coding of very long texture descriptions by means of codebook can be adopted to the specificity of facial recognition tasks in order to deal with the long feature vectors. Novel distance/similarity measures, classifier combining schemes, hybridization with statistical and machine learning techniques can bring further improvements in the results.

6. Conclusions This paper provides the basic knowledge about the main steps in building face recognition systems, starting from the presentation of the general scheme and building blocks, through presentation of different kinds of local descriptors and ending with the possible approaches to making the final decision by appropriate similarity/dissimilarity measures. The possible ways of combining the local descriptions into the global one are also presented. Many of the local descriptors presented have been first applied to texture analysis tasks, however they have been successfully used for facial

1229

recognition. Different than in global methods for face recognition, in the presented approach, local patterns are measured at every pixel of the given image. These local descriptors are the basis for building a sub-regions’ descriptions of the image by means of the histograms. Such sub-regions’ histograms are then concatenated in order to form the global image description. Three levels of locality can be distinguished, pixel level, at which a local pattern such as LBP is measure, sub-region level which provides the histogram of local patterns from the given sub-image, and the final global description resulting from the concatenation of the local histograms. One big advantage of the local pattern based methods is that they do not require a training step and novel classes of images (new persons in the database) can be easily added without the modification of the previously derived model. The cited empirical results shown that the methods based on this general framework provide state-of-the-art results. They also differ among each other as the presented general methodology is flexible and allows not only to use different local descriptors but also to develop hybrid systems utilizing statistical learning methods such as boosting or SVM and subspace analysis methods such as LDA. Different approaches to combining the local descriptions allow for further extensions of the framework. This paper can serve as an updated reference material and can be viewed as a starting point to study local pattern based facial recognition methods. In our opinion, the LBP is one of the most successful and promising directions in facial recognition research. Further studies may explore novel local descriptors as the core of the described methodology. Such descriptors could possibly be motivated by the recent research in texture analysis – one example of such being Weber Local Descriptor (WLD). However, recent research has shown that local descriptors especially designed for facial recognition can bring significant improvements. Examples of such descriptors specific to facial recognition are all the recently developed Gabor phase coders (HGPP, LGPDP, LGXP). The second main research direction for this group of facial recognition methods would be to explore novel ways of aggregating the local descriptions in order to deliver the global description. In addition to this, a similar task worth exploring would be to improve final decision making based not only on the final description but also on the decisions rendered by local classifiers. This is motivated by the fact that global descriptions resulting from the described methods tend to be very long and the curse of dimensionality limits the classification accuracy that can be achieved. The classifier aggregation methods applied to information gathered from sub-regions processed by descriptors of different kinds, may bring further improvements. This is suggested by successful applications of methods such as boosting or Borda count. The third research direction, which could bring additional benefits is to study the possible transformations, both linear and nonlinear, which applied to the histogram based description would transfer the classification task into another feature space making it more tractable. Preliminary results on application of block based LDA suggests that this approach may improve the results. One drawback of such step is the requirement of the training set which is not necessary in most methods based on local descriptors in their original formulation. This research direction also includes exploring different weighting schemas for sub-regional partial descriptions, which would possibly further improve the classifiers aggregation step.

Acknowledgment This work was supported in part by a Strategic Grant from the Natural Sciences and Engineering Research Council (NSERC) and the Canada Research Chair (CRC) program.

1230

M. Bereta et al. / J. Vis. Commun. Image R. 24 (2013) 1213–1231

References [1] W. Zhao, R. Chellappa, P.J. Phillips, A. Rosenfeld, Face recognition: a literature survey, ACM Computing Surveys 35 (4) (2003) 399–458. [2] M. Turk, A. Pentland, Eigenfaces for recognition, Journal of Cognitive Neuroscience 3 (1991) 71–86. [3] J. Yang, D. Zhang, A.F. Frangi, J. Yang, Two-dimensional PCA: a new approach to appearance-based face representation and recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence 26 (1) (2004) 131–137. [4] K. Etemad, R. Chellappa, Discriminant analysis for recognition of human face images, Journal of the Optical Society of America 14 (1997) 1724–1733. [5] J.R. Beveridge, K. She, B.A. Draper, G.H. Givens, A nonparametric statistical comparison of principal component and linear discriminant subspaces for face recognition, in: IEEE Computer Society Conference on Computer Vision and Pattern Recognition I, 2001, pp. 535–542. [6] P. Belhumer, P. Hespanha, D. Kriegman, Eigenfacse vs. fisherfaces: recognition using class specific linear projection, IEEE Transactions on Pattern Analysis and Machine Intelligence 19 (7) (1997) 711–720. [7] M.S. Bartlett, J.R. Movellan, T.J. Sejnowski, Face recognition by independent component analysis, IEEE Transaction on Neural Networks 13 (6) (2002) 1450– 1464. [8] P.J. Phillips, H. Moon, S.A. Rizvi, P.J. Rauss, The FERET evaluation methodology for face recognition algorithms, IEEE Transactions on Pattern Analysis and Machine Intelligence 22 (10) (2000) 1090–1104. [9] T. Zhang, D. Tao, J. Yang, Discriminative locality alignment, in: D. Forsyth, P. Torr, A. Zisserman (Eds.), Computer Vision – ECCV 2008, Springer, Berlin Heidelberg, 2008, pp. 725–738. [10] Y. Mu, D. Tao, X. Li, F. Murtagh, Biologically inspired tensor features, Cognitive Computation 1 (4) (2009) 327–341. [11] W. Bian, D. Tao, Face subspace learning, in: S.Z. Li, A.K. Jain (Eds.), Handbook of Face Recognition, Springer London, London, 2011, pp. 51–77. [12] L. Wiskott, J.M. Fellous, N. Kuiger, C. von der Malsburg, Face recognition by elastic bunch graph matching, IEEE Transactions on Pattern Analysis and Machine Intelligence 19 (7) (1997) 775–779. [13] A. Albiol, D. Monzo, A. Martin, J. Sastre, A. Albiol, Face recognition using HOG– EBGM, Pattern Recognition Letters 29 (2008) 1537–1543. [14] L. Shen, L. Bai, A review on Gabor wavelets for face recognition, Pattern Analysis and Applications 9 (2–3) (2006) 273–292. [15] Á. Serrano, I.M. de Diego, C. Conde, E. Cabello, Recent advances in face biometrics with Gabor wavelets: a review, Pattern Recognition Letters 31 (2010) 372–381. [16] H. Yanbin, Y. Jianqin, L. Jinping, Human face feature extraction and recognition based on SIFT, in: International Symposium on Computer Science and Computational Technology, ISCSCT ‘08, 2008, pp. 719–722. [17] M. Bicego, A. Lagorio, E. Grosso, M. Tistarelli, On the use of SIFT features for face authentication, in: Conference on Computer Vision and Pattern Recognition Workshop, CVPRW ‘06, 2006, pp. 35–35. [18] B. Zhang, S. Shang, X. Chen, W. Gao, Histogram of Gabor Phase Pattern (HGPP): a novel object representation approach for face recognition, IEEE Transactions on Image Processing 16 (1) (2007) 57–68. [19] S. Xie, S. Shan, X. Chen, J. Chen, Fusing local patterns of Gabor magnitude and phase for face recognition, IEEE Transactions on Image Processing 19 (5) (2010) 1349–1361. [20] Y. Guo, Z. Xu, Local Gabor phase difference pattern for face recognition, in: 19th International Conference on Pattern Recognition (ICPR 2008), 2008, pp. 1–4. [21] W. Zhang, S. Shan, W. Gao, X. Chen, H. Zhang, Local Gabor binary pattern histogram sequence (LGBPHS): a novel non-statistical model for face representation and recognition, in: Proceedings of the IEEE International Conference on Computer Vision, vol. I, 2005, pp. 786–791. [22] T. Ahonen, A. Hadid, M. Pietikainen, Face recognition with local binary patterns, in: Proceedings of the Eighth European Conference on Computer Vision, Lecture Notes in Computer Science, 3021, Springer, 2004, pp. 469–481. [23] T. Ahonen, A. Hadid, M. Pietikäinen, Face description with local binary patterns: application to face recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence 28 (12) (2006) 2037–2041. [24] G. Zhang, X. Huang, S.Z. Li, Y. Wang, X. Wu, Boosting local binary pattern (LBP)based face recognition, in: Proceedings of the Advances in Biometric Person Authentication, Lecture Notes in Computer Science, 3338, Springer, 2004, pp. 179–186. [25] Y. Rodriguez, S. Marcel, Face authentication using adapted local binary pattern histograms, in: Proceedings of the Ninth European Conference on Computer Vision, Lecture Notes in Computer Science, 3954, Springer, 2006, pp. IV321– IV332. [26] T. Ojala, M. Pietikainen, T. Maenpaa, Multiresolution gray-scale and rotation invariant texture classification with local binary patterns, IEEE Transactions on Pattern Analysis and Machine Intelligence 24 (7) (2002) 971–987. [27] S. Marcel, Y. Rodriguez, G. Heusch, On the recent use of local binary patterns for face authentication, International Journal on Image and Video Processing Special Issue on Facial Image Processing, Idiap-RR-34, 2006. [28] L. Zhang, R. Chu, S. Xiang, S. Liao, S.Z. Li, Face detection based on multi-block LBP representation, in: S.-W. Lee, S.Z. Li (Eds.), ICB 2007, Lecture Notes in Computer Science, 4642, Springer, 2007, pp. 11–18.

[29] G. Zhao, M. Pietikainen, Dynamic texture recognition using local binary patterns with an application to facial expressions, IEEE Transactions on Pattern Analysis and Machine Intelligence 29 (6) (2007) 915–928. [30] G. Heusch, Y. Rodriguez, S. Marcel, Local binary patterns as an image preprocessing for face authentication, in: IEEE International Conference on Automatic Face and Gesture Recognition, 2006, pp. 9–14. [31] H. Jin, Q. Liu, H. Lu, X. Tong, Face detection using improved LBP under Bayesian framework, International Conference on Image and Graphics, Hong Kong, China, 2004, pp. 306–309. [32] R. Zabih, J. Woodfill, Non-parametric local transforms for computing visual correspondence. In Proceedings of the third European conference on Computer Vision (ECCV ’94), Jan-Olof Eklundh (Ed.). Springer-Verlag New York, Inc., Secaucus, NJ, USA, vol. 2, 1994, pp. 151–158. [33] B. Froba, A. Ernst, Face detection with the modified census transform, in: Proceedings of the Sixth IEEE International Conference on Automatic Face and Gesture Recognition, 2004, pp.91–96. [34] X. Wang, H. Xu, H. Wang, H. Li, Robust real-time face detection with skin color detection and the modified census transform, in: International Conference on Information and Automation, (ICIA 2008), 2008, pp. 590–595. [35] D. Han, J. Choi, High-Performance Real-Time Face – Detection Architecture for HCI Applications, in: International Symposium on Ubiquitous Virtual Reality (ISUVR 2010), 2010, pp. 48–51. [36] W.-H. Yun, H.-S. Yoon, D.-H. Kim, S.-Y. Chi, Robust face recognition using the modified census transform, in: International Symposium on Communications and Information Technologies, (ISCIT ‘07), 2007, pp. 749–752. [37] S. Liao, X. Zhu, Z. Lei, L. Zhang, S. Z. Li, Learning multi-scale block local binary patterns for face recognition, in: S.-W. Lee, S.Z. Li (Eds.), ICB 2007, LNCS, vol. 4642, 2007, pp. 828–837. [38] P. Viola, M. Jones, Robust real-time object detection, International Journal of Computer Vision 57 (2) (2004) 137–154. [39] C.-H. Chan, J. Kittler, K. Messer, Multi-scale local binary pattern histograms for face recognition, in: S.-W. Lee, S.Z. Li (Eds.), ICB 2007, LNCS, vol. 4642, 2007, pp. 809–818. [40] X. Tan, B. Triggs, Enhanced local texture feature sets for face recognition under difficult lighting conditions, AMFG 2007, LNCS, vol. 4778, 2007, pp. 168–182. [41] A. Bendada, M.A. Akhloufi, Multispectral face recognition in texture space, in: Canadian Conference Computer and Robot Vision, 2010, pp. 101–106. [42] A. Sapkota, B. Parks, W. Scheirer, T. Boult, FACE-GRAB: Face recognition with general region assigned to binary operator, in: IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2010, pp. 82–89. [43] A.A. Mohamed, R.V. Yampolskiy, Adaptive Extended Local Ternary Pattern (AELTP) for recognizing avatar faces, in: 11th International Conference on Machine Learning and Applications, 2012, pp. 57–62. [44] L. Wolf, T. Hassner, Y. Taigman, Descriptor based methods in the wild, in: Faces in Real-Life Images Workshop in ECCV, 2008. [45] G. Liu, Z. Lin, Y. Yu, Radon representation-based feature descriptor for texture classification, IEEE Transactions on Image Processing 18 (5) (2009) 921–928. [46] S. Liao, M.W.K. Law, A.C.S. Chung, Dominant local binary patterns for texture classification, IEEE Transactions on Image Processing 18 (5) (2009) 107–1118. [47] M. Heikkila, M. Pietikainen, C. Schmid, Description of interest regions with center-symmetric local binary patterns, in: Fifth Indian Conference on Computer Vision, Graphics and Image Processing, 2006, pp. 58–69. [48] S. Xie, S. Shan, X. Chen, W. Gao, V-LGBP: Volume based local Gabor binary patterns for face representation and recognition, in: 19th International Conference on Pattern Recognition (ICPR 2008), 2008, pp. 1–4. [49] W. Zhang, S. Shan, H. Zhang, W. Gao, X. Chen, Multi-resolution histogram of local variation patterns (MHLVP) for robust face recognition, in: T. Kanade, A. Jain, N.K. Ratha (Eds.), AVBPA 2005, LNCS, vol. 3546, 2005, pp. 937–944. [50] B. Jun, H. S. Lee, J. Lee, D. Kimy, Statistical face image preprocessing and nonstatistical face representation for practical face recognition, in: IEEE International Symposium on Signal Processing and Information Technology (ISSPIT), 2009, pp. 392–397. [51] W. Zhang, S. Shan, X. Chen, W. Gao, Local Gabor binary patterns based on Kullback–Leibler divergence for partially occluded face recognition, IEEE Signal Processing Letters 14 (11) (2007) 875–878. [52] Y. Guo, J. Chen, G. Zhao, M. Pietikäinen, Z. Xu, Multi-band Gradient Component Pattern (MGCP): a new statistical feature for face recognition, Lecture Notes in Computer Science 5575 (2009) (2009) 229–238. [53] B. Zhang, Y. Gao, S. Zhao, J. Liu, Local derivative pattern versus local binary pattern: face recognition with high-order local pattern descriptor, IEEE Transactions on Image Processing 19 (2) (2010) 533–544. [54] L. Qing, S. Shan, X. Chen, W. Gao, Face recognition under varying lighting based on the probabilistic model of Gabor phase, in: Proceedings of International Conference on Pattern Recognition (ICPR), vol. 3, 2006, pp. 1139–1142. [55] W. Zhang, S. Shan, X. Chen, W. Gao, Are Gabor phases really useless for face recognition?, Pattern Analysis and Applications 12 (3) (2009) 301–307 [56] J. Daugman, How Iris Recognition Works, in: Proceedings of the International Conference on Image Processing, vol. 1, 2002, pp. 33–36. [57] M. Yang, L. Zhang, D. Zhang, Monogenic Binary Pattern (MBP): a novel feature extraction and representation model for face recognition, in: 20th International Conference on Pattern Recognition, 2010, pp. 2680–2683. [58] D.G. Lowe, Distinctive image features from scale-invariant keypoints, International Journal of Computer Vision 60 (2) (2004) 91–110.

M. Bereta et al. / J. Vis. Commun. Image R. 24 (2013) 1213–1231 [59] S. Yan, H. Wang, X. Tang, T. Huang, Exploring Feature Descriptors for Face Recognition, in: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2007, vol. I, pp. 629–632. [60] A. Stormer, G. Rigoll, Learning weighted similarity measurements for unconstrained face recognition, in: 16th IEEE International Conference on Image Processing (ICIP), 2009, pp. 61–64. [61] J. Chen, S. Shan, C. He, G. Zhao, M. Pietikainen, X. Chen, W. Gao, WLD: a robust local image descriptor, IEEE Transactions on Pattern Analysis and Machine Intelligence 32 (9) (2010) 1705–1720. [62] C.R. Huanga, C.S. Chena, P.C. Chung, Contrast context histogram – an efficient discriminating local descriptor for object recognition and image matching, Pattern Recognition 41 (2008) 3071–3077. [63] Z. Guo, L. Zhang, D. Zhang, Rotation invariant texture classification using LBP variance (LBPV) with global matching, Pattern Recognition 43 (2010) 706–719. [64] W.T. Freeman, E.H. Adelson, The design and use of steerable filters, IEEE Transactions on Pattern Analysis and Machine Intelligence 13 (9) (1991) 891– 906. [65] M. Jacob, M. Unser, Design of steerable filters for feature detection using canny-like criteria, IEEE Transactions on Pattern Analysis and Machine Intelligence 26 (8) (2004) 1007–1019. [66] J. Zou, Q. Ji, G. Nagy, A comparative study of local matching approach for face recognition, IEEE Transactions on Image Processing 16 (10) (2007) 2617–2628. [67] R. Meir, G. Rätsch, An introduction to boosting and leveraging, in: Advanced Lectures on Machine Learning (LNAI2600), 2003, pp. 118–183. [68] R.E. Schapire, The boosting approach to machine learning: an overview, in: D.D. Denison, M.H. Hansen, C. Holmes, B. Mallick, B. Yu, (Eds.), Nonlinear Estimation and Classification, Springer, 2003, pp. 149–172. [69] B.J. Boom, R.T.A. van Rootseler, R.N.J. Veldhuis, Investigating the boosting framework for face recognition, in: Proceedings of the 28th Symposium on Information Theory in the Benelux, 24–25 May 2007, Enschede, The Netherlands, 2007, pp. 189–196. [70] G. Zhao, M. Pietikäinen, Boosted multi-resolution spatiotemporal descriptors for facial expression recognition, Pattern Recognition Letters 30 (2009) 1117– 1127.

1231

[71] R. Verschae, J. Ruiz-del-Solar, M. Correa, Gender classification of faces using Adaboost, in: J.F. Martínez-Trinidad, (Ed.), CIARP 2006, LNCS, vol. 4225, 2006, pp. 68–78. [72] N. Sun, W. Zheng, C. Sun, C. Zou, L. Zhao, Gender classification based on boosting local binary pattern, in: J. Wang et al. (Eds.), ISNN 2006, LNCS, vol. 3972, 2006, pp. 194–201. [73] J. Chen, X. Zhang, J. Li, Face verification based on AdaBoost learning for Histogram of Gabor Phase Patterns (HGPP) selection and samples synthesis with quotient image method, in: D.-S. Huang et al., (Eds.), ICIC 2008, LNCS, vol. 5226, 2008, pp. 430–437. [74] P.J. Phillips, Support vector machines applied to face recognition, in: Proceedings of the 1998 Conference on Advances in Neural Information Processing Systems II, MIT Press, 1999, pp. 803–809. [75] T.K. Ho, J.J. Hull, S.N. Srihari, Decision combination in multiple classifier systems, IEEE Transactions on Pattern Analysis and Machine Intelligence 16 (1) (1994) 66–75. [76] T. Jabid, M.H. Kabir, O. Chae, Facial expression recognition using Local Directional Pattern (LDP), in: IEEE International Conference on Image Processing, 2010, pp. 1605–1608. [77] J. Kannala, E. Rahtu, BSIF: Binarized Statistical Image Features, in: 21st International Conference on Pattern Recognition (ICPR 2012), 2012, pp. 1363– 1366. [78] X. Meng, S. Shan, X. Chen, W. Gao, Local Visual Primitives (LVP) for face modelling and recognition, in: 18th International Conference on Pattern Recognition (ICPR ‘06), 2006, pp. 536–539. [79] S. Xie, S. Shan, X. Chen, X. Meng, W. Gao, Learned local Gabor patterns for face representation and recognition, Signal Processing 89 (12) (2009) 2333–2344. [80] Z. Cao, Q. Yin, X. Tang, J. Sun, Face recognition with learning-based descriptor, in: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2010, pp. 2707–2714. [81] Z. Cui, S. Shan, X. Chen, L. Zhang, Sparsely encoded local descriptor for face recognition, Face and Gesture 2011 (2011) 149–154.