Face Recognition

0 downloads 0 Views 2MB Size Report
CWT+PCA and ST-CWT+PCA are respectively 93.1%, 94.1% and 94.59% with L1 as the distance measure. ...... Vision (WACV '94), pp. 138-142, Sarasota FL ...
Face Recognition from Still Images and Video Sequences

Alaa Adnan Eleyan

Submitted to the Institute of Graduate Studies and Research in partial fulfillment of the requirements for the Degree of

Doctor of Philosophy in Electrical & Electronic Engineering

Eastern Mediterranean University July 2009 Gazimağusa, North Cyprus

Approval of the Institute of Graduate Studies and Research

________________________ Prof. Dr. Elvan Yılmaz Director (a)

I certify that this thesis satisfies the requirements as a thesis for the degree of Doctor of Philosophy in Department of Electrical & Electronic Engineering.

______________________________________________ Assoc. Prof. Dr. Aykut Hocanın Chair, Department of Electrical & Electronic Engineering

We certify that we have read this thesis and that in our opinion it is fully adequate in scope and quality as a thesis for the degree of Doctor of Philosophy in Electrical & Electronic Engineering.

___________________________________ Assoc. Prof. Dr. Hasan Demirel Supervisor

Examining Committee 1. Prof. Dr. Aytül Erçil

_________________________________

2. Assoc. Prof. Dr. Hasan Demirel

_________________________________

3. Assoc. Prof. Dr. Hüseyin Özkaramanlı

_________________________________

4. Assoc. Prof. Dr. Erhan İnce

_________________________________

5. Assist. Prof. Dr. Önsen Toygar

_________________________________

i

ABSTRACT Face Recognition from Still Images and Video Sequences

Alaa Adnan Eleyan Electrical & Electronic Engineering Department Eastern Mediterranean University

Supervisor: Assoc. Prof. Dr. Hasan Demirel

In this thesis, two of the most well-known statistical approaches, namely principal component analysis (PCA) and linear discriminant analysis (LDA), have been used for feature extraction and dimensionality reduction for face recognition problem. Feedforward neural networks (FFNN) were utilized to improve the recognition performance by incorporating the discriminating power of the neural networks in the classification process. Multiresolution face recognition using discrete wavelet transform (DWT) was also investigated. Images at varying resolutions were generated by using DWT and then feature vectors were extracted from PCA and LDA spaces. Two data fusion methods have been proposed, where the first method utilized multiresolution feature concatenation (MFC) approach, which concatenated PCA and LDA projected feature vectors in different subbands for face recognition. The second method which is called the multiresolution majority voting (MMV), performed classification on each ii

subband and fused the decisions coming from each subband, using majority voting to generate the overall decision. Finally, in the context of still image face recognition, we utilized complex approximately analytic wavelets, which possess Gabor like characteristics. We employed recently developed dual-tree complex wavelet transform (DT-CWT) and the single-tree wavelet transform (ST-CWT) for face recognition. Complex approximately analytic wavelets enjoy a much less redundant representation, which is computationally more efficient than Gabor wavelets and provide a local multiscale description of images with good directional selectivity and invariance to shifts and in plane rotations. Similar to Gabor wavelets they are insensitive to illumination variations and facial expression changes. The computational complexity analysis of both Gabor and complex wavelets were investigated. In this context, the superiority of the complex wavelets over the Gabor wavelets has been shown. These findings indicate that complex wavelet transform provides a strong alternative to Gabor wavelet transform for face recognition. Furthermore, the newly introduced ST-CWT having improved directional selectivity and shift invariance properties have shown better face recognition performance than DT-CWT. In addition to the face recognition in still images, two adaptive face recognition approaches for video sequences have been proposed. In the case of adaptive approach with updating gallery set, the gallery is updated at each frame by discarding outlier images from the set. Discarding of the iii

gallery images depends on a proposed novel fitness measure. A fitness measure for each sample image in the gallery set is maintained and the fitness is updated after the processing of each frame of the probe video sequence. The images on the gallery with the lowest accumulated fitness values are discarded at each frame. At the end of the probe video, the person with the highest accumulated fitness value is declared to be the identified person. The other approach employed a fixed gallery and accumulated the fitness without discarding images from the set. Both approaches benefits from the proposed novel fitness measure to recognize subject in a given video sequence. Both approaches have been compared with conventional PCA and LBP methods. The results demonstrated the superiority of the proposed approach over the compared methods.

iv

To mom & dad To Gulden and Munjid

v

TABLE OF CONTENTS LIST OF TABLES ...................................................................................................... ix LIST OF FIGURES .................................................................................................... xi LIST OF SYMBOLS / ABBREVIATIONS ..........................................................xvii CHAPTER 1 ............................................................................................................... 1 INTRODUCTION ..................................................................................................... 1 1.1 Introduction ..................................................................................................... 1 1.2 Contributions of the Dissertation.................................................................. 5 1.3 Outline of the Dissertation ............................................................................. 7 CHAPTER 2 ............................................................................................................... 9 FACE AS A BIOMETRIC RECOGNITION SYSTEM ........................................... 9 2.1 Introduction ..................................................................................................... 9 2.2 Pattern Recognition System ......................................................................... 13 2.3 Face Recognition from Still Images ............................................................ 17 2.3.1 Factors that affect Face Recognition Performance ............................. 19 2.3.2 Literature Review ................................................................................... 23 2.4 Face recognition from videos ...................................................................... 25 2.4.1 Literature Review ................................................................................... 26 2.5 Face Databases Used in Performance Analysis ........................................ 27 2.5.1 Still images databases ............................................................................ 27 2.5.2 Video databases ...................................................................................... 28 2.6 Summary ........................................................................................................ 29 CHAPTER 3 ............................................................................................................. 31 PCA & LDA BASED FACE RECOGNITION APPROACHES ......................... 31 3.1 Introduction ................................................................................................... 31 3.2 Literature Review .......................................................................................... 31 vi

3.3 Principal Component Analysis ................................................................... 33 3.4 Linear Discriminant Analysis ...................................................................... 38 3.5 Neural Networks ........................................................................................... 43 3.5.1 Feedforward Neural Networks (FFNN) ............................................. 43 3.5.2 Learning Algorithm (Backpropagation) ............................................. 44 3.6 Performance Analysis and Discussions ..................................................... 46 3.6.1 Training and Testing of Neural Networks ......................................... 46 3.6.2 System Performance .............................................................................. 47 3.7 Summary ........................................................................................................ 49 CHAPTER 4 ............................................................................................................. 50 DISCRETE WAVELET TRANSFORM FOR FACE RECOGNITION .............. 50 4.1 Introduction ................................................................................................. 50 4.2 Literature Review ........................................................................................ 51 4.3 Discrete Wavelet Transform ...................................................................... 52 4.4 Introduced Multiresolution Methods....................................................... 54 4.4.1

Multiresolution Feature Concatenation ........................................... 55

4.4.2

Multiresolution Majority Voting ....................................................... 56

4.5 Performance Analysis and Discussions ................................................... 56 4.6 Summary ...................................................................................................... 61 CHAPTER 5 ............................................................................................................. 62 GABOR & COMPLEX WAVELET TRANSFORMS FOR FACE RECOGNITION....................................................................................................... 62 5.1 Introduction ................................................................................................. 62 5.2 Literature Review ........................................................................................ 63 5.3 Gabor Wavelet Transform ......................................................................... 66 5.4 Complex Wavelet Transform .................................................................... 71 5.4.1

Dual-Tree Complex Wavelet Transform.......................................... 71

5.4.2

Single-Tree Complex Wavelet Transform ....................................... 74

5.5 Proposed Approach .................................................................................... 82 vii

5.6 Simulation Results And Discussions ........................................................ 86 5.7 Computational Complexity Analysis For Feature Extraction .............. 92 5.8 Summary ...................................................................................................... 94 CHAPTER 6 ............................................................................................................. 96 ADAPTIVE APPROACH FOR FACE RECOGNITION FROM VIDEO SEQUENCES ............................................................................................................ 96 6.1 Introduction ................................................................................................. 96 6.2 Literature Review ........................................................................................ 97 6.3 Feature Extraction Process ....................................................................... 100 6.3.1

Principal Component Analysis ....................................................... 100

6.3.2

Local Binary Patterns ........................................................................ 100

6.4 Video Face Database ................................................................................. 104 6.5 Proposed Adaptive Approach ................................................................ 105 6.6 Simulation Results and Discussions ....................................................... 108 6.7 Summary .................................................................................................... 119 CHAPTER 7 ........................................................................................................... 120 CONCLUSIONS AND FUTURE WORKS ......................................................... 120 7.1 Conclusions ................................................................................................ 120 7.2 Future Work ............................................................................................... 122 REFERENCES ........................................................................................................ 124

viii

LIST OF TABLES

Table 3.1: Performance of conventional PCA & LDA versus proposed PCANN & LDA-NN ................................................................................................... 48 Table 4.1: MFC and MMV using 3 wavelet subbands with PCA as in [6] ...... 57 Table 4.2: MFC and MMV using 10 wavelet subbands with PCA. ................. 57 Table 4.3: MFC and MMV using 3 wavelet subbands with LDA as in [6] ..... 59 Table 4.4: MFC and MMV using 10 wavelet subbands with LDA. ................ 59 Table 5.1: Filter coefficients of conjugate symmetric two-band complex biorthogonal filterbank ...................................................................... 79 Table 5.2: Aliasing energy ratio in dB .................................................................. 79 Table 5.3: Face recognition performance for Gabor wavelet with different downsampling factors using FERET database and three different similarity measures:

L1 distance measure, δL1, L2 distance

measure, δL2, and cosine similarity measure, δcos ........................... 87 Table 5.4: Face recognition performance for DT-CWT and ST-CWT with different downsampling factors using FERET database and three different similarity measures: L1 distance measure, δL1, L2 distance measure, δL2, and cosine similarity measure, δcos ........... 87 Table 5.5: Face recognition performance for different approaches using 200/400 features and FERET database with three different similarity measures ............................................................................ 90

ix

Table 5.6: face recognition performance for different approaches using 100/200 features and ORL database with three different similarity measures ............................................................................ 91 Table 5.7: Computational Complexity Analysis of Feature Extraction using Gabor Wavelet,

DT-CWT and

ST-CWT (N2: Total Images

Pixels) ................................................................................................... 94 Table 6.1: Performance of the adaptive approach with and without updating the gallery set against PCA and PCA majority voting. .............. 112 Table 6.2: Performance Of The Adaptive Approach With And Without Updating The Gallery Set Against LBP And LBP Majority Voting................................................................................................. 112

x

LIST OF FIGURES

Figure 2.1: Examples of two patterns corresponding to (a) class X and (b) class Y. .................................................................................................. 11 Figure 2.2: Plot of the feature vectors of the three features for a number of different images originating from class X (●) and class Y (■). ..... 12 Figure 2.3: Typical modules of a general pattern recognition system. ............ 14 Figure 2.4: Examples of biometrics that can be used for

Human

recognition…....................................................................................... 17 Figure 2.5: Typical Modules of face recognition System. .................................. 19 Figure 2.6: (a) frontal face image (b) pose variation example ........................... 20 Figure 2.7: (a) frontal face image, (b) illumination variation example ........... 20 Figure 2 8: (a) frontal face image, (b) facial expression example..................... 21 Figure 2.9: (a) frontal face image, (b) occlusion example .................................. 21 Figure 2.10: (a) man face image (b) woman face image ..................................... 22 Figure 2.11: (a) young woman (b) same woman at old age .............................. 22 Figure 2.12: Face examples from ORL database ................................................. 27 Figure 2.13: Face examples from FERET database ............................................. 28 Figure 2.14: Example of the BANCA database videos Left: Controlled, Middle: Degraded, Right: Adverse scenarios. ............................................... 29 Figure 3.1: Samples face images from the ORL database. ................................. 36 Figure 3.2: First 16 eigenfaces with highest eigenvalues. .................................. 37 Figure 3.3: PCA approach for face recognition. ................................................. 38 xi

Figure 3.4: (a) Points mixed when projected onto a line. (b) Points separated when projected onto another line. ................................................... 40 Figure 3.5: (a) Good class separation. (b) Bad class separation ........................ 40 Figure 3.6: First 16 Fisherfaces with highest eigenvalues. ................................ 41 Figure 3.7: LDA approach for face recognition.................................................. 42 Figure 3.8: Architecture of FFNN Neural Networks ........................................ 44 Figure 3.9: Training phase of both Neural Networks ........................................ 47 Figure 3.10: Recognition rate vs. number of training faces ............................... 49 Figure 4.1: 2-Dimensional discrete wavelet decomposition. ........................... 53 Figure 4.2: Discrete Wavelet Transform decomposition example: (a) Γj: Original face image, (b) Γj+1: scaling component with lowfrequency information (c) Hj+1: Horizontal component (d) Vj+1: Vertical component (e) Dj+1: Diagonal component........................ 54 Figure 4.3: Multiresolution Feature Concatenation technique ......................... 55 Figure 4.4: Multiresolution majority voting technique ...................................... 56 Figure 4.5: Performance comparison among MMV, MFC and conventional PCA with 10 subbands used for MFC and MMV using ORL database. .............................................................................................. 60 Figure 4.6: Performance comparison among MMV, MFC and conventional LDA with 10 subbands used for MFC and MMV using ORL database. .............................................................................................. 60

xii

Figure 5.1: Gabor Wavelet. (a) The real part of the Gabor kernels at four scales and six orientations (b) the magnitude of the Gabor kernels at four different scales. ...................................................................... 68 Figure 5. 2: Frequency response of 1-D Gabor wavelet (f= [0.5, 0.25, 0.125, 0.0625] and η=1).................................................................................. 69 Figure 5.3: Gabor wavelet transformation of a sample image (Top left face in figure 5.12). (a) The magnitude of the transformation. (b) The real part of the transformation. ........................................................ 70 Figure 5.4: Impulse response of dual-tree complex wavelet at 4 levels and 6 directions. (a) Real part. (b) Magnitude. ......................................... 72 Figure 5.5: Frequency response of 1-dimensional wavelet in the first 4 levels for the DT-CWT (Filters in first level are from daubechies “db10” filterbank and subsequent levels are filters from [36]). ................ 73 Figure 5.6: DT-CWT transformation of a sample image. (Top left face in figure 5.12).

(a) The magnitude of the transformation. (b) The

real part of the transformation. ........................................................ 74 Figure 5.7: Two-band critically downsampled complex biorthogonal filterbank (H0(z) and H1(z) are analysis filters; F0(z) and F1(z) are synthesis filters ). ................................................................................ 77 Figure 5.8: Impulse response of single-tree complex wavelet at 4 levels and 6 directions. (a) Real part. (b) Magnitude. ........................................ 80

xiii

Figure 5.9: ST-CWT transformation of a sample image. (Top left face in figure 12). (a) The magnitude of the transformation. (b) The real part of the transformation .............................................................................. 81 Figure 5.10: Frequency response of 1-dimensional wavelet in the first 4 levels for the ST-CWT (length 10 complex filters from table 5.1)........... 82 Figure 5.11: The block diagram of the proposed approach. ............................. 84 Figure 5.12: Example FERET images used in our experiments (cropped to the size of 128 × 128 to extract the facial region). The figure shows in the top two rows the examples of training images used in our experiments

and

in the bottom row the examples of

test

images. ................................................................................................. 85 Figure 5.13: Example ORL images used in our experiments (resized to 128 x 128). The figure shows two subjects’ images where first 2 rows used for training and the second 2 rows used for testing. ........... 86 Figure 5.14: Face recognition performance of the FERET database using PCA, Gabor+PCA, DT-CWT+PCA and ST-CWT+PCA for the δL1 (L1) similarity measure. The recognition rate means the accuracy rate for the top response being correct. .................................................. 89 Figure 5.15: Face recognition performance of the ORL database using PCA, Gabor+PCA, DT-CWT+PCA and ST-CWT+PCA for the δL1 (L1) similarity measure. The recognition rate means the accuracy rate for the top response being correct. .................................................. 91 Figure 6.1: 3×3 LBP basic operator. .................................................................... 101 xiv

Figure 6.2: An example of a facial image divided into 4×4, 8×8 and 16×16 rectangular regions. ......................................................................... 103 Figure 6.3: Example of the BANCA database images Left: Controlled, Middle: Degraded, Right: Adverse scenarios. ............................................. 104 Figure 6.4: Example of using face detection algorithm to crop the face region from the whole frame. ..................................................................... 105 Figure 6.5: Pseudo code for adaptive approach using fitness measure with updating gallery set (Nf= 50). ......................................................... 108 Figure 6.6: Pseudo code for adaptive approach using fitness measure without updating gallery set (Nf= 50). ......................................................... 108 Figure 6.7: Fitness accumulation through the frames for 1st video sequence of person number 1 with updating gallery set using PCA. ............ 110 Figure 6.8: Fitness accumulation through the frames for 1st video sequence of person number 1 without updating gallery set using PCA. ...... 110 Figure 6.9: Example of fitness accumulation through the frames for 1st video sequence of person number 1 with updating gallery set using LBP. .................................................................................................... 111 Figure 6.10: Example of fitness accumulation through the frames for 1st video sequence of person number 1 without updating gallery set using LBP. .................................................................................................... 111 Figure 6.11: Performance in controlled scenario with 1 training image per video with and without updating gallery set using PCA. ......... 115

xv

Figure 6.12: Performance in controlled scenario with 1 training image per video with and without updating gallery set using LBP. .......... 115 Figure 6.13: Performance in adverse scenario with 1 training image per video with and without updating gallery set using PCA. .................... 116 Figure 6.14: Performance in adverse scenario with 1 training image per video with and without updating gallery set using LBP. ..................... 116 Figure 6.15: Performance in degraded scenario with 1 training image per video with and without updating gallery set using PCA. ......... 117 Figure 6.16: Performance in degraded scenario with 1 training image per video with and without updating gallery set using LBP. .......... 117 Figure 6.17: Performance in controlled scenario with 2 training image per video with and without updating gallery set using PCA. ......... 118 Figure 6.18: Performance in controlled scenario, 2 training image per video with and without updating gallery set using LBP. ..................... 118

xvi

LIST OF SYMBOLS / ABBREVIATIONS

α

Training rate

Γj

Face image

Γj+1

Scaling component with low-frequency information

__



Mean of distance vector

δCos

Cosine distance

δL1

Euclidean distance, L1 norm

δL2

Manhattan distance, L2 norm

Δwij

Change in weights

Λ

Average image

λ

Eigenvalue

Ξ

Fitness measure

ϒ

Difference image

Φh(t)

Scaling function

Φh(ω)

Frequency response of the scaling function

Ψh(t)

Wavelet function

Ψh(ω)

Frequency response of the wavelet function

arg{}

argument

C

Covariance matrix

D

Number of directions

Dj+1

Diagonal component xvii

E

Error

f0(n)

Dual scaling filter

f1(n)

Dual wavelet filter

h0(n)

Scaling filter

h1(n)

Wavelet filter

Hj+1

Horizontal component

hm

Hidden layer output vector

K

Approximation order

L

Filter length

l

number of histogram bins

max{.}

Maximum

n

Number of training images per person

N2

Image size

netk

Output layer input

netm

Hidden layer input

Nf

Number of frames peer video

o

Output vector

ok

Output layer output

P(z)

Halfband filter

S

Number of scales

Sb

Between scatter matrix

Sw

Within scatter matrix

xviii

t

Target vector

T

Transpose

U

Eigenvectors

Vj+1

Vertical component

*

Conjugate

1D

1 Dimensional

2D

2 Dimensional

3D

3 Dimensional

ARMA

Autoregressive and Moving Average

BANCA

Biometric Access Control for Networked and E-Commerce Application

Cos

Cosine

CoWT

Continuous Wavelet Transform

CWT

Complex Wavelet Transform

dB

Decibel

DCT

Discrete Cosine Transform

DT-CWT Dual-Tree Complex Wavelet Transform DWT

Discrete Wavelet Transform

EBGM

Elastic Bunch Graph Matching

FERET

Face Recognition Technology

FFT

Fast Fourier Transform

IFFT

Inverse Fast Fourier Transform

GWT

Gabor Wavelet Transform

xix

HMM

Hidden Markov Models

ICA

Independent Components Analysis

IGFs

Independent Gabor Features

L1

Norm 1, Manhattan Distance

L2

Norm 2, Euclidean Distance

LBP

Local Binary Patterns

LDA

Linear Discriminant Analysis

NN

Neural Networks

ONPP

Orthogonal Neighborhood Preserving Projections

ORL

Olivetti Research Lab

PCA

Principal Components Analysis

PR

Pattern Recognition

SMQT

Successive Mean Quantization Transform

SNoW

Sparse Network of Windows

ST-CWT Single-Tree Complex Wavelet Transform SWT

Stationary Wavelet Transform

WP

Wavelet Packet Transform

xx

CHAPTER 1 INTRODUCTION

1.1 Introduction After 9/11 tragedy, governments in all over the world started to look more seriously to the levels of security they have at their airports and borders. Countries annual budgets were increased drastically to have the most recent technologies in identification, recognition and tracking of suspects. The demand growth on these applications helped researchers to be able to fund their research projects. One of most common biometric recognition techniques is face recognition. Although face recognition is not as accurate as the other recognition methods using iris or fingerprints, it still grabs huge attention of many researchers in the field of computer vision. The main reason behind this attention is the fact that the face is the conventional and the fastest way people use to identify each other. Since the mid of last century the researchers have been working extensively in the field of face recognition trying to introduce new approaches or improve the performance of the old ones to have more robust and accurate recognition systems. In this dissertation we proposed a number of new approaches for improving the performance of face recognition from still images or video sequences. Face images are of huge dimensions and require too much memory and processing power for face recognition. It is crucial to have a preprocessing 1

stage to extract the important information or the salient features out of this huge dimension. One of the well-known techniques for dimensionality reduction and feature extraction are principal component analysis (PCA) [1]-[3] and linear discriminant analysis (LDA) [4, 5]. These two techniques were used throughout this book as a preprocessing or post processing stage for feature extraction and dimensionality reduction. Neural networks are known for their robustness as classifiers. In the first part of this book multilayer perceptron feedforward neural networks (FFNN) were utilized as a classifier which takes, as input, the feature (projection) vectors obtained from applying PCA and/or LDA to the face images. The results of this classifier compared with conventional distance classifiers such as Euclidean distance were very promising. Another approach for solving the face recognition problem was proposed by [6] using discrete wavelet transforms. In this book discrete wavelet transform was applied to the face images till the 10th subband and two techniques were used for recognition. First method fused (concatenated) the resulting DC images from each subband originating from the same image and applied PCA or LDA to obtain the feature vector for the classification process. Second method applied PCA or LDA to the images at each subband and then classification was made separately at each subband. After that, all decisions were fused to come out with final decision. Although standard DWT is a powerful tool for analysis and processing of many real-world signals and images, it suffers from three major disadvantages: 2

(1) Shift- sensitivity, (2) Poor directional selectivity, and (3) Lack of phase information. These disadvantages severely restrict its scope for certain signal and image processing applications. Other extensions of standard DWT such as Wavelet Packet Transform (WP) [7] and Stationary Wavelet Transform (SWT) reduce only the first disadvantage of shift-sensitivity but with the cost of very high redundancy and computational cost. Recent researches suggest the possibility of reducing two or more above mentioned disadvantages using different forms of Complex Wavelet Transforms (CWT) [8,9] with only limited and controllable redundancy and moderate computational complexity. Complex wavelet transform introduce the ability of overcoming the previously mentioned drawbacks of the DWT. CWT provide a multiscale representation of images with good directional selectivity, invariance to shifts and in-plane rotation, and phase information. The complex wavelets however are orthogonal and can be implemented with short 1-dimensional separable filters which make them computationally attractive. Three complex wavelet transform, namely Kingsbury's Dual-Tree Complex wavelet transform (DT-CWT) [10,11], Single-Tree Complex wavelet transforms (ST-CWT) [12] and the Gabor wavelet transform [13-16], are compared for their capabilities for face recognition. The Gabor wavelets extract directional features from images and find frequent applications in computer vision problems of face detection and face recognition. The transform involves 3

convolving an image with a group of Gabor filters or kernels scale and directionally parameterized. As a result, redundant image representations are obtained, where the number of transformed images is equal to the number of Gabor kernels used. However, repetitive convolution with 2-D Gabor kernels is a rather computationally intense operation. The DT-CWT is a recently suggested transform, which provides good directional selectivity in six different fixed orientations at dyadic scales with the ability to distinguish positive and negative frequencies. It has a limited redundancy for images and is much computationally cheaper than the Gabor wavelet transform. Therefore, it arises as a good candidate to replace Gabor wavelet transform in applications, where the speed is a critical issue. The objectives of research in this dissertation include: 1. Review of pattern recognition problem: various techniques, properties with focus on face recognition problem from still images and video sequences. 2. Review of feature extraction techniques such as PCA, LDA and LBP: literature review, theory, and applications for face recognition problem. 3. Review of neural networks, literature review, theory and application for face recognition problem. 4. Review of discrete wavelet transforms: literature review, theory, properties, and application for multiresolution face recognition problem using data fusion.

4

5. Study of complex wavelet transforms (CWT): literature review, theory, various forms, properties, and investigations for application in face recognition problem. 6. Comprehensive and collective analysis of recently proposed ST-CWT, and a comparison with existing forms of CWT such as DT-CWT and GWT. 7. Practical realizations and simulations of ST-CWT, DT-CWT, GWT, and DWT for face recognition problem. 8. Comparative study for computational complexity analysis: Critical evaluation of the computational complexity among ST-CWT, DT-CWT, GWT. 9. Review of face recognition from video sequences: literature review with an 10. Implementation of new adaptive approaches for video face recognition with newly proposed fitness measure.

1.2 Contributions of the Dissertation The dissertation discusses the face recognition problem in general and tries to present new approaches and modify existing approaches for improving the performance of the available face recognition systems. First contribution of this work was shown in third chapter where the neural networks was used as a classifier for the eigenfaces approach. Input for the neural networks were the projection vectors obtained from the face database after applying principal component analysis (PCA) and linear discriminant analysis (LDA) as feature extraction and dimensionality reduction techniques.

5

The use of the feedforward neural networks improved the recognition performance of the system. Chapter 5 presents our second contribution, where complex wavelet transform was used for face recognition purpose. In this chapter, two techniques namely; the well-known dual-tree complex wavelet transforms (DT-CWT) and the recently introduced single-tree complex wavelet transforms (ST-CWT) have been used. The proposed techniques are compared with Gabor wavelet transform (GWT). Moreover, a computational complexity analysis among the three techniques has been done which suggests that ST-CWT and DT-CWT outperform the GWT in terms of its computational complexity. The last contribution was discussed in chapter 6 where a new approach for face recognition from video sequences using a new metric called fitness measure was introduced. The training images in the gallery set and frames in probe video were transformed to feature vectors using either PCA or LBP techniques. Two scenarios were conducted using this approach. First, the fitness value for each image in the gallery set was calculated and the gallery images with the lowest fitness values were discarded after each processed frame. The remaining gallery images were used to form a new space and then move to the subsequent frame and repeat the same process by accumulating the fitness values till the last frame in the probe video. In the second scenario, the fitness values for images in gallery set were calculated and accumulated at each frame without the update of the gallery set images. In both experiments, when reaching the last frame, the gallery image with the highest fitness value is declared to be 6

the correct person in the probe video. In many experiments, the person was recognized correctly before reaching last frame using the proposed fitness measure.

1.3 Outline of the Dissertation In this introductory chapter, we have defined the problem and underlined the contributions we accomplish in this work. Chapter 2 is about the general pattern recognition problem and introduces the methods to extract the features that best represent an object. The modules of a general pattern recognition system are listed and function of each one of these modules is explained. The chapter also discusses the problems or the factors that might affect the performance of a face recognition system such as illumination, occlusions, facial expressions and aging. Chapter 3 explains the two of the well-known feature extraction and dimensionality reduction techniques, namely, principal component analysis and linear discriminant analysis. The performance study of the two techniques for face recognition problem is also included. Moreover, a neural network was added to the system as a classifier instead of using a distance measures such as Euclidean distance. The neural network takes the feature vectors of the PCA and LDA as input for the network. Chapter 4 introduces the idea of the multiresolution face recognition using discrete wavelet transform together with PCA and LDA extracted feature vector.

7

The adopted two approaches used in this chapter for combining the extracted feature vectors at different subbands are explained. One of the most important contributions of this book is introduced in Chapter 5. Here the chapter explains the complex wavelet transform together with its two main sub categories: dual-tree complex wavelet transform (DTCWT) and single-tree complex wavelet transforms (ST-CWT). A comparison among these two complex wavelet transforms and Gabor wavelet transform is provided. Moreover, a computational complexity analysis showing the superiority of DT-CWT and ST-CWT over GWT is carried out. Chapter 6 introduces a new approach that we proposed which is an adaptive approach for face recognition from video sequences using a new metric called fitness measure. Fitness value calculation for each image in the gallery set (training images) depends on the distance among the tested feature vector and the feature vectors from the gallery set. Chapter explains the idea of improving the recognition by using an accumulated fitness measure through the probe video frames together with or without the update of the used face space through the recognition process. At the end of the dissertation, conclusions and discussions on the introduced approaches and simulation results are presented in Chapter 7. Furthermore, the possible future work which details the way in which it may help to improve the performance or modify some of the approaches is included and discussed.

8

CHAPTER 2 FACE AS A BIOMETRIC RECOGNITION SYSTEM

2.1 Introduction Pattern recognition is a scientific operation whose aim is the classification of data, objects, or samples into different categories or classes. Depending on the application, these objects or samples can be images, signals, or measurements that need to be recognized and classified. These objects are referred to using the common term patterns. Pattern recognition was mostly the output of theoretical research in the area of statistics. As with everything else, the quick development of computers increased the stipulate for practical applications of pattern recognition, which in turn placed new demands for further theoretical improvements. As a result of the evolving of society from the industrial phase to postindustrial phase, automation in products manufacturing becomes drastically important. This tendency helped to put pattern recognition at the high edge of present engineering applications and research. Pattern recognition is an essential part in most intelligent systems built for classification and identification. In order to build a robust pattern recognition system, it is crucial to understand how humans in reality differentiate among

9

different objects. Each object usually has its own features that differentiate it from others. As humans it is easy to separate a car from a bus or a cat from a dog using for example their shapes. The more features used to classify an object the better performance you achieve. A simple example of a classification task is trying to separate apples from oranges on a conveyer belt in a factory. Figure 2.1 shows two images, one for red apple and the other for orange. The two patterns are themselves visually different. The pattern of Figure 2.1 a) is an apple representing class X, and that of Figure 2.1 b) is an orange representing class Y. We assume that the conveyer belt carries many fruits, some of which are known to originate from class X and some from class Y. The first step is to identify the measurable quantities that make these two patterns distinct from each other. In our search for features, we can focus on the physical differences between oranges and apples. We might try to capitalize on the observation that apple’s color is typically red while orange’s color is orange. Other observation is the shape difference between apple and orange. One more observation is the surface smoothness difference between the two classes. Now we have three features for classification: the color x1, the shape x2, and the surface smoothness x3. If we ignore how these features might be measured in practice, we realize that the feature extractor has thus reduced the image of each pattern to a point or feature vector x in a three-dimensional feature space, where

10

 x1  x   x2   x3 

(a)

(b)

FIGURE 2.1: Examples of two patterns corresponding to (a) class X and (b) class Y.

Each point corresponds to a different image from the available database. It turns out that class X patterns tend to spread in a different area from class Y patterns. The straight plate seems to be a good candidate for separating the two classes. Let us now assume that we are given a new image with a pattern in it and that we do not know to which class it belongs. It is reasonable to say that we measure the value for the surface smoothness, shape and color of the pattern in the image and we plot the corresponding point. This is shown by the asterisk (*) in Figure 2.2. Then it is sensible to assume that the unknown pattern is more likely to an apple than an orange.

11

Shape

* Color

Surface

FIGURE 2.2: Plot of the feature vectors of the three features for a number of different images originating from class X (●) and class Y (■).

One important thing to consider when choosing the features is the redundancy of these features. It is important to avoid using redundant or correlated features that will not improve the decision the system made but instead slow it down. The preceding artificial classification task has outlined the rationale behind a large class of pattern recognition problems. The measurements used for the classification, the surface smoothness, the shape value and the color in this case, are known as features. Each of the feature vectors identifies uniquely a single pattern (object). Features and feature vectors will be treated as random variables and vectors, respectively. This is natural, as the measurements resulting from different patterns exhibit a random variation. This is due partly to the measurement noise of the measuring devices and partly to the different characteristics of each pattern [17]. The straight plate in Figure 2.2 is known as the decision boundary, and it constitutes the classifier whose role is to divide the feature space into regions that 12

correspond to either class X or class Y. If a feature vector x, corresponding to an unknown pattern, falls in the class X region, it is classified as class X, otherwise as class Y. This does not necessarily mean that the decision is correct. If it is not correct, a misclassification occurs. In order to draw the decision boundary in Figure 2.2 we exploited the fact that we knew the labels (class X or Y) for each point of the figure. The patterns or feature vectors whose true class is known and which are used for the design of the classifier are known as training patterns or training feature vectors.

2.2 Pattern Recognition System A pattern recognition system is a system that acquires biometric data from an individual, extracts a salient feature set from the data, compares this feature set against the feature sets stored in the database, and executes an action based on the result of the comparison. A typical biometric system can be viewed in Figure 2.3 as a system having five main modules: an acquisition module; a preprocessing module; a feature extraction module; a classification module; and a database module [18]. Each of these modules is described below. 1. Acquisition module: A suitable biometric reader, camera or scanner is required to acquire the raw biometric data of an individual. To obtain fingerprint images, for example, an optical fingerprint sensor may be used to acquire the friction ridge structure of the fingertip. The acquisition module defines the human machine interface and is, therefore, crucial to the performance of the biometric system. Since most biometric data are acquired as images, the 13

quality of the raw data is also impacted by the characteristics of the camera technology that is used.

Data Acquisition

Feature Extraction

Pre-Processing

Decision Classification

Database

FIGURE 2.3: Typical modules of a general pattern recognition system [18].

2. Pre-processing module: The quality of the biometric data acquired by the system might not be of a good quality so it is first evaluated in order to determine its suitability for further processing. Moreover, the acquired data is subjected to a signal enhancement in order to improve its quality. Some of the following preprocessing steps might be utilized prior to applying the data to the feature extraction module.  Size normalization: This is done by resizing all data to one default size.  Illumination Normalization: This preprocess is specific for image based recognition systems. The general purpose of illumination normalization is to decrease lighting effect when the observed images are captured in different environment.  Histogram Equalization: Histogram equalization is a process of adjusting the image so that each intensity level contains an equal number of pixels.

14

 Median Filtering: Median filtering is a simple and very effective noise removal filtering process. It is normally used to reduce noise in an image.  High-pass Filtering: High pass filtering removes low frequency gradients in the image, without affecting high frequency gradients.  Background Removal: Removing the background is so important for face recognition systems where entire information contained in the image is used.  Translational and Rotational Normalization: It is possible to work on a face image in which the head is shifted or rotated. The head plays a very important role in the determination of facial features especially for face recognition systems that are based on the frontal views of faces. It may be desirable that the pre-processing module determines and normalizes the shifts and rotations in the head position. 3. Feature extraction module: The biometric data is then processed and a set of salient discriminatory features are extracted to represent the underlying feature. For example, the position and the distances among eyes, nose, and mouth in a face image are extracted by the feature extraction module in some of the face-based biometric system. During training, this feature set is stored in the database and is commonly referred to as a template. 4. Classification module: The extracted features are compared against the stored templates to generate match scores. In a face-based biometric system, the minimum distance between the input and the template feature sets is determined and a match score reported. The match score may be moderated 15

by the quality of the presented biometric data. The classification module also contains a decision-making module, in which the distances are used to either accept or deny a claimed identity. 5. Database module: The database acts as the depository of biometric information. During the training stage, the feature set extracted from the raw biometric sample is stored in the database and used later on by the classification module for input feature sets classification or identification. The template of a user can be extracted from a single biometric sample, or generated by processing multiple samples. Thus, the minutiae points (ridges) of a finger, for example, may be extracted after mosaicing multiple samples of the same finger. Some systems store multiple templates in order to account for the within-class variations associated with a user. Face recognition systems may store multiple templates of an individual, with each template corresponding to a different facial pose with respect to the camera. In the face recognition literature, the raw biometric images stored in the database are often referred to as gallery, stored or training images while those acquired during authentication are known as probe, query, input or test images. Figure 2.4 shows most of the known biometric systems which can be used in authenticating an individual

16

Keystroke Dynamics

Gait

Palm Vein

Ear

Speech

Hand Geometry Face

DNA

Facial Thermogram Iris

Signature Fingerprint

Palmprint s

FIGURE 2.4: Examples of biometrics that can be used for human recognition.

2.3 Face Recognition from Still Images Face recognition is a non-intrusive method, and facial characteristics are probably the most common biometric features used by humans to recognize one another. The applications of facial recognition range from a static, controlled “mug-shot” authentication to a dynamic, uncontrolled face identification in a 17

complex background. The most popular approaches to face recognition [19] are based on either (i) the location and shape of facial characteristics, such as the eyes, eyebrows, nose, lips, and chin and their spatial relationships, or (ii) the overall (global) analysis of the face image that represents a face as a weighted combination of a number of canonical faces. While the authentication performance of the face recognition systems that are commercially available is reasonable [20], they impose a number of restrictions on how the facial images are obtained, often requiring a fixed and simple background with controlled illumination. These systems also have difficulty in matching face images captured from two different views, under different illumination conditions, and at different times. It is questionable whether the face itself, without any contextual information, is a sufficient basis for recognizing a person from a large number of identities with an extremely high level of confidence. In order for a facial recognition system to work well in practice, it should automatically (i) detect whether a face is present in the acquired image; (ii) locate the face if there is one; and (iii) recognize the face from a general viewpoint (i.e., from any pose) under different ambient conditions. Figure 2.5 shows general modules of face recognition system.

18

Face Image Camera Face Scanner Acquisition

Normalized Face

Feature Vector

Feature Extraction

Pre-Processing

Decision Classification

Face Database

FIGURE 2.5: Typical modules of face recognition System.

2.3.1 Factors that affect Face Recognition Performance

For faces to be a useful biometric, facial features used for face recognition should remain invariant to factors unrelated to person identity that modify face image appearance. While theory and some data suggest that many of these factors are difficult to handle, it is not clear where exactly difficulties lie and what their causes may be. In this section, we quantify the difficulties in face recognition as a function of variation in factors that influence face image acquisition and individual differences in subjects. These difficulties can be represented by the change in these factors: 

Pose Variation: As the camera pose changes, the appearance of the face can change due to a) projective deformation, which leads to stretching and foreshortening of different part of face, and b) occlusion of parts of the face. If we have seen faces only from one viewing angle, in general it is difficult to recognize them from other angles. 19

(a)

(b)

FIGURE 2.6: (a) Frontal face image (b) Pose variation example.



Illumination: Just as with pose variation, illumination variation is inevitable. Ambient lighting changes greatly within and between days and among indoor and outdoor environments. Due to the 3D shape of the face, direct lighting source can produce strong shadows and shading that emphasize or diminish certain facial features. While this problem is less noticeable for humans, it can cause major problems for computer vision.

(a)

(b)

FIGURE 2.7: (a) Frontal face image, (b) Illumination variation example.



Facial Expression: The face is a non-rigid object. Facial expression of emotion and paralinguistic communication along with speech acts can and does produce large variation in facial appearance. The number of possible changes in facial expression is reportedly in the thousands. Because facial expression affects the apparent geometrical shape and position of the facial features, the

20

influence on recognition may be greater for geometry based algorithms than for holistic algorithms.

(a)

(b)

FIGURE 2 8: (a) Frontal face image, (b) Facial expression example.



Occlusion: The face may be occluded by other objects in the scene or by sunglasses or other paraphernalia. Occlusion may be unintentional or intentional. Under some conditions individuals may be motivated to prevent recognition efforts by covering portions of their face.

(a)

(b)

FIGURE 2.9: (a) Frontal face image, (b) Occlusion example.



Individual factors: Algorithms may be more or less sensitive for men or women or members of different ethnic groups. Intuitively, females might be harder to recognize because of greater use of makeup and day-to-day variation in structural facial features. Male and female faces differ in both local features and in shape. Men's faces on average have thicker eyebrows and greater 21

texture in the beard region. In women's faces, the distance between the eyes and eyebrows is greater, the protuberance of the nose smaller, and the chin narrower than in men.

(a)

(b)

FIGURE 2.10: (a) Man face image (b) Woman face image.



Facial Aging: While studying the role played by these external factors in affecting face recognition is crucial, it is important to study the role played by natural phenomenon such as facial aging. Aging effects on human faces manifest in different forms in different ages. While aging effects are manifested more in terms of changes in the cranium's shape during one's younger years, they are manifested more in terms of wrinkles and other skin artifacts during one's older years.

(a)

(b)

FIGURE 2.11: (a) Young girl face (b) Same girl face at old age.

22

2.3.2 Literature Review Over the last few decades, a lot of researchers gave up working in the face recognition problem due to the inefficiencies of the methods used to represent faces. The face representation was performed by using two categories. The First category is global approach or appearance-based, which uses holistic texture features and is applied to the face or specific region of it. The second category is featurebased or component-based, which uses the geometric relationship among the facial features like mouth, nose, and eyes. Wiskott et al. [21] implemented featurebased approach by a geometrical model of a face by 2-D elastic graph. Another example of feature-based was done by independently matching templates of three facial regions (eyes, mouth and nose) and the configuration of the features was unconstrained since the system didn’t include geometrical model [22]. Principal components analysis (PCA) method [1,2] which is also called eigenfaces [3,23,24] is an appearance-based technique used widely for the dimensionality reduction and recorded a great performance in face recognition. PCA based approaches typically include two phases: training and classification. In the training phase, an eigenspace is established from the training samples using PCA and the training face images are mapped to the eigenspace for classification. In the classification phase, an input face is projected to the same eigenspace and classified by an appropriate classifier. Contrasting the PCA which encodes information in an orthogonal linear space, the linear discriminant analysis (LDA) method [4,5,25] which is also known as fisherfaces method is another example of Appearance-based techniques which encodes discriminatory 23

information in a linear separable space of which bases are not necessarily orthogonal. Recently the wavelets have grasped many researchers attention in the field of face recognition [26, 27]. Ekenel & Sankur [6] proposed an approach for multiresolution face recognition using discrete wavelet transform (DWT) to extract multiple subband face images. Using the proposed approach in [6], researchers in [28,29] used the multiresolution idea to fuse faces at different subbands using DWT and applied PCA and LDA for dimensionality reduction prior to recognition. A dynamic link architecture framework of the Gabor wavelet for face recognition was used in [30]. Wiskott et al. [21] subsequently developed a Gabor wavelet-based elastic bunch graph matching (EBGM) method to label and recognize human faces. Zhang et al. [31] introduced an object descriptor based on histogram of Gabor phase pattern for face recognition. Another method was proposed to determine the optimal position for extracting the Gabor feature such that number of feature points is minimized while the representation capability is maximized [32]. An independent Gabor features (IGFs) method for face recognition [13] based on the independent component analysis [33] was introduced. Recently [34] and [35] used the dual-tree complex wavelet transform DT-CWT and Gabor wavelet for facial feature extraction, where in both papers authors report comparable performance of the DT-CWT with more efficient computational complexity. In [36] DT-CWT applied on Spectral Histogram PCA space for face detection. In [37] and [38] the authors used orthogonal neighborhood preserving projections (ONPP) and supervised 24

kernel ONPP with DT-CWT for face recognition. Their preliminary results indicate that KONPP produce superior performance.

2.4 Face recognition from videos Generally speaking, face recognition from video can be simplified as face recognition from still images. The probe set consisting of a video sequence is converted to frames which in turn are fed to the recognition system to extract their features sets and compare them to the feature sets of the still face images on the gallery/training set that was already calculated at earlier stages. Though

significant

research

has

been

conducted

on

still-to-still

recognition, research efforts on still-to-video recognition, are relatively fewer due to the following challenges [39] in typical surveillance applications: poor video quality, significant illumination and pose variations, and low image resolution. Most existing video-based recognition systems [40] attempt the following: the face is first detected and then tracked over time. Only when a frame satisfying certain criteria (size and pose) is acquired, recognition is performed using still-to-still recognition technique. For this, the face part is cropped from the frame and transformed or registered using appropriate transformations. This tracking-then-recognition approach attempts to resolve uncertainties in tracking and recognition sequentially and separately. There are several unresolved issues in the tracking-then-recognition approach: criteria for selecting good frames and estimation of parameters for registration. Also, still-to-still recognition does not effectively exploit temporal information. A common strategy is to select several 25

good frames, perform recognition on each frame, and then vote on these recognition results for a final solution [41].

2.4.1 Literature Review Literature on face recognition from video or multiple images is relatively small compared to that of the algorithms that performs face recognition from still images. This lack of literature can be due to the fact that it is hard to use high quality cameras. The other reason for this lackness can be related to the challenging of doing feature extraction over video sequences. Zhou et al. [41] systematically investigated still-to-video and video-to-video face recognition using a probabilistic framework. Their system had a significant difference from [42] which does simultaneous object tracking & verification using a generic approach. In [41], instead of parameterizing only tracking motion vector as in [42], they parameterized both tracking motion vector and identity variables in the state space model. Radial basis function (RBF) networks were also used for tracking and recognition. Where they implemented an automatic person authentication system using RBF, where the system successively processes video images until reaching a high recognition confidence [43]. A system called person-spotter was described in [44] which used an elastic graph matching scheme to recognize people. It was able to capture, track and identify a person walking towards or passing a CCD camera. Y. Zhang et al [41] extended the formulation of the probabilistic 26

appearance based face recognition approach for still images to work with multiple images and video sequences. The algorithm they proposed was robust to partial occlusions, orientation and expression changes.

2.5 Face Databases Used in Performance Analysis

2.5.1 Still images databases a) Olivetti/AT&T face database The ORL database [45] consists of 400 images acquired from 40 persons (i.e. ten different images of each of 40 distinct subjects of both genders) taken over a period of two years with variations in facial expression and facial details. All images were taken under a dark background and the subjects were in an upright frontal position with tilting and rotation tolerance up to 20 degree and tolerance of up to about 10% scale. All images are grey scale with a 92×112 pixels resolution.

FIGURE 2.12: Face examples from ORL database.

27

b) FERET database A common scenario from FERET database [46] was used in which 600 frontal face images from 200 subjects are selected, where all the subjects are in an upright, frontal position. The 600 face images were acquired under varying illumination conditions and facial expressions. Each subject has three images of size 256×384 with 256 gray levels.

FIGURE 2.13: Face examples from FERET database.

2.5.2 Video databases a) BANCA database The BANCA database and an associated experimental protocol were developed at a set of European Universities [47]. The BANCA database is a bimodal database containing face video and speech samples (captured simultaneously). 208 subjects were captured (52 subjects in each of four European languages). Each subject participated in 12 sessions, of which four 28

represented a controlled scenario, four a degraded scenario, and four an adverse scenario. A high-quality camera was used in the controlled and adverse scenario, and a webcam was used in the degraded scenario. Each session contained both a genuine identity claim and an impostor claim.

FIGURE 2.14: Example of the BANCA database videos Left: Controlled, Middle: Degraded, Right: Adverse scenarios.

In all listed databases, the face images were cropped from the whole image or frame using face detection algorithm in [76]. Resulting images were resized to the size of 128×128 and stored for later use in the proposed system. In case of error, the face image was cropped and stored manually.

2.6 Summary Pattern recognition is an operation in which the system task is to classify an input data into different categories or classes depending on some predefined 29

salient features of each category. In this chapter we discussed the biometric recognition systems in general. The block diagram of general biometric system was shown and the modules within the system were discussed. We try to focus on the problem of face recognition as a biometric system. History of the related work done in face recognition from both still images and video sequences was listed. We try to list and discuss the main factors that usually affect the performance of face recognition such as illumination variations, facial expressions and pose variations. Face images examples of some well-known face databases which are intended to be used in this work for simulation and evaluation of the proposed methods are provided together with brief explanation about each one of those databases.

30

CHAPTER 3 PCA & LDA BASED FACE RECOGNITION APPROACHES

3.1 Introduction In this chapter two of the most well-known statistical approaches for face recognition are introduced, the principal component analysis (PCA) and linear discriminant analysis (LDA). Both of them were implemented and fed, as input, to a Feedforward neural network. The results of the system were compared with using PCA or LDA directly with a distance classifier such as Euclidean distance.

3.2 Literature Review Principal components analysis (PCA) method [1, 2] or eigenfaces [3, 23] is appearance-based technique used widely for the dimensionality reduction and recorded a great performance in face recognition. PCA based approaches typically include two phases: training and classification. In the training phase, an eigenspace is established from the training samples using PCA and the training face images mapped to the eigenspace for classification. In the classification phase, an input face is projected to the same eigenspace and classified by an appropriate classifier. Contrasting the PCA which encodes information in an orthogonal linear space, the linear discriminant analysis (LDA) method [4,5] or 31

fisherfaces method is another example of Appearance-based techniques which encodes discriminatory information in a linear separable space of which bases are not necessarily orthogonal. In this chapter, two face recognition systems, one based on the PCA followed by a feedforward neural network (FFNN) called PCA-NN, and the other based on LDA followed by a FFNN called LDA-NN, are explained. The two systems consist of two phases which are the PCA or LDA feature extraction phase, and the neural network classification phase. The introduced systems provide improvement on the recognition performances over the conventional LDA and PCA face recognition systems. The neural networks are among the most successful decision making systems that can be trained to perform complex functions in various fields of applications

including

pattern

recognition,

optimization,

identification,

classification, speech, vision, and control systems. In FFNN the neurons are organized in the form of layers. The FFNN requires a training procedure where the weights connecting the neurons in consecutive layers are calculated based on the training samples and target classes. After generating the eigenvectors using PCA or LDA methods, the projection vectors of face images in the training set are calculated and then used to train the neural network. These architectures are called PCA-NN and LDA-NN for eigenfaces and fisherfaces methods respectively. The first part of the chapter introduces PCA and LDA techniques which provide theoretical and practical implementation details of the systems. Both of 32

the techniques are explained by using wide range of illustrations including graphs, flowcharts and face images. The second part of the chapter introduces neural networks in general and FFNN in particular. The training and test phases of FFNN are explained in detail. Finally the PCA-NN and LDA-NN face recognition systems are explained and the performances of the respective methods are compared with conventional PCA and LDA based face recognition systems.

3.3 Principal Component Analysis Principal Component Analysis or Karhunen-Loève Transformation [48] is a standard technique used in statistical pattern recognition and signal processing for data reduction and feature extraction [49]. As the pattern often contains redundant information, mapping it to a feature vector can get rid of this redundancy and yet preserve most of the intrinsic information content of the pattern. These extracted features have great role in distinguishing input patterns. A face image in 2D with size N × N can also be considered as one dimensional vector of dimension N2. For example, face image from ORL (Olivetti Research Labs) database with size 112 × 92 can be considered as a vector of dimension 10,304, or equivalently a point in a 10,304 dimensional space. An ensemble of images maps to a collection of points in this huge space. Images of faces, being similar in overall configuration, will not be randomly distributed in this huge image space and thus can be described by a relatively low dimensional 33

subspace. The main idea of the principle component is to find the vectors that best account for the distribution of face images within the entire image space. These vectors define the subspace of face images, which we call "face space". Each of these vectors is of length N2, describes an N × N image, and is a linear combination of the original face images. Because these vectors are the eigenvectors of the covariance matrix corresponding to the original face images, and because they are face-like in appearance, we refer to them as "eigenfaces". Let the training set of face images be 1,2,….,M , then the average of the set is defined by M   1  Γi M i 1

(3.1)

Each face differs from the average by the vector

i  Γi , i 1, , M

(3.2)

This set of very large vectors is then subject to principal component analysis, which seeks a set of M orthonormal vectors, UM, which best describes the distribution of the data. The kth vector, Uk , is chosen such that 2 M   1   UT i  k M k i 1

(3.3)



is a maximum, subject to 1, if i  k    UT U     i k ik 0, otherwise   

i  1,

,M

(3.4)

The vectors in U and scalars  are the eigenvectors and eigenvalues, respectively of the covariance matrix

34

M T C  1  i T i  AA M i1

(3.5)

where the matrix A =[ 1 , 2 ... M ]. The covariance matrix C, however is N2 × N2 real symmetric matrix, and calculating the N2 eigenvectors and eigenvalues is an intractable task for typical image sizes. A computationally feasible method to find these eigenvectors is needed. Consider the eigenvectors v of ATA such that AT Av   v

(3.6)

Pre-multiplying both sides by A, we have AAT Av   Av

(3.7)

where Aν are the eigenvectors and µ contain the eigenvalues of C= A AT. Following these analysis, we construct the M × M matrix Σ= ATA, where Σmn= Tm n , and find the M eigenvectors, ν , of Σ. These vectors determine linear combinations of the M training set face images to form the eigenfaces U. M Ui   vik k , k 1

i  1,...., M

(3.8)

With this analysis, the calculations are greatly reduced, from the order of the number of pixels in the images (N2) to the order of the number of images in the training set (M). In practice, the training set of face images will be relatively small (M