Multimodal Feature Level Fusion based on Particle

0 downloads 0 Views 2MB Size Report
the MatConvNet toolbox [31] linked with NVIDIA CuDNN. The external parameters that were used for PSO algorithm, are presented as follow. The weight used to ...
Multimodal Feature Level Fusion based on Particle Swarm Optimization with Deep Transfer Learning Pedro H. Silva∗‡ , Eduardo Luz∗‡ , Luiz A. Zanlorensi Jr.† , David Menotti† and Gladston Moreira∗ ∗ Computing

Department, Universidade Federal de Ouro Preto, Ouro Preto, MG, Brazil 35400-000 of Informatics, Federal University of Paran´a, Curitiba, PR, Brazil 81531-990 ‡ Both authors contribute equality to this work

† Department

Abstract—There are several biometric-based systems which rely on a single biometric modality, most of them focus on face, iris or fingerprint. Despite the good accuracies obtained with single modalities, these systems are more susceptible to attacks, i.e, spoofing attacks, and noises of all kinds, especially in non-cooperative (in-the-wild) environments. Since noncooperative environments are becoming more and more common, new approaches involving multi-modal biometrics have received more attention. One challenge in multimodal biometric systems is how to integrate the data from different modalities. Initially, we propose a deep transfer learning optimized from a model trained for face recognition achieving outstanding representation for only iris modality. Our feature level fusion by means of features selection targets the use of the Particle Swarm Optimization (PSO) for such aims. In our pool, we have the proposed iris fine-tuned representation and a periocular one from previous work of us. We compare this approach for fusion in feature level against three basic function rules for matching at score level: sum, multi, and min. Results are reported for iris and periocular region (NICE.II competition database) and also in an open-world scenario. The experiments in the NICE.II competition databases showed that our transfer learning representation for iris modality achieved a new state-of-the-art, i.e., decidability of 2.22 and 14.56% of EER. We also yielded a new state-of-theart result when the fusion at feature level by PSO is done on periocular and iris modalities, i.e., decidability of 3.45 and 5.55% of EER.

I.

I NTRODUCTION

Biometric-based person recognition systems have been developed rapidly nowadays. Biometric systems have been employed in several applications, such as border-crossing control systems, access to controlled environments or to personal computers, even smartphones among others. The face recognition is a well-known problem which reaches high accuracies [20], [25]. In controlled environments and at short distances, i.e., constrained environments, there are very efficient methods (almost perfect) for face recognition, however, under adverse conditions and non-cooperative environments, performance significantly declines, as demonstrated in Youtube Faces in the Wild (YFW; [33]) and Labelled Faces in the Wild (LFW; [13]). In non-cooperative environments, the face could be intentionally compromised by accessories and the ocular region could be an alternative for subject recognition. Although it is also necessary to achieve high robustness and accuracies to consider an ocular based biometric systems in non-cooperative environments, a.k.a. in the wild, such has been reported with use of deep learning in face modality. Among ocular modalities, the iris is considered as the most

promising, reliable and accurate biometric trait, providing high discrimination among subjects. Furthermore, the iris is stable along aging of individuals [6]. In controlled environments and at short distances (constrained environments) is possible to find iris recognition approaches that are almost perfect [3], [5]. The periocular region has emerged as an option for situations in which the iris is compromised and, consequently, periocular based subject recognition has gained attention [19]. Several popular techniques, recognized by efficiency in diverse image recognition tasks, were evaluated to recognize people through the periocular region such as local binary patterns (LBP), SIFT, SURF, HOG, Gabor filters [2], [4], [12], [14], [27], [34]. However, under adverse conditions and non-cooperative environments, performance significantly declines, as demonstrated in the NICE.II [21] competition. With the exception of face modality, few investigations have been on the usage of deep learning to represent other biometric modalities [9], such modalities on the ocular region. Ghosh claims that a number of issues need to be investigated, among them, whether the lack of large databases could be the drawback to the use of deep learning on multimodal systems, since deep learning demands large volume of data to successfully training very deep convolutional networks (CN) [20], [25]. Deep network architectures often present poor performance when trained in small/reduced databases due to over-fitting and transfer learning is one of the techniques explored in literature to overcome this problem [10], [35]. Recent studies have shown outstanding results using transfer learning and convolutional networks in several computational vision tasks [8], [18] on a reduced database. Bringing the gains achieved with deep learning on face recognition problem to other modalities may allow robust multimodal systems with real potential for use in surveillance systems as already shown in [17]. In this context, the aim of this work is to investigate how to merge deep representation of two modalities (iris and periocular region) aided by a Particle Swarm Optimization (PSO) [15] algorithm to by means of feature selection. Our experiment shows that the proposed fusion overcomes outperforms the state-of-the-art results for an uncontrolled/non-cooperative competition database: NICE.II. This work is organized as follows. In Section II, we describe an important multimodal database used in this work, namely NICE.II, and also review the works directly applied to this database. The deep learning model [17] is briefly

described, together with the proposed method, in Section III. The experiments are presented and discussed in IV and finally, the conclusions are pointed out in Section V. II.

DATABASES AND R ELATED W ORKS

In this section, we describe the NICE.II (UBIRIS.v2), a non-cooperative database considered by this work and the current state-of-the-art methods. Despite the good accuracies obtained with single modalities 1) The database: The Noisy Iris Challenge Evaluation (NICE) was the first competition created specifically to investigate the impact of ocular images acquired in uncontrolled environments for subject recognition. Note that all images used in this competition come from UBIRIS.v2 database [22]. The database used in the NICE.II competition was proposed by the same authors of UBIRIS.V2 [22] and therefore it follows the same pattern of acquisition. The UBIRIS.V2 database [22] has 11102 images from 261 subjects. The images were acquired to mimic an uncontrolled scenario, at different distances, angles, lightning in order to simulate real noise conditions. The images have 800x700 resolution, 72 dpi, and 24-bit color. As the UBIRIS.V2 database is balanced regarded the number of images per subject, we have used it for the transfer learning process. The NICE competition was held in two phases. One phase contemplated the iris segmentation challenge - NICE.I (2008), and another one for iris classification - NICE.II (2010). For this work, we are interested in the second phase, in which the techniques of feature extraction and classification were evaluated. Participants in the second phase of NICE (NICE.II) used the iris images segmented by the best performing method in NICE.I [21]. The competition (NICE.II) was attended by 67 participants from 30 countries. The database for training consists of 1000 images from 171 subjects poorly distributed. A selection of images was reserved exclusively for the official evaluation, also with 1000 images, but in this case, those 1000 images came from 141 subjects. The official competition metric was decidability (d), which measures how well the intra-class (genuine) and inter-class (impostor) distribution scores are distant from each other [7]. The decidability can be defined as follows |µE − µI | d= q 1 2 2 2 (σI + σE )

(1)

where µI and µE are means and σI and σE stand for standard deviations of intra-class and inter-class distributions, respectively. 2) State-of-the-art methods: The method proposed by [29] achieved the best result on NICE.II competition, reporting a decidability of 2.57. This method consists of preprocessing techniques and four approaches for feature extraction. To extract features from the iris, ordinal measures and color histograms were applied. Periocular features were extracted

Fig. 1.

Images used on NICE.II competition.

with Texton Histogram and Semantic information. Therefore, all the four outputs were fused at score level. The best performing on the NICE.II competition considering only the iris modality was achieved by the approach proposed by [32], reporting a decidability of 1.82. In this approach, the iris is normalized [6] and partitioned into several segments. The partitioning scheme is dependent on the quality of the iris segmentation. After partitioning, the features are extracted with Gabor filters and later an adaptive boosting algorithm (Adaboost) is used to select the best features and calculate the similarity. Another work that considered only iris images for recognition, is presented in [28]. For the feature extraction, two methodologies were used, one based on Log-Gabor and other based on Zernike moments. The analysis is performed only on images reserved for the training phase of the competition NICE.II, that is 1000 images from 171 subjects. Therefore images from the first 19 subjects were used for Log-Gabor parameter estimation and the remainder, 864 images associated with 151 individuals, were used for evaluation. Although the presented result (decidability = 2.57) cannot be compared directly with the methods that reported results in the official NICE competition test set, authors claim that the results are comparable. In [24], the authors used five techniques for feature extraction in iris and periocular data, such as SIFT, LBP, 1-D Wavelet Zero-Crossing, 2-D Dyadic Wavelet Zero-Crossing and Comparison Maps. For each feature vector, a different matching approach was applied. The outputs of all the methodologies are then merged by logistic regression model and the result in the NICE competition is a decidability of 1.78 with EER of 18.48%. The methodology proposed by [21] consists of combining information from the iris and the periocular region at the matching score level. The authors not reported results on the official NICE.II competition dataset, instead, they used 2340 images from UBIRIS.V2 [22] which is a similar database. The techniques used to extract iris features favors a robust representation of the texture by means of the convolution of the normalized iris image against a Multi-Lobe Differential Filter (MLDF) bank. For the periocular region, hand-crafted features are proposed to represent sclera color and geometry, as well as shape and texture of the eyebrow. The result reported an average decidability of 2.97. The authors of [1], used only iris texture information for unconstrained recognition with images obtained in visible wavelengths. The methodology consists of segmentation and

normalization of iris images, features extraction and weighted score-level fusion based on the sum-rule. From experimental evaluations, two subsets from UBIRIS.V2 and MobBIO databases were used. The data is composed of 2250 images of 100 eyes from UBIRIS.V2 and 800 images of 200 eyes from the MobBIO database. In total, 8 features were extracted with wavelet transform, keypoint-based, generic texture descriptor and color information. These features were then combined to improve the iris recognition. The best results reported in EER were 22.04% in UBIRIS.V2 and 10.32% in MobBIO. The state-of-the-art for eye recognition in the visible spectrum in the open-world scenario, i.e., when not all classes to be recognized are present in the training set, is the methodology proposed by [16]. This methodology consists of feature extraction with deep learning and matching using distance metrics. The authors used a trained model for face recognition (VGG) and performing a transfer learning to the periocular region. They also demonstrated that performing data augmentation by translating, rotating and cropping images, together with images generated by Generative adversarial networks (GAN) [11], improve the EER rate when VGG is trained from scratch. The experiments reported EER of 5.92% and 5.42%, and decidability of 3.47% and 3.53%, in NICE.II and MobBIO databases. The most recent work in the UBIRIS.V2 and FRGC databases presents an approach using CNN for periocular recognition [23]. The proposed methodology showed that excluding the ocular area (the sclera and the iris) and only using information from the surrounding eyes (periocular) improves the recognition performance. To demonstrate this, the authors compared the recognition with features only from the ocular region, from the periocular area without the ocular region and with features from the entire image (periocular and ocular). It is important to note that all the images from UBIRIS.V2 and FRGC databases were used. Experiments were performed in the closed-world scenario, i.e., all the classes to be recognized are known in the training. The result, in our opinion, is considered state-of-the-art for eye recognition in the visible spectrum in the closed-world scenario, with an EER of 1.9% and 88% of Rank-1 in UBIRIS.V2 and EER of 1.1% and 92% of Rank-1 in FRGC database. III.

A PPROACH

In this section, we present the proposed method aiming subject recognition with deep learning on two modalities: iris and periocular region. We describe the feature extraction process, how the data from both modalities are merged and how the classification is performed. After the feature extraction, all features extracted from periocular and iris are stored in a gallery. This process is illustrated in Figure 4 (a). A. Feature Extraction 1) Periocular: To the best of our knowledge, the model proposed in [17] is the state-of-the-art for periocular representation at NICE database with open-world scenario evaluation. The methodology proposed in [17] consists of performing a transfer learning from a model trained for face recognition (VGG) to the periocular region. The model is publicly

FIND IRIS MASK

ISOLATE IRIS AND REMOVE NOISE

APPLY RUBBER SHEET

Fig. 2.

Iris normalization process, represented in polar coordinates.

available 1 and we have used it to extract features from the periocular region. The Figure 3 (a) illustrates such model and we have called it Deep Eye Descriptor (DED). 2) Iris: To represent the iris image, we have followed the same methodology proposed in [17]. We fine tune a VGG model, trained for face recognition, to the iris recognition domain by means of a transfer learning. Figure 3 (b) illustrate the process and have called the new model of Deep Iris Descriptor (DID) The transfer learning process happens when a model trained for one domain or one task is used to accelerate or enhance learning in another domain/task [10]. Controlling which layer to transfer is done by freezing the learning rate of the layer. As proposed in [17], we haven’t frozen any layer of the VGG network and thus transferring all layers, except for the last one. The last layers are replaced to best represent the number of classes on training database. After this modification on VGG architecture, the pre-trained model is then fine-tuned with normalized iris images. The iris normalization, also known as Daugman Rubber Sheet [6], allows the representation of the iris in polar coordinates (rectangular shape) as illustrated in Figure 2. In that manner, the iris image can be easily shaped to fit the input constraints of the modified VGG model (RGB image size of 224x224). To fine tune, stochastic gradient descent is used to optimize weights. For this work, we decided not to use automatic segmentation algorithms for the iris location, due to errors associated with this step that would be propagated to the classification stage. Thus, we only use the ground truth masks supplied together with the UBIRIS.V2 database and NICE.II. It’s worth noting that we only have iris masks for 104 subjects for the UBIRIS.V2 database and our transfer learning process is limited to that. B. Feature Fusion One challenge in multimodal systems is how to integrate (or merge) information from various modalities. In this work, 1 CSILAB:

http://www.decom.ufop.br/csilab/

MODIFIED VGG

MODIFIED VGG

POOL5

sofmax-loss

CONV5/ 512 POOL4

POOL1

DEEP FEATURE DESCRIPTOR

DEEP FEATURE DESCRIPTOR

TRAINING / FINE TUNING ON PERIOCULAR IMAGES

TRAINING / FINE TUNING ON NORMALIZED IRIS

POOL1

POOL 5

POOL2

POOL3

CONV5/ 512 POOL4

POOL1

DEEP EYE DESCRIPTOR (DED)

POOL 5

FC8 1x1 / 256

POOL4

FC7 1x1 / 4096

POOL3

CONV4/ 512

FC6 7x7 / 4096

POOL2

CONV5/ 512

CONV2 / 128 CONV3 / 256

FC8 1x1 / 256

CONV4/ 512

FC7 1x1 / 4096

CONV1 / 64 CONV2 / 128 CONV3 / 256

FC6 7x7 / 4096

CONV1 / 64

DEEP IRIS DESCRIPTOR (DID)

(a) DED Fig. 3.

POOL3

FC9 1x1 / 261

POOL1

POOL2

FC8 1x1 / 256

POOL5

CONV4/ 512

FC7 1x1 / 4096

POOL4

CONV2 / 128 CONV3 / 256

FC6 7x7 / 4096

POOL3

sofmax-loss

POOL2

CONV5/ 512

FC9 1x1 / 261

CONV4/ 512

FC8 1x1 / 256

CONV2 / 128 CONV3 / 256

FC7 1x1 / 4096

CONV1 / 64

FC6 7x7 / 4096

CONV1 / 64

(b) DID

Deep Eye Descriptor (DED) and Deep Iris Descriptor (DID) - Feature extractor process for periocular [17] and iris modalities, respectively.

we investigate three basic fusion rules in matching score level: sum, multiplication, and minimum and compared against our proposed feature fusion simple via the evolutionary algorithm PSO.

the possibles combinations. We used a binary PSO, that is, a feature is used or not. This basically, creates a fusion rule. First, we set the initial population in a way that it includes all features and gradually it reduces the vector size.

1) Matching score level: Matching score level fusion is the most popular fusion method in the literature [30]. In matching score level fusion, the similarity scores provided by the classification step are merged with aid of an operator as shown in Figure 4 (b). For this work, at first, the scores are normalized and then combined by means of three operators: SUM, MULTIPLICATION and MINIMUM VALUE.

It is impossible to test all possible features combinations, so we use the PSO to search part of it, a guided search. The response from PSO is a selection of features that have the greatest decidability in train images features. The calculation of decidability on test deep descriptor is performed over the same selection found on train descriptor.

2) Feature level: Feature extraction level fusion is the one that offers the greatest potential for improvement of multimodal biometric systems. In this approach, there is more information available to be handled by the methods. The most direct approach to Feature Fusion level is a simple concatenation of the feature vectors. In this manner, the feature vectors are normalized, concatenated, and the resulting vector is used for the classification step, as shown in Figure 4 (c). Although, for this work, we propose a more sophisticated fusion, by means of a feature selection. The feature selection here is performed by a binary Particle swarm optimization algorithm.

1) Particle swarm optimization: The Particle Swarm Optimization (PSO) proposed by Kennedy & Eberhart [15] is an evolutionary-based algorithm, which differs from a genetic algorithm, it uses the group behavior and itself to update its weights. As the genetic algorithm, the PSO has an initial population, in PSO case, the individuals are particles. The population is updated according to a function called velocity, which takes into account itself info (gbest) and also the best particle from the population (pbest), that is, it uses a global and local search to do a movement towards the optimum solution. There are several variations of PSO in literature. We decided to use a PSO which has an inertia factor, what makes the speed to reduce and, in the end, focus more on the local search.

C. Feature selection Once the features are extracted from the image using the deep descriptor a wrapper feature selection is performed as demonstrated in Figure 4 (c). An option to find the best combination of the number of features versus greatest decidability is searching all combinations. However, this is not feasible. So an alternative is the use of PSO to search over part of all

The velocity of a particle can be called as: vik (t)

= w(t) ∗ vik (t − 1) + c1 γ1i (pbestki − xki (t − 1)) + + c2 γ2i (gbestki − xki (t − 1))

where the inertial weights w(t) decreases linearly from 0.9 to

ENROLL

We have several decidability functions, although, we used the Spearman distance metric as shown in the equation: DID

GALERY

ds (A, B) = 1 − NORM. IRIS

A binary particle swarm optimization version is used in this work. A particle is a vector of zeros and ones, where zero is to not use and one, saying if a feature will be used or not. The particle’s update position is according to the following equation:

PROBE DISTANCE MEASURE

c

FUSION RULE

GALERY c

2 i=1 (rAi − rBi ) 2 n(n − 1)

where rAi and rBi are the rank of Ai and Bi . The A and B are the features vectors. The choice for this metric become from the work presented by Luz et al. [17], where the Spearman distance performed better for small features vector than cosine and Euclidean distances.

(a) Enroll stage.

DED

Pn

EYE

DED

DID

6

NORM. IRIS

EYE

DISTANCE MEASURE

c

xki

SCORE

 =

1, rand() < s(vik ) 0, otherwise

where rand() results in a positive random number with a 1 uniform distribution between [0.0, 1.0] and s(vik ) = . −v k

c

1+exp

i

(b) Fusion on matching score level.

D. Classification PROBE

DED

DISTANCE MEASURE

c

GALERY

CONCATENATION

DID

c

c

SCORE

NORM. IRIS

EYE

The evaluation of the proposed method is accomplished in biometric verification mode. In biometric verification mode, the system is considered an open gallery problem (or openworld scenario), and each instance (image) is represented by a feature vector. Then, one distance metric is used to compute the similarity scores among all images pairs (intraclass and inter-class), in a one-against-all scheme. With those similarity scores, the EER is determined from the Detection Error Tradeoff (DET) curve and decidability by Equation 1. This evaluation follows the protocol adopted by the NICE.II competition [21].

(c) Fusion on feature level.

IV. Fig. 4. Method overview with fusion on matching score level and feature level for two modalities: iris and periocular. (DID - Deep Iris Descriptor; DED - Deep Eye Descriptor)

0.4. vik is the i − th component of the velocity of the k th particle, in the t − th step of the algorithm, xki (t) is the i − th component of the k − th particle. The c1 and c2 are the acceleration constants and for the last, γ1 and γ2 are positive random numbers with a uniform distribution between 0 and 1. The new particle position is calculated using the velocity function and the previous position: xki (t)

=

xki (t

− 1) +

vik (t)

The fitness function used is the decidability as Equation 1.

E XPERIMENTS AND R ESULTS

In this section, we detail the feature fusion process and also the experiments performed in the NICE database. The NICE database does not provide face image, only a bounding box on eye region. Thus, we divide the NICE analysis into periocular and iris. In this way, we evaluate the impact in each different modality. The experiments were performed on an Intel (R) Core i75820K CPU @ 3.30GHz 12 core, 64GB of DDR4 RAM and a GeForce GTX TITAN X GPU. The implementation is based on the MatConvNet toolbox [31] linked with NVIDIA CuDNN. The external parameters that were used for PSO algorithm, are presented as follow. The weight used to update the velocity (c1 and c2 ) is set to 2.05 [26] and the maximum velocity is 2.0. The number of particles (“individuals”) and the iterations is both set to 100, for all approaches.

A. Recognition modalities The results reported here are constructed with the official test of the NICE.II competition for comparison purposes. The number of intra-class (genuine) pairs is 4634 and inter-class pairs (imposter) is 494866. We conducted experiments with only the iris and the fusion of iris and periocular region. 1) Iris: The PSO algorithm was applied to select the more relevant features extracted with DID model, called DIDPSOFS. In Table I, we show a comparison of the state-ofthe-art against our approach for iris recognition in NICE.II database. We enhance the decidability and reduced the EER, establishing a new state-of-the-art. From the 256 features, 180 are selected resulting in enhance of 0.02 in decidability and a reduction of 0.12% in EER. TABLE I.

R ESULTS OF THE APPROACHES FOR IRIS RECOGNITION IN NICE.II DATABASE . Decidability

EER

Wang et al. (2012)

Method

1.82

19.00%

DID model

2.20

14.68%

DID-PSOFS model

2.22

14.56%

The gain of the use of PSO feature selector is not just in decidability and EER, but also in computational cost. The space necessary to store 256 features is bigger than the 180, the difference (76 features) seems derisory, although in a universe of millions of individuals this difference starts to impact in the extra space necessary to store the features and the computational cost increases exponentially. It is possible to see that some features are not useful and can be discarded since even with the reduction of the deep descriptor size, there is no loss in decidability. 2) Fusion Iris + Periocular: In this section, we presented to fusion the two modalities (iris + periocular) at features level using the PSO algorithm to features selector with the (DED and DID model) as feature extractor, called Multimodal Iris and Periocular Recognition (MIPR) model. For comparison, we performed experiments with three basic fusion rules in matching score level, a simple concatenation of the features without feature selector. The results are shown in Table II. In the Figures 5, we present the intra-class and inter-class distribution curves computed from the distance metric of the Spearman. R ESULTS OF THE APPROACHES FOR IRIS + PERIOCULAR RECOGNITION IN NICE.II DATABASE . ∗ FS = F EATURE S ELECTOR .

TABLE II.

Fusion Rule

# of Features

Modalities

Decidability

EER

Basic function rules

512

iris + periocular

3.22

5.35%

MIPR without FS

512

iris + periocular

3.43

5.74%

MIPR model

443

iris + periocular

3.45

5.55%

In Table II, we show a comparison of our different approaches, while in Table III, it is presented the state-of-theart for iris + periocular recognition against our approach for NICE.II database in an open-world scenario. With our best approach, we drive an enhance in decidability of 0.10 and an EER of 1.11%. The basic function rule reported in Table II is the multiplication rule.

As shown in Table III, we surpass the result presented by Luz et al. [17].The MIPR approach results in a better approach than the three fusion basic rules and also the simple concatenation of the features, this because probably some redundant features were removed and possible negative dependencies (features that together reduce the accuracy) were removed. TABLE III.

R ESULTS SUMMARIZATION OF METHODS EVALUATED WITH OPEN - WORLD SCENARIO SCHEME .

Methods

Database

Modalities

Decidability

EER

UBIRIS.V2

iris + periocular

2.97

-

Tan et al. (2012)

NICE

iris + periocular

2.57

12%

Wang et al. (2012)

NICE

iris

1.82

19%

Proena (2012)

2

Luz et al. (2017)

NICE

periocular

3.35

6.66%

MIPR

NICE

iris + periocular

3.45

5.55%

The decidability from the simple concatenation of the features vectors from iris and the eye region (periocular) along the PSO approach are almost equal, only a slight difference of 0.02 in decidability. However, while the first use 512 features, the second only uses 443 features and has a better EER. We evaluate the computation cost of one-against all, 1 sample against 999 another samples, for both sizes features ten times. For the 512 descriptors, the time necessary was 0.36 seconds on average (standard deviation of 0.008 seconds), while for the 443 descriptors, just 0.33 seconds on average (standard deviation of 0.019 seconds). This is for a small dataset, when the number of samples is greater, the time necessary will be bigger because it is compared against all.

V.

C ONCLUSION

In this paper, a fusion of the features extracted from the periocular region plus the iris is presented. In open gallery problem, the approach proposed surpasses the state-of-the-art decidability result for the fusion of iris and periocular modality, and also just the iris modality. We present three approaches for fusion in matching score level and one for fusion in feature level called MIPR. The multimodal systems are a promising research field, which helps in the creation of robust and more trustful systems. Along with multimodal systems, the PSO (or any algorithm with the same principle) is a good approach to reduce the high dimensional features vectors and to enhance the quality of the final features vector.

ACKNOWLEDGMENTS The authors thank UFOP and funding Brazilian agencies CNPq, Fapemig, and CAPES. We gratefully acknowledge the support of NVIDIA Corporation with the donation of the Titan X Pascal GPU used for this research. 2 We reproduce the experiments described by Luz et al. [17] with a fixed seed to generate the CNN model. It is worth noting that the results presented in [17] are an average of 15 executions and each execution was run with a different seed.

600

600

600

INTRA-CLASS; Mean=0.46;std=0.21 INTRA-CLASS; Mean=0.94;std=0.23

INTRA-CLASS; Mean=0.35;std=0.19 INTRA-CLASS; Mean=0.95;std=0.17

500

500

500

400

400

400

300

300

300

200

200

200

100

100

100

0

0 0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

(a) Distribution curves, iris modality, spearman distribution metric. Fig. 5.

0 0

0.2

0.4

0.6

[2]

0.8

E. Andersen-Hoppe, C. Rathgeb, and C. Busch, “Combining multiple iris texture features for unconstrained recognition in visible wavelengths,” in 5th International Workshop on Biometrics and Forensics (IWBF), 2017. S. Bharadwaj, H. S. Bhatt, M. Vatsa, and R. Singh, “Periocular biometrics: When iris recognition fails,” in Biometrics: Theory Applications and Systems (BTAS), 2010 Fourth IEEE International Conference on. IEEE, 2010, pp. 1–6. K. W. Bowyer, K. Hollingsworth, and P. J. Flynn, “Image understanding for iris biometrics: A survey,” Comp. Vision and Image Understanding, vol. 110, no. 2, pp. 281 – 307, 2008.

[4]

S. Crihalmeanu and A. Ross, “Multispectral scleral patterns for ocular biometric recognition,” Pattern Recognition Letters, vol. 33, no. 14, pp. 1860–1869, 2012.

[15]

[16]

[17]

[18]

[19]

[5]

J. G. Daugman, “New methods in iris recognition,” Systems, Man, and Cybernetics, Part B: Cybernetics, IEEE Transactions on, vol. 37, no. 5, pp. 1167–1175, 2007.

[6]

——, “High confidence visual recognition of persons by a test of statistical independence,” IEEE transactions on pattern analysis and machine intelligence, vol. 15, no. 11, pp. 1148–1161, 1993.

[21]

J. G. Daugman and G. O. Williams, “A proposed standard for biometric decidability,” in Proc. CardTech/SecureTech Conference, 1996, pp. 223– 234.

[22]

[20]

[8]

J. Donahue, Y. Jia, O. Vinyals, J. Hoffman, N. Zhang, E. Tzeng, and T. Darrell, “Decaf: A deep convolutional activation feature for generic visual recognition.” in Icml, vol. 32, 2014, pp. 647–655.

[9]

S. Ghosh, “Challenges in deep learning for multimodal applications,” in Proceedings of the 2015 ACM on International Conference on Multimodal Interaction. ACM, 2015, pp. 611–615.

[10]

I. Goodfellow, Y. Bengio, and A. Courville, “Deep learning,” 2016, book in preparation for MIT Press. [Online]. Available: http://www.deeplearningbook.org

[24]

[11]

I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” in Advances in neural information processing systems, 2014, pp. 2672– 2680.

[25]

[12]

K. Hollingsworth, K. W. Bowyer, and P. J. Flynn, “Identifying useful features for recognition in near-infrared periocular images,” in Biometrics: Theory Applications and Systems (BTAS), 2010 Fourth IEEE International Conference on. IEEE, 2010, pp. 1–8.

[26]

[13]

G. B. Huang, M. Ramesh, T. Berg, and E. Learned-Miller, “Labeled faces in the wild: A database for studying face recognition in unconstrained environments,” Technical Report 07-49, University of Massachusetts, Amherst, Tech. Rep., 2007.

[14]

1.2

1.4

1.6

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

(c) Distribution curves, MIPR, spearman distribution metric.

Distribution curves Intra-class and Inter-class on NICE database

[3]

[7]

1

(b) Distribution curves, periocular modality, spearman distribution metric.

R EFERENCES [1]

INTRA-CLASS; Mean=0.43;std=0.17 INTRA-CLASS; Mean=0.96;std=0.14

F. Juefei-Xu, K. Luu, M. Savvides, T. D. Bui, and C. Y. Suen, “Investigating age invariant face recognition based on periocular biometrics,” in Biometrics (IJCB), 2011 International Joint Conference on. IEEE, 2011, pp. 1–7.

[23]

[27]

[28]

[29]

J. Kennedy and R. C. Eberhart, “A discrete binary version of the particle swarm algorithm,” in Systems, Man, and Cybernetics, 1997. Computational Cybernetics and Simulation., 1997 IEEE International Conference on, vol. 5. IEEE, 1997, pp. 4104–4108. E. Luz and D. Menotti, “An x-ray on methods aiming at arrhthmia classification in ecg signals,” in International Conference on Bioinformatics and Computational Biology (BIOCOMP’2011), 2011. E. Luz, G. Moreira, L. A. Z. Junior, and D. Menotti, “Deep periocular representation aiming video surveillance,” Pattern Recognition Letters, 2017. M. Oquab, L. Bottou, I. Laptev, and J. Sivic, “Learning and transferring mid-level image representations using convolutional neural networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2014, pp. 1717–1724. U. Park, A. Ross, and A. K. Jain, “Periocular biometrics in the visible spectrum: A feasibility study,” in Biometrics: Theory, Applications, and Systems, 2009. BTAS’09. IEEE 3rd International Conference on. IEEE, 2009, pp. 1–6. O. M. Parkhi, A. Vedaldi, and A. Zisserman, “Deep face recognition,” in British Machine Vision Conference, vol. 1, no. 3, 2015, p. 6. H. Proenca and L. A. Alexandre, “Toward covert iris biometric recognition: Experimental results from the nice contests,” IEEE Transactions on Information Forensics and Security, vol. 7, no. 2, pp. 798–808, 2012. H. Proenca, S. Filipe, R. Santos, J. Oliveira, and L. A. Alexandre, “The ubiris. v2: A database of visible wavelength iris images captured onthe-move and at-a-distance,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 32, no. 8, p. 1529, 2010. H. Proenc¸a and J. a. C. Neves, “Deep-prwis: Periocular recognition without the iris and sclera using deep learning frameworks,” IEEE Transactions on Information Forensics and Security, vol. 13, no. 4, 2018. G. Santos and E. Hoyle, “A fusion approach to unconstrained iris recognition,” Pattern Recognition Letters, vol. 33, no. 8, pp. 984–990, 2012. F. Schroff, D. Kalenichenko, and J. Philbin, “Facenet: A unified embedding for face recognition and clustering,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 815–823. Y. Shi et al., “Particle swarm optimization: developments, applications and resources,” in evolutionary computation, 2001. Proceedings of the 2001 Congress on, vol. 1. IEEE, 2001, pp. 81–86. C. W. Tan and A. Kumar, “Towards online iris and periocular recognition under relaxed imaging constraints,” IEEE Transactions on Image Processing, vol. 22, no. 10, pp. 3751–3765, 2013. ——, “Accurate iris recognition at a distance using stabilized iris encoding and zernike moments phase features,” IEEE Transactions on Image Processing, vol. 23, no. 9, pp. 3962–3974, 2014. T. Tan, X. Zhang, Z. Sun, and H. Zhang, “Noisy iris image matching by using multiple cues,” Pattern Recognition Letters, vol. 33, no. 8, pp. 970–977, 2012.

[30]

H. Vajaria, T. Islam, P. Mohanty, S. Sarkar, R. Sankar, and R. Kasturi, “Evaluation and analysis of a face and voice outdoor multi-biometric system,” Pattern recognition letters, vol. 28, no. 12, pp. 1572–1580, 2007. [31] A. Vedaldi and K. Lenc, “Matconvnet: Convolutional neural networks for matlab,” in Proceedings of the 23rd ACM international conference on Multimedia. ACM, 2015, pp. 689–692. [32] Q. Wang, X. Zhang, M. Li, X. Dong, Q. Zhou, and Y. Yin, “Adaboost and multi-orientation 2d gabor-based noisy iris recognition,” Pattern Recognition Letters, vol. 33, no. 8, pp. 978–983, 2012. [33] L. Wolf, T. Hassner, and I. Maoz, “Face recognition in unconstrained

videos with matched background similarity,” in Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on. IEEE, 2011, pp. 529–534. [34] J. Xu, M. Cha, J. L. Heyman, S. Venugopalan, R. Abiantun, and M. Savvides, “Robust local binary pattern feature sets for periocular biometric identification,” in Biometrics: Theory Applications and Systems (BTAS), 2010 Fourth IEEE International Conference on. IEEE, 2010, pp. 1–8. [35] J. Yosinski, J. Clune, Y. Bengio, and H. Lipson, “How transferable are features in deep neural networks?” in Advances in neural information processing systems, 2014, pp. 3320–3328.