Face Recognition using depth information

0 downloads 0 Views 17MB Size Report
Jun 25, 2017 - internship was based on face recognition using grayscale image and depth in- formation of .... The matrix A is determined by the covariance matrix Cx : Cx = 1. K. K. ∑ ... several convolutional and Max pooling layers. Figure 1.3: ... This method shows to be efficient and robust to facial expression and poses.
Master Thesis

Face Recognition using depth information

Pierre-Etienne MARTIN Image Processing and Computer Vision master student

Supervised by Dr. Mikhail TOKAREV Institute of Thermophysics SB RAS Novosibirsk, Russia

Referent teacher Dr. Vincent LEPETIT Université de Bordeaux Bordeaux, France

June 25, 2017

Abstract In the scope of Image Processing and Computer Vision master (IPCV), my internship was based on face recognition using grayscale image and depth information of faces. The internship took place in the Thermophysics institute of Novosibirsk in Russia and was supervised by Dr. M TOKAREV. The internship begun the 25 of January and finished the 25 of June 2017. This Master thesis is a report of my internship and has to be sent to my referent teacher in France before the 23 of June 2017. The defense takes place the 29 of June 2017 at the University of Bordeaux. The first choice was about the depth map computation which is described in the Acquisition chapter and from this choice we made our own dataset from the people working in the institute. One data is constituted of one grayscale image of the face and one depth map based on this image. We then performed a dataset of 19 people (15 men and 4 women) up to 400 pairs of data for each person. The second choice was about how to set the input and the architecture of our deep learning model. We built our input with a 2 channels matrix, the first channel being the grayscale image (texture) of the face and the second channel the depth map linked to the image. The model was built and trained using Keras on Unbuntu 14.06 with in GPU mode using a graphics card GeForce GT 740M NVIDIA. Other models using only texture and depth were also trained to discuss the efficiency of the joint model. From the results, we can tell if the joint model of depth and texture is better than a model using only one of these data.

Contents Acknowledgments

2

Introduction

3

1 Related work 5 1.1 2D Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.1.1 Gabor Wavelets . . . . . . . . . . . . . . . . . . . . . . . . 5 1.1.2 Principal Component Analysis : PCA . . . . . . . . . . . 6 1.1.3 Face Recognition using Convolutional Neural Network (CNN) 8 1.2 3D Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 1.2.1 Expression Invariant 3D . . . . . . . . . . . . . . . . . . . 9 1.2.2 3D Recognition using Depth . . . . . . . . . . . . . . . . . 10 2 Acquisition 2.1 3D Light Field Camera [19] . . . . . 2.2 3D Scanner . . . . . . . . . . . . . . 2.3 Binocular stereoscopic system . . . . 2.3.1 Calculation of the depth map 2.3.2 Segmentation . . . . . . . . . 2.3.3 Filling . . . . . . . . . . . . . 2.3.4 Organization of the data set .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

12 12 15 17 17 19 21 23

3 Deep Learning Model 24 3.1 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 3.2 Data Augmentation . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.3 Difference of Gaussian . . . . . . . . . . . . . . . . . . . . . . . . 28 4 Results 29 4.1 Combination of variables . . . . . . . . . . . . . . . . . . . . . . . 30 4.2 Comparison of the models . . . . . . . . . . . . . . . . . . . . . . 32 Conclusion

37

References

38

List of Figures

40

List of Tables

42

1

Acknowledgments I would like to thank all the contributors of the dataset who were working at the Thermophysics Institute of Novosibirsk, Russia which allowed us to train our models : Vladimir ANTIPIN, Artur BILSKIY, Andrey CHERDANTSEV, Mikhail CHERDANTSEV, Leonide CHIKISHEV, Florian DELAPLACE, Vladimir DULIN, Tania DULINA, Oleg GOBYZOV, Maria KLYKOVA, Maxim KOROTKOV, Alexander NEBUCHINOV, Mikhail RYAGOV, Maxim SHESTAKOV, Anna YAGODNITSYNA, Alexandra ZUEVICH. Many thanks to Alexander KVON for the acquisition equipments he lent me. I also thank Alexander SEREDKIN for his help on the understanding of the depth map calculation software, his time helping me in the calibration process and his contribution to the dataset. And finally a great thank you to Mikhail TOKAREV for being part of the dataset and being a thoughtful tutor during my internship, but most importantly for his hospitality and his will to make me visit Russia.

2

Introduction The identification of person is a topic which is more and more developed in security domain to prevent attack and identity usurpation. With the recent terrorist events, the need of reliable biometric recognition has drastically increased to be able to track suspects effectively and avoid these kind of catastrophes. Many biometrics methods can be used and face recognition is not the most reliable but is one of the less intrusive and most accepted method. See Table 1 from [1, p36]

Table 1: Comparison of several biometric technologies (assessments based on authors’ perceptions of "Handbook of Fingerprint Recognition" [2]). Furthermore the Market is also relevant to this development (see figure 1). Also, according to this survey guide [4] from GIGYA which is a company specialized in identification of customers, the password for every day identification is going to disappear to let the floor to biometric identification. Such a change comes from the number of hacking which increases due to the bad habit of the users to use password easy to remember or the Figure 1: Biometrics market share same for different security systems. That by system type from [3]. is why a biometric system security needs to be efficient enough, easy to collect and also user friendly. And as we can see on this diagram (figure 2), despite his low performance, the facial biometric share is the third most important because of its acceptability and its collectibil3

ity (see table 1) which allow developers to use of Deep Learning model which well trained should be able to increase the performances. And on FindBiometrics website [5] we can observe the same trends : the users of biometric systems are more excited about Fingerprint, Facial and combination of modalities (see figure 2). Due to this effervescence in the facial domain, in this Master Thesis we will go through different facial recognition method and then suggest a new method based on grayscale images of the face and their depth.

Figure 2: Survey from FindBiometrics website [5].

4

Chapter 1

Related work The main challenge of a facial recognition model is to extract the useful features which will separate efficiently the subject faces of the dataset. These descriptors can be obtained using two kinds of model : linear model or non linear model. Different method are described in "‘2D and 3D face recognition: A survey"’ [6]. We will fist describe a few 2D approaches and then see some 3D approaches. Each subsection is related to one or two articles where the methods are described more precisely. These methods use approaches that are described in the "‘Handbook of Face Recognition"‘ [7, Part 1].

1.1

2D Approaches

In this section we will see some general methods to extract the features of 2D images.

1.1.1

Gabor Wavelets

The Gabor Wavelets which has been used for recognition in [8] and [9], extract the spatial and orientation features in the spatial and frequency domain. They have been found appropriate for texture representation. The features are extracted by convolving the image with the Gabor filters which can be described as follow : g(x, y; λ, θ, ψ, σ, γ) = e−

x02 +γ 2 y 02 2σ 2

x0

ei(2π λ +ψ)

x0 = x cos θ + y sin θ

with :

(1.1)

0

y = −x sin θ + y cos θ with λ the wavelength of the sinusoidal factor, θ the orientation, ψ the phase offset , σ the Gaussian standard deviation and γ the spatial aspect ration. In figure 1.1 for demonstration we used 20 Gabor filters of size 39 x 39 with 4 scales and 5 orientations (1.1a and 1.1b). Then by convolving an input face (1.1c) with the Gabor filters, we obtain features in the real domain (1.1d) and imaginary domain (see magnitude figure 1.1e).

5

(a) Real part Gabor filters.

(c) Input face.

(b) Magnitude Gabor filters.

(d) Real part.

(e) Magnitude.

Figure 1.1: Gabor features from convolution between face and Gabor filters. A model based on Gabor features becomes more robust to illumination and orientation by increasing the number of scale and orientation to make the filters. So the model is more efficient according to the number of Gabor filters used. The main issue of Gabor features is the size of the features and its computation time due to the linearity of the method. That is why it is often used with an other method for its discrimination strength.

1.1.2

Principal Component Analysis : PCA

Principal Component Analysis has been used in this paper [8] joint with Gabor filters features. It calculates the N principal components which will extract the common features of a dataset. It is a method to reorganize the data to find the best possible characteristics of the dataset. It can also be used for image compression or voice separation. The main theory is as follow. Let x be the N images of a subject with x = [x1 , x2 , ..., xN ] and xi the vector image of length K. The PCA would then be the transformation : y = A(x − mx ) with :

mx =

K 1X xk k k=1

6

(1.2)

The matrix A is determined by the covariance matrix Cx : Cx =

K 1 X xk xTk − mx mTx K

(1.3)

k=1

and A being an orthonormal matrix, the inversion is possible proving that PCA is a linear method : x = AT y + mx

(1.4)

Figure 1.2 is an example of the first eight principal components over a data set of 155 images of the same subject which would be in our case yi with i = 1 to 8 and y = [y1 , y2 , ..., yN ].

(a) First component.

(b) Second component.

(c) Third component.

(d) Fourth component.

(e) Fifth component.

(f) Sixth component.

(g) Seventh component.

(h) Eighth component.

Figure 1.2: PCA example. To obtain good results with PCA for recognition, the method must be modified to try to obtain the components that are not similar to the other subjects and then increase inter personal variation. If it is not modified, the method would not be discriminant enough and the use of another method as Gabor features should then help to make a good identification.

7

1.1.3

Face Recognition using Convolutional Neural Network (CNN)

In this paper [10], Yi Sun et al. compare several methods with a deep learning model called DeepID2 (see figure 1.3). They show that this model is more able to increase the inter personal variation and decrease the intra personal variation than common methods. The DeepID2 features are extracted through several convolutional and Max pooling layers.

Figure 1.3: DeepID2 model. They trained this model using a loss for identification and verification. For the identification process they used the cross entropy loss : Lidentif (xi ) = −

n X

−pi dk log pk = − log pi

(1.5)

k=1

with x the DeepID2 feature, i the subject and pk the probability for each subject. pk being equal to 0 for k 6= i and pk = 1 for k = i, we have the simplification. And for the verification process they modified the usual loss function using a loss based on the L2 norm introduced by R. Hadsell and al. [11] : if yi,j = 1 Lverif (xi , xj , yi,j ) =

1 k xi − xj k22 2

if yi,j = −1 Lverif (xi , xj , yi,j ) =

(1.6)

1 max(0, m− k xi − xj k2 )2 2

with xi and xj the DeepID2 features of two different faces. If i and j are the same person then yi,j = 1 otherwise yi,j = −1. They found out that joining the verification and the identification signal increases the inter personal variation and decreases the intra personal variation outperforming other deep learning methods. Their final model is then a Bayesian joint method taking as input the PCA of the DeepID2 features.

8

1.2

3D Approaches

In this section we will describe the techniques which build the 3D model and then how they extract the features to make the identification.

1.2.1

Expression Invariant 3D

In the paper "‘Expression-Invariant 3D Face Recognition"’ [12], they compared four approaches to make an identification model : eigen decomposition of range images, combination of texture and range images in the eigenfaces scheme [13], eigen decomposition of canonical images and their eigenforms algorithm : eigen decomposition of flattened textures and canonical images. To do so they first build their own data using range camera : a grayscale FireWire CCD camera with a resolution of 640 pixels, 8 bit. Then they extract the features they are interested in from the data recorded : see figure 1.4.

(a) Texture mapping on the facial surface.

(b) Texture mapping on the canonical form.

(c) The resulting flattened texture.

(d) The resulting canonical image.

Figure 1.4: Different features obtained from range data. Their method uses as features the eigen decomposition of flattened textures 1.4c and canonical images 1.4d. They find out that the methods using eigen decomposition of range images, the other using only eigen decomposition of canonical images only and the one using directly the range images in the eigenfaces scheme [13] were not as good as their method. Their method using bendinginvariant canonical representation has shown to be robust to facial expression and is even more accurate by using the texture in addition.

9

1.2.2

3D Recognition using Depth

In this paper "‘Three-Dimensional Model Based Face Recognition"’ [14], Xiaoguang Lu and al. acquired their 2.5D model using the 3D scanner Minolta VIVID 910. The goal of their work is to match 2.5D models (frontal shot + semi profile shot) with a 3D model build on 5 shots of the same subject with neutral expression. For this purpose they first detect the feature points (eyes and mouth) of the 2.5D model : figure 1.5.

(a) Frontal texture.

(c) Semi profile texture.

(b) Frontal depth.

(d) Semi profile depth.

Figure 1.5: Detection of the features points. Then they make a coarse alignment with a rigid transform using a least square fitting between the triangles formed with the feature points. Next they do a fine alignment using Iterative Closest Point method [15] trying to find the best translation and rotation to minimize the error between the 2.5D model with the 3D model. This alignment is done on the part of the face where the expression should not change (nose, eyes...) but avoid mouth because of its great variation depending on face expressions. Then they use the error computed during the fine alignment to compute the matching : figure 1.6.

(a) 2.5D Model.

(b) Best match correct.

(c) 2.5D Model.

(d) Best match incorrect.

Figure 1.6: Results of the matching procedure. This method shows to be efficient and robust to facial expression and poses with an error rate of 3.5%. The main issue in their work is that the method is not completely automatic. The features points are not always detected. Also, their dataset is composed of 18 subjects meaning this process should be run on all the subjects which might take some time (data not available).

10

These methods all find different ways to extract the features that should help for identification. We decided to focus on a 3D approach because of the security that the depth data could add to a biometric system. Indeed, it has been shown that security system using 2D face recognition are hackable using a simple 2D printed photo [16]. This paper : "‘LFHOG: A discriminative descriptor for live face detection from light field image"’ [17] also underlines this problem and use a linear SVM based on HoG features on the depth direction for face detection. Interested in the field of deep learning, we decided to use a non Linear approach with a deep learning model which would take as input the texture of the face and the depth information and should give as output the identity of the person. We will compare it with a model using only the depth and another using only the texture to check if it has improved the result. We will also make some experiments using a depth set to zero or a depth different to the one computed from the texture to check if our model can be able to recognize a tentative of hacking.

11

Chapter 2

Acquisition To build our dataset we tried different methods of acquisition using range cameras. All these methods use the principle of Stereopsis [18, chap7] which allows us to be aware of the depth of a scene through the difference of the images recorded by our eyes. The devices calculate the depth using different methods [18, chap14]. Our goal is to be able to get accurate data from faces with a fast acquisition process to capture the face even with motion. Also, to be accepted by the public for a future application, this acquisition process should not be invasive.

2.1

3D Light Field Camera [19]

The 3D Light Field Camera developed by Raytrix company is a ChargeCouple Device (CCD) camera produced by SVS-VISTEK (Germany). It has been used by my colleague Alexander SEREDKIN for 3D velocity measurements in a slot jet [20]. Raytrix company also promotes this camera for face depth measurement and this is why we tried the acquisition on different subjects. Description of the camera (Figure 2.1): Model : R11 M GigE Output : 12-bit grayscale images Frame Rate : up to 10 fps (frames per second) Resolution : 10.7 Mp (Megapixels) Pixel size : 9 µm Exposure time : from 202 µs to 8 s Micro-lenses in the raster : 192x168 Diameter of the micro-lenses : 216 µm (24 pixels)

Figure 2.1: Raytrix camera and its optic. 12

The lens array of the camera is composed of 3 types of micro-lenses with complementary focal lengths which allows a user to have a depth of field six times larger compared to a standard 2D camera. An example of the row output image of the Raytrix camera is presented in figure 2.2.

Figure 2.2: Raw image data (left : entire image, right : zoom on the right eye) Then with the help of the calibration (figure 2.3a) and the stereopsis principle explained for row data with a synthetic example (figure 2.3b) from [20]: where the virtual depth decreases while the number of time you see the same point (without mirroring effect) increases; the depth map can be computed (figure 2.3c). Then the texture is added to the depth map to obtain a 2.5D model (figure 2.3d).

(b) Left : simulated data, right : raw sub-aperture image for different virtual depth. Figure from [20].

(a) Calibration image : each color correspond to a type of lens.

(d) 2.5D Model : Depth map + Texture.

(c) Depth map computed.

Figure 2.3: Depth map computation process.

13

Because Raytrix software (see figure 2.4) makes everything by itself : the image preprocessing, the calculation, the selection of the measurement points and the filling method, the number of parameters to set is huge. Then to obtain good results we have to spend some time trying different combinations of parameters which differs according to the illumination condition and sometimes for each person. Furthermore this same software is sold with the camera and cannot be modified which prevents improvements of the calculation.

Figure 2.4: Raytrix Software. The camera Raytrix is a promising range device which can give reliable depth data after careful optimization of preprocessing and depth calculation algorithm parameters for a highly contrasted object. It is also manipulable and simple to use after geometrical calibration of the lens array. In our case in order to have the required depth accuracy within 1-2mm we used structured light illumination via a multimedia projector in the form of periodical black and whites stripes with the period about 5mm in the object plane. This made the subject’s face more contrasted increasing the quality of the depth data. However the periodical intensity structures were visible on the recovered texture. Also Raytrix camera looks like a normal camera and would be easily accepted by the public. But still, the accuracy is not perfect. This can be improved with a better Raytrix camera which has more types of micro-lens and higher spatial resolution in megapixel but it costs more and was not available in our work environment. However the main issue was the acquisition process which was too long due to the research of the optimized parameters. Those criteria preventing us to make a large dataset for each people in a short time, we decided not to focus on this acquisition method.

14

2.2

3D Scanner

In the same time that we were testing Raytrix device, we decided to try a 3D scanner from Range Vision company [21]. The 3D scanner is mainly composed of a projector mounted on a tripod legs and 2 cameras (figure 2.5) that have to be focused on the face location (figure 2.6). The scanner is based on active structured light illumination principle. Description of the 3D Scanner : Model : PRO 2M Output : 3D Model Scanning time : 12 s (7s for the model Standard Plus but less accurate) Resolution of the cameras : 2 Mp (Megapixels) 3D resolution per operation distance D (and scanning area (mm)) : - D = 2 m (850x530x530) : 0.3 mm - D = 0.9 m (460x345x345) : 0.19 mm - D = 0.27 m (66x50x50) : 0.03 mm

Figure 2.5: The Range Vision 3D scanner and its components. After have installed the Range Vision software to control the 3D scanner and made all the connections from the computer to the projector and the cameras, we need to make the calibration. For this we first place the cameras on the same axes at distance d (set according to the scanning area) to each other. Then we swivel the cameras towards the face position. Then with a calibration plate we record different positions between the distant plane and the near plane. The total time for the calibration process is around 40 minutes, so it is very important to do not move the scanner to avoid to repeat the whole process. Figure 2.6: Scheme of the acquisition. 15

During the acquisition the subject has to stay still for 12 s while the projector emits different light patterns on the subject (see figure 2.7). These patterns called structured light and coded light allow us to obtain a maximum of measurement points by difference of contrast and deformation of patterns.

(a) Thick lines.

(b) Color squares.

(c) Thin lines.

Figure 2.7: The different light patterns during acquisition process. Then the Range Vision software processes the data recorded by the cameras using triangulation. As we can see on the figure 2.8, the results are quite impressive : the accuracy is perfect and the amount of point is enough for extrapolation to obtain a full depth map. However, we can notice for the subjects with beards (see figure 2.8b) the 3D scanner fails to record measurement points on the beard due to the distance D of the camera which is between 1 and 2 meters lowering the 3D resolution (between 0.19 and 0.3 mm).

(b) Subject with beard.

(a) Subject without beard.

Figure 2.8: 3D models obtained with the 3D Scanner. Before the installation of the 3D scanner we were expecting to obtain better result and we did, but shortly after the installation we knew that this type of scanner could not be used in public because of the warnings which clearly specifies : "’Do not aim the scanner on the people and animals in order to avoid contact with eyes of the harmful bright light projector."’. This is why all the subjects tested with this device had to close their eyes during the whole acquisition process. Furthermore the acquisition time is a great issue for our application. Indeed stay put for several seconds to obtain a depth map is a very invasive way for the public and too long for checking identity. Also because our purpose is to train a deep learning model, obtain a decent amount of data with 12 s per depth map is impossible in our time window. So we decided to try an other acquisition method.

16

2.3

Binocular stereoscopic system

The binocular stereoscopic system we are using is composed of 2 CCD cameras ImperX IGV-B2020 mounted on tripod heads using lens SIGMA 50mm 1:2.8 DGMACRO which gives 14 bit grayscale images of resolutions 2056x2060 with pixel size of 7.4 µm. The cameras are connected to a homemade pulse generator with 8 TTL outputs controlled by an external PC through RS-232 interface which allows to take the shots at the same time. The software called ’ActualFlow’ controlling the shots and calculating the depth map has been developed by my tutor Mikhail TOKAREV and used it for optical flow diagnotics and surface motion analysis [22]. It has been developed in C++ using Microsoft Foundation Classes (MFC) : library for developing desktop applications and OpenCV library. The cameras should be able to have a frame rate of 16 fps in dual tap mode but due to the limitation of the data storage throughput and desire of using only one ADC (Analog to Digital Converter) to have monotonous image intensity, we have been able to reach only 6.7 fps for an acquisition on 15 seconds without missing images. Before acquisition the cameras need to be calibrated. The aperture and the focus are set to be able to distinguish clearly the dots on the calibration plate (the Foreground plate in figure 2.9). Then the software saves the images of the calibration plate at several depth positions : -50 mm, 0 mm and 50 mm using PI miCos VT-80 translation stage controllable via Corvus PCI (Peripheral Component Interconnect) extension board. Then the software make the detections of the dots and save in a txt file their position on the images and their 3D location determined by the distance between each points and the plate position. We have then a .txt calibration file for each camera.

Figure 2.9: The Range Camera installation for acquisition.

2.3.1

Calculation of the depth map

The software developed by my tutor uses stereo correspondence method ([18, Algorithm 7.2] and [23, pp. 438-445]) with block matching to compute the depth map. Once we have the calibration files, we use OpenCV function StereoMatcher to compute the disparity image (difference between the point of projection) of the images from the the two cameras (figure 2.10). 17

Figure 2.10: Images obtained by the two cameras. Then with the disparity image and the calibration files we use reprojectImageTo3D from OpenCV library to obtain the true depth map. Then we have to look for the best combination of the parameters to obtain as much true measurement points as possible (see figure 2.11). So there is a trade off between smoothed depth map without holes and accurate but rarefied depth map. With not enough measurement points, we will not be able to reconstruct a full depth map, and with bad measurement point the full depth map will be wrong. It is good that with these parameters we can use the same combination for all the subjects while the light condition is approximately the same. The time for the calculation of the depth map is approximately 3 s.

Figure 2.11: Depth computed with the best combination of variables. As we can see in figure 2.11 the result obtained is correct with enough measurement points and good accuracy. The big advantage of this acquisition method is the amount of data collectable in a short time. We can collect until 200 images in 15 seconds. And also the parameters that we don’t need to change for each subject, it is necessary to change them only if the configuration of the installation changes : angles between the camera, face location closer or 18

further and big changes of light condition. Furthermore because the software has being written by my tutor M. TOKAREV, I had access to it, allowing me to understand the inner functions and what I could modify to improve the results. This is why we decided to use binocular stereoscopic system to build our dataset and made some post processing to improve the results for our application. We have done a first acquisition in one day with 16 subjects recording 50 images per person per camera and then a second acquisition with 19 people recording 100 images per person per camera. The extra 3 people on the second acquisition did the recording twice with some physical differences (see figure 2.12) to obtain a maximum of variation for the same subject. This is also why we asked the subjects during the acquisition process to move the face from right to left, up to down and make different expressions while still keeping the face between the two panels.

Figure 2.12: Variation of the physic and the poses with Anna Y. and Maxim S.

2.3.2

Segmentation

First of all, to get rid of the measurement points that are out of the range we are interested in, we put two panels in which the face should be : Foreground and Background in figure 2.9. Then at the beginning of each calculation for each acquisition we set the background and foreground distance by selecting dynamically on each panel an area where to calculate the mean of the depth. But we noticed that they were approximately the same so we decided to set them for all acquisition : Background = -100 mm and Foreground = 50 mm. Then we consider only the values of the depth map between these two distances [18, chap. 9.2.1]. See figure 2.13.

(a) Depth before segmentation.

(b) Depth after segmentation.

Figure 2.13: Segmentation of the depth map.

19

Because we are interested only in the face we also made a spatial segmentation [18, chap. 9.2.3] with face detection using a cascade classifier. The classifier detects at multiscale the face in the image based on the max width of the face we set on the first pair of image for each sequence. We consider only the best detection as the face position. Then to avoid to have the depth of the panels and the structure in the final depth map, we set also at the beginning at each sequence a window search (red window in figure 2.14). If the center of the face deFigure 2.14: Face detection process tected is out of this window, the face is discarded. If it is in the window, we draw an ellipse according to the max width. The height is defined with the height_f actor set at 1.4. So the face image will be of the size (max_width, max_width ∗ height_f actor). And if the ellipse is out of the window search (see gray ellipse on figure 2.14), then it is moved to fit inside the window (white ellipse). Finally, to keep only the depth of the face we set all its values out of the ellipse to 0. (see figure 2.15c).

(a) Face detected.

(b) Face obtained.

(c) Depth obtained.

Figure 2.15: Face detection result. Now that the segmentation is done, we check if there is enough measurement points to fill the depth map. If the number of measurement points is more than the third of the total of points the depth map is filled, otherwise it is discarded.

20

2.3.3

Filling

To fill the depth map, we tested 2 methods : a Gaussian window [18, chap. 4.1] taking the measurement points to fill the empty areas and the second one in the built in Inpaint function of OpenCV. To make the Gaussian window efficient we take only the position of the points needed to be filled (set to -1) considering that the measurement points obtained are the exact measure. First of all we fill over a mask the empty areas to 0, see figure 2.16. The mask 2.16d is the convolution of the no measurement points mask 2.16b with the boundary mask 2.16c. When filling the empty area we make sure that there is no measure point in the window with wf illempty = 15 (total_length = 2 ∗ wf illempty + 1 = 31). The empty filling speeds up greatly the global filling process for people with a little face.

(a) Depth computed.

(c) Boundary mask.

(b) Mask without data before filling.

(d) Mask for empty area.

(e) Mask after filling.

Figure 2.16: Empty area filling. Then we fill the depth with the measurement points around the empty positions. We set a window size wf ill = 50 (total_length = 2 ∗ wf ill + 1 = 101) and consider only the points with enough neighborhood : we fill only points with (wf ill ∗ wf ill )/4 neighbors, so approximately 1/16 the total number of elements in the Gaussian matrix. The Gaussian matrix is built with the help of the OpenCV function getGaussianKernel using the default σ : σ = 0.3 ∗ (total_length ∗ 0.5 − 1) + 0.8. We repeat the process until all the data points are filled : figure 2.17).

21

(a) Iteration 1

(b) Iteration 2

(c) Iteration 3

(d) Iteration 4

(e) Iteration 5

(f) Iteration 6

(g) Iteration 7

(h) Iteration 8

Figure 2.17: Depth filling. Then finally we smooth our result only for the points which were missing before the filling process using the mask 2.16b using the same method with a window : wsmooth = 60 (total_length = 2 ∗ wf ill + 1 = 121). We can now compare our result with the inpaint function : figure 2.18.

(a) Gaussian window.

(b) Inpaint function.

Figure 2.18: Results of the different filling methods. The result obtained with the inpaint function is 2.18b actually more noisy with some artifacts on the boundary of the face which are less pronounced with our method 2.18a. Furthermore the computation time of the whole filling process with the gaussian method for the subject in figure 2.17 took 7 s and using the inpaint function it took 50 s. We decided to keep our method to proceed all the data. 22

2.3.4

Organization of the data set

Once the filling done, we obtain an 8 bit grayscale face image and a 12 bit depth to save : see figure 2.19. We used the size 200 ∗ (200 ∗ height_f actor) with the height_f actor set to 1.4, so the final saving size is 200 × 280.

(a) Face.

(b) Depth.

Figure 2.19: Final results. Then we need to save the results according to the subject, the acquisition, the camera (right or left) and the type of data for model training issue. The results are named according to the number of images which have been proceed under png format. So the path should look like this : Results/Subject_name/ Acquisition_number/Right_or_Left_camera/Face_or_Depth/Proceed_image_ number.png. We proceed 5019 images over 5502. The images not proceed were due to the bad quality of the depth map because of the people who were out of range of the motion when they were moving to fast. We also had to remove a subject (Sasha on figure 2.20) of the acquisition 2 because the second acquisition last several days so the installation slightly moved making the depth calculation impossible for the last subject due to the old calibration which did not fit the latest optical configuration.

Figure 2.20: Data repartition. After the preparation of the data set we can train our deep learning model. 23

Chapter 3

Deep Learning Model A deep learning model is a succession of layers which is called architecture of the model. As it is represented in figure 3.1, the model is then trained with Training data. Because to train a deep learning model we need to have lot of data, we use data augmentation. While the training is progressing, at each epoch we also check the loss and the accuracy of the model with Validation data which is different of the Training data. Then we test the model trained with the Test data (which are the same than the Validation set because of the lack of data). The output is then a probability of each data to be a subject present in the dataset.

Figure 3.1: The deep learning model.

24

3.1

Architecture

The architecture has been inspired from the model DeepID2 [10, figure 1]. Our model (see figure 3.2) takes as input the texture and the depth as they were 2 channels. They are then being processed both in the same time. The model is first composed of 2 convolution layers with kernel size of K1 = 3 x K2 = 3 using mirror method at the boundary for calculation :

B(i, j) =

K1 X K2 X

wk1,k2 ∗ A(f (i + 1 − k1), f (j + 1 − k2))

k1=1 k2=1

with :

(3.1)

f (x) = x if x > 0 f (x) = −x + 1 if x ≤ 0

with 20 filters. It is followed by a MaxPool layer with a kernel size of N1 = 2 x N2 = 2 : B(i, j) =

max

(A(k1, k2))

(3.2)

k1=(N 1−1)∗i,N 1∗i k2=(N 2−1)∗j,N 2∗j

Then 2 other convolution layers of same kernel size with 40 filters and another MaxPool layer with same parameters and 2 last convolution layers with still the same kernel size and 60 filters. Finally a dense layer : Xi =

M X

wki ∗ wk + bik

(3.3)

k=1

using softmax function : eX P (iddata = idk |X) = PN

k=1

T

∗wk

eX T ∗wk

(3.4)

which gives as output the probability for each subject to be the input data.

Figure 3.2: Architecture of the model. Then we train our model using Kingma and Ba’s Adam algorithm [24], using the categorical cross entropy loss : L=

N 1 X id Pk log(Pk ) + (1 − Pkid ) log(1 − Pk ) N k=1

25

(3.5)

with Pkid = 0 when k 6= id and Pkid = 1 when k = id. So we obtain : L=

1 (Pid + N

N X

log(1 − Pk )

(3.6)

k=1,k6=id

Figure 3.3 is the output that Keras software gives once the architecture is done. We set the size of the images used to 50 ∗ 70 which keeps the same proportion than the images saved (see section 2.3.4). The number of channels becomes 1 when we use only face or depth information to evaluate our model using both information. Finally the output of the dense layer is the number of classes that we have in our dataset. It can also change according to how the training is being done (see Chapter 4 Results).

Figure 3.3: Detailed architecture of the model.

26

3.2

Data Augmentation

The data augmentation is a technique which is used when we do not have enough data to train properly a model. We use the same data for each epoch but we modify them using translation and rotation : figure 3.4. It is also useful to avoid the over-fitting of the data during training.

(a) Tania face.

(b) Tania depth.

(c) Artur face.

(d) Artur depth.

Figure 3.4: Example of data augmentation. The Keras library did not allow to use the same transformation for 2 channels data, so we had to modify the library to generate the data correctly. The rotation range has been set to 45◦ and the shift in horizontal and vertical axis has been set to 20%.

27

3.3

Difference of Gaussian

We also tried to replace the texture by its difference of gaussian which is the difference of the convolution between the image and a Gaussian kernel with different deviation :

Gaussian Kernel :

Gσ (x, y) = √

First convolution :

s1 = I ∗ Gσ

Second convolution : Difference of Gaussian :

1 2πσ 2

e−

x2 +y 2 2σ 2

(3.7)

s2 = I ∗ Gk∗σ dog(k, σ) = s1 − s2

It is used to enhanced the features acting as band pass filter to dismiss the frequencies we are not interested in. We tried multiple combinations of the variables k and sigma to train our models.

Figure 3.5: Examples of Difference of Gaussian features.

28

Chapter 4

Results Our models have been trained using different datasets, image processing and using different classes. We used three types of models : one using only depth information : ’Depth’, an other one using only texture information : ’Face’ and the last one using both information : ’Both’. First of all, we quickly realized that we needed to mix the data from the first and second acquisition to obtain good results. So we used different training using a dataset we called "‘Disjoint dataset"’ (see figure 4.1a) and an other dataset called "‘Joint dataset"’ (see figure 4.1b). The Disjoint dataset uses as training data the second acquisition and as validation data and test data the first acquisition. On the other hand the Joint dataset uses as training data the first 75% of the both acquisitions and as validation data and test data the last 20% of the both acquisition. The 5% in between are unused : it represents one second during the acquisition and let us have some difference between the training and the validation/test sets. As we can notice, the number of subjects is also different according to the dataset used. As it was previously said in the chapter 2 Acquisition, we were enable to compute the depth map of the subject Sasha, so we cannot incorporate him in the Disjoint dataset. In the first next section we analyze the results obtained with different DoG (Difference of Gaussians) parameters and in the second section compare the results obtained with some modifications in the model or the datasets. The results were obtained with a training of 100 epochs for Disjoint dataset and 50 epochs for Joint dataset. We used a batchsize of 515 images and the training time for 50 epochs is around 13 minutes. We used Keras software on Unbuntu 14.06 on GPU mode with a graphics card GeForce GT 740M NVIDIA to train our models. Keras uses Python language and the script was also developed in Python. Each prediction was saved and statistics made using the same script. The statistics were most of the time global but our code was also able to make it over each class for deeper analysis. Our script also save the filters of the convolutional layers from a random data in the validation set for each model.

29

(a) Disjoint dataset.

(b) Joint dataset.

Figure 4.1: Distribution of the data for each subject.

4.1

Combination of variables

The figure 4.2 and 4.4 represent the accuracy obtained for the model ’Both’ and the model ’Face’ using different sets of σ and k for the Difference of Gaussians. The parameters take the values : σlist = [0.1, 0.2, 0.3, 0.5, 1, 2, 5, 10] and klist = [0.1, 0.5, 0.9, 1.1, 1.5, 2, 3, 5, 10] which give a total of 72 measurement points. We also computed the accuracy for the models without using Difference of Gaussian but using the texture as it is.

30

(a) Both model.

(b) Face model.

Figure 4.2: Accuracy based on the variation of the DoG parameters using Disjoint dataset. For both of the models using Disjoint dataset the best combination of parameters is σ = 0.2 and k = 10 (see figure 4.3 for DoG example). It can boost the performance up to 30% compared to the model which does not use the Difference of Gaussians.

Figure 4.3: DoG obtained with σ = 0.2 and k = 10.

(a) Both model.

(b) Face model.

Figure 4.4: Accuracy based on the variation of the DoG parameters using Joint dataset.

31

But using the Joint Dataset the Difference of Gaussian is not relevant. We obtain very similar performances. This difference of increased performance is linked to the datasets and the training. In the first case we train our model using always the same acquisition, it induces not only the learning of the face features but also the learning of the background and the scene condition which does not change. Using DoG we enhance the face features and get rid of the noise made by scene conditions. But in the second case the DoG has not proved to be relevant because the model did not learn the scene condition due to the use of two different acquisitions. The model learned by itself that this noise induced by illumination and scene were not useful for the classification and so did not learn it.

4.2

Comparison of the models

After we have obtained the first results we decided to test our models on hand made data to see how the model would react. We used data with a depth map set to zero which can be associated to someone trying to cheat an authentication system using a photo (class "‘Photos"’). We also used data with texture set to zero to see how important the texture is in our identification model. And finally we used texture linked to the depth of someone else, this can be associated to someone who would have printed a face on a mask in the purpose to cheat again the authentication system (class "‘Printed masks"’). We then made two other models based on the Joint dataset. One using the depth map which will not be normalized uniformly : depth 16bit normalization Old model : depth = 65025 (4.1) depth New model : depth = Individual normalization max(depth) And an other one using the same modification and which also incorporates the two additional classes in its training and validation process : "‘Photos"’ and "‘Printed masks"’ which sets the output of the dense layer to 21. They are built from the training and validation dataset which triple the amount of data as the training time triple. In the table 4.1 we compare the efficiency of our results through the Best prediction and the Top 3 predictions, and through the mean probabilities obtained for each category. These probabilities interpret the confidence of the models. Globally we notice how the models are more confident for the True Best matches than for the False Best matches. We can also analyze how the probability in the true top 3 (without being the best) is. If the probability is high without being the best it is also a good aspect. By looking at these probabilities we can set thresholds into the verification system to check reliability of the identification and trigger an action if the reliability is low. The action could be an alarm or a new acquisition. It can also be useful in the case of searching a group of people. We can look at the top N matches where the matches are higher than this threshold and the true match would be more likely one of these suggestions. 32

Table 4.1: Results obtained for different models in %. As we can see on the table 4.1 our disjoint models do not perform good scores. This is due to the lack of training data which leads to a very low intra personal variation and then to an overfitting of the training data. The overfitting is well represented on the evolution of the loss in the figure 4.5.

Figure 4.5: Evolution of the accuracy and the loss for the Both model using Disjoint dataset The loss of the training dataset decreases all along but the loss of the validation dataset first decreases and then increases. The model has already reached his limit and the training dataset is not complete enough to allow the model to learn the meaningful features of each subject’s face. It sticks to learn all the 33

other parameters as the illumination or the cloths considered as noise. Also the depth gets the worst score, this is due to the data which are even more noisy than the other models because there is no Difference of Gaussian made on the depth. Using the disjoint dataset, the model using only the face texture has the best performance in all the aspects. However, by using the joint dataset we see how the scores increase. That is because we have more intra personal variation within the training dataset and also because the validation set is quite close to the training set. The evolution of the loss takes then an other turn (see figure 4.6) and the models achieve more than twice better accuracy than the previous results.

Figure 4.6: Evolution of the accuracy and the loss for the Both model using Joint dataset With the use of the Joint dataset, the model using both information has the best scores with a high confidence, a low probability when it is wrong and a correct probability for false negative. To go deeper into the analyze we can check what happens if we set the depth to 0 or the texture to 0 or if we use the texture with a depth of an other subject. The Both model performs 98% of accuracy when the depth is set to 0, 5% of accuracy when the texture is set to 0 and with switch depths it recognizes correctly the subject according only to the texture. It means that this model does not use the depth data and is similar to the Face model. This is why we introduced two other models using both data with the depth normalized individually (see equation 4.1) and we added 2 classes to one of them modifying, in the same time, its training and validation datasets.

34

Table 4.2: Results obtained for the two new models in %. On the table 4.2, by looking at the first model results, we can notice how the influence of the face has decreased and how the influence of the depth by normalizing it individually has been enhanced. It is reflected by the lower score when setting the depth to 0 compared to the previous results with the depth set to 0. However by switching the faces we notice that the model is more likely to identify the subject from the texture or to give a wrong identification than to identify the subject from the depth. It stresses that the model get better features from the texture than from the depth. But by looking at the second model we notice the effects of the training difference. The model easily identified the data with depth to 0 as a printed pictures (score of 100%) but on the other hand was totally inefficient to identify the data with the textured set to 0. Its inefficiency is due to its training which does not take into account that sort of data. Also if we look deeper in the results, with the statistics over each classes, we can notice the model achieves 97% score over 168 samples for the Printed mask class. The model is able to identify when the texture do not correspond to the depth. This last model is what we wanted to achieve because it takes the depth and the texture into account and can use them both for identification of persons or identification of a threat or usurpation. The last figure (4.7) represents the filters of each convolutional layer of our final model : Texture + Depth model using individual normalization with 2 more classes for "‘Printed Photos"’ and "‘Printed Masks"’. We can see how the texture and the depth are combined to obtained the final features.

35

(a) Convolution layer 1

(b) Convolution layer 2

(c) Convolution layer 3

(d) Convolution layer 4

(e) Convolution layer 5

(f) Convolution layer 6

Figure 4.7: Filters of Vladimir A. obtained from our final model.

36

Conclusion The goal that we set from the beginning has been achieved. We have been able to record good quality Depth and Texture on a group of people in a short time window. The deep learning model has shown to be efficient and can detect if someone is trying to cheat an authentication systems, solving the problem raised in the article of Kelly Jackson Higgins [16], while Face model had similar scores but is enable to detect usurpation of identity and Depth which is not accurate enough for identification. This internship gave me the opportunity to work with a new team, in a new country and in an environment I was not used to. My choices were listened and I could lead this project as I wanted with the advices of my tutor Mikhail T.. I could use devices from the Institute to try different ways to obtain the data and the team of the institute was willing to help me in this way by being part of the dataset. To thanks the team, they all received a 2.5D avatar of themselves in GIF extension. As future work we could make of the acquisition installation an automatized device with a set calibration for applications. The depth parameters could also be set automatically. Then the device could be installed in any area and record the data for surveillance, identification, authentication or search of people. The Deep learning model could be improved by adding other classes or modifying the architecture. The Depth and Texture could also be processed separately and then merge the two features for finals decision as it has been done in this paper [25]. It would be interesting to see how the identification would progress using a large number of classes and how the inter personal variation would evolve. Facial biometric has many possibilities as deep learning has. Both of them are expending and I hope I could work again in these fields.

37

References [1]

S. Prabhakar, S. Pankanti, and Jain A. “Biometric recognition: security and privacy concerns.” In: IEEE Security & Privacy Magazine 1.2 (2003), pp. 33–42.

[2]

Davide Maltoni et al. “Handbook of Fingerprint Recognition.” In: Springer (2003).

[3]

Arun Mani and Mark Nadeski. Processing solutions for biometric systems. Texas Instruments, July 2015.

[4]

GIGYA. Survey Guide: Businesses Should Begin Preparing for the Death of the Password. www.gigya.com, 2016.

[5] FindBiometrics. Leading industry resource for all information on biometrics modalities. findbiometrics.com. [6]

Andrea F. Abate et al. “2D and 3D face recognition: A survey.” In: Pattern Recognition Letters 28.14 (2007), pp. 1885–1906.

[7]

Stan Z. Li and Anil K. Jain. Handbook of Face Recognition. Second edition. ISBN 978-0-85729-931-4. Springer, 2011.

[8]

Faten Bellakhdhar, Kais Loukil, and Mohamed Abid. “Face recognition approach using Gabor Wavelets, PCA and SVM.” In: IJCSI International Journal of Computer Science Issues 10(2).3 (Mar. 2013). www.IJCSI.org, pp. 201–207.

[9]

Martin Lades et al. “Distortion Invariant Object Recognition in the Dynamic Link Architecture.” In: IEEE Transactions on Computers 42.2 (Apr. 1993), pp. 300–310.

[10]

Yi Sun, XiaogangWang, and Xiaoou Tang. “Deep Learning Face Representation by Joint Identification-Verification.” In: (June 2014). arXiv:1406.4773 [cs.CV].

[11]

R. Hadsell, S. Chopra, and Y. LeCun. “Dimensionality reduction by learning an invariant mapping.” In: In Proc. CVPR (2006).

[12]

Alexander M. Bronstein, Michael M. Bronstein, and Ron Kimmel. “ExpressionInvariant 3D Face Recognition.” In: (2003).

[13]

Mavridis N. et al. “The HISCORE face recognition application: Affordable desktop face recognition based on a novel 3D camera.” In: Proc. Int’l Conf. on Augmented Virtual Environments and 3D Imaging (ICAV3D). Mykonos, Greece, 2001.

[14]

Xiaoguang Lu, Dirk Colbry, and Anil K. Jain. “Three-Dimensional Model Based Face Recognition.” In: (May 2004). 38

[15]

Bergström P. and Edlund O. “Robust registration of point sets using iteratively reweighted least squares.” In: Computational Optimization and Applications 58.3 (2014). doi: 10.1007/s10589-014-9643-2, pp. 543–561.

[16]

Kelly Jackson Higgins. “Researchers Hack Faces in Biometric Facial Authentication Systems.” In: Online Community for security professionals (Feb. 2009).

[17]

Zhe Ji, Hao Zhu, and Qing Wang. “LFHOG: A discriminative descriptor for live face detection from Light Field image.” In: ICIP 978-1-4673-99616/16 (2016), pp. 1474–1478.

[18]

David A. Forsyth and Jean Ponce. Computer Vision Book. Second edition. Pearson, 2012.

[19] Raytrix. The Raytrix Light Field Technology provides 3D capture and high resolution with different one shot cameras. www.raytrix.de. [20]

Seredkin A.V., Shestakov M.V., and Tokarev M.P. “An industrial lightfield camera applied for 3D velocity measurements in a slot jet.” In: AIP Conference Proceedings. Vol. 1770. 030025. 2016.

[21] Range Vision 3D Scanners. RangeVision is a company developing structuredlight 3D scanners since 2010. rangevision.com. [22]

Mikhail Tokarev et al. “DIC for Surface Motion Analysis Applied to Displacement of a Stent Graft for Abdominal Aortic Repair in a Pulsating Flow.” In: 11TH INTERNATIONAL SYMPOSIUM ON PARTICLE IMAGE VELOCIMETRY – PIV15. Santa Barbara, California, Sept. 2015.

[23]

Gary Bradski and Adrian Kaehler. Learning OpenCV. ISBN: 978-0-59651613-0. 1005 Gravenstein Highway North, Sebastopol, CA 95472: O’Reilly Media, 2008.

[24]

Diederik P. Kingma and Jimmy Lei Ba. “ADAM : A method for Stochastic Optimization.” In: ICLR (2015).

[25]

Yuancheng Lee et al. “Accurate and robust face recognition from RGBD images with a deep learning approach.” In: (2016). Computer Vision Lab, Department of Computer Science, National Tsing Hua University, Hsinchu, Taiwan.

39

List of Figures 1 2

Biometrics market share by system type from [3]. . . . . . . . . . Survey from FindBiometrics website [5]. . . . . . . . . . . . . . .

1.1 1.2 1.3 1.4 1.5 1.6

Gabor features from convolution between face PCA example. . . . . . . . . . . . . . . . . . DeepID2 model. . . . . . . . . . . . . . . . . . Different features obtained from range data. . Detection of the features points. . . . . . . . Results of the matching procedure. . . . . . .

2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 2.10 2.11 2.12 2.13 2.14 2.15 2.16 2.17 2.18 2.19 2.20

Raytrix camera and its optic. . . . . . . . . . . . . . . . . . . . . Raw image data (left : entire image, right : zoom on the right eye) Depth map computation process. . . . . . . . . . . . . . . . . . . Raytrix Software. . . . . . . . . . . . . . . . . . . . . . . . . . . . The Range Vision 3D scanner and its components. . . . . . . . . Scheme of the acquisition. . . . . . . . . . . . . . . . . . . . . . . The different light patterns during acquisition process. . . . . . . 3D models obtained with the 3D Scanner. . . . . . . . . . . . . . The Range Camera installation for acquisition. . . . . . . . . . . Images obtained by the two cameras. . . . . . . . . . . . . . . . . Depth computed with the best combination of variables. . . . . . Variation of the physic and the poses with Anna Y. and Maxim S. Segmentation of the depth map. . . . . . . . . . . . . . . . . . . Face detection process . . . . . . . . . . . . . . . . . . . . . . . . Face detection result. . . . . . . . . . . . . . . . . . . . . . . . . . Empty area filling. . . . . . . . . . . . . . . . . . . . . . . . . . . Depth filling. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Results of the different filling methods. . . . . . . . . . . . . . . . Final results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Data repartition. . . . . . . . . . . . . . . . . . . . . . . . . . . .

12 13 13 14 15 15 16 16 17 18 18 19 19 20 20 21 22 22 23 23

3.1 3.2 3.3 3.4 3.5

The deep learning model. . . . . . Architecture of the model. . . . . . Detailed architecture of the model. Example of data augmentation. . . Examples of Difference of Gaussian

. . . . .

24 25 26 27 28

4.1

Distribution of the data for each subject. . . . . . . . . . . . . . .

30

40

. . . . . . . . . . . . . . . . . . . . . . . . features. .

and Gabor filters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

3 4 6 7 8 9 10 10

4.2 4.3 4.4 4.5 4.6 4.7

Accuracy based on the variation of the DoG parameters Disjoint dataset. . . . . . . . . . . . . . . . . . . . . . . . DoG obtained with σ = 0.2 and k = 10. . . . . . . . . . . Accuracy based on the variation of the DoG parameters Joint dataset. . . . . . . . . . . . . . . . . . . . . . . . . . Evolution of the accuracy and the loss for the Both model Disjoint dataset . . . . . . . . . . . . . . . . . . . . . . . . Evolution of the accuracy and the loss for the Both model Joint dataset . . . . . . . . . . . . . . . . . . . . . . . . . Filters of Vladimir A. obtained from our final model. . . .

41

using . . . . . . . . using . . . . using . . . . using . . . . . . . .

31 31 31 33 34 36

List of Tables 1

4.1 4.2

Comparison of several biometric technologies (assessments based on authors’ perceptions of "Handbook of Fingerprint Recognition" [2]). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

3

Results obtained for different models in %. . . . . . . . . . . . . Results obtained for the two new models in %. . . . . . . . . . .

33 35

42