Age Estimation Method Based on Generative ...

17 downloads 0 Views 816KB Size Report
Age estimation, Convolution neural network, Generative adversarial networks ... GAN (Generative adversarial networks) is a kind of generative model which was.
2017 2nd International Conference on Computer Engineering, Information Science and Internet Technology (CII 2017) ISBN: 978-1-60595-504-9

Age Estimation Method Based on Generative Adversarial Networks XIN NING, WEIJUN LI and LINJUN SUN ABSTRACT Face age estimation has become a hot issue in the field of computer vision and man-machine interaction. To solve the unbalanced distribution of different types in age database, we adopted generative adversarial networks to study distribution of human face images to generate a great number of human face data of different ages. Then, we established age estimation model based on convolution neural network. Finally, in order to realize a better model performance, we adopted the training mode which was featured with knowledge transfer to improve performance of network models. The experimental results demonstrated that the proposed age estimation method has higher classification accuracy and a smaller age error. KEYWORDS Age estimation, Convolution neural network, Generative adversarial networks.

INTRODUCTION Human face is one of the most important biological characteristics of human, and it plays an important role in the aspect of identity distinguishing and emotional transference. In recent years, studies based on human face images include face detection, identity authentication, and face characters (gender, age, expression and race) which have become research hot issues in the field of computer vision and manmachine interaction. And moreover, age estimation technology have various application demands in the field of man-machine interaction, and application of computers to realize face age estimation has been concerned by more and more people in recent years. In the past dozens of years, people have carried out a great number of studies on age estimation. Stable and effective age features can greatly improve performance of age estimation system. Pitanguy [1] selected parameters which could represent changes of human face with age by measuring size of face organs and bones to quantize effect of age changes on human face. Cootes [2] put forward an active shape model (ASM) to represent features of human face. The model added a model of overall texture information on the basis of face shape model. Lanitis [3] adopted the active appearance model (AAM) to extract age features of human face and established _________________________________________ Xin Ning, [email protected], Linjun Sun, [email protected]. Institute of Semiconductors, Chinese Academy of Sciences, Beijing 100083, China, Wave United laboratory for cognitive computing, Beijing 100083, China; Corresponding author: Weijun Li, w [email protected], Institute of Semiconductors, Chinese Academy of Sciences, Beijing 100083, China;

333

the function between face features and age with the method of regression analysis to realize age estimation. Geng [4] put forward the character sub-space model which would divide face images of different ages into different sub spaces to reflect age features of human. And [5] applied the mainfold-learning model in face feature extraction and tested it in the YGA database through SVM classifier and SVR classifier. Liu [6] proposed an end-to-end learning approach for robust apparent age estimation, named by AgeNet. The above methods have solved the problem of age estimation to some extent, but there are still some limitations. Age estimation is a complex issue. Based on generative adversarial networks, a new age estimation method is proposed in this paper. GENERATIVE ADVERSARIAL NETWORKS GAN (Generative adversarial networks) is a kind of generative model which was put forward by Goodfellow et al. [7] in 2014, and it has become a hot research direction in the artificial intelligence circle at present. The thought of GAN is originated from nash equilibrium of the game theory which includes a discriminator and a generator. GAN studies potential distribution laws of training samples through competitive training and generates new similar samples. The structured flowchart of GAN is as shown in Fig.1. We used G and D to expressed generator and judger respectively which could be random neural network or sensor. Input of G network is random variable z, and G (z) is a sample generated by G network. Input of D network is real data x and generative data G (z). As for judger, if the input comes from real data x, it is labeled as 1, which indicates that it is a real data. And if the input comes from generative data G (x) from the generative network, it is labeled as o, which indicates that it is a generative (false) data. The purpose of D is to distinguish real data and false data as far as possible; while the purpose of G network is to make its generative false samples G (z) to cheat D network as far as possible to make D network cannot distinguish they are generative samples. AGE ESTIMATION METHOD BASED ON GAN Since face images include complicated background which will affect performance of age estimation models, thus age estimation methods generally include the following procedures: face detection, face aligning, feature extraction, model training and age estimation, and the process is as shown in Fig.2. This paper regarded age estimation as a multi-classification problem which aims to use deep convolution neutral networks to realize face feature extraction and classification model training. Through study of a large number of samples, this paper extracted face features and trained classification models to realize face age prediction.

334

Real data x

x Discrimination model D

G(z)

Real/False?

Generative model G

Random noise z

Figure 1. The structured flowchart of GAN.

Figure 2. The process of face age estimation. Table 1. Distribution of different age groups.

0~7 5000

8~15 5500

16~25 85000

26~35 180000

36~45 160000

46~60 110000

>60 15000

Sample expansion of face age data based on GAN Realization of training of deep convolution neural networks requires a large number of data samples with labels. With the deepening of studies on age estimation based on face images, a face image database specific to study of age has be established presently. As for deep study, performance of network models is closely related to training data samples. In general, a large quantity of well-balanced data samples can be trained to obtain better models. To solve a shortage of samples, several open and popular face age database have be collected, including FGNET database, MORPH database, CACD database and IMDB-Wiki database which have 1002, 52099, 17423, 157362 and 397969 face images respectively. By combing these databases, an age database which includes more than 600 thousand face images can be obtained. The database can meet demands of deep learning algorithm to a great number of training samples. However, all the open databases have the problem of uneven distribution, and the statistical results are as shown in Table 1. It can be seen that images in the current age database generally centered on age during 26 to 60, while sample data of children, youngsters and old people is very few. Training with this data will give rise to an unbalanced distribution ability of network and will reduce classification accuracy in test sets in particular. To solve the problem of uneven data distribution, we adopted generated data of generative adversarial networks to expand samples. We adopted generative adversarial models based on convolution neutral networks, that’s to say, both generative network and discrimination network adopted convolution neutral networks, instead of traditional multilayer perceptron. The network structure is as shown in Fig. 3.

335

Figure 3. The structure of generative network and discrimination network.

The optimization function of generative adversarial networks is as follows: (1)

X is training data which obeys to certain data distribution pdata , and z is a random vector quantity which obeys to special distribution p z . D(x) and G(x) stand for output results of discrimination network and generative network respectively in a given x condition. Training network D can distinguish samples correctly as far as possible. When input data comes from real samples, log (D(x)) should be maximized. And when input data comes from generative network, log (1-D(x)) should be maximized. When training network G to make what input in network Dare generative samples, output probability should be maximized as far as possible, and in other words, to minimize log(1-D(x)). In the training process, one part should be fixed to upgrade parameters of another network, and to alternate between them to maximize mistakes of the other part. And in the end, network G can learn distribution of original data. Image pre-processing To reduce difficulty of convolution neutral network learning training, face images used for network training generally need operations like face detection, location of key points and face aligning, and to remove complicated background and correct facial postures. We adopted FuST face detection method to carry out face detection [8], ERT location of key points to locate key points of face images, and similarity transformation to realize face aligning [9]. Design of network structure Accurate age estimation can be regarded as a multi-classification problem. We regarded each age of people as a group and predicted there were 100 different age 336

groups from 0 to 100. And compared with classification of age groups, accurate age prediction can greatly increase the difficulty of classification. Since age estimation is very complex, a network model which has a strong expressive ability and classification capacity is required to fit and realize. Improvement of network models in classification capacity needs to adopt deeper network structures, and general structures of convolution neutral networks have difficulties to obtain an optimal solution. Thus, we adopted convolution neutral networks VGG-16 [10] to realize face age feature extraction and age group classification. VGG network was put forward by the visual group of University of Oxford and it won the champion in the ILSVR in 2014. It has excellent performance in classification and has been widely used in various computer visual tasks like image segmentation and face identification. The network is featured with a deep network structure and a strong feature presentation ability at the same time. VGG16 includes 16 layers, including 13 convolution layers and 3 wholly connection layers. The former 15 layers are used for feature extraction and the last wholly connection layer is used for classification. Design of network training methods We realized network training by use of the thought of transfer learning. Network first has sufficient pre-training in the Imagenet database set with a data quantity of 1.5 million, and at that time the last wholly connection layer of the network will output a vector quantity of 1,000, which corresponds to 1,000 types. Although network having pre-training in the Imagenet database set is not equipped with an age classification capacity, it can provide initial values for age classification networks. Generally speaking, a good initial value is critical to network training. Later, we adjusted network in face age database. The database here includes real face images and generated face images. Since age categorical distribution of the age database is not balanced, network models obtained on the basis of the training are more accurate. EXPERIMENT AND ANALYSIS Expansion effect of age samples When generating face images of a certain age group, we selected a large number of noise vectors in the [-1,1] balanced distribution and each noise vector has 100 dimensions. And to add slight disturbance to each dimension of each noise vector, more noise vectors can be obtained. And when inputting these noise vectors into generated networks, corresponding face images can be obtained. Fig.5 is an example of generated images. And the left side of Fig.5 is used to train real images and the right is for generated images.

337

Real image the resulting image Figure 5. The resulting of generated image.

Thus, it can be seen that generative adversarial networks learned distribution of original data and could generate almost real face images. However, generated images are not copies of original trained data but randomly distributed generated images through study, and they are varied. According to the above method, expansion of face image samples of different ages can form evenly distributed face age database. Age estimation results This paper regarded age prediction as a multi-classification task and adopted convolution neutral networks to train classification models to realize estimation of age groups. Performance evaluation of age algorithm is an important link for studies of age estimation. At present, common evaluation of age prediction includes mean absolute error (MAE) and mean accuracy. MAE refers to average values of absolute errors between estimated age and real age, and it can be expressed as: MAE=



N k 1

Sk  Sk

(2)

N

Sk is real age; Sk is estimated age; and N is the number of all test images. The smaller the MAE, the smaller the error range will be and the higher the accuracy of the algorithm will be. Mean accuracy is a common age estimation evaluation function. The so called mean accuracy is the proportion between the number of correctly estimated images and the number of the whole images, and it can be expressed as: accuracy=

N correct  100 N

(3)

Table 2. Age estimation performance before data expansion.

DataSet FGNET MORPH

MAE 1.5375 2.6076

Top10 accuracy 0.9375 0.8828

338

Top5 accuracy 0.8000 0.6184

Table 3. Age estimation performance after data expansion.

DataSet FGNET MORPH

MAE 1.4531 2.469

Top10 accuracy 0.9422 0.8889

Top5 accuracy 0.8125 0.6253

If the number of all the test images is N, the number of correctly estimated images N is correct . It is clear that a higher estimated accuracy indicates a good performance of age models, and the index is often used for performance evaluation of age group models. As for accurate age estimation, we adopted Top10/Top5 accuracy indexes to measure. Top10/5 accuracy refers to that in the 101 types, if the top 10/5 types with the highest probability in network prediction include real types, the prediction can be regarded correct. Table 2 and Table 3 have provided MAE, Top 10 accuracy and Top 5 accuracy performance indexes with accurate sage estimation in FGNET data set and MORPH data set. From the above tables, we can see that the age prediction model of the project based on deep convolution networks have obtained a higher accuracy in accurate age classification with an average error of no more than 3 years. Meanwhile, through practical tests, the method could realize good performance in practical scenes. In addition, by expanding age samples which have received training through generative adversarial networks, the overall classification performance was further improved. CONCLUSION From the current open age database, data samples generally centered from 26 to 60, lacking of data of low age groups and high age groups, which is to the disadvantage of model training. Therefore, we adopted the newest generative adversarial networks to generate samples lacking of age groups to solve the problem of uneven age distribution. In the end, we adopted deep convolution networks with a feature of imitation to realize face image feature extraction and classification and age estimation of face images. The experimental results indicated that the age estimation method proposed by us can realize a higher classification accuracy and a smaller age error. ACKNOWLEDGEMENTS This research was financially supported by the National Science Foundation. This work was supported by the National Nature Science Foundation of China (Grant No.61572458 and No.90920013). Corresponding author: Weijun Li, [email protected], Institute of Semiconductors, Chinese Academy of Sciences, Beijing 100083, China. REFERENCES 1.

Pitanguy I, Leta F, Pamplona D, et al. Defining and measuring aging parameters [J]. Applied Mathematics and Computation, 1996, 78(2): 217-227.

339

2.

Cootes T F, Edwards G J, Taylor C J. Active Appearance Models [J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2001, 23(6):681-685. 3. Lanitis A, Taylor C J, Cootes T F. Toward automatic simulation of aging effects on face images [J]. Pattern Analysis & Machine Intelligence IEEE Transactions on, 2002, 24(4):442-455. 4. Geng X, Zhou Z H, Smith-Miles K. Automatic age estimation based on facial aging patterns [J]. IEEE Transactions on pattern analysis and machine intelligence, 2007, 29(12): 2234-2240. 5. Guo G, Fu Y, Dyer C R, et al. Image-based human age estimation by manifold learning and locally adjusted robust regression.[J]. IEEE Transactions on Image Processing, 2008, 17(7):1178-88. 6. Liu X, Li S, Kan M, et al. AgeNet: Deeply Learned Regressor and Classifier for Robust Apparent Age Estimation[C]// IEEE International Conference on Computer Vision Workshop. IEEE, 2015:258-266. 7. Goodfellow I J, Pougetabadie J, Mirza M, et al. Generative Adversarial Nets [J]. Advances in Neural Information Processing Systems, 2014, 3:2672-2680. 8. Wu S, Kan M, He Z, et al. Funnel-Structured Cascade for Multi-View Face Detection with Alignment-Awareness [J]. Neurocomputing, 2016. 9. Kazemi V, Sullivan J. One millisecond face alignment with an ensemble of regression trees[C]// Computer Vision and Pattern Recognition. IEEE, 2014:1867-1874. 10. Zhong S, Li K, Feng R. Deep convolutional hamming ranking network for large scale image retrieval[C]// Intelligent Control and Automation. IEEE, 2014:1018-1023.

340