A Real-Time Hand Gesture Recognition System for ... - IEEE Xplore

0 downloads 0 Views 430KB Size Report
CSIE, Tamkang University. Taiwan. [email protected]. Kanoksak. Wattanachote. CSIE, Asia University. Taiwan [email protected]. Hwei-Jen Lin.
Fourth International Conference on Ubi-Media Computing

A Real-Time Hand Gesture Recognition System for Daily Information Retrieval from Internet Sheng-Yu Peng CSIE, Tamkang University Taiwan [email protected]

Kanoksak Wattanachote CSIE, Asia University Taiwan [email protected]

Hwei-Jen Lin

Kuan-Ching Li

CSIE, Tamkang University Taiwan [email protected]

CSIE, Providence University Taiwan [email protected]

retrieval, a camera and speaker should be attached into this system as major hardware components. Each member of family can log in to this system by face recognition and obtaining information from it. Each one of users has personal services and can setup themselves. For example, the type of information news parents would like to receive may be different from children, as also each person’s fortune or entertainment information. Based on these issues, we attempt to solve by providing personal services. This system should keep detecting hand region and recognizing hand gesture so that users can control it by hand. When users choose any function, the system will report the information to user by synthetized speech. Synthetizing speeches make users feel intuition and convenient, though not replacing monitor display. It tends to report few and brief information, so the system will optimize the captured information, and will make people easier to understand commands to be executed. In this paper, our face detection method is using Adaboost and haar-like features, and face recognition method mainly makes use of component analysis and eigenface. Since the proposed system is built and used in family, what signifies that few faces data are stored, so improvements and good results on small-scale face recognition is achieved. Hand detection method is using skin color detection by YCbCr color spaces and Camshift is used to track, and at last, hand gesture recognition method utilized is PCA method. In face recognition, there are many methods available which can perform such as principal component analysis (PCA), neural network, support vector machine (SVM). Among them, PCA has fast process time, so we use principal component analysis to recognize users [1][2]. A research that uses hand gesture to control robotic hand is presented in [3], and some research that utilizes other methods to perform hand gesture recognition are discussed in [4][5]. In this proposed research, we use principal component analysis to recognize the hand gesture, and in order to detect hand position continuously, we use steady adaptive mean-shift (Camshift) for tracking [6][7]. In terms of application, there is a research that designs a face recognition system for smart home [8]. This

Abstract—Nowadays people are used to get daily information from Internet such as weather condition, news and financial information, among others. Though, in order to receiving these daily information, users have to repeat same mouse and keyboard actions, inducing waste of time and inconvenience. In order to improve these situations, we propose in this paper a system design that can easily get daily information without mouse and keyboard actions and make people's life more convenient and easier. In this proposed system, we have implemented an approach that provides daily information retrieved from Internet, where users can operate this system with his hands’ movements. Once selected the function by hand gestures, the system will report action information to users by synthetized speech. In a typical family, since each member has different requirements and needs, the system utilizes face recognition to identify each user, bringing up personalized services to each user. In this paper, we use PCA method to recognize faces as also hand gestures, and then a number of hand gestures and system controls are acquired and stored into this system. Results from a set of experiments indicate that the proposed system in a family environment with small-scale of face recognition show good performance as also good result in hand gesture recognition. Keywords-face recognition; hand detection; hand tracking; hand gesture recognition; PCA

I.

INTRODUCTION

Nowadays people usually have daily information from newspaper, television and Internet. As Internet is more developed and widely available elsewhere, the future of Internet must be major source to access the vast amount of information. Daily information such as weather, news and fortune information are most common ones. Though, these daily information has to repeat the same mouse and keyboard actions, and consequently, it wastes time and many inconvenience are present. In order to improve all these situations, we present in this paper the design of a system that can easily access daily information without mouse and keyboard actions and making people’s life convenient. The environment of this proposed system is built in home environment and provides daily information 978-0-7695-4493-9/11 $26.00 © 2011 IEEE DOI 10.1109/U-MEDIA.2011.45

146

research applies the face recognition to identify the user and provide some personal service, but it is limited by face recognition, and cannot be controlled by user. Another research that makes use of speech recognition and speech synthesis to get some services is also presented in [9]. After a number of considerations, we consider that hand gesture to control system is more quickly, intuition and accurately, and therefore, we use hand gesture to control the system in this proposed research. The remaining of this paper is organized as follows. In section 2, we describe the system architecture and system flow chart, as also to describe face recognition and speech synthesis, while in Section 3 is presented the hand detection and recognition, and approaches of this research. Some experimental results are discussed in section 4 and finally, conclusion remarks and future work in section 5. II.

Figure 2. System flow chart

Figure 2 is the flow chart of our proposed system. At first, the system will keep on face detection. When appearing a detected face and keeping focused for additional three seconds, the user identification is completed. In case the user is one of family members, the user can login into the system and utilize services of the system. The third step is hand detection and tracking, while the fourth step is hand gesture recognition. The fifth and last step is daily information retrieval from Internet. When users use hand gesture choose function, the system will get information and finally report it by speech synthesis. Details on face detection and recognition will be presented in next section.

FACE RECOGNITION AND SPEECH SYNTHESIS

A.

Face Detection and Recognition In face detection, the most widely used method is Adaboost, as provided by Paul Viola and Michael Jones. Adaboost and haar-like features are used to detect facial image, so that we can get face region and position. After detecting face, we have some preprocessing for face recognition. First, facial image is segmented and converted from color image to grayscale one. Next, we resize the facial image to a smaller image since using smaller and lower resolution image can achieve better recognition results than large and high resolution image one. At last, we will have illumination normalization. In face recognition process, several computer vision approaches have been developed for face recognition such as PCA, Neural Network and SVM. Among them, PCA is selected to recognize faces since PCA is fast. Unfortunately, PCA is easily affected by light. Thus, illumination normalization for face recognition is needed. We use histogram equalization to standardize the brightness of images, which we use to recognize the faces. Face recognition process has two phases: a training phase and a recognition phase. The flow steps of training are as follows. Step 1 Load grayscale training images, Step 2 Process PCA on the training faces,

Figure 1. System architecture

The proposed system consists of both face and hand detection and recognition, daily information retrieval from Internet and speech synthesis, as depicted in Figure 1 the system architecture. Each user can easily control it and obtains some daily information. When users enter the camera capture zone, the system will start the process of detection and recognition. Users can use hand to control the system within that area, and when users use hand gesture to choose the function and the system will get the information of the function from Internet. After optimizing all information, it will use speech synthesis to report information to users through read voice message broadcasted in a speaker.

147

Step 3 Project the training images onto the PCA subspace, Step 4 Save the training data.

Skin Detection

N

After training, it will get a training data that saved every feature of training images. In recognition, it will compare the feature of input image and the feature of training image, and finally output the result. The flow steps for recognition process are described as follows: Step 1 Load input image. Step 2 Load the saved training data. Step 3 Project the input image onto the PCA subspace. Step 4 Calculate the distance between input image and training data. Step 5 Output the result.

Tracking Enable Y CamShift

Segmentation and Normaliztion

PCA

Result

After a user is successfully identified, this user can login the system and make use of services. The complete log flow into the system is described in section 4.

Figure 3. Hand gesture recognition flow chart

B.

Speech Synthesis Technologies in speech synthesis have been developed for several years, and nowadays, speech synthesis is used for a number of products such as telephone, train broadcasting, bus broadcasting among others. In this paper, text-to-speech technology is used to report daily information and make users feel intuition and convenient, and the system language of speech synthesis implemented into the system is Chinese. People can get a lot of information from Internet. The daily information is usually searched, and our system provides some commonly used daily information, and it captures the information from Internet. Before report to the user, we must optimize it that makes information few and brief so that user can easily understand it. Our system has basic function include weather, news and fortune. We also use module to perform it so that we can extend the function and provide more services. III.

A.

Detection and Tracking In hand detection and tracking, we detect the hand region and position by skin detection. Skin detection is important step and it can affect the final result. The performance of skin detection must is one of the key points. This paper use YCbCr color space to detect skin region. In YCbCr color space, skin region usually focus on a region that likes an ellipse, and we use ellipse template to decide pixel of image whether is a skin or non-skin pixel. According to the different brightness of light, it will not focus on the same region. We use two ellipse templates to solve different situation of light. When luminance is bigger than 128, we use first ellipse template to detect it. When luminance is less than 128, we use another ellipse template to detect it. Then we can get a better skin region result. In normal situation, one frame may include two hands and face, or two hands, or only one hand etc. We use object tracking to solve this problem. We place a tracking window in the center of camera capture window. If the conditions of tracking trigger are enough, and then it will start tracking. These three conditions of tracking are as followsǺ 1. Detect more than one skin region. 2. Skin region within the tracking window. 3. Skin area is bigger than forty percent of tracking window and keep more than two second.

HAND DETECTION AND RECOGNITION

This chapter will describe the detail of hand gesture recognition. Figure 3 is the hand gesture recognition flow chart. First, the skin detection detects the skin region from input image. If the conditions of tracking trigger are not enough, it will keep detecting skin region. If the conditions of tracking trigger are enough, it will start to track by CamShift algorithm. The third step is segmentation and normalization. The fourth step is PCA. Finally, we will analyze and recognition.

This paper uses an open palm (five) as a default hand gesture. In our experiment, open palm area is about fifty percent of tracking window, so here we use forty percent as a safe threshold. In order to increase the stability, we use timer for tracking trigger. After several tests, we 148

decide to use two seconds timer, and it not makes users feel that wait too long. We use CamShift method to track hand position, and segment it. Figure 4 is our normalization Step. Figure 4(a) is a downsampled image from the original image, and we detect the skin region and binarization and find the biggest skin region, as shown in figure 4(b). Finally we resize the image to a 100x100 resolution, as shown in figure 4(c).

The image size of capturing device is a 640x480 resolution.

Figure 7. System interface

Our system interface provides five icons that can be chose by hand gesture, as shown in Figure 7. The icon in the leftmost is weather information retrieval, and then next icons are news, fortune, settings and log out. Before users use these functions, they need to log in the system. Before user is identified, the system will keep detecting face, as shown in figure 8(a). After user is identified, it will let user log in and use some services, as shown in Figure 8(b).

Figure 4. Normalization Step (a) Downsampled image (b) The biggest skin region (c) Normalized image

B.

Hand Gesture Recognition In hand gesture recognition, we use PCA method to recognize the hand gestures. Before we recognize the hand gestures, we have to train the hand gestures, and then we can use the training data to recognize the hand gesture. Here our training step and recognition step is similar to section 2.

Figure 8. User log in (a) Not be identified face and not log in (b) Identified face and can log in

Figure 5. Six defined hand gestures (a) Zero (b) One (c) Two (d) Five (e) Seven (f) Close palm

Figure 5 is our six defined hand gestures in our system, and we train it. We resize the hand gestures image to the small size, so it can make recognition faster. The resized hand gestures image is a 100x100 resolution, and every hand gesture has two types: non-wrist and a wrist. Each type has five images: front, left-roll rotation, rightroll rotation, left-yaw rotation, right-yaw rotation, as shown in figure 6. The total image of a hand gesture is ten. After training, we use the saved training data to recognize the hand gesture.

Figure 9. Original images and normalized images Figure 6. Five images of one hand gesture (a) Front (b) Left-roll rotation (c) Right-roll rotation (d) Left-yaw rotation (e) Right-yaw rotation

IV.

EXPERIMENTAL RESULTS

In this section, we will describe our equipments and results in our experiment. The equipments include a computer, camera and speaker. As configuration, the computer is an AMD Dual Core 2.3GHz CPU PC with 2GB memory. The camera is using logitech webcam C120. 149

interactive game, smart home, auxiliary equipment and industrial control. In our experiment, the hand gesture recognition accuracy rate is 93.1%, and every frame has between 0.1 and 0.3 second process time, and we have good fluency for controlling the system. As future work, increasing the hand gesture recognition accuracy rate and improves the total speed of process is primary target, so that we can have less process time and do other algorithm calculation. We will add more hand gestures or add mechanism of operation by two hands. It will make control diversity. We will add user define hand gesture by himself that can set user’s intuition hand gesture. We will add more service of information retrieval and it makes more choose and let users feel convenient.

Figure 10. Hand gesture recognition result

Figure 9 is our all original hand gesture images and normalized images. The normalized images are downsampled from the original image and skin detection and binarization. Finally, we can find the biggest skin region and resize to 100x100 resolution. Figure 10 is our six defined hand gesture and recognition result. Our hand gestures consist of zero, one, two, five, seven and close palm. user can use any hand gesture to start tracking and use our defined hand gestures to choose function. The one, two, seven, five and zero hand gestures is correspond to weather, news, fortune, setting and log out. The close palm hand gesture is used to auxiliary. In our experiments, we use hand gesture pictures to test the recognition accuracy rate first in static environment. When we finishing test and get good result, we start to test the recognition accuracy rate in a real-time environment. In our real-time experiment, every hand gesture has 1000 frames to recognize. Table 1 is our hand recognition results in real-time. The total hand recognition accuracy rate is 93.1%. The total time for processing a single frame is between 0.1 and 0.3 seconds: 50ms for face detection, 50ms for face recognition, 50ms for hand detection, and 10ms for hand gesture recognition.

ACKNOWLEDGEMENT This paper is partially supported by National Science Council (NSC), Taiwan, under research grant NSC992218-E-126-002-.

REFERENCES [1] Neerja and Ekta Walia, “Face Recognition Using Improved Fast PCA Algorithm”, in Proceedings of the Congress on Image and Signal Processing CISP'08, pp. 554-558, 2008. [2] Manan Ahmad, Abdullah Bade and Luqman Al-Hakim Zainal Abidin, “Using Principal Component Analysis And Hidden Markov Model For Hand Recognition Systems”, in Proceedings of the International Conference on Information and Multimedia Technology, pp. 323-326, 2009.

TABLE 1. HAND GESTURE RECOGNITION ACCURACY RATE Zero

Two

Five

[3] Jagdish Lal Raheja, Radhey Shyam, Umesh Kumar and P Bhanu Prasad, “Real-Time Robotic Hand Control using Hand Gestures”, in Proceedings of the International Conference on Machine Learning and Computing, pp. 1216, 2010

Close Total palm Accuracy 93.5%ʳ 96.9%ʳ 86.1% 94.1%ʳ 97.8%ʳ 90.3%ʳ 93.1% rate

V.

One

Seven

CONCLUSIONS AND FUTURE WORK

[4] Yikai Fang, Kongqiao Wang, Jian Cheng and Hanqing Lu, “A Real-Time Hand Gesture Recognition Method”, in Proceedings of the IEEE International Conference on Multimedia and Expo, pp. 995-998, 2007

This paper implements a system that provides daily information retrieval from Internet. The system provides an interface that can easily get daily information by hand gesture recognition. The system is not only can apply in family environment, but also can apply in public. In public, every user can get information form this system by hand gesture, and the cost will cheap than touchpad. The system also suitable for the population that not familiar with computer that only learn how to posture the hand gesture. It will every helpful for the population. Our hand gesture recognition can integrate with other application such as

[5] Liu Yun and Zhang Peng, “An Automatic Hand Gesture Recognition System Based on Viola-Jones Method and SVMs”, in Proceedings of the Second International Workshop on Computer Science and Engineering, pp.72-76, 2009.

150

[6] Dorin Comaniciu, Visvanathan Ramesh and PeterMeer, “Real-Time Tracking of Non-Rigid Objects using Mean Shift”, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 142-149, 2000. [7] Pankaj Kumar, Anthony Dick and Tan Soo Sheng, “Real Time Target Tracking with Pan Tilt Zoom Camera”, Digital Image Computing: Techniques and Applications, pp. 492-497, 2009. [8] Fei Zuo and Peter H. N. de With, “Real-time Embedded Face Recognition for Smart Home”, IEEE Transactions on Consumer Electronics, pp. 183-190, 2005. [9] Jun-Ren Ding, Chien-Lin Huang, Jin-Kun Lin, Jar-Ferr Yang and Chung-Hsien Wu “Interactive Multimedia Mirror System Design”, IEEE Transactions on Consumer Electronics, pp. 972-980, 2008.

151