Gloved and free hand tracking based hand gesture recognition

8 downloads 5667 Views 707KB Size Report
E-mail – [email protected]. Abstract — Hand gesture recognition system can be used for human-computer interaction (HCI). The use of hand gestures.
ICETACS 2013

Gloved and Free Hand Tracking based Hand Gesture Recognition Dharani Mazumdar Dept. of Electronics & Communication Engineering, Gauhati University Guwahati-14, Assam, India E-mail – dharanimazumdar89@g mail.co m

Anjan Kumar Talukdar

Kandarpa Kumar Sarma

Dept. of Electronics & Communication Engineering, Gauhati University Guwahati-14, Assam, India E-mail – [email protected]

Dept. of Electronics & Communication Technology, Gauhati University Guwahati-14, Assam, India E-mail – [email protected]

Abstract — Hand gesture recognition system can be used for human-computer interaction (HCI). The use of hand gestures provides an attractive alternative to cumbersome interface devices for HCI. Proper hand segmentation from the background and other body parts of the video is the primary requirement for the design of a hand-gesture based application. These video frames can be captured from a low cost webcam (camera) for use in a vision based gesture recognition technique. This paper discusses about continuous hand gesture recognition. It reports a robust and efficient hand tracking as well as segmentation algorithm where a new method, based on wearing glove on hand is utilized. We have also focused on another tracking algorithm, which is based on skin colour of the palm part of the hand i.e. free hand tracking. A comparative study between two tracking methods is presented in this paper. A finger tip can be segmented for proper tracking in spite of the full hand part. Hence, this technique allows the hand (excepting the segmented finger) to move freely during the tracking time also. Problems such as skin colour detection, complexity from large numbers of people in front of the camera, complex background removal and variable lighting condition are found to be efficiently handled by the system. Noise present in the segmented image due to dynamic background can be removed with the help of this adaptive technique which is found to be effective for the application conceived. Keywords— Hand gesture recognition, Segmentation, Tracking, Complex background.

I.

INT RODUCT ION

Gestures recognition system increasingly becoming significant part of human-co mputer interaction. Gestures can originate fro m any bodily motion or state but common ly originate fro m the face and/or hand. A gesture is a movement of the body parts that contain information and/or feelings [1][2]. Waving goodbye is a gesture. Pressing a key on a keyboard is not a gesture because the motion of a finger on its way to hitting a key is neither observed nor significant. All that matters is which key was pressed. Many researchers have tried to define gestures but their actual meaning is still arbitrary. S.Mit ra and T.Acharya stated gestures as expressive, meaningful body movement involving physical movements of the fingers, hands, arms, head, face, or body with the intent of 1) conveying mean ingful information or 2) interacting with the environment [3]. According to V. I. Pavlovic, R. Sharma and T. S. Huang gestures originate as a mental concept, possibly in conjunction with speech. They

978-1-4673-5250-5/13/$31.00 ©2013 IEEE

are expressed through the motion of arms and hands, the same way speech is produced by air stream modulation through the human vocal tract. A lso, observers perceive gestures as streams of visual images which they interpret using the knowledge they possess about those gestures [4]. Gesture recognition is a topic pursued with the goal of interpreting hu man motions via mathemat ical algorith ms. Gesture recognition can be seen as a way for a co mputer to begin to understand human body language, thus building a richer bridge between mach ines and humans than primit ive text interfaces or even graphica l user interfaces (GUI), wh ich still limit the majo rity of input to keyboard and mouse. Hence, gesture recognition is a process by which the system will know, what is going to be performed by the gesturer. Gesture recognition can be conducted with techniques from computer vision and image processing. Gesture recognition enables humans to interface with the machine and interact naturally without any mechanical devices [5]. Using the concept of gesture recognition, it is possible to point a finger at the computer screen so that the cursor will move accordingly. This could potentially make conventional input devices such as mouse, keyboards and even touch-screens redundant. This paper is organized as fo llo ws: basic theoretical considerations are presented in Section II. So me co mparat ive study on segmentation is presented in literature survey of Section III. Section IV covers about two standard algorithms of hand segmentation and tracking process . Experimental results and discussions are explained in section V. Observations and conclusion are described in Sections VI and VII respectively. II.

BASIC THEORIT ICAL CONSIDERAT IONS

A generalized block diagram for gesture recognition process is shown in fig 1. Segmentation & Tracking

Ima ge Fra me Acqui s i ti on

Fea ture Extra cti on

Classification

Output Ges ture

Figure 1: Block diagram for gesture recognition system

ICETACS 2013

Image acquisition is the first process in gesture recognition technique. A set of image frames are captured by a low cost webcamera. Generally, the image acquisition stage involves such as scaling. Hand segmentation is the pre-requisite to track the movement of the hand. Seg mentation procedures partition an image into its constituent parts or objects. In general autonomous segmentation is one of the most difficu lt task in digital image processing. In the field of technology and image processing hand tracking is a high-resolution technique that is employed to know the consecutive position of the hands of the user and hence represent objects in 3D. After the successful tracking there is a need to ext ract the important feature points fro m the available data points of the track path. In pattern recognition and in image processing, feature extraction is a special form of dimensionality reduction. When the input data to an algorithm is too large to be processed and it is suspected to be notoriously redundant (e.g. the same measurement in both feet and meters) then the input data will be transformed into a reduced representation set of features (also named features vector). Transforming the input data into the set of features is called feature extraction [6]. After feature extract ion, classifier p lays a vital role in gesture recognition process. Classifier is a statistical method that takes feature set as input and gives a class labeled output, which are required output gestures [7]. There are various types of segmentation and tracking algorith ms developed, which are based on color plnes. III.

LIT ERAT URE REVIEW

Many researchers have proposed numerous methods for segmentation. Here, I have included some papers on segmentation as follows. In [8] A figure/ground segmentation method is presented for ext raction of image regions that resemble the global properties of a model boundary structure. The shape representation, called the chordiogram, is based on geometric relationships of object boundary edges, while the perceptual saliency cues use favor coherent regions distinct fro m the background. In [9] Based on object detection and tracking results, a three level hierarchical video object segmentation approach is presented. At the first level, an offline learned segmentor is applied to each object tracking result of current frame to get a coarse segmentation. At the second level, the coarse segmentation is updated into an intermed iate segmentation by a temporal model which propagates the fine segmentation of previous frame to current frame based on a discriminative feature points voting process. At the third level, the intermediate segmentation is refined by an iterat ive procedure which uses online collected color-and-shape information to get the final result. In [10] a new method for figure-ground image segmentation based on a probabilistic learning approach of the object shape. The segmentation process is composed of two parts. In the first part, object shape models are built using sets of object

frag ments. The second part starts by first segmenting an image into homogenous regions using the mean-shift algorithm. IV.

A LGORIT HMS FOR HAND SEGMENT AT ION A ND TRACKING

In our systems, we emp loy co mputer vision (OpenCV) software with a lo w cost web camera, which have low computational comp lexity so that the techniques can operate in real time. Locating and normalizing a hand is a processes of tracking. Several methods can be used for tracking, each with different success and failure condit ions. In this paper we have described two different types of hand tracking algorith ms. One algorith m is developed by free hand, which is based on skin color and the other by using a colored glove. Both of them has advantage as well as disadvantage over each other. These two algorithms are described one by one as given below: A. Free Hand Tracking In this algorithm, tracking is based on skin color . Hand segmentation is the pre -requisite to track the movement of the hand. As we are dealing with hand gestures, the face region in the frame is removed by using a face detection algorithm in OpenCV lib rary[11]. As skin color of human-being has a color distribution that differs significantly fro m those of background objects, we used skin color segmentation method to extract the skin regions from the video frame [12]. Because, skin color is clustered in very small region in co lor space, we can easily separate out skin regions by proper threshold. By doing this, the hand and face parts are segmented as they have almost same color. After this, the face region in the frame is removed by using a face detection algorith m, since we are interested only on the hand motion. Then the original co lor plane is converted to two separate HSV and YCb Cr colo r p lanes, and a threshold is applied to chrominance component of both color spaces. Then these image planes are converted to binary form to apply some morphological operation like erosion, dilation etc. to extract the skin regions. A logical ‗AND‘ operation is done between them to get the most probable skin region (hand). After getting the skin color segmented regions, the largest connected segment corresponds to the palm region of the hand. Then, the centroid of the palm is determined and subsequently the centroid is used for making gesture trajectories. Figure 2 shows flow chart of this algorithm. The algorithm can be summarized as below:Step 1- Face region is detected and subtracted from the input image frame.

-198-

Step 2- The RGB frame is converted into two color planes, i.e. HSV and YCbCr. Step 3- Both image frame is converted to binary plane for doing some morphological operation to extract the skin regions. Step 4- A logical ‗AND‘ operation is performed between the planes, which gives segmented hand region.

ICETACS 2013 Step 5- Centroids of segmented region is calculated using moments calculation.

Step 3- HSV p lane is converted to binary plane with proper thresholding.

Step 6- Gesture trajectories can be found by connecting centroids of the segmented region.

Step 4- Certain morphological operations such as erosion-dilation on the binary plane gives a proper segmented result. Step 5- Centroids of segmented region is calculated using moments calculation. Step 6- Gesture trajectories can be found by connecting centroids of the segmented region.

Fig 2: Process of Free Hand T racking Algorithm

Fig 3: Process of Gloved Hand T racking Algorithm

B. Gloved Hand Tracking In this algorith m the palm has a colored g love. HSV color plane is the most suitable color plane for color based image segmentation [13]. So, we have converted the color space of original image frame of the camera, i.e. fro m RGB to HSV p lane. Now setting the hue value ranges from the lower threshold to upper threshold for the particular color of hand gloves, we can easily eliminate all the other parts of the image frames. Finally, this output is converted to binary form and certain morphological operation likes erosion, dilation etc. are carried out. It results in a noiseless segmented hand or region of interest (ROI) that can be used subsequently. Then, the centroid o f the segmented hand is determined and subsequently the centroid is used for making gesture trajectories. Figure 3 shows flow chart of this algorithm. The algorithm can be summarized as below:Step 1- The input image frame is first converted to HSV color plane from RGB plane.

The gloved hand tracking algorith m can be performed by another way also. Due to change of only one pixel of the segmented region, track path gets deformed [14]. So, this method creates some limitations on the degrees of freedom of the hand. At the tracking time, if by mistake the orientation or shape of the hand is changed, number of pixel may change. Hence, the position of the centroid changes. So, we have modified our idea by segmenting a very small region fro m the hand. For this we wear a cap of the same color as glove on the finger tip . C. Calculation of Centroids for Both the Algorithms We have measured the centroids of the segmented hand region by a simp le way, which is same for both the algorith ms. The centroid (x, y) of the segmented hand region R is arith met ic mean of the co-ord inates in the x and y directions, as given in equation (1).

x

Step 2- Hue value is then set properly for the color of the hand glove.

-199-

1 R

u And y ( u ,v ) R

1 R

v. ( u ,v ) R

(1)

ICETACS 2013

V.

RESULT S A ND DISCUSSIONS

The resulting image for every step of the two algorith ms as described in the proces s of Section IV, is given below. Figure4 shows the resulting image for free hand tracking algorithm.

Fig 5: Output image planes for gloved hand tracking algorithm

Fig 4: Output image planes for free hand tracking algorithm

Figure4b shows the detected face of the original input image shown in figure4a. Figure4c and 4e shows the HSV and YCb Cr image plane respectively. Figure 4d shows the image plane after ext racting the skin regions by doing s ome morphological operation on Figure4c wh ich is obtained after converting it to b inary p lane. Similarly, Figure4f shows the result of skin regions extraction of Figure4e. Figure4g is the resulting segmented image plane found from logical ‗AND‘ operation between Figures4d and 4f. Finally, Figure4h shows tracking result for the gesture ‗A‘.

Fig 6: T racking with finger tip

Figure5a is the input RGB image p lane wearing a glove of red color on the palm part of the hand. Figure5b shows the HSV plane. Figure5c shows the binary form after setting the hue value between lower and upper threshold range. Hue value of red color generally in the range of 160-179. We found that the range of 172-179 is perfect for the range of hue values of our red glove. Doing some morphological operation like erosion, dilation etc. on the binary plane can found out a good segmented hand as seen image p lane Figure5d . Finally, Figure5e shows tracking result for the gesture ‗G‘.

-200-

ICETACS 2013

Figure 6 is the tracking result fo r the same gesture ‗G‘ with a single finger tip. It is clear that tracking found from the finger tip is better than the gloved hand. Gloved hand tracking method creates some limits to the degrees of freedom of the hand for t racking purpose. But the finger t ip offers higher degrees of freedom to the other parts of the hand excepting the caped finger tip. VI.

OBSERVAT IONS

A few samp les are recorded by a webcam and indooroutdoor sets are created for use with this two algorithms . In this paper we have included the segmented output result found from this two algorith ms. The samples consider illumination and background variation.

(a)

by changing the lighting conditions, but this technique is robust to all these variable conditions and gave the same output. This is because this system shows no effect to the hue value of the glove. Again, Figure8 shows some of the test results at dynamic background conditions. To know the robustness of the system, various environments are considered. Figure8b is the output from free hand tracking technique for the input as shown in Figure8a. This gives a noisy segmented result. Figure8c and 8d shows the input and output gloved hand tracking technique respectively. Hence we can conclude that this technique is robust to lighting and complex background conditions and it does not depend on any situation. Figure9 shows a snap shot of the calculated centroids of one of the segmented hand region.

(b)

Fig 9: Calculation of centroids of segmented hand

(c)

Table I shows the result of these technique at static and dynamic background conditions. For static background condition, we have observed the situation like normal, complex background and dynamic lighting. We have found that at any conditions gloved hand tracking algorithm is robust and effective.

(d)

Fig 7: Segmentation result at static background condition

TABLE I: COMPARISONS OF SEGMENT ED OUT PUT BET WEEN T HE ALGORIT HMS

(a)

(c)

Segmentation Models

Normal

Complex backgrou nd

Dynamic Lighting

Dynamic Background

Free Hand Tracking

Working

Working

Working

Noisy output

Gloved Hand Tracking

Working

Robust

Robust

Robust

(b)

We have calculated the PSNR (Peak Signal to Noise Ratio) value of the output results of this two algorithms as shown in table II. The PSNR is defined as -

(d)

Fig 8: Segmentation result at dynamic background condition

Figure 7a is the input to free hand tracking technique at static background with poor light condition, wh ich gives a noisy output as shown in Figure7b. But the same samples when are applied as input as shown in Figure7c, to gloved hand tracking technique gives a better segmented output as shown by Figure7d. We have performed a nu mber of trials

(2)

Where, is the output image, is the input image and ( ) is the size of the image; i, j are the pixels values.

-201-



ICETACS 2013

The PSNR values further establish the advantages of the gloved hand tracking algorith m over free hand tracking algorithm.

hand by wearing glove of two different colors. Real time recognition process with no time delay is one of the features offered by this algorithm. We can use it for real time purpose at any environment.

TABLE II: COMPARISONS OF PSNR VALUE BET WEEN THE TWO

REFERENCES

T RACKING ALGORIT HMS [1] Complex Background

Dynamic Lighting

Dynamic Background

(PSNR)

(PSNR)

(PSNR)

27.5342

27.5342

27.5342

27.5342

Free Hand T racking

28.1697

28.2862

27.8681

27.9334

Gloved Hand T racking

30.1923

30.2398

30.9188

31.0234

Normal

Mode ls

(PSNR)

Input (PSNR)

VII.

[2]

[3]

[4]

CONCLUSION

Free hand tracking algorith m is very sensitive to varying lighting conditions and dynamic background situation. Apart from these drawbacks , there are some limitations related to gesturer as below1. Gesturer should always wear fu ll sleeve to easily detect the palm part from rest of the body parts . 2. It is d ifficu lt to detect the correct gesturer when more than one gesturers are there. (Figure10) 3. When the hand comes in front of the face, then it is difficult to separate it from the face. (Figure10)

[5] [6]

[7]

[8]

[9] [10]

[11]

(a)

(b) [12]

Figure 10: Problems related to free hand tracking algorithm

As compared to the free hand tracking, gloved hand tracking algorith m is not dependent on any background and lighting conditions. On a hot day, there is no necessity to put on full sleeve for gesture recognition process using glove hand tracking algorithm, because it is not sensitive to body colors (for this co lor o f the g love or finger cap should not be same with body color). In continuation with this algorithm we can easily find out the back s ide (top part) of the gesturer

[13]

[14]

-202-

F. Florez, J. M. Garcia, J.Garcia and A. Hernandez, ―Hand gesture recognition following the dynamics of a topology preserving network‖, Proceedings of the Fifth IEEE International Conference on Automatic Face and Gesture Recognition, pp. 318-323, 2002. T. Ishihara, N. Otsu, ―Gesture Recognition Using AutoRegressive Coefficients of Higher-Order Local AutoCorrelation Features‖, Proceedings of the Sixth IEEE International Conference on Automatic Face and Gesture Recognition , pp. 583-588, 2004. S. M itra and T. Acharya, ―Gesture Recognition: A Survey‖, IEEE Transactions on Systems, M an, and Cy bernetics—Part C: Applications and Reviews, vol. 37, no. 3, pp. 311-324, 2007. V. I. Pavlovic, R. Sharma and T. S. Huang, ―Visual Interpretation of Hand Gestures for Human-Computer Interaction: A Review‖, IEEE Transactions on Pattern Analysis and M achine Intelligence, vol. 19, no. 7, pp. 677695, 1997. E. S. Nielsen, L. A. Canalis and M . H. Tejera, ―Hand Gesture Recognition for Human-M achine Interaction‖, Journal of WSCG, vol. 12, no. 1-3, 2004. S. L. Phung, A. Bouzerdoum and D. Chai, ―Skin Segmentation Using Color Pixel Classification: Analysis and Comparison‖, IEEE Transactions on Pattern Analysis And M achine Intelligence, vol. 27, no. 1, pp. 148-151, 2005. L. Gu and D. Bone, ―Skin Colour Region Detection in MPEG Video Sequences‖, in Proc. IEEE International Conferences on Image Analysis and Processing, Italy, pp. 898-903,1999. A. Toshev, B. Taskar, K. Daniilidis, ―Object Detection Via Boundary Structure Segmentation‖, IEEE International Conference on Computer Vision and Pattern Recognition, San Francisco, pp. 950-957, 2010. J. Xing, H. Ai and S. Lao, ―Hierarcial Video Object Segmentation‖, First Asian Conference on Pattern Recognition, Beijing, pp. 67-71, 2011. G. Lariviere, M . S. Allili, ―A Learning Probabilistic Approach for Object Segmentation‖, Ninth Conference on Computer Vision and Robotics, Toronto, pp. 86-93, 2012. Y. Wang, K. F. Loe and J. K. Wu, ―A Dynamic Conditional Random Field M odel for Foreground and Shadow Segmentation‖, IEEE Transactions on Pattern Analysis and M achine Intelligence, vol. 28, no. 2, pp. 279-286, 2006 . G. R. Bradski, S. Clara, ―Computer Vison Face Tracking For Use in a Perceptual User Interface‖, Intel Technology JournalQ2‘98, 1998. M . H. Yang, N. Ahuja, M . Tabb, “Extraction of 2D M otion Trajectories and its Application to Hand Gesture Recognition‖, IEEE Transactions oN Pattern Analysis and M achine Intelligence, vol. 24, no. 8, pp. 1061-1074, 2002. G. R. S. M urthy , R. S. Jadon, ―A Review Of Vision Based Hand Gestures Recognition”, International Journal of Information Technology and Knowledge M anagement, vol. 2, no. 2, pp. 405-410, 2009.