Gesture Recognition Using Modified HSV Segmentation - IEEE Xplore

16 downloads 0 Views 992KB Size Report
Banaras Hindu University. Uttar Pradesh, India e-mail: [email protected]. Pramod K. Mishra. Computer Science Department. Banaras Hindu University.
2011 International Conference on Communication Systems and Network Technologies

Gesture Recognition using Modified HSV Segmentation Mokhtar M. Hasan

Pramod K. Mishra

Computer Science Department Banaras Hindu University Uttar Pradesh, India e-mail: [email protected]

Computer Science Department Banaras Hindu University Uttar Pradesh, India e-mail: [email protected]

accurate classification, on the contrary, the device that used for motherboard manufacturing requires a more accurate, Fig. 1 shows the employing and discovering the perfect features with perfect tools throughout years of researches. Gestures can be static or dynamic [3], static gesture shows a single pose for single command, dynamic gestures requires a sequence of gestures for a single command.

Abstract—Gesture recognition system is an important link between humans and machines, in this paper we are going to present a new gesture system that used to control the human made machine by teleoperations, we have applied in this system the HSV color model that used for segmentation operation with some modification to overcome the problem of incomplete segmentation, the input gesture is divided into blocks and the features are extracted using our suggested brightness feature vector, we have made a study by considering all the block sizes starting from single pixel block size which is 1x1 and going onward passing by all the odd blocks numbers up to 23x23 block size in order to reveal the performance of gesture system through different block sizes, we have achieved an average recognition time of 1.676 seconds per gesture, in this paper we have focused on hand gestures instead of other organ because the hand can express much meaning and this is why it is used commonly by deaf people and we have applied static gesture system.

II.

Keywords: Brightness Calculation, HSV color model, Gesture Recognition, Image Segmentation, Laplacian Edge Detection, Image Saturation, Normalization, Feature Vector, Image Smoothing.

I.

INTRODUCTION

The human gestures can express a lot of meaning and need a tool to manipulate and exploit this meaning; the different algorithms applied throughout the years of researches do not employ these features, so that the researches continue with inventing new algorithms that may exploit the features in a perfect manner. Gesture system can be one of the following three states as classified in [1], glove based, vision based and drawing gestures, the vision based represents the most promising and a good alternative for glove based [2] which depends on sensors and wires and considered costly, the vision based needs one camera attached to a robot and the gesture recognition algorithm responsible for translating the human gestures into a command to be carried out by the machine or robot. All the gesture systems try to use more information out of the rich information the human gestures can provide, the process of employing an extra information or refining the features from the existed subtle noises leads to inventing new algorithm, also each algorithm can be used for some application(s), for example the device that used to count the number of products in an assembly line does not need an

978-0-7695-4437-3/11 $26.00 © 2011 IEEE DOI 10.1109/CSNT.2011.75

HSV SEGMENTATION

In our suggested algorithm, we have applied the segmentation operation using HSV color model, one of the advantages of this color model is the skin color searching model [3], this model seeks for the human skin pigment and segments this color component accordingly, by using some range of predefined values for each of the HSV color components; we can achieve a good segmentation, Xingyan Li [4] has decided a good range for image segmentation and he neglected the V components specification. We have noticed a problem after applying the HSV segmentation algorithm which is the existence of black spots in the resulted image; this is caused since some reasons: • The original input image has some wrinkles at the hand palm, which cause this wrinkled area to be classified as non-hand region due to the specific predefined values of Hue, Saturation and Value components. • The difference in lighting condition due to the free hand position at the time of gesturing. • The existence of a human hair at the back side of the hand. • Noise, which is the inevitable element.

Presented Gestures 1970’s

Classification Algorithm 1, 2,3, 4, 5, … 1980’s

1990’s

Figure 1. Features Employing throughout the Decades. 328

Due to the above reasons, the segmented image could appear as in Fig. 2. So, in order to overcome this gross edges which will produce a false features and will affect the recognition decision, we have applied a segment filling strategy in order to fill out the segment before applying the edge detection method, the filling can be summarized as the saturation of the inner segmented hand gesture reaching up to hand perimeter, this will help us to vanish all the black spots caused by the segmentation operation and will lead to a better edges, Fig. 3 showing this fact. We can now compare ocularly which edges are more ready for feature extraction in case of Fig. 2 and Fig. 3 edges. III.

Algorithm 1: Feature Vector Calculation F(n) Input: Edged Hand Image I(width, height), bs Output: Feature Vector F(n) with n Features Method: Step 1: [Initialize] x=0, y=0 Step 2: [feature value calculation] ∑ ∑ , Step 3: [update variables] F(next location)=sum Set sum to zero Increment x by bs Return to step 2 if x + window Threshold then increment white by one

Segmented

Step 3: [update] Increment i by one Return to step 2 if i < n. Step 4: [output] mb=black + white end

Filled Edges Figure 3. Application of Filling Operation.

329

V.

As a final step to determine which database gesture the new presented gesture belongs to, we assume V(n) is the feature vector of size n represents the testing feature vector, and the F(m, n) is the n database features for each of m different stored database gestures, the following equation is used: min |

2

,

|

We have tested our algorithm using 24 different gestures that cover six different poses, 4 samples per pose, the hereinafter figures shows the performance of our system Table (2) shows the recognition percentages achieved by our system considering different blocks sizes. We did not exceed the block size 23x23 because we have noticed that the recognition rate dropped down when we are going behind this block number. We have calculated another parameter which helps us understanding the behavior of the suggested algorithm, which is the average of recognition, this parameter tells us how far the other non-recognized gestures from the recognized one, i.e., let us consider the block size 11x11, at testing phase we have tested 24 different gestures, each one has a recognition percentage represents his matching criteria, the average of those 24 matching criteria represents the output shown in Fig. 5. As this parameter going high, this gives us a bad indication about the recognition overlapping and non-clear separation, as this parameter going low, it indicates that the recognized gesture is well separated; this concept can be linked with Fig. 6. Fig. 6 explains the fact of mean of the distance from the recognized gestures against all other database gestures for each block size, the higher mean indicates a good separation of the recognized gesture, which means that the other nonrecognized gestures are far away from this one, in turn, the lowest number shows that there is some kind of overlapping between the recognized and non-recognized gestures.

1

The database gesture that corresponds to mb is the winning gesture. IV.

SYSTEM IMPLEMENTATION

BLOCK SIZE CONSIDERATION

We have employed different block sizes in testing operation in order to study the behavior of our algorithm, we have started the block size from a single point which is 1x1, and then roaming by all the odd blocks up to 23x23, the first impression about the expected feature vector size is disappointed, because in 1x1 block size leads to 16348 features, which is difficult to manipulate and difficult to grasp be the computer since we have 60 different gesture images, but after the edges detection and passing over the false features, we obtained a feature vector size of 875 which means 5 % will be the true features over the whole features, Table 1 listed all the true and false features that are calculated by using various block sizes separately. And also another remarkable notice is that, as the block size that employed by the system increases, the number of true features also increases, which lead to perfect features in the feature vector gradually. We have considered that all the black blocks in the resulted image after edge detection are to be a false feature, and the blocks that have some brightness (edge information) considered are to be a true feature and can be used in the classification algorithm and so, can be stored along with their indices in the database which can significant reduce the feature vector size.

TABLE 1. TRUE FEATURES PRODUCES BY OUR SYSTEM.

Block Size 1x1 3x3 5x5 7x7 9x9 11 x 11 13 x 13 15 x 15 17 x 17 19 x 19 21 x 21 23 x 23

Total Features 128 x 128 = 16348 42 x 42 = 1764 25 x 25 = 625 18 x 18 = 324 14 x 14 = 196 11 x 11 = 121 9 x 9 = 81 8 x 8 = 64 7 x 7 = 49 6 x 6 = 36 6 x 6 = 36 5 x 5 = 25

True Features 875 198 102 71 50 40 33 26 24 19 12 16

TABLE 2. RECOGNITION PERCENTAGES.

Block Size 1x1 5x5 9x9 13x13 17x17 21x21

Figure 4. Database Gestures.

330

Recognition % 71 79 92 92 92 92

Block Size 3x3 7x7 11x11 15x15 19x19 23x23

Recognition % 75 92 92 96 96 83

Recognition %

100 90 80 70 60 50 40 30 20 10 0

25x25

23x23

21x21

19x19

17x17

15x15

13x13

9x9

11x11

7x7

5x5

3x3

1x1

Average of Recognition % of 24 Tested Gestures per block size

94 92 90 88 86 84 82 80 78 76

Block Size

Database Stored Poses

Figure 7. Recognized Gesture Behavior with Single Block Size.

70

40 35 30 25 20 15 10 5 0

Recognition %

60 50 40 30 20 10

25x25

23x23

21x21

19x19

7x17 1 17x17

15x15

13x13

11x11

9x9

7x7

5x5

3x3

0 1x1

The mean of distance of all the database gestures from tested gesture per block size

Figure 5. Average of Recognitioon

Database Stored Poses

Block Size

Figure 6. Distance Mean.

Figure 8. Unrecognized Gesture Behavior with Single Block Size.

As an illustrating example for Fig. 6, suppose we have three database training gestures d1, d2, and d3, and the recognition percentags was 70%, 40%, and 90% respectively when compared with a presented gesture t1, so, as obvious, t1 matches d3 with a good percentage and the distance from d1 and d2 are abs(70-90) and abs(40-90) which are 20 and 50 respectively and gives (20+50)/2=35, this 35 is the mean distance; now, a good separation and recognition algorithm when gives a high mean. We also have drawn another tested daata, this time we have drawn a single recognized gesturee along with its percentages against each pose out of six poses, pose six has the largest recognition percentage which makes the presented gesture classified as pose six, and Fig. 7 shows this recognition. In case of non-recognized gesture, we will see that there is more that peak point or the peak point does not exceed 85% which means this gesture does not recognized, as in Fig. 8.

VI.

SUMMARY & CONCLUSION

Gesture system is another challenge for the human in order to communicate with the human-made machines as he is doing with deaf people daily. The human gestures embodying a lot of information that can be used to correct and perfect the understanding of the human commands; and all algorithms that used these gesture information utilized these information with different views. We have applied a new gesture system with brightness value that calculated from the blocks of the input gesture and we have used our suggested algorithm to recognize and classify the new presented gestures, we have used a different block sizes in order to achieve our goal. We have applied our algorithm using the HSV segmented with some predefined values for H, S, and V components, these components fit fo or a white skin, in case of application for a different skin color we have to adjust these components to be suitable, because the most striking property for HSV is the skin color extraction.

331

Considering the segment filing, we can apply one more step before later approach which is the mean operation which helps us to diminish the opening that existed at the perimeter of the hand gesture and then we can apply the filling algorithm.

[18] B. Heisele, P. Ho, T. Poggio, Face Recognition with Support Vector Machines: Global versus Component-based Approach, Massachusetts Institute of Technology Center for Biological and Computational Learning Cambridge, 2001. [19] Anil K. Jain, Robert P.W. Duin, and Jianchang Mao, Statistical Pattern Recognition: A Review, IEEE Transactions on Patterns Analysis and Machine Intelligence, vol.22(1):4-35, January 2000. [20] S. Sathiya Keerthi, O. Chapelle, D. DeCoste, Building Support Vector Machines with Reduced Classifier, Complexity, Journal of Machine Learning Research 8 (1-22), 2006. [21] Abin – Roozgard, Convolutional Neural Networks, lectures in Neural Networks. [22] Y. P. Lew, A. R. Ramli, S. Y. Koay, A. ali, V. Prakash, A Hand Segmentation Scheme using Clustering Technique in Homogeneous Background, Student Conference on Research and Development Proceedings, Shad Alam, Malaysia, 2002. [23] Chi-Chun Lo, Shuenn-Jyi Wang, Video Segmentation using a Histogram-Based Fuzzy C-Means Clustering Algorithm, Institute of Information Management, National Chiao-Tung University, Computer Standards & Interfaces, 23:429–438, 2001. [24] S. Marcel, O. Bernier, J. Viallet, D. Collobert, Hand Gesture Recognition using Input–Output Hidden Markov Models, France Telecom Cnet 2 Avenue Pierre Marzin 22307 Lannion, France, 1999. [25] C. Karlof, D. Wagner, Hidden Markov Model Cryptanalysis, Computer Science Division (EECS) Univertsity of California Berkeley, California 94720, 2004. [26] http://commons.wikimedia.org. [27] William T. Freeman, M. Roth, Orientation Histograms for Hand Gesture Recognition, Mitsubishi Electric Research Laboratories, Cambridge, Ma 02139 USA, 1994. [28] X. Li, Gesture Recognition based on Fuzzy C-Means Clustering Algorithm, Department Of Computer Science The University Of Tennessee Knoxville, 2003. [29] Just, S. Marcel, A comparative study of two state-of-the-art sequence processing techniques for hand gesture recognition, Switzerland, 2009. [30] V. I. Pavlovic, R. Sharma, T. S. Huang, Visual Interpretation of Hand Gestures for Human-Computer Interaction: A Review, IEEE Transactions On Pattern Analysis And Machine Intelligence, vol. 19(7), July 1997. [31] S. Malassiotis , M.G. Strintzis, Real-time hand posture recognition using range data, Image and Vision Computing 26 :1027–1037, 2008.

1. References [1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13] [14] [15] [16]

[17]

Mokhtar M. Hasan, Pramod K. Misra, HSV Brightness Factor Matching for Gesture Recognition System, International Journal of Image Processing, vol. 4(5):456-467, Malaysia, 2010. A. Erol, G. Bebis, M. Nicolescu, R. D. Boyle and X. Twombly, Vision-Based Hand Pose Extimation: A Review, Computer Vision and Image Understanding, Elsevier, 2007, pp.108:52-73. Mokhtar M. Hasan, Pramod K. Misra, Robust Gesture Recognition using Euclidian Distance, IEEE International Conference on Computer and Computational Intelligence, IEEE Catalog Number: CFP1059L-PRT, ISBN: 978-1-4244-8948-0, vol. 3:38-46, China, 2010. Xingyan Li, Vision Based Gesture Recognition System with High Accuracy, Department of Computer Science, The University of Tennessee, Knoxville, TN 37996-3450, 2005. S. Naidoo, C.W. Omlin, M. Glaser, Vision-Based Static Hand Gesture Recognition Using Support Vector Machines, Department of Computer Science, University of the Western Cape, South Africa, 1999. K. Symeonidis, Hand Gesture Recognition using Neural Networks, Master Thesis, School Of Electronic And Electrical Engineering , 2000. R. Brunelli, T. Poggio, Face Recognition: Features versus Templates, IEEE Transactions on Pattern Analysis And Machine Intelligence, Vol. 15(10), October 1993. J. Wachs, U. Kartoun, H. Stern, Y. Edan, Real-Time Hand Gesture Telerobotic System using Fuzzy C-Means Clustering, Department of Industrial Engineering and Management, Ben-Gurion University of the Negov, 1999. J. Triesch, C. Malsburg, Robust Classification of Hand Postures Against Complex Backgrounds, IEEE Computer Society, Second International Conference On Automatic Face And Gesture Recognition , 1996. T. Yang, Y. Xu , Hidden Markov Model For Gesture Recognition, The Robotics Institute Carnegie Mellon University Pittsburgh, Pennsylvania 15213, 1994. Jiong J. Phu, Yong H. Tay, Computer Vision Based Hand Gesture Recognition using Artificial Neural Network, Faculty of Information and Communication Technology, University Tunku Abdul Rahman, Malaysia, 2006. Heni B. Amor, S. Ikemoto, T. Minato, H. Ishiguro, Learning Android Control using Growing Neural Networks, Department Of Adaptive Machine Systems Osaka University, Osaka, Japan, 2003. M.Swain and D.Ballard, Indexing via Color Histograms, International Journal of Computer Vision, vol. 7, pp. 11-332, 1991. S. Venkataraman, V. Gunaseelan, Hidden Markov Models in Computational Biology, lectures in HMM. The AF Research Laboratory, Neural Networks,,Language and Cognition, Elsevier, Neural Networks, 22: 247-257, 2009. H. Gunes, M. Piccardi, T. Jan, Face and Body Gesture Recognition for a Vision-Based Multimodal Analyzer, Computer Vision Research Group, University of Technology, Sydney (UTS), 2007. Y. Lu, S. Lu, F. Fotouhi, Y. Deng, Susan J. Brown, A Fast Genetic K-Means Clustering Algorithm, Wayne State University, Kansas State University Manhattan, USA, 2000.

332