Real-time hand tracking with a colored glove - CiteSeerX

Real-time hand tracking with a colored glove Gert Geebelen∗ , Tom Cuypers† , Steven Maesen† , Prof. dr. Philippe Bekaert† ∗† Hasselt

University - tUL - Expertise Centre for Digital Media, Belgium ∗ Email: [email protected] † Email: [email protected]

Abstract—We present a real-time hand tracking technique by using a cloth glove with color markers placed at moveable hand bone positions and two different, fixed-positioned webcams. After calibrating the webcams in a preprocessing step, linear triangulation reconstructs for each match of the color blobs tracked in both images, the corresponding point in three-dimensional space. Then the necessary transformations are computed to manipulate a three-dimensional virtual hand mesh according to the reconstructed hand points of the color markers. Index Terms—Stereo vision, calibration, tracking, triangulation

I. I NTRODUCTION People commonly use their hands to manipulate objects, to point at things etc. Mapping this natural way of handling physical objects to virtual objects gives the user intuitive interaction possibilities especially when te application is in 3D space for instance a 3D user interface or a 3D game. Another advantage of using a hand as input device is the large number of domains of freedom it provides. Taking in account the requirements of using a hand as input device, we have investigated an inexpensive small baseline webcam setup with a resolution of 640 x 480 pixels, capturing at 30 FPS. During development we have further optimized the glove design to satisfy real-time limits while still having a reasonable accuracy.

C. Color glove tracking In [7] a real-time data driven pose estimation technique using one or two webcams is proposed which uses a fully colored hand glove with a specific pattern. A big amount of poses of a virtual hand mesh coated with the colored pattern are simulated and a reduced rasterized image along with the virtual hand orientation are saved for each hand pose. When capturing input from a appropriate colored physical hand glove the image is reduced and rasterized and a nearest-neighbour database lookup is performed with the images in the database. When the closest match is found, the corresponding saved hand pose and orientation are shown. Since an approximation is used, to achieve a high accuracy, the number of database samples saved will have to increase which will increase the lookup time needed to find the closest match, risking the technique won’t to be applicable in real-time anymore. In [1] another approach is used which uses one camera and a hand glove with color-coded rings at the joints of each finger. These joints form the input variables of a nonlinear quasi-newton optimizing method which tries to minimize the distance between a the joints of a virtual hand mesh projected on a plane and the joints tracked in the image. Once the positions of the joints are found the virtual mesh is updated according inverse kinematics. A slow convergence of the quasi-Newton method due to the many input variables made it hard for real-time implementation. III. OVERVIEW

II. R ELATED WORK A. Bare hand tracking Nowadays real-time bare hand tracking still remains an active topic of research when it comes to finger interpretation[5] and supporting the variety of human skin colors which often involves an extra step in which the user manually has to select the hand region to obtain skin training data to obtain good results[8]. B. Instrumented hand glove tracking More accurate results are obtained with gloves instrumented with sensors[2], [4] or gloves which emit or reflect infrared light[6]. These techniques also give real-time results but are expensive and may put some constraints on the possible hand movements.

We will now give an overview of the major steps of our research and show some results. A. Calibration Camera calibration is done using a checkerboard with Zhang’s calibration method[9]. The extrinsic calibration parameters describe a projection of 3D world coordinates to 3D camera coordinates and the internal calibration paramaters describe a projection of the 3D camera coordinates to 2D homogeneous pixel coordinates. Once these parameters are computed we can compose the camera projection matrix that contains a transformation from 3D world coordinates to 2D homogeneous pixel coordinates and is composed as follows: Let x be an image point in homogeneous pixel coordinates and let X be a point in homogeneous world coordinates. Then P a 3 x 4 matrix describes the projection relationship x = PX

(1)

(a) Found calibration pattern in image from our first camera

   x f sx y  =  0 w 0 | {z } | x

|

f sθ f sy 0 {z K

(b) Found calibration pattern in image from our second camera

 r00 ox oy   r10 r20 1 }| {z P

r01 r11 r21

r02 r12 r22 {z

[R|T ]

Fig. 1: Reprojecting rays of imperfect matching points

   X t00   Y t10   Z  t20 } W | {z } } X

(2) K is the 3 x 3 intrinsic matrix containing the intrinsic parameters and [R|T ] the 3 x 4 extrinsic matrix containing a 3 x 3 rotation matrix and a 3 x 1 translation matrix. We assume one camera is positioned in the origin with R = I(3x3) and T = 0(3x1) . The second camera’s position and rotation relative to the first can be derived from the essential and fundamental matrices that are also computed. B. Triangulation If we reconstruct the projection rays of two matching image points from a theoretical point of view, the rays will intersect in the original point in 3D space. In practice this is almost never the case due to discretization and noise. Assume x and x’ are homogeneous 2D matching points found in each image and P and P’ their corresponding projection matrices. Then their reprojection lines must cut in the same 3D point X: x = PX

(3)

x0 = P 0 X

(4)

These equations can be combined in a homogeneous system of equations AX = 0 with   xProw2 − Prow0  yProw2 − Prow1   A= (5) 0 0 x0 Prow2  − Prow0 0 0 y 0 Prow2 − Prow1 Linear triangulation means solving this system of equations. In practice this needs to be solved by using an approximation method like SVD, else only the 0-solution will be found as could be derived from figure 1.

(a) First model

(b) Current model

Fig. 2: Hand gloves

near the orange color blobs. Besides the speed advantage, our current model also gives more robust results because only one blob color border must be precisely defined to find the blobs at the phalanx positions while our first glove had problems to track all colors equally succesful. To further optimize speed we downsampled each input image two times decreasing the number of pixels. Before iterating we localize the hand region in the dowsampled image. Tracking is done in a single frame and not between frames. When tracking between frames is used, recovery after a wrong interpretation is very hard[3]. Once the colors are tracked we search matches in both images on blob level. This minimizes the problem of pixel occlusion and enables us to use a wider base line in the future whithout narrowing information loss. Results of color tracking are shown in figure 3. D. Matching Blob matches between both images are found combining three different matching methods that build a matching set: • Build a matching set using the epipolar geometry constraint which implies that a matching point of a specific point in an image must lie on the epipolar line in the other image and conversely. Results are show in figures 4 and 5.

C. Tracking During our research we investigated several hand gloves, two of them are shown in figure 2. The glove of figure 2b has only one color for each blob. This enables faster tracking because less per-pixel color border tests in HSV-space are needed which significantly increases tracking speed. Note that we can extract the same information from figure 2b as in figure 2a by looking at the background colors only in a 1D direction

(a) Tracking glove from camera 0

(b) Tracking glove from camera 1

Fig. 3: Results of tracking the orange color blob.

(a) Epipolar points in blue. Epipolar lines that correspond with points in other image in green

(a) Hand region and matches found in image from camera 0

(b) Epipolar points in green. Epipolar lines that correspond with points in other image in blue

(b) Hand region and matches found in image from camera 1

Fig. 4: Epipolar geometry

• •

Fig. 5: Mutual matching of the color blobs using the epipolar geometry. Notice blob 6 being an outlier.

Build a set for each finger according to the background color near each blob. Estimate the hand structure by finding the hand mean blob, the one not located on the fingers. The hand is build in a hierarchical way using distance calculations and searching the build finger sets to verify found matches and estimate outliers.

E. 3D reconstruction Reconstruction is done by triangulation of the mean points of the matching blobs. Results are shown in image figure 6. IV. C ONCLUSION We presented a real-time hand tracking technique that tracked the front of the left hand. This technique is quite accurate if most of the color blob pixels are found. In different recordings the pixel values may vary and the borders for tracking the colors might have to change which is also complicated by the use of different cameras. V. F UTURE WORK Future work may concentrate on extending frontal tracking to global hand tracking by applying the same method but using different blob colors for each of the four types of hand views. Robustness in different lighting conditions also needs further investigation.

Fig. 6: 3D point cloud of reconstructed points

R EFERENCES [1] B. Dorner. Chasing the colour glove: visual hand tracking. 1994. [2] A. Olwal, H. Benko, S. Feiner. Senseshapes: Using statistical geometry for object selection in a multimodal augmented reality system. http: //graphics.cs.columbia.edu/projects/SenseShapes/index.htmll. [3] M. De La Gorce, N. Paragios, D. Fleet. Model-based hand tracking with texture, shading and selfocclusions. Computer Vision and Pattern Recognition, pages 1–8, 2008. [4] Immersion. Cyberglove 2 wireless data glove. http://www.metamotion. com/hardware/motion-capture-hardware-gloves-Cybergloves.htm. [5] S. Malik. Real-time hand tracking and finger tracking for interaction. [6] MetaMotion. Optical motion capture systems. http://www.metamotion. com/motion-capture/optical-motion-capture-1.htm. [7] R. Y. Wang, J. Popovi´c. Real-time hand-tracking with a color glove. ACM Transactions on Graphics, 28(3), 2009. [8] M. Yuan, F. Farbiz, C.M. Manders, T.K. Yin. Robust hand tracking using a simple color classification technique. The International Journal of Virtual Reality, 8(2), 2009.

[9] Z. Zhang. A flexible new technique for camera calibration. Microsoft Research, 1998.