Marker-Based Surgical Instrument Tracking Using ... - Semantic Scholar

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING

1

Marker-Based Surgical Instrument Tracking Using Dual Kinect Sensors Hongliang Ren, Member, IEEE, Wei Liu, and Andy Lim

Abstract—This paper presents a passive-marker-based optical tracking system utilizing dual Kinect sensors and additional custom optical tracking components. To obtain sub-millimeter tracking accuracy, we introduce robust calibration of dual infrared sensors and point correspondence establishment in a stereo configuration. The 3D localization is subsequently accomplished using multiple back projection lines. The proposed system extends existing inexpensive consumer electronic devices, implements tracking algorithms, and shows the feasibility of applying the proposed low-cost system to surgical training for computer assisted surgeries. Note to Practitioners—This paper is motivated by the scarcity of tracking devices for the training of novices in computer-assisted and navigated surgeries. Reliable and low-cost tracking systems are important to ensure easily available surgical training outside operating rooms. The current off-the-shelf tracking systems have been employed in operating rooms for tracking surgical instruments, typically with submillimeter accuracy, but they are highly expensive and require extensive preparation prior to surgery. Hence, the proposed cost-effective tracking system aims to bridge the gap and enable general trainees to learn computer assisted surgery, which is otherwise only available in a limited number of labs or operating rooms. Index Terms—Kinect sensor, navigated surgery, optical tracking, tracking.

I. INTRODUCTION

I

NCORPORATING accurate surgical tracking devices and integrated navigation systems, Computer-Assisted Surgery is emerging as a viable paradigm shift in surgeries and interventions. Surgical assistance systems aim to extend the capability of surgeons for planning and carrying out surgical interventions more accurately and less invasively with imaging, tracking and positioning modules, especially when the surgeons cannot see where the surgical instruments are inside human body. Computer-assisted systems typically register the intra-operative pose (position and orientation) information of surgical instruments with preoperative 3D models of the patients, which are typically derived from Computed Tomography scans or Magnetic Resonance Imaging. Such a computer assistance and training system will help increase surgical precision and identify unnecessary or imperfect surgical manipulations, and thus effectively increase the success rate of the surgeries. Among the widely used tracking technologies, Electromagnetic Tracking and Optical Tracking are able to achieve submillimeter accuracy nowadays. Electromagnetic tracking systems use theoretical

Manuscript received July 05, 2013; revised September 06, 2013; accepted September 21, 2013. This paper was recommended for publication by Associate Editor T. Kawahara and Editor D. Tilbury upon evaluation of the reviewers’ comments. This work was supported by the Singapore Academic Research Fund under Grant R397000139133, Grant R397000157112, and the NUS Teaching Enhancement Grant C397000039001 awarded to Dr. Hongliang Ren. The authors are with the Department of Biomedical Engineering, National University of Singapore, Singapore 117575, Singapore (e-mail: [email protected]. sg). This paper has supplementary downloadable multimedia material available at http://ieeexplore.ieee.org provided by the authors. The Supplementary Material includes an MP4 audio/video file (.mp4) for system setup and tracking experiment. This material is 1.61 MB in size. System setup and tracking experiment for the article of "Marker-Based Surgical Instrument Tracking Using Dual Kinect Sensors" by Hongliang Ren, Wei Liu, and Andy Lim. Digital Object Identifier 10.1109/TASE.2013.2283775

Fig. 1. The setup of the instrument tracking system includes a pair of Kinect sensors, IR light sources, IR filters, and supporting tripod.

model fitting to determine the pose of the sensing coils within an excited magnetic field. Electromagnetic tracking systems are susceptible to distortion errors if the working magnetic field is interfered with ferromagnetic materials [1], [2]. The optical tracking systems triangulate the position of surgical instruments using active or passive markers and photosensors [3]–[5]. Typical example systems such as the Polaris or Vicra systems from Northern Digital Inc., were claimed to have a root-mean-square tracking accuracy of 0.25 mm [6]. It is critical for novices to have hands-on training in computer-assisted surgical procedures, which are typically navigated surgeries. However, this is constrained by the costly commercial surgical tracking devices involved, which are typically used in clinical procedures but not widely available for training purpose. Therefore, the main objective and primary contribution of this paper is to develop a cost-effective surgical instrument tracking system with sub-millimeter accuracy for boosting extensive surgical training, and we resort to the off-the-shelf inexpensive Microsoft Kinect sensors [7]. This is enabled by both the hardware development based on multiple sensing modules, software implementations for calibration, correspondence establishment, and 3D point reconstruction, as illustrated in the following sections. II. METHODS A. Materials As shown in Fig. 1, the proposed marker based surgical tracking system exploits dual Kinect sensors, which are placed side by side with overlapping field of views. Each Kinect sensor consists of an infrared (IR) projector, an IR camera, a RGB camera, a four-microphone array, and a motor to adjust the tilt [8]. The pseudo-random, nonrepeating pattern of infrared points can be projected onto surfaces by the IR projector [7]. The Kinect by default produces depth video which is 640 480 pixels at 30 Hz. It is actually capable of capturing pictures with 1280 960 pixels at twice the default resolution. Images of this resolution were used in calibration procedures to get better accuracy (http://ieeexplore.ieee.org). There were earlier studies mainly on the marker-less tracking of objects using Kinect sensors. Noonan et al. [7] performed head tracking using the RGB-D data obtained from the Kinect and iteratively matched them with a template extracted from the head-phantom CT to determine the pose and orientation of the head. To segment transparent objects from the background, Lysenkov et al. [9] made use of the fact that the Kinect is unable to estimate depth when transparent objects are positioned. As a result, the silhouette of the transparent objects can be observed with ease from the depth image and then iteratively matched with the silhouette generated with a prior. Oikonomidis et al. [10] developed a Kinect-based tracking system capable of tracking complex hand articulations. The tracking system developed by Oikonomidis et al. is able to match the observed hand configuration with that of a prior template by using an optimization algorithm known as particle swarm

1545-5955 © 2013 IEEE

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. 2

IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING

B. Calibration of the Kinect RGB and IR Camera

Fig. 2. Image of a checkerboard captured (a) under no IR illumination. (b) Under illumination from the built-in IR projector (obtained from ROS wiki, which is not desirable for tracking purposes). (c) Under illumination by LED IR light source.

Fig. 3. Rigid bodies to be attached with passive markers for tracking purpose. The dimensions of the markers were given by the manufacturer and the true between the marker pair m and are given by, distances mm, mm, mm, mm, mm, and mm.

optimization (PSO). These tracking methods utilize the depth sensing capability of Kinect sensor to observe 3D objects and a calibration module in ROS (www.ros.org). Towards accurate tracking, the approach proposed in this paper differs from earlier studies in the following respects. First, dedicate markers are employed to provide robust and accurate tracking when compared with the marker-less approaches. The marker-based approach is also general practice in commercial surgical tracking systems. Second, custom active illuminators and IR filters are employed to further improve tracking performances. Third, the expected applications of the system are in surgical environments other than conventional scene understandings, where accuracy is not the top priority. Moreover, the proposed approach is new in that it employs dual Kinect sensors in order to get larger field of view within the surgical environment. The overlapping field of view of the cameras forms the working volume of the tracking system. The viewing angle for the Kinect is approximately 43 along the vertical axis and 57 along the horizontal axis. Stereotriangulation between 2 RGB cameras and 2 Infrared (IR) cameras of the two Kinect sensors will produce two sets of 3D coordinates for the localization of a marker. Unlike previous studies such as [7], [9], and [10], only IR-images are used to determine the position of the 3D passive markers. The depth-sensing information of the Kinect is not utilized in this paper, but can be fused in the future for instrument tracking. It was noted that under ambient conditions, there was insufficient IR illumination to produce images with discernible contrast [Fig. 2(a)]. The built-in IR projector on the Kinect sensor can be a potential IR light source but it produces speckled patterns that might result in a grainy and noisy IR image [Fig. 2(b)]. Hence, the IR projector on the Kinect was physically obscured and two 850 nm IR LED light sources were mounted on top of each Kinect sensor to provide adequate illumination of the reflective marker spheres. Meanwhile, 850 nm IR pass filters are used for enhancing the image perception and Fig. 2(c) is an example of the image acquired through the filtered lens under this illumination. These components are commercially available and inexpensive including IR illuminators (e.g., thorlabs.de) and IR filters (e.g., naturalpoint.com). Furthermore, more hardware options are available for building the systems, with competitors to Microsoft’s Kinect sensors, such as ASUS Xiton Pro Live (ASUS Inc.). This makes it feasible to build up a low-cost surgical tracking system. Fiducial markers can be placed on a Polaris Passive 4-Marker Probe as well as rigid bodies (Fig. 3). The next step is to implement a tracking method based on stereo configuration and fiducial markers.

The Kinect comes with factory calibration, and the distortion is not visibly evident with the naked eye. However, for the purpose of surgical application, further calibration is essential in order to enhance the tracking accuracy. This calibration is performed using Camera Calibration Toolbox, designed in [11]. The pinhole camera model is assumed in this calibration. The calibration yields the intrinsic parameters of the cameras, which relates the position of a 3D point and its pixel position. An initial calibration is done with the IR and the RGB cameras of the Kinect sensors. Subsequently, the extrinsic camera parameters are obtained, i.e., the relative transformation of the cameras with respect to each other. The extrinsic camera parameters are important for the determination of epipolar geometry. A stereo-calibration between two cameras is then performed using the same calibration toolbox. The left camera is taken as the world coordinate frame and the camera coordinate of the right camera is determined with respect to the left one. C. Algorithm for Marker Detection The reflective spherical markers appear as bright circles in the IR images. In each image frame, the circles are detected using the seeded region growing (SRG) algorithm, which segments the intensity images rapidly and robustly and without the need to tune parameters [12]. In SRG, the regions in the image that have pixel intensities below the threshold are excluded and only the regions with intensities above the threshold are used as seeds. Subsequently, the seeded regions are growing. An analogy for the growing of regions is that the seeds are points where water is being poured onto and pixels of similar intensity can be flat ground. As water pours from the seeds, regions of similar intensity will be flooded and become homogenized [12]. The region stops growing when the water encounters a boundary, where the pixel intensities are significantly different. Eventually, only the homogenous regions remain to represent the reflective marker spheres. In our experiments, the markers are determined to be regions that have radii ranging from 3 to 12 pixels. Finally, the centers of the markers can be found from the centers of mass of the regions. D. Determination of Correspondence Following the determination of the centers of the four markers in the images from both the left and the right cameras, the correspondence between these points have to be determined (Fig. 4). This is done using the fundamental matrix, which represents the intrinsic projective geometry between the two views. It can be calculated from the camera matrices of both cameras and the known position of one camera with respect to the other. The fundamental matrix between two cameras, , and a 3 3 matrix of rank 2 [13], is defined as (1) where is the projection of a 3D point, , on the image plane of one is epipolar line of in the other camera’s image of the cameras and plane. and are in homogeneous coordinates. The line is on the constructed such that the point , which is the projection of image plane of the second camera, is incident on it. The opposite of this relation holds as well, and has to lie on . In addition, it is known that when a point is incident on a line, the cross product between the homogeneous coordinates of the point and the line is equal to zero. Therefore (2) and it follows that: (3)

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. REN et al.: MARKER-BASED SURGICAL INSTRUMENT TRACKING USING DUAL KINECT SENSORS

3

Fig. 4. Correspondence between marker points in images captured by the left and right IR cameras of Kinect sensors.

Suppose that and are the images of a 3D point in the left and right cameras, respectively, it can be derived that (4) Fig. 5. RMS error against average -direction.

This relation holds true in the absence of noise, but in actual images the term has a nonzero value. However, it is postulated that it should still yield a reasonably small value. Hence, the corresponbetween a point on dence can be determined by computing one image and the four points on the other, and choosing the pairing with the lowest value. E. 3D-Point Reconstruction (3DPR) 3-D point reconstruction plays a key role in the system. A fast and robust reconstruction method is desirable for tracking purposes. Triangulation based reconstruction methods have been used for biocular systems. In ideal cases, the 3-D coordinate of the target can be triangulated by its 2-D correspondent points in each image plane according to the Pin-Hole model. However, due to the model error, image noise, lens distortion, etc., traditional optimization-based reconstruction methods are time-consumable and sensitive to the initial guess value. We adopt the geometry-based method of finding the midpoint of the perpendicular distance between two BPL [14]. The BPL is a line in 3D space that projects from the camera center towards the 3D point. This method acknowledges the fact that the BPLs from the two cameras are unlikely to meet in the 3D space practically. Hence, the objective is to make the distance between the two BPLs minimal between the perpendicular foots (PF) of the two back-projected lines) [14]. The point between the 2 PFs is a good estimate for the reconstructed 3D point. Without involving iterative optimization, the algorithm is light on computational requirements, making it a good algorithm for real-time surgical application. In the midpoint method, the BPL has to be determined first. The pinhole model can be expressed as (5) where is a nonzero constant, is the 2D pixel coordinates in the IR camera image, and , , and are the coordinates of a 3D-Point. The equation shows the relationship between the 3D point and the pixel position of its image. and are the rotation matrix and the translation vector respectively that relates the camera center with the world coordinates. is the intrinsic matrix of the camera. From (5), the equation of the BPL can be expressed as (6) where is the back projection line, i.e., any 3D point along the BPL with parameter . is the vector representing the direction of the back projection line, which can be decided and projection matrix . The by the 2D image coordinates parameter is a real number larger than zero, and is the 3D world coordinates of the camera center. The BPL passes each camera center and its 2-D image point. The reason for using is to describe

Fig. 6. Marker-distance measurement error, that is, the normalized differences stands for distance between between measured and actual distances. stands for distance between markers 2 and 3, etc. markers 1 and 2,

the 3D line in a parameterized way. Different scaling parameter can be calculated [14] corresponding to different 3D point on the 3D line . of BPL III. EXPERIMENT RESULTS The experimental data for the RMS error is obtained at varying distances away from the Kinect tracking system. 150 data points are obtained for each distance so that the mean and the standard deviation can be determined. The average time taken for acquiring each data point is approximately 6 milliseconds. The RMS error is plotted against the -distance away from the left camera (Fig. 5). It is noted that the RMS error is calculated as follows. A normal vector to the best-fit plane is computed from the 3D positions of the four markers recovered by tracking system. As it is known that three points define a plane, and thus four points result in a redundancy. Due to the presence of noise in our data, it is therefore unlikely to find a plane that contains all four detected positions of the markers. After the determination of the best-fit plane, the root-mean-square (RMS) of the perpendicular distances between the markers and the best-fit plane is computed and taken to be the error. As the average distance from the markers to the left camera is increased from 0.4 to 1.2 m, the RMS error shows a generally increasing trend. The average error decreased from 0.17 to 0.052 mm as the probe is moved from 0.49 to 0.64 m, but subsequently increases to 0.13 mm at 0.74 m. At distance of 1.0 m, the average error is approximately 0.30 mm. From this it shows that 0.7 m is an optimal distance for this tracking system. In addition, compliance to the co-planarity constraint is not sufficient as an estimation of the accuracy of the tracking system. mm, The true distances between the markers are known as mm, mm, mm, mm,

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. 4

IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING

and mm, where stands for distance between markers and . The percentage difference of the measured distance with respect to the actual distance is shown in Fig. 6. It is observed that the measured dimensions are all within 2% of the actual dimensions. IV. CONCLUSION AND FUTURE WORK The proposed surgical tracking system uses the images captured by the infrared cameras of a pair of Kinect sensors with additional illumination sources. The proof-of-concept experiments were performed to validate the proposed dual-kinect tracking system and showed its tracking accuracy in submillimeter, which is deemed feasible for the applications in surgical training. This paper focused on using the IR cameras of two Kinect sensors for the tracking of retroreflective markers. The next step is to combine the data from the IR camera with data from other sensing modalities such as the RGB camera of the Kinect sensor to enhance the robustness and accuracy of the tracking system.

REFERENCES [1] H. Ren and P. Kazanzides, “Investigation of attitude tracking using an integrated inertial and magnetic navigation system for hand-held surgical instruments,” IEEE/ASME Trans. Mechatronics, vol. 17, no. 2, pp. 210–217, Apr. 2012. [2] H. Ren, D. Rank, M. Merdes, J. Stallkamp, and P. Kazanzides, “Multi-sensor data fusion in an integrated tracking system for endoscopic surgery,” IEEE Trans. Inform. Technol. Biomed., vol. 16, no. 1, pp. 106–111, Jan. 2012.

[3] J. Stoll, H. Ren, and P. E. Dupont, “Passive markers for tracking surgical instruments in real-time 3D ultrasound imaging,” IEEE Trans. Med. Imaging, vol. 31, no. 3, pp. 563–575, Mar. 2012. [4] H. Ren, W. Liu, and S. Song, “Towards occlusion-resistant surgical instrument tracking,” Bone Joint J. Orthopaedic Proc. Supplement, vol. 95-B, no. SUPP 28, p. 52, 2013. [5] H. Ren and P. Kazanzides, “A paired-orientation alignment problem in a hybrid tracking system for computer assisted surgery,” J. Intell. Robot. Syst., vol. 63, pp. 151–161, 2011. [6] K. Cleary and T. M. Peters, “Image-guided interventions: Technology review and clinical applications,” Annu. Rev. Biomed. Eng., vol. 12, pp. 119–142, 2010. [7] P. Noonan, T. Cootes, W. Hallett, and R. Hinz, “The design and initial calibration of an optical tracking system using the microsoft kinect,” in Proc. IEEE Nuclear Sci. Symp. Med. Imaging Conf. (NSS/MIC), 2011, pp. 3614–3617. [8] Z. Zhang, “Microsoft kinect sensor and its effect,” IEEE Multimedia, vol. 19, no. 2, pp. 4–10, 2012. [9] I. Lysenkov, V. Eruhimov, and G. Bradski, “Recognition and pose estimation of rigid transparent objects with a kinect sensor,” in Proc. Robotics: Sci. Syst. Conf., 2012. [10] I. Oikonomidis, N. Kyriazis, and A. Argyros, “Efficient model-based 3d tracking of hand articulations using kinect,” in Proc. British Machine Vision Conf., 2011, pp. 1–11. [11] J. Heikkila, “Geometric camera calibration using circular control points,” IEEE Trans. Pattern Anal. Machine Intell., vol. 22, no. 10, pp. 1066–1077, Oct. 2000. [12] R. Adams and L. Bischof, “Seeded region growing,” IEEE Trans. Pattern Anal. Machine Intell., vol. 16, no. 6, pp. 641–647, 1994. [13] R. Hartley and A. Zisserman, Multiple View Geometry in Computer Vision. Cambridge, U.K.: Cambridge Univ Press, 2000, vol. 2. [14] Q. He, C. Hu, W. Liu, N. Wei, M.-H. Meng, L. Liu, and C. Wang, “Simple 3-d point reconstruction methods with accuracy prediction for multiocular system,” IEEE/ASME Trans. Mechatronics, vol. 18, no. 1, pp. 366–375, 2013.