Photorealistic Visualization for Virtual and ... - Semantic Scholar

2 downloads 0 Views 271KB Size Report
Abstract. In surgery, virtual and augmented reality are increasingly being used as new ways of training, preoperative planning, diagnosis and surgical navigation ...
Photorealistic Visualization for Virtual and Augmented Reality in Minimally Invasive Surgery Danail Stoyanov1, Mohamed ElHelw1, Benny P Lo1, Adrian Chung1, Fernando Bello2 and Guang-Zhong Yang1,2 1 Visual Information Processing Group, Department of Computing, Imperial College, London, UK {dvs, me, benlo, ajchung, rmd99, gzy @ doc.ic.ac.uk}, 2 Department of Surgical Oncology and Technology, Imperial College, St. Mary’s Hospital, London, UK {[email protected]} Abstract

In surgery, virtual and augmented reality are increasingly being used as new ways of training, preoperative planning, diagnosis and surgical navigation. Further development of virtual and augmented reality in medicine is moving towards photorealistic rendering and patient specific modeling, permitting high fidelity visual examination and user interaction. This coincides with the current development in Computer Vision and Graphics where image information is used directly to render novel views of a scene. These techniques require extensive use of geometric information about the scene and the purpose of this paper is to provide a comprehensive review of the underlying techniques required for building patient specific models with photorealstic rendering. It also highlights some of the opportunities that image based modeling and rendering techniques can offer in the context of minimally invasive surgery.

1. Introduction One of the most promising applications for Virtual Reality (VR) and Augmented Reality (AR) technologies is medicine. Computer generated environments can be used in all stages of healthcare delivery, from the education of medical staff up to providing effective therapy and rehabilitation methods. In surgery, the computerized visualization of patient data models has both pre- and intra-operative purposes. Pre-operatively, VR simulators may be used to train practitioners in basic surgical tasks as well as complete interventions. More excitingly, patient specific models built from tomographic data offer to allow the practice of complex procedures prior to working with the patient directly. Similarly, VR may also be used for effective

diagnosis and procedural planning methods. Meanwhile, intra-operatively, AR presents opportunities in navigation. In Image Guided Surgery (IGS), the surgeon’s view of the operating field is augmented by the addition of computer-generated data. This may serve in conjunction with the surgeon’s perception of the patient’s anatomy and allow the flexible intra-operative use of preoperative tomographic data. The process of using computer generated threedimensional models of human anatomy makes full utilization of all the available patient data during and before a procedure. Combined with modern robotics, such usage of technology in the operating theatre is known as Computer Assisted Surgery (CAS). This is reforming surgical practice by setting new standards in accuracy and patient treatment. The first work in medical VR during the early 1990s was restricted by the limitations of the available technology and the concept of CAS was an interesting research prospect. VR based simulators struggled to achieve realism due to compromises between the realistic representation of appearance and the accurate modeling tissue and instrument behaviors. In contrast, now VR is applied in a spectrum of trainers while the prospects of AR in the operating theater are becoming reality [1]. In particular, a field of medicine that is suited to VR and AR environments is Minimally Invasive Surgery (MIS). This is largely due to the operating theater set up. A surgeon reaches target areas through small ports in which the instruments are inserted and observes the operating field on a screen. This helps to minimize the need for elaborate devices replicating the theater for VR simulation purposes. Meanwhile, AR technologies impose minimal intrusion on the surgical environment as any computerized models may be directly displayed onto the present screen.

In parallel to the investigation of CAS technologies, the surgical community has developed the range of minimally invasive surgical procedures. Reducing the trauma caused by an operation through a minimally invasive approach directly improves the treatment of the patient and the chances of a prompt and successful recovery. However, the inherent difficulties of MIS techniques have traditionally imposed limitations on their applicability. Reduced instrumental control and freedom, combined with unusual hand-to-eye coordination and limited view of the operating field, enforce restrictions on the surgeon not common to conventional surgery. Advances in video imaging, instrumentation and robotics have played a key role in resolving such fundamental limitations and bringing forth the widespread use of MIS in a variety of procedures of cardiothoracic surgery, gynecologic surgery, general surgery, neurosurgery, orthopaedic surgery, urologic surgery and oncologic surgery. Due to technological limitations, early VR simulation systems for MIS only addressed hand-to-eye coordination and basic skill training. More recently, commercial VR simulation systems for complete MIS procedural training have emerged (Figures 1 and 2). Such training tools will be a major part of surgical education and skills assessment. However, the availability of patient specific trainers will be of significantly greater value. Patient specific systems will not only be useful for training but will instigate new approaches to computer assisted diagnosis and procedure planning. In the intra-operative scenario, IGS systems have proved more difficult to develop. This is largely due to the added complication of registering data from different modalities. Although IGS is widely used in areas such as neurosurgery [2] and spinal surgery [3], in MIS the complications for accurate registration are greater. This is due to the actively deforming operating field and the large changes in anatomical structure between acquiring tomographic data and performing an intervention. Therefore methods of acquiring the structure of the operating field are desirable since tomographic data is not enough to build reliable patient specific models. Methods for structure recovery using radiotherapy are undesirable due to the high radiation exposure to both surgical staff and the patient and the high cost of suitable instruments. Other methods using laser range finders are applicable but require additional equipment in the operating theater. In the case of laparoscopic surgery, they are impractical and go against the drive for minimal intrusion. Hence, the most amicably poised methods lie in the utilization of the camera equipment that is already available. The problem of recovering three-dimensional structure from two-dimensional images has been a key subject of research in Computer Vision for many years. Although fundamentally ambiguously posed, recently solutions based on the use multiple images of a scene have had relative success. Meanwhile, the graphics community has focused on achieving photo-realistic

rendering of three-dimensional computer models. However, recent approaches have investigated the use of image information directly to generate novel views of a scene. Such techniques require geometric information of the scene’s structure. In a sense, although Computer Graphics and Computer Vision both worked with images they have worked in opposite directions. However, due to the recent merger of goals in Computer Vision and Computer Graphics a young field is emerging, which promises to advance the current state of photo-realism in VR and AR. This is called Image Based Modeling and Rendering. An important feature of IBMR methods is the use of only the most basic information available, images. This makes it a powerful tool in many applications and in the context of the paper to MIS. For the purposes of in vivo patient specific model acquisition and for photo-realsitic VR simulation and AR based IGS, IBMR holds unparallel potential. In the remainder of this paper we discuss the current state of the art in Minimally Invasive Computer Assisted Surgery (MICAS) technologies and the major challenges in the field. We then provide a brief review of the IBMR techniques most likely to make a significant impact on improving visualization in MICAS.

2. State of the Art in CAS Visualization The use of image-based computer technology in MIS may be broadly divided into four categories: • • • •

Training, Simulation and Skills Assessment Computer Assisted Diagnosis Pre-operative Planning Intra-operative Guidance

In the following sections we briefly review the current status of each area.

2.1. Training, Simulation and Skills Assessment The development of Computer Graphics techniques and hardware to allow real-time, photo-realistic rendering of computer environments has opened a new door in the education and training of medical staff. Virtual reality simulators offer potentially cost effective methods of training surgeons through reducing the required practice on cadavers and actual patients, whilst allowing unlimited preparation on a virtual subject. The computer simulation environment also allows easier and prospectively automatic skills assessment of the trainee through the analysis of movements and events throughout the procedure. Preliminary results demonstrate that the time transfer rate (the time spent on a simulation compared to that spent operating on a cadaver or patient) is 25% - 28% and may be expected to rise up to 50%, when compared to other areas where virtual simulation is employed, such as flight pilot training [5]. A rise in transfer rate driven by

technological advance will be of added value to surgical training. Technological limitations forced early virtual reality simulations into a trade off between visual realism and accurate mechanical modeling of soft-tissue, instruments and their interaction. This meant that systems either provided basic skills training such as instrument control and co-ordination, or near photo-realistic quality anatomical models, which were useful in developing a good understanding of body regions. However, modern simulation systems bridge the gap between the two and provide full procedural training, not only offering visually adequate rendering but allowing the practitioner to interact with the training environment.

Figure 1,

physiological behaviour is needed. On the other hand, FEM is more accurate for rendering the mechanical characteristics of human organs, but it is more computationally expensive. While the ability to interactively simulate the accurate deformation of soft tissue is important, a system that does not include the capability to modify the simulated tissue has limited utility. Considerable efforts have been made by the research community to improve on the modeling and interactivity of soft tissue. Most of these efforts have concentrated on pre-computing deformations or simplifying the calculations. There have been several attempts at locally adapting a model in and around the regions of interest [11,12,13]. However, these models rely on a pre-processing phase strongly dependent on the geometry of the object, with the major drawback that performing dynamic structural modification becomes a challenging issue, if not impossible. More recently, a model, which adapts the resolution of the mesh by retesselating it on-the-fly in and around the region of interaction was presented by [14]. This adaptive model does not rely on any pre-processing thus allowing structural modifications (Figure 3).

Simulation device for colonoscopy and bronchoscopy

Figure 3,

Figure 2,

The operating field during virtual stitching practice

MIS approaches are better suited to virtual simulation than most, due to the nominal intrusion upon the surgical environment. In these schemes, the practitioner looks at a screen whilst performing the operation on the virtual subject as they would in a real procedure. The key areas of development required in current simulators for MIS procedures are photo-realistic representation of the operating field, haptic interface devices to simulate force feedback (haptics) and accurate behavioral models of soft tissue and instruments. Various methods have been proposed and implemented with the aim of achieving real-time simulation of soft tissue deformation. The most popular being Mass Spring models (MSM) [6,7] and the Finite Element Method (FEM) [8,9,10]. MSM have the potential of reaching real-time but lack in accuracy when precise modeling of

Adaptive soft tissue model with cutting plane

Although advances in soft tissue deformation, haptics integration and photo-realistic representation will increase the impact of VR systems in surgical education a more exciting prospect lies in the use patient specific data models. Moreover, the use of IBMR techniques can significantly improve the realism of traditional deformation models.

2.2. Computer Assisted Diagnosis Computer technologies have reformed the varous forms of modern patient diagnosis improving efficiency and accuracy as well as helping to control healthcare costs. Image processing and computer vision techniques are already widely used in radiological diagnosis to enhance images and provide some automation in data analysis. In MIS procedures, similar possibilities are available to improve the practitioner’s view by removing undesirable effects such as lens distortion, specular reflections and shadows. More comprehensive analysis in minimally invasive diagnosis may be performed but

requires in vivo structural information. For example in [15], shape recovery from images is used to recover the structure of the stomach wall and automatically detect abnormalities. Recent developments in medical VR have instigated the pursuit of non-invasive diagnosis, Virtual Endoscopy (VE). The idea behind VE is to use tomographic data to build a patient specific model, which may be used for examination. A significant advantage of such an approach is not only its non-intrusive nature but also the opportunity to evaluate anatomical structures previously too small or delicate to investigate such as the eyes and inner ears. In [16], image based rendering is applied to virtual colonoscopy in order to minimize inspection time without loss of diagnostic sensitivity. Arguably, images are a more manageable medium for communication between physicians compared to 3D geometry. With further developments in IBMR techniques the realism of VE may be improved and an environment realistically replicating a real minimally invasive procedure may be created.

2.3. Pre-operative Planning Planning is an important pre-operative step in an MIS intervention. Due to the restricted movement of the instruments, care must be taken to position instrument ports with maximum access to target areas. In the past, surgeons have had to visualize the threedimensional structure of the operating area from examining two-dimensional slices of X-Ray, CT or MRI data. However, the avoidance of delicate areas, optimal route planning and the forecasting of potential problems are now given aid through volume visualization of data. As such, the practitioner is in a better position to prepare the surgical procedure before its execution [17]. Computer algorithms for finding the optimal port placements in an MIS procedure may also be utilized. The dominant factors of path planning are the accurate reconstruction of patient anatomical structures. However, the realism of virtual models is also important and may be improved by IBMR techniques. In [2,18] images are captured from a tracked endoscope inserted into the cranial cavity. Using image based modeling techniques, several endoscopic images are painted onto the surface of a three-dimensional model derived from MRI or CT scans of the brain. The resulting combined dataset can be used in surgical planning as well as in training and intra-operative guidance.

2.4. Intra-operative Guidance Image Guided Surgery improves the intra-operative use of available patient data from different modalities by augmenting the surgeon' s view of the operating field. Initially, after a pre-operative examination of patient data, the surgeon was required to mentally maintain an idea of the specific anatomical structures during the operation. Recently systems have emerged which

combine the various sets of data for intra-operative visualization in an AR framework. However, tracking the surgical instruments and performing registration of preoperative data sets to the intra-operative state of the patient anatomy are challenging tasks. Nevertheless IGS methods are becoming common in many areas requiring high precision, such as neurosurgery, spinal surgery, orthopaedic surgery and ENT (Ear, Nose and Throat) procedures [3,19,20]. Current guidance systems for non-MIS procedure have focused on using markers for the tracking of instrument and tissue deformations (using fiducals). Markers aid the registration between pre-operative tomographic data and the intra-operative structure of the operating field. However, the introduction of markers in an MIS procedure is more difficult and preferably avoided. Hence, methods of intra-operative structure recovery are required to allow registration. Another important part of IGS systems is the appropriate visualization of the augmented operating field. Image-based rendering (IBR) techniques are suitable for this purpose due to the availability of image data and of possible structural information.

2.5. Major Challenges and Clinical Needs The basic clinical needs driving the use of computerized technology arise at the very foundations of the surgical training of practitioners and follow the surgical process all of the way to the actual intervention. Underlying the use of image based computer assistance is the aspiration to achieve comprehensive data utilization both in the pre and intra-operative scenarios. The challenges involved in realizing this goal are related to the realism of data representation (visualization), the robust attainment of patient specific models and the registration between different data modalities. A key feature of virtual patient models used for training, planning and diagnosis is the realism of visualization. IBMR may be used from two directions to improve existing systems. Firstly, images captured from actual MIS operations may be used to enhance the realism of surface appearances and effects such as bleeding. On the other hand, image based modeling during procedures may give additional information that improve aspects of realism related to haptic devices providing force feedback, instrument to tissue interactions and tissue deformations. The composition of patient specific models also requires intra-operative image based modeling. During a surgical procedure it is unwise to rely entirely on preoperatively acquired data. There are often scanner-based geometric distortions, due to irregular magnetic fields, imprecise table speeds and gantry tilts [19]. Additionally, patient anatomy tends to shift and deform significantly when moved to the operating table. These have to be corrected prior to any intervention by using active registration with intra-operative data. In image-guided surgery, a fundamental problem is the registration of the intra-operative operating field to

pre-operatively obtained data. The registration of 2D and 3D data is a common problem in augmented reality applications [20]. In the medical field, the volumetric data (MRI, CT, PET) must be scaled, rotated and translated to present a view consistent with the intraoperatively captured images (fluoroscope, angiogram). Multi-modal images can be combined to provide complementary information in a single view, thus reducing cognitive effort on the part of the surgeon. Registration can be complicated by the tissue deformations that take place during the procedure, which is often the case in neurosurgery [21] and even more so in upper GI and colorectal surgery. This necessitates non-rigid transformations to make the pre-operative data consistent with current anatomy [22,23]. Image-based methods aim to achieve a less invasive, non-contact, and automated method for 2D-3D registration. Segmentation based techniques extract curves, surfaces [24] and higher-level features [25,26] from the images and volumes to be registered. These anatomically related features are used as the sole input for aligning the datasets. A drawback of these methods is that the accuracy of the segmentation step is a limiting factor on the accuracy of the registration [27]. IBMR is a young field, which has found popularity when dealing with scenes containing regular shapes and surface modeled by simple reflectance functions, for example architectural scenes. In the MIS imaging environment, scenes contain less well defined surfaces, with complex reflectance and the actual image acquisition is different since the geometric relationships between the light source, the camera and the scene are on a much smaller scale. Therefore, thorough investigation of these issues is required before IBMR methods may be utilized in the clinical environment.

3. Image Based Rendering Image-based rendering techniques have gained considerable attention over the last few years in the computer graphics community. Their potential to create high quality output images and the possibility of integrating them into existing computer graphics systems makes them attractive solutions to photorealism. Conventional Computer Graphics techniques generate images by simulating the interaction of light with the scene described using geometric primitives and surface properties. On the other hand, computer vision works in the opposite direction. Starting with a collection of images, the target is to calculate depth of image points and generate 3D geometric models. In general, imagebased rendering uses images to generate novel views of a synthetic or real environment. The geometric models of the scene are replaced solely by images or images enhanced with geometry or depth information. In its broader definition, image-based rendering combines both computer graphics and computer vision in order to achieve the following advantages:

• • •

Efficient modelling, representation and rendering of complex objects; Sustained or enhanced overall system performance and increased frame rate; Increased rendering realism by accounting for global illumination and inter-object reflectance;

Image-based rendering approaches can be classified into three main categories: rendering with no geometry, rendering with implicit geometry and rendering with explicit geometry [28]. While rendering with no geometry systems have their input as a large number of images, explicit geometry systems use few images but rely on having a full scene description in terms of geometry or depth information in order to render new views of the scene. Implicit geometry systems employ point correspondences among images to generate inbetween views. Even though many image-based rendering techniques do exist, very few are being used in medical simulation. In the following subsections we describe significant image-based rendering techniques within each category and discuss their suitability for surgical simulation.

3.1 Lightfield/Lumigraph Rendering A light field is defined as “the radiance at a point in a given direction” [29]. Light Field rendering is a no geometry image-based rendering technique based on capturing light reflected off illuminated static surfaces. A set of images of an object from many different viewing points that record the flow of light around the object is used to generate new views. Light field/Lumigraph techniques can be described generally as generating and storing all possible views of an object or a scene. No depth or other image related information is needed. Because both methods depend on having a huge database representing almost all possible images of objects, a compression method was introduced to handle large database size. However, the storage requirements of scenes made from many objects are still prohibitive for surgical applications. Other delimiting factors are the constrained lighting conditions along with the provision of static non-penetrating geometry required when acquiring scene images.

3.2 View interpolation and view morphing View interpolation techniques [30] combine two reference images acquired at different viewing points in order to generate in-between views. It is based on the observation that small viewpoint movements cause little variations in generated images. A pre-calculated correspondence or morph map between the reference images is used to guide image generation. New images are generated when moving along the line connecting the two viewing points by linearly interpolating between the two reference viewing points and the new intermediate viewing point.

Using two input images, view morphing generates intermediate views as a linear interpolation between the two images by means of projective transformations to preserve major image features such as lines and conics. Both view morphing and view interpolation are implicit geometry techniques that are of limited use in surgery simulation.

3.3 Texture Mapping The most familiar explicit geometry IBR method is texture mapping which greatly improves the quality of computer-generated imagery [31]. By mapping 2D images onto planar polygons, texture mapping was the first technique to represent complex materials that are otherwise hard to model and render. In addition, it is used to enhance scene realism by simulating global illumination effects with less computation. Texture mapping can change the way a computer graphics model looks in many different ways [32]: • • •





Colour: using texture image to colour the object surface; Specular Colour: using an environment map to represent the specular object reflecting the environment; Normal Vector Perturbation: perturb surface normals according to corresponding values stored in the map to get 3D-looking bumps appearing on the surface; Displacement Mapping: instead of modulating the shading equation, the geometry itself is modified along normals of each surface element; Transparency: opacity of transparent objects is controlled using a map;

Texture mapping is the de facto technique used in almost every surgical simulation to achieve realism. Environment mapping [31,33] is a texture mapping technique that captures most of the light incident at a scene point and uses the generated image to approximate environment reflections on scene objects. It can also be used to represent the environment as seen by a viewpoint. Cubic environment mapping has been used in medical simulation to allow for VE on low-end machines. Data sets are acquired from computer tomography (CT) or magnetic resonance imaging (MRI). A polygonal or volume-based rendering technique is used to generate the images corresponding to the views of a virtual camera at regular intervals along a central path from point of entry to point of interest [34]. Each view is made up from the six images corresponding to the projections of the scene on the faces of a cube enclosing the current sampling point. These images are used during runtime to texture map the faces of the cube enclosing the viewing point providing the user with four degrees of freedom. This technique is widely used in virtual bronchoscopy and virtual colonoscopy.

Using the previous method for inspecting wrinkled surfaces, such as the colon, introduces the problem of inter surface occlusions, which decreases the effectiveness of the whole procedure. To alleviate this problem, some methods unfold the colon so that its inner wall is spread on a flat plane displayed as a panoramic views, while other techniques solve this problem by calculating a number of extra viewing points along the navigation path so that most of the surface becomes visible [35].

3.4 3D Image Warping 3D image warping methods are explicit geometry approaches that rely on having depth information for all image pixels, thus turning them into 3D samples. Using this information, source image points can be re-projected from a reference image plane onto a target image plane. The 3D warping process can cause occlusion and disocclusion problems. Disocclusions appear as empty holes in the target image because of parts of the warped scene becoming visible. Hole filling algorithms are used to alleviate this problem. Occlusion problems are caused by more than one point projecting to the same target image point. A visibility resolving solution based on the painter’s algorithm was introduced by [36]. The usefulness of 3D image warping in real life applications has not yet been proven. However, 3D image warping can be used within conventional geometry-based graphics systems to enhance realism or enhance performance. One example is relief texture mapping [37]. It is an extension to texture mapping where each texture image pixel is associated with a depth value measured perpendicular to the image plane at the observed pixel. During runtime, pixels are projected to the plane of the polygon to dynamically generate the desired texture. Therefore, the resulting images have enhanced 3-D details along with correct view motion parallax. Using a cylindrical projection manifold, cylindrical relief texture mapping [38] can be used to improve the realism of textured closed surfaces used in endoscopic simulations.

Figure 4, A close-up view of a tissue rendered using conventional polygonal modelling.

Figure 5, A close-up view of the same tissue as in Figure 3 rendered using the image-based method. Image-based rendering can be used to generate inbetween frames for slow or network based surgery simulation systems. Post-rendering 3D warping [39] uses IBR to increase the output frame rate or to make up for system latency. The scene is rendered at two instances in time producing two frames along with their depth information. Employing the 3D image warping technique, intermediate frames are generated by warping the two reference images using in-between viewing positions. Path prediction techniques are used to determine the in-between positions for the generated frames.

4. Structure Recovery from Images As we have seen, some IBR techniques rely heavily on structural information about the scene. Extracting this information from images alone is a cost effective solution, important in MIS as it imposes little intrusion to the surgical environment. Researchers in the human visual system believe that we use a number of cues to perceive the structure of the world around us. Namely, stereopsis, shading, shadows, motion, texture, focus and object recognition are cooperatively used to process information received from the eyes. Computer Vision researchers have borrowed these ideas about biological vision to form methods of reconstruction. However, in contrast to human vision, Computer Vision researchers have predominantly focused on exploiting each cue individually resulting in easier problem formulations and avoiding conflicting assumptions. However, this has also imposed a lack of robustness in existing reconstruction methods because the problem is difficult and fundamentally ill posed. Images are time frozen, framed, two-dimensional representations of a frameless, continuous, threedimensional world. Additionally, limitations in imaging devices fail to capture dynamic range and color of our environment. Collectively, methods of inferring the geometric properties of a scene from images are known as Shapefrom-X (X standing for different visual cues). The cues, which have been used in Computer Vision are shading, texture, focus and motion. The majority of successful

research effort in the area has been in the use of multiple views of a scene [40,41] (Structure-from-Motion, Stereopsis) where the task of structure recovery is well posed through triangulation. Monocular cues such as shading and texture are less well defined and do not recover the position of scene points but the shape of surfaces. Techniques using focus as a cue are in a sense analogous to Structure-from-Motion (SFM) by using multiple images but with varied camera rather than position and orientation, however, a measure of the focus at scene regions is ambiguous and difficult to quantify. In the context of MIS, the inherent characteristics of the images in question immediately limit the application of some reconstruction techniques. The inner body organs have complex reflectance properties, irregular shapes, variably textured surfaces and they may deform in time. Both probabilistic and texture gradient based Shape-from-Texture approaches produce erroneous results in regions of little or non-uniform texture. Similarly, Shape-from-Focus methods, which often use image variance measurements to estimate changes in camera geometry also fail in such regions. Meanwhile the surgical environment also imposes restrictions on the reconstruction approaches, which may be employed. For example, methods of structure recovery a pattern of light projected onto the scene to improve the correspondence process or to help the recovery of structure through monitoring the distortion of the projected patterns (structured lighting) are undesirable. Structured light in the visible spectrum will amend the surgeon’s view of the operating field and also requires the introduction of additional equipment. The reconstruction methods, which are most naturally and amicably suited to the operating theater and the characteristics of MIS images are SFM and Shapefrom-Shading (SFS). For SFM, multiple images may be acquired by a single moving camera or by a stereo endoscope and the structure of the scene reconstructed up to an unknown scale factor with reference to the camera position. SFS, on the other hand, uses single images to infer the shape (surface gradient or normal) of surfaces in the scene. The SFM method for reconstruction is trivial given accurate correspondence across different views and camera calibration. Determining corresponding image primitives (projections of the same real world feature) is a widely studied problem in Computer Vision known as stereo matching or stereo correspondence [42,43]. Difficulties arise when commonly used assumptions to constrain the problem fail. For example, similarity metrics, such as Normalized Cross Correlation, perform poorly in homogeneous image regions, while smoothness restrictions (assumptions that the scene is formed of piece wise smooth surfaces) may fail to preserve object boundaries. Camera Calibration on the other hand deals with inferring the intrinsic and extrinsic camera parameters. Offline methods using calibration objects of known geometry are accurate and well documented [44,45,46]. However, for many applications the parameters may vary during image acquisition (for

example the surgeon may wish to change focus during an MIS procedure). Techniques performing calibration without calibration objects (called self-calibration or autocalibration) rely heavily on scene rigidity and also on constant intrinsic parameters and a wide range of available camera motion [47]. Once correspondence and calibration are given, back projecting rays to find their intersection in space yields the scenes’ structure. This process is known as triangulation [48]. Unlike SFM, which recovers the position of scene features, SFS techniques recover the shape of the surfaces in the scene. The foundations of the SFS problem formulation lie in the image irradiance equation, which describes image brightness as a function of the projected surface, the light source and the camera properties [49]. However, finding a solution for shape from the image irradiance equation is not feasible considering the number of unknown variables and the fact that the only source of information for each image point is its intensity. Therefore, SFS techniques proposed in the literature seek a solution to a simplified problem. Almost comprehensively used restrictions are those of Lambertian surface reflectance, infinitely distant light source and orthogonal projection. However, the only source of information is the image brightness while the description of a surface requires more than one parameter. Hence, further constraints are typically incorporated into the problem formulation to resolve the ambiguity [50].

work was further developed to incorporate camera distortion and a term for specular reflection in [51]. It has long been recognized that in order to apply structure recovery techniques to complex practical situations, ways of integrating or assimilating different methods must be considered [52,53]. The complementary nature of SFM and SFS was first noted by [53] and since then approaches based on Bayesians [54,55] or Markov Random Fields [56], game theory [57], lattice theory [58] and variational frameworks [59] have been documented. The main ideas, which have been explored are the interaction between individual modules through explicit and implicit data exchanges, the combination of results from different cues at different frequencies and the segmentation of images by some criteria to determine the suitability of each reconstruction method for specific separate regions. However, since the task of integration is difficult in a general formulation there is no approach, which works well for a range of test images. It may well be envisaged that different problem formulations are better suited to specific applications than others. For endoscopic images, integrating the stereo and shading cues appears to be almost essential. SFM is difficult for images where correspondence is unreliable. This is the case when large homogeneous regions are present. Meanwhile, SFS techniques typically require starting solutions, smooth surfaces and simple reflectance. These may all be aided by SFM, while SFS contributes to recovering shape in texture-less regions. This is the complementary nature of SFS and SFM and an integration system to effectively utilize the strengths of each cue would prove an indispensable tool for accurate structure computation.

Conclusion

Figure 6,

Structure computed using SFS source code described in [42]

The difficulties associated with MIS images are in the violation of formulating assumptions used by structure recovery techniques. Multiple view approaches experience difficulties in establishing correspondence due to possible deformation of the operating field. For the same reason, self-calibration methods are difficult to use when dealing with varying camera parameters. On the other hand, SFS techniques suffer from the complex reflectance properties of surfaces which do not allow the easy simplification of the image irradiance equation. The restriction often imposed on light source position was changed for endoscopic images by [15] who assumed coinciding light source and camera center positions. This

Medical simulation systems are an actively pursued subject that requires extensive collaboration between the technical and medical communities. It offers unique opportunities and challenges to emerging IBMR techniques. It is likely that as they develop, certain branches of IBMR techniques will evolve such that they are particularly suited for soft tissue deformation and modeling - a fundamental challenge to the current VR based simulation systems where the lack of high fidelity in visual and tactile feedback prohibits its more wide spread use in advanced skills training and assessment. In this review we have highlighted some of the important techniques in IBMR with regard to improving visual realism in surgical simulation systems. Such techniques are also of benefit to surgical planning, diagnosis and especially to IGS. In particular, structure recovery methods from images offer unparalleled opportunities in improving registration of different modalities and the visualization of AR in IGS.

References [1].

Virtual Reality in Medicine: A Survey of State of the Art, http://www.informatik.umu.se/~jwworth/ [2]. D. Dey, P. J. Slomka, D. G. Gobbi and T. M. Peters. Mixed reality merging of endoscopic images and 3-d surfaces. Medical Image Computing and ComputerAssisted Intervention 2000. MICCAI, 2000. [3]. M. de Waal. Image-guided surgery of the spine. Medica Mundi, vol. 42, 1998. [4]. M. J. Mack. Minimally Invasive and Robotic Surgery. The Journal of the American Medical Association. vol. 285, pp. 568-572, 2001. [5]. R. M. Satava, and S. B. Jones. Current and future applications of virtual reality for medicine. In Proceedings of the IEEE. vol. 86, pp. 484-489, 1998. [6]. Xavier Provot. Deformation Constraints in a MassSpring Model to Describe Rigid Cloth Behavior. Wayne A.Davis and Przemyslaw Prusinkiewicz. In Proceedings of Graphics Interface 1995, pp. 147-154. 1995. [7]. U.Kühnapfel, H.K.Çakmak, and H.Maaß. 3D Modeling for Endoscopic Surgery. Oct. 1999. [8]. S.Roth, M.Gross, S.Turello, and F.Carls. A bernsteinbézier based approach to soft tissue simulation. In Proceedings of the Eurographics 1998. Sept., 1998. [9]. M.Bro-Nielsen and S.Cotin. Real-time volumetric deformable models for surgery simulation using finite elements and condensation. Computer Graphics Forum, vol. 15, pp. 57-66, Sept. 1996. [10]. D.Bielser and M.Gross. Open surgery simulation. In Proceedings of Medicine Meets Virtual Reality. 2002. [11]. Cignoni P., Ganovelli F., and Scopigno R. Introducing Multiresolution Represen-tation in Deformable Modeling. In SCCG, pp. 149–158, Apr. 1999. [12]. Debunne G., Desbrun M., Cani M.P, and Barr A.H. Adaptive Simulation of Soft Bodies in Real-Time. In CA, pp. 15–20, 2000. [13]. Wu X.M. Adaptive Nonlinear Finite Elements for Deformable Body Simulation Using Dynamic Progressive Meshes. In Proceedings of the Eurographics 2001, pp. 439–448, 2001. [14]. Paloc C., Bello F., Kitney R.I, and Darzi A. Online multiresolution volumetric mass spring model for real time soft tissue deformation. In MICCAI 2002, pp. 219– 226, 2002. [15]. T. Okatani, and K. Deguchi. Shape Reconstruction from an Endoscope Image by Shape from Shading Technique for a Point Light Source at the Projection Centre. Computer Vision and Image Understanding, vol. 66, pp. 119-131, 1997. [16]. I. Serlie, F. M. Vos, R. E. van Gelder, J. Stoker, R. Truyen, Y. Nio and F. H. Post. Improved visualization in virtual colonoscopy using image based rendering. In Proceedings of the Joint Eurographics - IEEE TCVG Symposium on Visualizatation (VisSym-01), 2001. [17]. E. Coste-Maniere, L. Adhami, R. Severac-Bastide, A. Lobontiu, J. K. Salisbury, J.-D. Boissonnat, N. Swarup, G. Guthart, E. Mousseaux, and A. Carpentier. Optimized Port Placement for the Totally Endoscopic Corinary Artery Bypass Grafting using the da Vinci Robotic System. Proceedings of ISER, 2000. [18]. D. Dey, D. Gobbi, P. Slomka, K. Surry and T. Peters, Automatic fusion of freehand endoscopic brain images to three-dimensional surfaces: Creating stereoscopic

panoramas, In IEEE Transactions on Medical Imaging, vol. 21, pp. 23-30, 2000. [19]. F. Gerritsen, M. Breeuwer, H. de Bliek, J. Buurman, and P. Desmedt, Image-guided surgery - the easi project, presented at The Surgery Room of the 21st Century, Moat House hotel, Glasgow, Scotland, 1999. [20]. A. P. King, P. J. E. M. R. Pike, G. L. G. Hill, and D. J. Hawkes. An analysis of calibration and registration errors in an augmented reality system for microscope-assisted guided interventions. Medical Image Understanding and Analysis, 1999. [21]. A. Tei. Multi-modality image fusion by real-time tracking of volumetric brain deformation during image guided neurosurgery. MIT, 2002. [22]. D. Rueckert, C. Hayes, C. Studholme, P. Summers, M. Leach and D. J. Hawkes. Non-rigid registration of breast MR images using mutual information. In Lecture Notes in Computer Science, vol. 2208, 1998. [23]. T. Gaens, F. Maes, D. Vandermeulen and P. Suetens. Non-rigid multimodal image registration using mutual information. In Lecture Notes in Computer Science, vol. 1496, pp. 1099-, 1998. [24]. J. Feldmar, N. Ayache, and F. Betting. 3D-2D projective registration of free-form curves and surfaces. Inria, Institut National de Recherche en Informatique et en Automatique Technical Report RR-2434. [25]. A. Liu, E. Bullitt and S. M. Pizer. Registration via skeletal near projective invariance in tubular objects. In Lecture Notes in Computer Science, vol. 1496, pp. 952-, 1998. [26]. C. M. Cyr, T. Sebastian and B. B. Kimia. 2d-3d registration based on shape matching. In IEEE Workshop on Mathematical Methods in Biomedical Image Analysis, 2000. [27]. J. B. A. Maintz. An overview of medical image registration methods. In Symposium of the Belgian hospital physicists association (SBPH-BVZF). vol. 12, pp. 1-22, 1997. [28]. H-Y. Shum and S. B. Kang. A Review of Image-based Rendering Techniques. In Visual Communications and Image Processing. IEEE/SPIE, pp.2-13, 2000. [29]. M. Levoy, and P. Hanrahan. Light Field Rendering, In Proceedings of SIGGRAPH 1996. pp. 31-42, 1996. [30]. S. Chen and L. Williams. View interpolation for image synthesis. Computer Graphics (SIGGRAPH’93). pp. 279288, 1993. [31]. J. F. Blinn, and M. E. Newell. Texture and Reflection in Computer Generated Images. CACM, vol. 19, pp. 542547, 1976. [32]. A. Watt. 3D Computer Graphics. 2000. [33]. N. Greene. Environment mapping and other applications of world projections. Computer Graphics and Applications. IEEE. vol. 6, pp. 21-29, 1986. [34]. D. Wagner, R. Wegenkittl and E. Gröller. Endoview: A phantom study of a tracked virtual bronchoscopy. Journal of WSCG, vol. 10, 2002. [35]. I. W. O. Serlie, F.M. Vos, R. van Gelder et al. Improved visualization in virtual colonoscopy using image-based rendering. Accepted for publication in Proceedings of ACM/Eurographics. VisSym, 2001. [36]. L. McMillan. An Image-Based Approach to ThreeDimensional Computer Graphics. UNC, 1997. [37]. M. Oliveira, G. Bishop, and D. McAllister. Relief Texture Mapping. In Proceedings of SIGGRAPH 2000, pp. 231-242, 2000.

[38]. M. ElHelw and G-Z. Yang. Cylindrical Relief Texture Mapping. Journal of WSCG, vol. 11, 2003. [39]. W. Mark. Post-Rendering 3D Image Warping: Visibility, Reconstruction, and Performance for Depth-Image Warping. University of North Carolina at Chapel Hill, 1999. [40]. O. Faugeras. Multiple View Geometry. 1993. [41]. R. Hartley and A. Zisserman. Multiple View Geometry in Computer Vision. 2000. [42]. U. Dhond, and J. Aggarwal. Structure from stereo - a review. IEEE Transactions on Systems, Manufacture and Cybernetics, vol. 19, pp. 1489-1510, 1989 [43]. D. Scharstein, and R. Sziliski. A Taxonomy and Evaluation of Dense Two-Frame Stereo Correspondence Algorithms. Internation Journal of Computer Vision, vol. 47, pp. 7-42, 2002. [44]. R. Y. Tsai. A versatile Camera Calibration Technique for High-Accuracy 3D Machine Vision Metrology Using Offthe-Shelf TV Cameras and Lenses. IEEE Journal of Robotics and Automation, vol. RA-3, pp. 323-344, 1987. [45]. R. Y. Tsai. An Efficient and Accurate Camera Calibration Technique for 3D Machine Vision. In Conference on Computer Vision and Pattern Recognition, CVPR, IEEE, 1986. [46]. Z. Zhang. A flexible new technique for camera calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence. vol. 22, pp.1330-1334, 2000. [47]. A. Fusiello. Uncalibrated Eucledian reconstruction: a review. Image and Vision Computing. vol. 18, pp. 555563, 2000. [48]. R. Hartley and P. Sturm. Triangulation. In Proceedings of ARPA IUW, 1994. [49]. B. K. Horn, and M. J. Brooks, Shape from Shading. 1989.

[50]. R. Zhang, PS. Tsai, J.E. Cryer, M. Shah. Shape from Shading: A Survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 21, 1999. [51]. C. H. Q. Forster, and C. Tozzi. Towards 3D Reconstruction of Endoscope Images Using Shape from Shading. In Proceedings of the XIII Brizilian Symposium on Computer Graphics and Image Processing, 2000. [52]. J. L. Marr. Vision: A Computational Investigation into the Human Representation and Processing of Visual Information. 1982. [53]. A. Blake, A. Zisserman, and G. Knowles, Surface descriptions from stereo and shading. Image Vision and Computing. vol. 3, pp. 183-191, 1985. [54]. S. Sakar, and K. L. Boyer, Perceptual organisation using Bayesian networks. In Conference on Computer Vision and Pattern Recognition. IEEE. 1992 [55]. S. Pankanti, and A. K. Jain. A uniform Bayesian framework for integration. In International Symposium on Computer Vision. 1995. [56]. J. J. Little, Integrating Vision Modules at Discontinuities, In 12th Canadian Symposium on Remote Sensing. 1989 International. IGARSS 189. vol. 3, pp. 1260-1263, 1989. [57]. H. I. Bozma, and J. S. Duncan. Integration of Vision Modules: A Game-Theoretic Framework. In Conference on Computer Vision and Pattern Recognition. IEEE. 1991. [58]. A. Jepson, and W. Richards. A Lattice Framework for Integrating Vision Modules, IEEE Transactions on Systems, Man and Cybernetics. vol. 22, 1992. [59]. J. Shah, H. H. Pien, and J. M. Gauch. Recovery of Surfaces with Discontinuities by Fusing Shading and Range Data Within a Variational Framework. IEEE Transactions on Image Processing. vol. 5, pp. 1243-1251, 1996.