Robust Single-Shot Structured Light

0 downloads 0 Views 6MB Size Report
Figure 1: Example input images and color coded depthmap results. DP is the typical ... the code. A well known single-shot 3D scanning system is the one by Zhang et al. ..... 2006. 2. [7] C. J. Mitchell. Aperiodic and semi-periodic perfect maps.
Robust Single-Shot Structured Light Christoph Schmalz Siemens AG, CT GTF NDE Otto-Hahn-Ring 6, 81739 Munich, Germany

Elli Angelopoulou University of Erlangen-Nuremberg Martenstrasse 3, 91074 Erlangen, Germany

[email protected]

[email protected]

Abstract Structured Light is a well-known method for acquiring 3D surface data. Single-shot methods are restricted to the use of only one pattern, but make it possible to measure even moving objects with simple and compact hardware setups. However, they typically operate at lower resolutions and are less robust than multi-shot approaches. This paper presents an algorithm for decoding images of a scene illuminated by a single-shot color stripe pattern. We solve the correspondence problem using a region adjacency graph, which is generated from a watershed segmentation of the input image. The algorithm runs in real time on input images of 780x580 pixels and can generate up to 105 data points per frame. Our methodology gives accurate 3D data even under adverse conditions, i.e. for highly textured or volume-scattering objects and low contrast illumination. Experimental results demonstrate the improvement over previous methods.

(a) Dragon without texture

(b) Dragon with texture

(c) DP result without texture

(d) DP result with texture

(e) Our result without texture

(f) Our result with texture

1. Introduction Structured Light is a general term for many different methods of measuring 3D surface data. The underlying idea is to project a known illumination pattern on a scene. Shape information is then extracted from the observed deformation of the pattern (figure 1). The most basic hardware setup consists of one camera and one projector, but the details of the implementations vary widely. A good overview of the various proposed pattern designs can be found in [13]. Range data is generated by triangulating the camera ray and the projection ray to a point in the scene. This requires solving the correspondence problem: determining which pairs of rays belong together. One way to address this issue is temporal coding. Typical examples of this method are the Gray Code and the Phase Shifting techniques (e.g. [14]). In these methods, disambiguating information is embedded into a sequence of patterns that can be evaluated for each camera pixel separately. This imposes the limitation that the object may not move while the sequence is acquired. Another approach

Figure 1: Example input images and color coded depthmap results. DP is the typical decoding method based on Dynamic Programming [17].

to solve the correspondence problem is through the use of spatial coding, where the necessary information is encoded in a spatial neighborhood of pixels. This requires that the neighborhood stays connected, which means that the object must be relatively smooth. Nonetheless, spatial coding offers the advantage that only one pattern suffices to generate 3D data. This makes it particularly applicable to moving scenes. It allows the use of simpler hardware, which in turn allows easy miniaturization. This can then lead to structured lightbased 3D video endoscopes for medical, as well as industrial 1

applications. We propose a new Single-Shot Structured Light method that offers a combination of high speed, high resolution and high robustness. Our contributions are: • superpixel representation of the input image • construction of a region adjacency graph • pattern decoding on the region adjacency graph • quantitative results on synthetic reference scenes with ground truth • qualitative results on real images

2. Single-Shot Structured Light The design of Single-Shot Structured Light systems involves a fundamental tradeoff between resolution and robustness. For high resolution, small pattern primitives are needed, but the smaller the primitives, the harder it is to reliably detect them. Therefore, many different single-shot patterns have been proposed. Most designs are based on pseudorandom sequences or arrays [10],[7]. They have the property that a given window of size N or NxM occurs at most once. This is known as the window uniqueness property. Observing such a window suffices for deducing its position in the pattern. Another design decision is the alphabet size of the code, i.e. the number of different symbols that are used. Ideally one wants to use a large alphabet for a long code with a small window size. However, the smaller the distance between individual code letters, the less robust the code. A well known single-shot 3D scanning system is the one by Zhang et al. [17]. The color stripe pattern used in that system is based on pseudorandom De Brujin sequences [1]. The decoding algorithm works per scanline and is based on Dynamic Programming. Its largest drawback seems to be the high processing time of one minute per frame. However, our re-implementation and testing of this method (see figure 1 and section 5) revealed deficits in robustness as well. Another pattern type is based on the so-called M-arrays [9], but it results in comparatively low resolution with depthmaps of only 45x45 pixels. Koninckx et al. [6] present an interesting approach based on a black-and-white line pattern with one oblique line for disambiguation. It runs at about 15Hz and the resolution is given as 104 data points per frame, which is also relatively low. A recent paper by Kawasaki [5] uses a pattern of vertical and horizontal lines. It is one of the few articles containing quantitative data about the accuracy of the methodology, which is given as 0.52mm RMS error on a simple test scene. The runtime is 1.6 sec per frame, but there is no information on the number of points reconstructed per frame. The paper

by Forster [4] also has hard performance figures. They use a color stripe pattern with scanline-based decoding, running at 20Hz and with up to 105 data points per frame. The RMS error of a plane measured at a distance of 1043mm is given as 0.28mm. The Forster system offers a good compromise of resolution and speed. However, the scanline-based decoding leads to suboptimal robustness. It is decomposing a 2D problem into a series of 1D problems and thus losing a considerable amount of spatial information. Consider for example the scene shown in figure 3. Although the order of the stripes is clearly visible it cannot be determined in a 1D scan because of the holes in the object. In comparison, we apply full 2D decoding using a region adjacency graph. Hence, our approach combines the high robustness with high resolution and high speed. In general, color stripe patterns offer a good compromise between robustness and resolution. Their primitives are onedimensional, so resolution is lost only in one direction. Color checkerboard patterns seem to double the resolution but their use requires triangulation in two dimensions, which significantly increases the complexity of the system. Additionally, triangulation at the intersections of edges is less precise, so the potentially higher resolution is mostly lost again. We therefore chose to project color stripes.

3. Superpixel Representation Our goal now is to identify the color stripes in the camera image. We observe that the pixel representation is redundant: many pixels have approximately the same color. We can, therefore, reduce the complexity of the image representation by using superpixels instead. The Watershed Transform offers a computationally efficient way to achieve this. It was popularized in image processing by Vincent and Soille [16]. A number of variations of the basic algoritm have been proposed [12]. The basic idea is that pixel values (usually the magnitude of the gradient of the original image) are treated as height values. As this 3D surface is gradually flooded, water collects in valleys separated by ridges. An image can thus be segmented into a set of distinct catchment basins, also known as regions. The advantages of the watershed transform are that it is unsupervised, parameter free and fast. It is, however, a low-level method that produces severe oversegmentation. Usually, an area perceived as homogeneous will be broken up into many small individual regions due to noise. For our system this is immaterial: Our goal is to represent the image with superpixels that are internally uniform and thus significantly reduce image complexity. This, in turn, allows us to use graph-based decoding algorithms in real-time. An additional advantage of superpixel-based representations is that properties like color are defined as statistics over an ensemble of ’simple’ pixels, thus reducing the effects of defocus blur and noise. Figures 2 and 3 show the input and output of the watershed transform.

Figure 2: Example plot of the gradient magnitude of a stripe pattern. It is the input to the watershed transform.

directly, but only color changes between two neighboring regions. If the surface color is constant across the two regions, its influence will be cancelled out. If it is not, spurious color changes may be detected. However, our decoding algorithm is explicitly designed to handle them. A second effect is that the surface color influences the relative response of the different color channels. For example the blue channel will appear very weak compared to green and red on a yellow surface. So 10 digits change in blue may be more significant than 20 digits change in red. Our pattern is designed so that each color channel changes after at least every second stripe. We know, therefore, that each valid superpixel must have at least one neighbor where a given channel changes. Thus, we can define the color range per channel as the maximum absolute change over all neighbors and use it to normalize the color change. dinv = i

(a) Input image

4. Graph-Based Pattern Decoding Once the image is segmented into superpixels, we can perform the core pattern decoding algorithm. It consists of the following series of steps. For details see the subsequent subsections.

(1)

k

(b) Resulting superpixels

Figure 3: Example of a watershed segmentation performed on a scene illuminated by a stripe pattern

1 max|cki |

Where ci denotes the color change in the individual channels and k iterates over all neighbors of a superpixel. The key assumption for this equalization is that each region is as wide as a stripe in the camera image and thus directly borders with both neighboring stripes. This is a reasonable conjecture since in our pattern the stripes are very densely packed in order to produce a high resolution depth map. Empirical evidence shows that the condition is nearly always met. Recovering the specific position of a stripe in the pattern can then be performed via graph traversal once the edges and their weights are appropriately set up.

1. Build the region adjacency graph. 2. Calculate the color change over each edge. Assign edge symbols and error estimates. 3. Find a unique path of edges. 4. Recursively visit all neighbors in a best-first-search, while the edge error is sufficiently low.

4.2. Edges The edges of the graph describe how the color changes between two adjacent superpixels. The raw color change Cˆ is a three-element vector. The scalar edge weight w is defined to be its L∞ norm: T Cˆ = [ˆ cr cˆg cˆb ] ∈ R3

(2)

ˆ ∞ = max|ˆ w = ||C|| ci |

(3)

4.1. Vertices The region adjacency graph has one vertex for every superpixel. A one megapixel image of a scene illuminated by our color stripe pattern is typically represented by a graph with 50000 vertices. Each superpixel has an associated color. It is determined by a robust nonlinear rank filter over all the original image pixels that belong to the superpixel (figure 3). Since color is a vector we use marginal ordering [11]. The color is additionally corrected for the color crosstalk that occurs in the camera, using a technique similar to [3]. In general, the observed superpixel color is not the original projected color, but rather a color distorted by the object’s reflectivity. Therefore, we cannot use the observed color

i

We would like to assign each element of Cˆ as belonging to one of three categories: channel rising, constant or falling. Since we use an alphabet with two intensity levels per channel, there are only three possible labels. We denote triples of labels by symbols, e.g. the symbol for red rising, green T falling, blue constant is S = [+1 − 1 0] . An alternative representation is R+G-. The actual assignment of these symbols involves some preparation. To equalize the response across the three channels we first multiply each component of Cˆ by its corresponding inverse range dinv from eq. 1. i

e is then normalized so that the The equalized color change C maximum absolute channel value is 1. C=

e C Cˆ ⊗ Dinv = w ||Cˆ ⊗ Dinv ||∞

(4)

Here ⊗ denotes the elementwise multiplication of two vectors. We define the symbol match error Ematch associated with assigning symbol S to C as: sX Ematch (C, S) = et (ci , si )2 (5)

Figure 4: Importance of the secondary gradient. The upper and the lower half of the image belong to different objects. Some stripes appear to be continuous in the secondary direction when in fact they are not. The window size of the code is 5.

i

with the single channel error et defined as:  1+ci  if si = −1  1−t |ci | et (ci , si ) = if si = 0 t   1−ci if si = +1 1−t

(6)

where t is a threshold value below which color changes are considered insignificant. One could use t = 13 for an even partitioning of the interval [−1; +1]. We can now finally define the optimal edge symbol with the lowest possible error by   −1 if ci ≤ −t si (ci , t) = 0 if − t < ci < t (7)   +1 if ci ≥ t Note that there can be several symbols that fit almost equally well. In later steps the algorithm can also assign a suboptimal symbol (in the sense that it has a higher match error than the optimal symbol) if necessary. The match Ematch alone can not correctly capture color transitions that occur at occlusion boundaries. Consider for example the case shown in figure 4. The blue, yellow, magenta and green stripes appear to be vertically continuous, but are not. At the border of the two objects a high secondary gradient occurs, even though the colors of the regions above and below the boundary are very similar. To capture this information, we define P d | ds I| e Egradient = P (8) d | dp I| e

d d where ds I and dp I denote the components of the image gradient in secondary and primary direction as shown in figure 4. The index e iterates over all edge pixels. Thus, the total edge error for a given graph edge and symbol is then

E = Ematch + αEgradient

(9)

with a suitable proportionality constant α, typically set to 0.2.

Furthermore, each such edge has to be classified according to its direction in relation to the primary direction of the pattern. The edge direction can be forward, backward or parallel to the pattern. We perform a line fitting over all the pixels associated with the image edge. The position of the superpixel centroids relative to the line allows us to label the directions. The edges consist of only a few pixels each, so the line approximation works well.

Figure 5: Illustration of edge direction assignment. The location of the region centroids relative to the fitted lines gives the edge direction. In this case, from the viewpoint of the red region, the edge to the cyan region is “backward” and the edge to the blue region is “forward”.

4.3. Graph Traversal In our methodology, decoding the pattern is equivalent to finding the correspondence between the vertices of the region adjacency graph and the pattern primitives in the projected image. The window uniqueness property of the pattern makes this association possible. Identifying the position of a subsequence within a longer code can be done analytically [8]. However, it is easier and faster to simply use pre-computed lookup tables which store all the locations where a given primitive occurs. In our case the primitives are colored stripes and the windows used for identification are represented as sequences of edge symbols. To find the correspondences between graph paths and stripes in the pattern, we first sort all the edges in the adjacency graph by their edge error E, see eq. 9. The lowest error edge is selected as the starting point of a new graph path. The set of its possible positions in the pattern is determined by its optimal symbol S. These positions, in turn, determine the next edge symbols that could be used to extend

the path. If one of the two end-vertices of the path has a qualified edge we add it to the path. To qualify for extending the path an edge must: a) have the correct direction and b) its error E for the desired symbol must be less than a certain user-defined threshold A which controls the ’aggressiveness’ of the decoding algorithm. The value of A depends on t from eq. 6 and α from eq. 9, but is typically 0.5. A number of possible positions that would have needed different neighboring edge symbols to continue the path are invalidated by adding a given edge. This process is repeated until only one possible position remains. This happens, at the latest, when the path length equals the unique window length of the pattern. When there is more than one edge that can be added, the one with the lowest error is selected. If there are no valid edges to add, we start again with a new edge. Once a unique path has been found, the associated pattern positions are uniquely determined as well. We pick an arbitrary seed vertex on the path and propagate the pattern position to its neighbors. The neighbors are visited in a bestfirst search as follows. If the direction of the edge between the two vertices is correct and the edge error is smaller than A, we add the neighboring vertex to an ’open’ heap. The edge symbol used to calculate the edge error may be different from the optimal symbol, as long as the error is below the threshold. Additionally, we maintain a token bucket that can be used to temporarily exceed the error threshold. If the bucket is not full, tokens are collected when an edge with an error lower than A is encountered. Available tokens are subtracted from the edge error when it is higher than A. This allows us to tolerate isolated bad edges. When all neighbors of a vertex have been visited, we continue the identification process with the best neighbor on the heap, i.e. the one with the lowest edge error. When the heap is empty, the propagation stops. If there are unused edges left, we begin a new pass and try to assemble a new unique path starting with the best unused edge. The quality of a position assignment is captured in the position error. It is defined as:   βEkmatch gradient Q = min + Ek (10) k wk where β is a suitable scale factor (typically 255 since that is the maximum edge weight with 8 bit colors) and k is the neighbor index as before. The inclusion of the edge weight w reflects the fact that high-weight edges are less likely to be disturbed by noise. At occlusion boundaries two conflicting pattern positions may occur. In that case, the one with the lower position error is chosen. Note that, because of the normalization in the edge symbol calculation every edge gets a symbol, even if it connects two regions of equal color. The color change in this case is only noise, but that is not known beforehand. In a stripe pattern there are many such null edges which have to be

detected indirectly. This is illustrated in figure 6. The edge between regions b and c has a low weight relative to the edge between a and b. In such a case we calculate the optimal symbol of the (possibly virtual) edge between a and c. If it is identical to the edge symbol assigned between a and b, region c has the same color and the same pattern position as b. This scheme is independent of local contrast. Our decoding algorithm is local. It starts at a certain edge and recursively visits the neighboring vertices in a best-first-search. We also experimented with an MRF-based graphcut optimization algorithm [2] for finding the globally optimal labeling of the vertices. However the results were less accurate because of complexities of reliably modeling long-range interactions. Furthermore, the runtime of the MRF method was much higher due to the large number of possible labels, which is typically more than 100. An example subgraph with assigned edge symbols is shown in figure 6. The bold edges could actually be used, the dashed ones were ignored. In the supplemental material the we show an animation of the decoding algorithm at work. There may be shadow areas in the image where there is no pattern to decode. It is statistically possible that a valid edge sequence can still be detected, but it is extremely unlikely that the growth process will lead far. We can, therefore, easily suppress these false positives if an insufficient number of valid neighbors is found.

Figure 6: An example subgraph of the region adjacency graph

5. Results The proposed decoding method is robust because of the rank filtering used to assign the superpixel colors and the 2D decoding algorithm that allows it to cope with non-smooth

surfaces and textured materials. The edges used during the stripe identification process need not conform to scanlines, which makes it possible to overcome disruptions in the observed pattern. We evaluate the performance quantitatively on synthetic images and qualitatively on real objects. When generating depth data we work on the original images. Thus, in the following, edges refer to edges in the image, not in the region adjacency graph, unless explicitly stated. We iterate over the segmented image looking for regions that belong to consecutive pattern positions. If a pair of such regions is found, we calculate the exact location of the edge between them with subpixel precision by interpolating the gradient. Depth values are then determined via ray-plane intersection. Precision depends on several factors like camera resolution, triangulation angle and the quality of the projector. We found a best case standard deviation from the plane of 0.12mm with the following setup: projector resolution 1024x768, camera resolution 1388x1038 with pixel size 6.45 microns square, baseline 366mm, triangulation angle 19.2 degrees, working distance roughly 1000mm. The standard deviation was calculated over 2700 samples, i.e. on a small patch of the image, excluding possible calibration errors. Doing a comparative evaluation of the performance of a Structured Light system is difficult because different methodologies use different patterns. There are thus no standardized test images as, for example, for stereo algorithms. To test the performance of our method, we therefore created in Povray a number of synthetic test scenes with known ground truth. There are also no publicly available implementations of other algorithms. Thus for comparison purposes we re-implemented [17], one of the most widely known Single-Shot Structured Light methods. The virtual objects are located 1000mm from the camera. The images were corrupted with different levels of additive white Gaussian noise. One test scene is shown in figure 7. It is heavily textured and non-planar. Other synthetic test scenes are available on the internet [15]. No smoothing was used on the depth data. Outliers are defined as deviations from the ground truth by more than 10mm. The precision of the depth data is shown in figure 8. For acceptable noise levels and contrast, the standard deviation is around 1/1000 of the working distance. For the simulated geometry and camera resolution an edge localization error of 1 pixel results in a depth error of about 4mm. Thus the given standard deviations correspond to about 1/4 of a pixel. This is quite good considering the significant amount of noise in the images. There is a small systematic error of about 0.2mm or 1/5000 in the depth values, corresponding to about 1/20 of a pixel. The exact reason for this has yet to be determined. It is probably an artifact of the simulation, which is sensitive to aliasing. For the simulated test scenes the calibration data of the sensor is exactly known.

The comparison of our results with the Dynamic Programming-based decoding shows both a large increase in inliers and a marked decrease in the number of outliers. Under less-than-perfect image conditions the DP approach [17] produces unreliable results, especially considering that the results shown in figure 9 are for the lowest noise level. Note that we did not confirm the authors’ claim that violations of the ordering constraint can be solved by multi-pass dynamic programming. Many edges were falsely identified in the 1st pass and were therefore not eligible for a 2nd pass. Our method in contrast is truly independent of the ordering constraint. The runtime of the DP algorithm was given as 1 min per frame in the original paper. Our implementation takes 5 seconds (on a faster processor), but that is still considerably more than the 100ms required by the proposed algorithm.

(a) Full view

(b) Detail view

Figure 7: Test scene ’sun’ at medium noise level

(a) Mean error for various levels of noise and contrast

(b) Standard deviation for various levels of noise and contrast

Figure 8: Evaluation of measurement error on the test scene ’sun’ To demonstrate the scalability of the proposed 3D scanning approach, we show results obtained with a desktop

(a) Number of inliers for various levels of noise and contrast

(b) Number of outliers for various levels of noise and contrast.

Figure 9: Comparison of the results of our algorithm and the DP approach on the test scene ’sun’. The reference is the performance of the DP decoding algorithm a the lowest noise level.

system (figure 10) and results from a prototype miniature design based on a projection slide (figure 12). Again, no postprocessing was applied. The DP decoding method has problems with the saturated colors and texture of the book cover and the doll’s eye, but the proposed algorithm works almost everywhere. Skin has no strong texture but is a difficult surface to measure because of its subsurface scattering and the frequent specularities. Nevertheless, the pattern could be completely decoded and the accuracy of the depth data is good enough to discern the ridge lines of the fingerprint. The fan in figure 11 is also a challenging object because it violates the local smoothness assumption. Two-dimensional decoding still succeeds in most areas.

6. Conclusion and Future Work We presented a robust algorithm for the decoding of Single-Shot Structured Light patterns that outperforms the previous method [17] in the number of data points generated and the number of outliers that have to be rejected. These improvements are due to: a) the superpixel representation of the image that allows robust filtering of the color and b) the use of a true 2D decoding algorithm that does not have to work along scanlines. It works well down to a pattern contrast of only 0.3 and can run at 15 Hz with input images of 780x580 pixels on a 3Ghz machine, generating up to 105 data points per frame. The typical accuracy is 1/1000 of the working distance, the best case accuracy is 1/10000. The sys-

(a) Input image ’book’ with strong texture

(b) Input image ’doll’ with strong texture

(c) Color coded depth map of the book, generated with the proposed algorithm. Range is 35 mm.

(d) Color coded depth map of the doll, generated with the proposed algorithm. Range is 40 mm.

(e) Depthmap of the book, generated with the DP approach

(f) Depthmap of the doll, generated with the DP approach.

Figure 10: Book and doll results. DP decoding results in large holes and erroneous depth values.

tem works with almost arbitrary scenes and is scalable from working areas of less than 10mm square to 1000mm square. We have also demonstrated its miniaturization potential via 3D fingerprinting. We are already working on adapting our method to endoscopic imaging.

References [1] F. Annexstein. Generating de bruijn sequences: An efficient implementation. IEEE Transactions on Computers, 46(2):198–200, 1997. 2 [2] Y. Boykov, O. Veksler, and R. Zabih. Fast approximate energy minimization via graph cuts. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(11):1222–1239, Nov. 2001. 5 [3] D. Caspi, N. Kiryati, and J. Shamir. Range imaging with adaptive color structured light. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(5):470–480, May 1998. 3

(a) Input image ’fan’ with non-smooth geometry

(a) Input image ’fingertip’. Volume scattering and specularities make skin a difficult surface.

(b) Color coded depth map of the fan, generated with the proposed algorithm. Range is 200 mm.

Figure 11: Fan results [4] F. Forster. A high-resolution and high accuracy real-time 3d sensor based on structured light. In 3D Data Processing Visualization and Transmission, International Symposium on, pages 208–215, Los Alamitos, CA, USA, 2006. IEEE Computer Society. 2 [5] H. Kawasaki, R. Furukawa, R. Sagawa, and Y. Yagi. Dynamic scene shape reconstruction using a single structured light pattern. In Proc. IEEE Conference on Computer Vision and Pattern Recognition CVPR ’08, pages 1 –8, june 2008. 2 [6] T. P. Koninckx and L. Van Gool. Real-time range acquisition by adaptive structured light. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(3):432–445, March 2006. 2 [7] C. J. Mitchell. Aperiodic and semi-periodic perfect maps. IEEE Transactions on Information Theory, 41(1):88–95, Jan. 1995. 2 [8] C. J. Mitchell, T. Etzion, and K. G. Paterson. A method for constructing decodable de bruijn sequences. IEEE Transactions on Information Theory, 42(5):1472–1478, Sept. 1996. 4 [9] R. A. Morano, C. Ozturk, R. Conn, S. Dubin, S. Zietz, and J. Nissano. Structured light using pseudorandom codes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(3):322–327, March 1998. 2 [10] K. G. Paterson. Perfect maps. IEEE Transactions on Information Theory, 40(3):743–753, May 1994. 2 [11] I. Pitas and P. Tsakalides. Multivariate ordering in color image filtering. IEEE Transactions on Circuits and Systems for Video Technology, 1(3):247–259,295–6, Sept. 1991. 3

(b) Deviation of the interpolated depth data from the plane. The ridges of the fingerprint are clearly visible.

Figure 12: Fingertip results

[12] J. B. T. M. Roerdink and A. Meijster. The watershed transform: definitions, algorithms and parallelization strategies. Fundam. Inf., 41(1-2):187–228, 2000. 2 [13] J. Salvi, J. Pagès, and J. Batlle. Pattern codification strategies in structured light systems. Pattern Recognition, 37:827–849, 2004. 1 [14] G. Sansoni, M. Carocci, and R. Rodella. Three-dimensional vision based on a combination of gray-code and phase-shift light projection: Analysis and compensation of the systematic errors. Appl. Opt., 38(31):6565–6573, 1999. 1 Light Survey Website. [15] Structured http://www.structuredlightsurvey.de, 2009. 6 [16] L. Vincent and P. Soille. Watersheds in digital spaces: An efficient algorithm based on immersion simulations. IEEE Transactions on Pattern Analysis and Machine Intelligence, 13(6):583–598, June 1991. 2 [17] L. Zhang, B. Curless, and S. M. Seitz. Rapid shape acquisition using color structured light and multi-pass dynamic programming. In Proc. First International Symposium on 3D Data Processing Visualization and Transmission, pages 24–36, 19–21 June 2002. 1, 2, 6