Comparison of Structured Lighting Techniques with a ... - CiteSeerX

Comparison of Structured Lighting Techniques with a View for Facial Reconstruction Da An, Alexander Woodward, Patrice Delmas and Chia-Yen Chen CITR, University of Auckland, Dept. Computer Science, New Zealand. Email: [email protected]

Abstract A variety of Structured Lighting techniques are explored as a means to provide accurate and algorithmically efficient 3-D data acquisition. The investigation is motivated toward providing highly accurate 3-D facial reconstruction. Here, the practical implementation of a Structured Lighting test-bed is presented. The pursued algorithms are described and a comparative anaylsis of their performance is undertaken. Keywords: Structured lighting, gray code, pattern codification, stereo-vision.

1 Introduction This work investigates the implementation of a practical Structured Lighting test-bed for the acquisition of high accuracy 3-D data. Motivation is driven with a view towards 3-D human facial reconstruction due to its vast importance in a range of applications. 2-D image processing techniques have been widely used to reconstruct 3-D objects from photographs. Techniques such as Stereo Vision and Photometric Stereo find difficulty in reconstructing the 3-D points of a scene object accurately or without a complex formulation. Active vision methods such as Structured Lighting aim to solve the correspondence problem more quickly and easily. A variety of Structured Lighting patterns are explored and implemented. This paper focuses on binocular Structured Lighting techniques. From their results a comparison is conducted that analyses the efficacy of each particular technique on known depth information. This seeks to gain a quantitive evaluation as to what pattern codification scheme provides the best results.

2 Employed Techniques 2.1 Overview Structured Lighting is a well established approach for acquiring 3-D data. It is classed as an active vision technique due to the projection of light structure into a scene. The introduction of active illumination aims to simplify the surface reconstruction problem. This paper focuses solely on techniques which use stereo cameras as part of the system geometry. With a rectified stereo image pair the correspondence takes place in one dimension. This situation was adhered to for all computations. The projected light structure allows for coding of pixel positions, thus aiding in stereo correspondence. A compromise between the number of images required (for the case of more complex coding strategies) and subsequent uniqueness in pixel code should be considered. This is important when considering spatio-temporal 3-D acquisition of dynamic objects. No assumptions are made on the reflection model of an object. However, Structured Lighting faces challenges when dealing with cases of shadow or very bright/dark object regions.

Structured Lighting is becoming more viable and attractive due to the reducing cost of projector and camera hardware.

The options set forth for different lighting structures come under the collective term of ’pattern codification strategies’ [1].

Firstly, the implemented techniques are described. The practical setup of the lab environment is then given. Finally, a set of visualisations are provided along with quantitive anaylsis and commentary on the results.

The output of the considered Structured Lighting techniques exist as a disparity map in disparity space (x, y, d). Where x and y are spatial coordinates in the image frame and d is a measure of disparity.

Disparity refers to the positional difference in the projections of a scene point into the image frames of the stereo cameras. Visualising a disparity map (by rescaling into a valid image intensity range) shows points which are nearer having larger image intensities (see Section 4.2). The following subsections present the implemented Structured Lighting techniques used in this paper.

2.2 Time-multiplexed Structured Lighting with Gray Code This classical technique [2] uses a binary code to identify projected light planes. The consequence of this is a reduced number of images required to uniquely identify the planes in an image. A bit plane stack is created through timemultiplexing of a set of binary images. The binary code used for this procedure is the ’Gray Code’. This set of binary gray code images requires log2 (n) patterns to distinguish among n light planes. The light planes are projected onto the object and appear as profiles over its surface.

Figure 2: Example of the first 6 gray code bit planes.

2.3 Direct Coding with a Colour Gradation Pattern A colour pattern with a gradation between various hues is utilised (see Figure 3). The aim here is to provide more unique locations for the stereo correspondence process. For this procedure a traditional stereo algorithm is used (presented in Section 2.6). Due to its relatively low frequency the technique performs better on objects with uniform albedo.

2.4 Direct Coding with a Colour Strip Pattern A colour strip was created that involved high frequency changes. The aim was to provide a more distinctive pattern (and hopefully more unique) projected onto the object when compared to the Colour Gradation Pattern. The pattern is shown in Figure 3. As with the Colour Gradation Pattern, a traditional stereo algorithm is used (see Section 2.6).

2.5 Gray Code with Colour Strip Pattern Figure 1: The geometry of Structured Lighting with Gray Code. Eleven images were used for codification. Nine of these are Gray code patterns which allow for distinguishing 512 light planes in an image. One fully illuminated image and one not are additionally taken. These are used to gather estimates of the ’on’ and ’off’ states for thresholding and subsequent binary gray code determination. The projected patterns are composed at a resolution of 512 by 512 pixels. Gray codes have the advantage that spatially adjacent profiles have codes which change by only one bit. This knowledge can aid in the detection of erroneous code estimates in the resultant data.

From experimentation, a purely Gray Code solution (section 2.2) found difficulty in recovering certain object regions. This manifested in the result as an incorrectly deduced code (i.e. spike noise) for a set of pixels residing in said problem areas. When these areas of incorrect code are located, an improvement to the standard Gray Code algorithm is made by initiating a standard stereo correspondence algorithm on an image with a projected colour strip pattern. The following situations pose problems for recovery: 1. When the surface region is at an oblique angle with respect to the projector optical axis.

2. The existence of attenuated image intensity at the edge between ’on’ and ’off’ states of the code. A remedy for this would be to take the inverse gray code pattern, however this requires more images. 3. When there is a region in shadow. 4. The colour of the object is very dark (low albedo), or very light (high albedo), possesses strong specular reflection components or the presence of interreflections.

Figure 4: Colour Gradation and Strip projectioned onto a test subject.

3 Lab Setup 3.1 Cameras

Occlusions should be demarcated accordingly and not dealt with through stereo matching with the colour stripe pattern. The extension of a gray code approach could be pursued in a manner such as that presented by Caspi et al. [3].

A pair of Canon EOS 10D cameras were used for high resolution acquisition. These cameras operate at an effective 8.2 megapixel resolution. This allows for very dense disparity maps and accordingly a larger disparity range. The cameras have a measured focal length of 52 mm. The baseline separation between the two cameras is 200 mm.

Figure 3: Left, the Colour Gradation pattern, and right, the Colour Strip pattern.

2.6 Traditional Stereo Correspondence Algorithms A set of traditional dense two-frame stereo correspondence algorithms were used to provide for the techniques described in Sections 2.3, 2.4 and 2.5.

The main concern when using these cameras is the slow acquistion time that belies a potentially fast process when the appropriate hardware is available. With these cameras it is in the order of tens of seconds. Experimentation into alternatively using video cameras result in approximately one to two seconds for acquisition.

For sake of comparsion, these algorithms were also applied in a standard manner on a passively illuminated stereo image pair of the test object. The following algorithms were used: 1. SAD (Winner Takes all using the Sum of Absolute Differences): This is the standard window based local matching algorithm. The disparity which takes the minimal sum of absolute differences in a certain window region along a scan-line is chosen. 2. SSD (Winner Takes all using the Sum of Absolute Differences): A variant of the SSD algorithm using a different cost formulation. 3. DPM (Dynamic Programming Method): This global algorithm uses dynamic programming to find optimal 1-D disparity profiles along scan-lines. For a complete description and overview of these traditional stereo algorithms, refer to [4].

Figure 5: Views of the stereo rig with projector, the calibration object, and a test subject. The standard stereo geometry is used, allowing for simplified reconstruction formula. Once the cameras have been calibrated, only the additional knowledge of the baseline separation between the two cameras allows for reconstruction of the true 3-D surface points of an object. The test subject is placed approximately 1300 mm horizontally away from the cameras.

3.2 Light Projector An Acer LCD Projector (model PL111) was used used to project Structured Lighting into the scene.

The device is capable of projecting a resolution of 800 by 600 pixels and has a focal length of 21.5 ↔ 28 mm.

3.3 Calibration A cube shaped calibration object was used to calibrate the cameras. The object has 63 circular calibration markings distributed evenly over two of its sides. The classic Tsai calibration technique [5] was used for its proven effectiveness from prior experimentation. Refer to Figure 1 for the system geometry used for experimentation.

4 Results 4.1 3-D Visualisations As an example, Figure 6 presents a set of visualisations of the acquired results from using the Gray Code with Colour Strip technique. These visualisations were produced using the OpenGL graphics API.

A comparison is made between the resultant disparity maps and a ground truth of the two objects obtained by laser scanner. The ground truth is aligned with the acquired data using the Iterative Closest Point (ICP) algorithm. The ground truth data is then reprojected into disparity space for faster comparison. The measures of surface difference are shown in Tables 1, 2, 3 and 4. These provide quantitive insight into the overall ’correctness in surface shape’ for each Structured Lighting technique. From the data collected it can be seen quantitively that the Gray Code with Colour Strip projection provides the best results overall. The Colour Strip technique performs better than the Colour Gradation technique. This is due to the higher spatial frequency of the strip providing more visibly unique surface regions. The Colour Gradation technique is less beneficial for human faces where skin texture is nonhomogeneous. This is counter to the mannequin’s uniform albedo surface. All projected light techniques perform better than using a traditional stereo approach under passive illumination.

Figure 7: The two test subjects forming data sets ’MANNEQUIN’ and ’ANDY’ respectively.

5 Conclusion Figure 6: Example visualisations of the paper results.

4.2 Comparative Analysis Two test objects were used to gather results for all implemented algorithms; one static object in the form of a mannequin (data set ’MANNEQUIN’) and a human test subject (data set ’ANDY’). Figures 8 and 9 show resultant disparity maps. From visual inspection, it can be seen that the Gray Code based techniques provide the smoothest maps. Pure Gray Code struggles on the dark regions around the eye and hair regions. Gray Code with a Colour Strip pattern proved to cope better in these same regions than Gray Code alone.

This paper has presented an investigation and implementation of a set of Structured Lighting techniques with a goal to provide high accuracy 3-D data. This work forms a growing canon of implemented active vision algorithms for 3-D reconstruction. Various techniques were described and an analysis was presented to elucidate their performance on both a mannequin and a human test subject. 3-D human facial reconstruction was chosen as a point of interest due to its ever growing applicability in today’s world. From experimentation it was found that timemultiplexing techniques, such as binary encoding using Gray Code, provide the highest accuracy. Future goals seek to further improve accuracy and concurrently explore spatio-temporal 3-D acquisi-

Table 1: Statistics for dataset ’MANNEQUIN’ in comparison with ground truth. Method Gray Code with Colour Strip Gray Code SAD with Colour Gradation SAD with Colour Strip SAD normal image pair SSD with Colour Gradation SSD with Colour Strip SSD normal image pair DPM with Colour Gradation DPM with Colour Strip DPM normal image pair

Max difference 16 16 16.4 15.2 14.8 16.8 16 14 15.2 15.2 16

Mean difference 1.3 1.5 2.9 3.3 6.5 3.0 3.3 7.1 3.0 3.2 4.5

Variance

Std. Dev

23.0 25.2 31.9 51.7 121.4 31.7 42.4 125.6 36.0 41.7 56.0

4.8 5.0 5.6 7.2 11.0 5.6 6.5 11.2 6.0 6.5 7.5

Table 2: Percentages of disparity differences for dataset ’MANNEQUIN’ comparison to ground truth. Method Gray Code with Colour Strip Gray Code SAD with Colour Gradation SAD with Colour Strip SAD normal image pair SSD with Colour Gradation SSD with Colour Strip SSD normal image pair DPM with Colour Gradation DPM with Colour Strip DPM normal image pair