Matching color uncalibrated images using ... - Semantic Scholar

6 downloads 34275 Views 1MB Size Report
403 matches - Keywords: Uncalibrated color images; Point of interest detector; .... Feature vectors of a point (the center of the rotation) in Lizard ... We call it an.
Image and Vision Computing 18 (2000) 659–671 www.elsevier.com/locate/imavis

Matching color uncalibrated images using differential invariants P. Montesinos a,*, V. Gouet a, R. Deriche b, D. Pele´ c a

EMA/LG12P—Site EERIE, Parc scientifique Georges Besse, 30035 Nimes Cedex 1, France b INRIA, route des Lucioles, 06560 Sophia-Antipolis Cedex, France c France Te´le´com CCETT/CNET/DIH, 4 rue du Clos Courtel, BP 59, 35512 Cesson Se´vigne´ Cedex, Rennes, France Received 9 December 1998; received in revised form 31 July 1999; accepted 26 October 1999

Abstract This paper presents a new method for matching points of interest in stereoscopic, uncalibrated color images. It consists in characterizing color points using differential invariants. We define additional first order invariants, using color information, and we show that the first order is sufficient to make the characterization accurate. The characterization thus obtained is invariant to orthogonal image transformations. In addition, we make it robust enough for affine illumination transformations. We go on to present a generalization of a gray-level corner detector to the case of color images. Thirdly, we propose a robust and fast incremental technique for matching points of interest in uncalibrated cases, which works robustly and rapidly whatever the number of points to be matched. Our matching scheme is evaluated using stereo color images consisting of many points, with viewpoint and illumination variations. The results obtained clearly show the relevance of our approach. 䉷 2000 Elsevier Science B.V. All rights reserved. Keywords: Uncalibrated color images; Point of interest detector; Differential invariants; Stereo matching

1. Introduction This work addresses the problem of matching stereoscopic, uncalibrated color images, with the aim of doing transfer, i.e. computing synthetic images having intermediate points of view. Primitives matching is one of the main tasks in Computer Vision, and research in this domain is still active, see for example Refs. [1–5]. Many techniques to find candidate matches in a pair of images have been studied; they can be separated into two classes: first the feature matching approaches, which extract salient primitives from images, such as edge segments or contours, and match them. These approaches do not seem to be suited for the recovery of an unknown epipolar geometry, in the sense that this needs very precise point correspondences. So we will not use them here. The iconic approaches are more adapted to our work. They use the signal information directly for matching points. One of the most popular techniques is the correlation method [1,5,6]. Other more recent methods, working on gray-level images and usually used for image indexing, consist in characterizing points by using differential invariants of the signal [7–10]. In this paper, we revisit point characterization involving differential invariants, because its main interest is that it is invariant to orthogonal image transformations. The invariants have * Corresponding author.

been defined up till now only for gray-level images and used to be computed up to order 3 to characterize points efficiently. We show here that using color images can robustly improve this characterization, not only by multiplying the gray-level features but also by providing new invariants of the first order. We show that color information allows us to reduce the characterization to the first order. Then point descriptions obtained are more robust than the existing ones, but almost as rich. In addition, we present a method to make it invariant to variations in illumination. As we use only first order derivatives for characterization, we propose a precise and stable first order detector of points of interest. It corresponds to a generalization of a gray-level corner detector (the Harris detector) to the case of color images. Then we propose our method of matching using these color points and this color characterization. The main drawback of this matching method—and of the majority of the existing ones—is that it becomes unworkable with large sets of points, because of its complexity. Realizing dense maps between images for instance becomes problematic. So we enrich our matching process by proposing an incremental version. The idea is to use geometrical constraints independent of the main transformations allowed between images. We show that this improvement efficiently reduces the complexity of the matching method, and makes the number of possible ambiguities smaller.

0262-8856/00/$ - see front matter 䉷 2000 Elsevier Science B.V. All rights reserved. PII: S0262-885 6(99)00070-0

660

P. Montesinos et al. / Image and Vision Computing 18 (2000) 659–671

We also make it invariant to linear illumination variations in Section 2.3. 2.1. Invariant image attributes

Fig. 1. Feature vectors of a point (the center of the rotation) in Lizard images differing in a rotation of 35⬚. We see clearly that ~vcol is invariant to rotation.

Our matching scheme decomposes into four stages: in Section 2, we define a new differential characterization of points of interest for color images, which is made invariant to image orthogonal transformations and to affine transformations of illumination. In Section 3, we present our corner detector for color images. In Section 4, we introduce a new point matching process, working with our color invariants, and in Section 5 an incremental technique for matching large sets of points rapidly and efficiently. In Section 6, the matching results we obtain demonstrate the validity of our point characterization and matching technique on images which differ from some of the usual image transformations. Also, results on large sets of points show the relevance of our incremental matching technique.

For our color-based stereo application, we are interested in using gray-level image attributes which are invariant with respect to as large a group of transformations as the orthogonal and affine one. In this article, we will mainly consider the case of orthogonal transformation of image coordinates and affine transformation of image intensities. As has been shown by Hilbert [8], any invariant of finite order can be expressed as a polynomial function of a set of irreducible invariants. Considering a scalar image, these invariants from the fundamental set of image primitives that have to be used in order to describe all local intrinsic properties. This set is well known for first and second order properties [4,11] and is better expressed in a system of coordinates—no longer linked to rotation the way the well-known Gauge coordinate is—as follows: L; Lh ; Lhh ; Ljh ; Ljj

…1†

where h is the unit vector given by h ˆ 7I=兩7I兩 and j ⬜ h: (Note that within this system of coordinates, we have Lj ˆ 0:) Sets of higher order are increasingly complicated. If one wants to use these attributes in a matching process, it is worthwhile noting that it might be better, from a geometrical and/or numerical point of view, to consider some other combination of these five invariants rather than just considering the ones given by Eq. (1). For example, the five following invariants used at several scales perform quite

2. Characterization with differential invariants Among iconic methods, correlation methods produce very good results but are very time consuming and very restrictive because of the small transformations they allow between images. In contrast, methods based on differential invariants are more robust in the case of rotations between images and are also faster to compute. But in gray-level images, it is often necessary to consider invariants up to the third order [4,9,10] to obtain good characterization of the primitives, and these invariants are difficult to estimate in stable fashion. Our work consists in using differential invariants for point characterization. We revisit them in Section 2.1 and we introduce new first order invariants specific to color. In Section 2.2, we present a characterization which overcomes the main drawbacks of the classical invariants, by considering only first order derivatives and color information. This new description is robust and invariant to orthogonal transformations of image.

Fig. 2. An example of local normalization. The images in the top line differ from the intensities associated with our model and the second line shows them after normalization (using a diameter of window equal to 21. The corresponding window is drawn in red on the keyboard in the two images).

P. Montesinos et al. / Image and Vision Computing 18 (2000) 659–671

661

Fig. 3. Two matched points on Lizard images differing in linear change of intensities with D11 ˆ 0:5; D22 ˆ 0:4; D33 ˆ 0:3; T1 ˆ 0:3; T2 ˆ 0:2 and T3 ˆ 0:1: The two left-hand diagrams represent ~vcol of two matched points computed on non-normalized images while the ones at the right ~vcol use local normalization. We see clearly that the pre-processing makes the feature vector invariant.

• The complexity of the method is low since only 14 images are computed …R; Rx ; Ry ; 兩7R兩; G; Gx ; Gy ; 兩7兩; RB; Bx ; By ; 兩7R兩; 7R·7G; 7R·7B†: The use of second order derivatives would need much more image computation; it would prove problematic, because we use a subpixel Gaussian filtering [12].

well for matching two gray-level images: 8 > Intensity > > > > > Laplacian > > > > > > Gradient magnitude >
> Isophote curvature > > > > > > > > > > > : Flowline curvature

⫺Ix2 Iyy

Ijj Ih

Ix Iy ⫹

Ijh Ih

⫹ 2Ix Iy Ixy ⫺ Iy2 Ixx …Ix2 ⫹ Iy2 †3=2

…Iyy ⫺ Ixx † ⫹ Ixy …Ix2 …Ix2 ⫹ Iy2 †3=2



Iy2 †

…2† Considering a set of color image {R; G; B} and the group of rotations (specified by just one parameter, i.e. the rotation angle), the set of invariants for first and second order will include 6 × 3 ⫺ 1 ˆ 17 invariants. These may include the fifth invariants for each color channel, and two additional invariants that may be chosen from the following set: 7R·7G

7R·7B

7G·7B

2.2. Characterization using first order invariants and color information Our idea is to use Hilbert’s invariants which involve only derivatives up to the first order. The characterization obtained with gray-level images would be too poor, but we show here that it is widely compensated by color information. We obtain a vector of invariants against translation and rotation containing 2 × 3 ⫹ 2 ˆ 8 components. We call it ~vcol : 2

2

2

~vcol …~x; s† ˆ …R; 兩7R兩 ; G; 兩7G兩 ; B; 兩7B兩 ; 7R·7G; 7R·7B†

T

…3† Using only first order invariants presents two main advantages: • It allows very robust characterization with regard to noise (first order derivatives).

In Fig. 1, the reader can compare the feature vectors ~vcol obtained for two matched points in images differing in 2D rotation. 2.3. Color constancy A scene can be submitted to two types of lighting change: first, the intensity of each hue of the emitted light can be modified. We call this internal change in the light. Secondly, the light source can be moved in the scene. We call it an external light alteration. In this article, we take into account only the internal changes. For modelization of external changes, the reader can consult Ref. [13]. In order to obtain robust characterization of an image by color, we make our feature vector ~vrot presented above invariant to internal changes of intensity. This requires first and foremost good estimation of a model for these transformations. According to Finlayson’s [13] recent work on color constancy, we use his diagonal model with an additional translation vector to get linear invariance to intensity. We obtain: ~x 0 ˆ D~x ⫹ T

…4†

where ~x ˆ …r; g; b† is the pixel color and x~ ˆ …r ; g ; b 0 †T its linear transformation, D…3×3† is a diagonal matrix and T…3† a translation vector. Other more complex models exist (linear with translation for example) but this one seems to provide the best quality/complexity ratio. According to this model, there are two methods for making characterization invariant. The first solution is to modify our vector ~vrot to make it invariant to these T

0

0

0

662

P. Montesinos et al. / Image and Vision Computing 18 (2000) 659–671

tackling this problem. For each pixel of the processed channel, the extrema are locally computed in a circular window centered on the pixel. With this improvement, the local properties of the pixels are preserved. This approach needs just one parameter: the diameter of the window to be considered; the more the images differ, the smaller this parameter must be. We have estimated three correlation scores on some points between the color channels of images before and after this local normalization. The results show that the normalization process does not modify the correlation scores obtained. It is due to the fact that each channel is locally normalized independently of the two other ones. Consequently, the eight invariants of ~vcol stay meaningful after normalization. The reader can see in Fig. 3 an example of ~vcol according to the normalization. 3. Extraction of points of interest We present in this section a generalization of a gray-level corner detector to the case of color images. As we use only first order derivatives of images for matching, we need a precise and stable first order detector. We have chosen the precise Harris [14,15] one because it is one of the most stable point detectors—as Ref. [14] showed; we generalize it to color data. Given a matrix M: 2

Mˆ4

Fig. 4. Harris color points of images differing in viewpoint and illumination (natural change of intensity).

Ss …R2x ⫹ G2x ⫹ B2x †

Ss …Rx Ry ⫹ Gx Gy ⫹ Bx By †

Ss …Rx Ry ⫹ Gx Gy ⫹ Bx By †

Ss …R2y ⫹ G2y ⫹ B2y †

5

where Ss ( ) stands for a Gaussian smoothing. The Harris color operator can be defined by positive local extrema of the following operator: Det…M† ⫺ K Trace2 …M†

transformations. The model proposed here has six degrees of freedom …D11 ; D22 ; D33 ; T1 ; T2 ; T3 †: Using this solution, our eight-invariant vector would be reduced to one having 8 ⫺ 6 ˆ 2 invariants—which would of course leave it too poor to characterize points efficiently. This solution will not be considered. The second approach takes into account the six parameters of the model without losing the richness of ~vrot : It consists in normalizing the images in order to make them independent of the affine model. The entire vector ~vrot will then be computed with these normalized images. For each color plane {R; G; B}; the gray-value of each pixel is re-expressed in, a given interval, with a linear transformation using the extrema of all the gray values in the plane in question. It makes the image independent of the D and T parameters of our linear model. The main drawback of this normalization process is that if it is applied globally on the whole of the image of each channel, it is sensitive to change in image content—as for instance when the two images have been taken from different viewpoints. As shown in Fig. 2, we propose a more local solution for

3

with K ˆ 0:04

Fig. 4 shows an example of Harris Color points. The three color channels are equally weighted in the matrix M because this matrix comes directly from the Di-Zenzo multispectral tensor, which allows the computing of exact differential first order forms of vectorial functions. Of course different weights and also different color spaces can be considered, but at this time, we have not yet explored this field of investigation. In recent works, we have defined two criteria for evaluating the Harris Color detector: repeatability and location. The measurements were taken on color images differing in 2D rotation, intensity and viewpoint. The results obtained were compared to a gray-value detector which has been well tested. They show that this color detector is more stable than other gray-value ones. 4. Matching When point characterization is achieved, the matching method consists in comparing each feature vector of the first image with its counterpart in the second image in

P. Montesinos et al. / Image and Vision Computing 18 (2000) 659–671

order to find the points which look the most similar. The most complete comparison method seems to be the Mahalanobis distance [4], which involves the covariance matrix of the vector components. This distance incorporates a model of noise, but it is very complex and difficult to implement accurately, because of the large number of different samples that its estimation requires on various images. At present we are working on its estimation, and in the meantime, we propose in the next section an intermediate solution for computing the likeness of two vectors. 4.1. A method for comparing feature vectors To compute the likeness of two vectors, we define a suboptimal method, according to the Mahalanobis distance. As each component of ~vcol presents its own range of values, the Euclidean distance cannot be used directly to do the comparison. Our solution is to first resize each vector component in a given interval, by using the extrema of this component on all the vectors computed in all the images. The normalized components thus obtained are then comparable to each other. And we compare the new feature vectors two by two by using the Euclidean distance. This model incorporates only magnitude differences of each vector component so does not take noise into account. But we compute only first order derivatives and Gaussian filtering minimizes possible noise in pixel intensity. So we assume that noise is reduced in the estimate of ~vcol ; and consequently this model seems to be sufficient. 4.2. Matching Using these distances, the most natural way of doing the matching is to select the pairs …m1i ; m2j † whose distance is lower than the one of all the possible pairs …m1i ; m2j 0 † and …m1i 0 ; m2j †: But this technique may eliminate some, or even a lot of, matches which could have been good. So we keep all the matches associated with small distances and we improve the matching using a relaxation technique, in order to eliminate the possible ambiguities. Our relaxation solution is an iterative algorithm which uses semi-local constraints, more precisely neighborhood constraints. A pair of given points is considered to be a good match if there are many good matches in its neighborhood. Some heuristics can be integrated into the computation, for example to compare the distances between the points at issue. See Ref. [4,5,16] for further details. In Section 5, we show that this matching method can be notably improved, in particular for large sets of points matching. 5. An incremental algorithm for constrained matching The main drawbacks of the comparison method described in Section 4 is first of all its complexity. If we do not have any information about the disparity between the two images

663

(for example when the cameras are not calibrated), the complexity of the comparison method is equal to O…m × n† for m points to calculate in the first image and n in the second one. Consequently, the second disadvantage is the number of ambiguities—which rises with the number of points to be matched. It makes the relaxation process slower and finally may generate more bad matches. To sum up, the matching process is efficient up to 200 or 300 points but becomes unworkable above that. It is for example necessary to match a large number of points in order to realize dense depth maps between two images. So our idea is to make the search area of the corresponding point in the other image as small as possible. If the disparity map between the two images is unknown, we must localize this area ourselves. Let us suppose that we have at our disposal a set M of good matches between the images. We show in the next section how this data can give us useful geometric information about the search area. 5.1. The available geometric information In this paper, we consider the camera model to be a Stenope model—modeling the projection as a pure perspective one. 5.1.1. Using the epipolar geometry If M contains at least eight matches, the epipolar geometry of the two-camera system can be estimated [2,5]. It is characterized by a fundamental matrix F…3×3† which satisfies mT2 Fm1 ˆ 0 for two matched points m1 and m2. This equation expresses the relationship that the point m2 in the second image corresponding to the point m1 in the first image lies on the line Fm1 ; equally that the point in the first image corresponding to m2 lies on F T m2 : This represents all the information that can be estimated from two images when the cameras are not calibrated. When F is estimated, it is very easy to see that the complexity of the matching process is much lower—the search area has become a line. In what follows, the fundamental matrix F computed from the M matches will be called FM . 5.1.2. Using a Delaunay triangulation Let us consider a 3D point P belonging to a triangle T, (p1,p2) its projections on the two images and (t1,t2) the projections of the triangle. It can be easily demonstrated that a triangle is transformed into a triangle by projective transformation, so t1 and t2 are triangles too. The point p1 is necessarily located in t1 and necessarily has its corresponding point p2 in t2. If P does not belong to T, the position of p2 relative to t2 is a function of the disparity. Practical experiments have shown that it is enough to consider only the triangles closest to t2. Indeed we have stated that disparity is inversely proportional to the number of points that can be matched—so proportional to the width of the associated triangles. This is the reason why the combination of triangle t2 and its closest neighbors represents an area which

664

P. Montesinos et al. / Image and Vision Computing 18 (2000) 659–671

Fig. 5. …a; a 0 †; …b; b 0 † and …c; c 0 † are correct matches. A point m of Image1 located in the triangle …a; b; c† has its corresponding point in Image2 on the epipolar line F·m and in the triangle …a 0 ; b 0 ; c 0 †: The point m 0 has its corresponding point in Image1 on F T ·m 0 and in …a; b; c†:

has every chance of containing p2. So we are able to define an area inside which the point corresponding to a given point can be found. We compute a triangulation on the matched points of the first image, which gives the triangulation of the second image: a triangle of the second image has its three vertices matched to points of the same triangle in the first image. For the first image, we opted for a Delaunay triangulation because it produces the most equiangular triangles possible [17]. The triangulation of the second image obtained from it is not necessarily a Delaunay triangulation. In the sections below, we call TM the two triangulations based on the matches M. By combining these two constraints, the search of the corresponding point is reduced to a segment—if the point in question belongs to a triangle of the triangulation, as shown in Fig. 5. If the point in question does not belong to any triangle, the search is reduced either to some number of segments or to the entire epipolar line. The geometric constraints just outlined allow us to reduce the search area of the point to be matched, even if we do not have any information about the disparity between the

Fig. 7. Lizard image (on the left) with three linear transformations of gray values. The graph shows the corresponding matching results.

images and whatever the images are. These constraints are going to be integrated into the incremental algorithm described in the next section. 5.2. Our incremental algorithm The constrained matching method presented above supposes that we have a set of matched points at our disposal to initiate the matching using the geometric constraints. So we define an incremental method which computes at iteration i the set of matches M i from the

Fig. 6. Matching results on Lizard image which has been submitted to six rotations in the image plane.

P. Montesinos et al. / Image and Vision Computing 18 (2000) 659–671

665

Fig. 8. Two room images differing in viewpoint and illumination: 155 matches were found using the basic matching technique. The superposed epipolar geometry was computed using all the matches.

geometric constraints associated with the matches M i⫺1 of iteration i ⫺ 1: The algorithm proceeds in six steps: (i) Extraction (adding) of points of interest in the images: sets Pi1 and Pi2 : (ii) Characterization of each point using the feature vector ~vcol (see Section 2.2 for the characterization method). (iii) For each point p1k 僆 Pi1 and p2l 僆 Pi2 ; estimation of the search area in the other image, using FMi⫺1 and TMi⫺1 if exist: A1k and A2l. (iv) Comparison of the feature vectors of p1k and p2l which satisfy: p1k 僆 A2l and p2l 僆 A1k : A set of matches Miⴱ with possibly reduced ambiguities is obtained. The vector comparison method has been described in Section 4.1. (v) Relaxation method on Miⴱ (see Section 4.2): a set of matches M i without ambiguities is obtained.

(vi) Computation of the geometric constraints: the triangulation TMi and the fundamental matrix FMi associated with the matches M i. (vii) While not enough matches, back to beginning. Because the M i matches are computed using those of iteration i ⫺ 1; it is very important to obtain the highest possible rate of good matches at each iteration. This condition is satisfied most of the time because the geometric constraints allow efficient elimination of many mismatches. Indeed, the epipolar geometry constraint allows to eliminate easily all the matches which are outliers. We know that some of them will remain—they are inliers, but experience has shown that the geometric constraint based on triangulation allows to go farther by eliminating more mismatches than do classical approaches of matching. It is interesting to

Fig. 9. The two same room images: 170 good matches were found using the incremental matching algorithm and the geometric constraints. The final epipolar geometry FM2 represented by some epipolar lines in blue on the two images is very precise. The epipolar lines correspond to matches {161,29,60,147,170}.

666

P. Montesinos et al. / Image and Vision Computing 18 (2000) 659–671

Fig. 10. Matching of (367,269) Harris color points with and without the incremental algorithm. The computation has been done on a Sun Ultra 5, 333 MHz, 256 Mo of memory. The rate of good matches of first iteration is only 62%, because we have kept only a small percentage (40%) of the inliers, to be sure of getting a set M 0 consisting of matches all exact. The reader can notice that the speed of relaxation decreases with the iterations; it is due to the fact that the search area is more and more reduced (there are more and more triangles), so the number of ambiguous matches decreases. Without the incremental method, it is the relaxation process which is time consuming (47 0 ), because of the many ambiguous matches.

notice too that the average width of the segments decreases as the number of iterations increases; so the matching process goes progressively faster with progressively fewer ambiguities.

The main difficulty remains the obtaining of a good estimate for the first set of matches M 0 —for which geometric constraints are not available. Our solution is to estimate FM0 at the end of the relaxation process and

Fig. 11. Results of matching M 2 on two views differing in viewpoint and illumination (natural change), realized on three iterations. The information in blue added to the images represent the geometric constraints computed at iteration 1 (second iteration) and used to do the matching at the last iteration. In the top line, the blue lines represent some epipolar lines of the epipolar geometry FM1 ; they correspond to matches {70,107,144,117,140}. In second line, the superposed information in blue represents the triangulation TM1 :

P. Montesinos et al. / Image and Vision Computing 18 (2000) 659–671

667

Fig. 12. Harris Color points to be matched on two images of a scene differing from viewpoint and illumination (natural change). 1170 points were found in the left image and 1035 in the right one.

to eliminate all the matches which are not consistent with FM0 : A few mismatches may remain, so we keep only a small percentage of the inliers (in practice approximately 40% of the matches), consisting of the ones with the best matching scores. Experience shows

that these represent a good basis to compute the M 1 matches of the first iteration, using the geometric constraints TM0 and FM0 now available (while FM0 can be re-estimated from the new set of memorized matches M 0).

668

P. Montesinos et al. / Image and Vision Computing 18 (2000) 659–671

Fig. 13. Matching result. The iterative matching process has produced 403 matches.

6. Matching results

6.1. Invariance to rotation

The points of interest to be matched Pi1 and Pi2 are computed using the Harris Color detector introduced in Section 3. We characterize them with the feature vector ~vcol of Eq. (3). The derivatives are computed using a subpixel Gaussian filtering [12].

Fig. 6 presents a color image which has been submitted to rotations in the image plane. The graph shows clearly that the characterization with ~vcol is robustly invariant to rotation. We are able to obtain matches which are 90% true whatever the rotation applied.

P. Montesinos et al. / Image and Vision Computing 18 (2000) 659–671

669

Fig. 14. Epipolar geometry estimated using the last set of matches obtained. This time, its estimation has been done using a non-linear approach to obtain an exact geometry. We have selected four corners of the roof of the house at the right of each image, the bottom of the tent at the top of the image, a corner of the panel down the image, and we have drawn the corresponding epipolar lines on the other one.

6.2. Invariance to change of intensities Fig. 7 shows three linear intensity transformations applied to a color image, which has been locally normalized (see

Section 2.3). So the characterization is invariant to rotation and change of intensities. The graph shows the importance of image normalization. Without it, only 10% of the matches found are exact whereas with it the rate jumps at 90%.

670

P. Montesinos et al. / Image and Vision Computing 18 (2000) 659–671

6.3. Matching under different viewpoints The matching is realized with and without the incremental method described in Section 5. Various algorithms exist to estimate the fundamental matrices FMi ; the Ransac algorithm [18] being one such. We estimate them using a robust linear method with the help of the Least Square Median regression (LMedS) [5,16]. Because we just want here to eliminate the mismatches which are not consistent with the estimated geometry, we do not need to estimate an exact geometry. Consequently our approach is linear, runs fast and gives excellent results far of the epipoles. The estimation process of the FMi is fast and reliable, since the considerable thresholding of the first iteration and the geometric constraints applied in the later ones produce few outliers. The triangulation of the first image used to computer TMi is a semi-dynamic Delaunay triangulation [17]: it is implemented in incremental fashion in order to improve its efficiency within our incremental matching process; at iteration i, the triangulation is computed by inserting the points Pi1 belonging to M i, in the triangulation TMi⫺1 computed during the previous iteration. Figs. 8 and 9 show the results of the matching process on …220 × 231…( points of interest. Fig. 8 brings to the fore matches (called M) computed only with the basic matching method described in Section 4 whereas Fig. 9 shows matches (called Minc) using the incremental method. These two results are very easy to compare: • Speed: half an hour was necessary to obtain the set M. Only three iterations and a few minutes were needed to obtain Minc. This is due to the fact that the complexity in the feature vector comparison algorithm is smaller, since the search area is limited. The other reason is that the rate of ambiguous matches is lower at the beginning of the constrained relaxation algorithm, which consequently runs faster. • Quality of results: we obtain better results with the incremental method (170 matches all exact) than with the basic one (155 matches with few of them bad). Let us see the results more precisely. The red epipolar line in Fig. 8 is associated to the match 142. This match is wrong and outlier then can be easily eliminated using the epipolar geometry. But we can observe that using the incremental process with geometric constraints (see Fig. 9), the same points are correctly matched (matches 29 and 60). This example demonstrates that the geometric constraints of the incremental process allow to match more points. Now let us see the green epipolar line in Fig. 8. It corresponds with the march 97, which is inlier but wrong ! The same points using the incremental process have been correctly matched (match 86 drawn in blue in Fig. 9). This example demonstrates that the geometric constraints of the incremental process allow to produce more good matches. Fig. 11 shows the last step of the incremental matching on

images differing in viewpoint and illumination. 367 and 269 Harris Color points were detected and 164 points were matched in three iterations. Only 175 points could have been matched—the points of view are very different. The rate of good matches is 93% and the final epipolar geometry obtained is very precise, in spite of the differences between the images. The points have been matched roughly in 100 s. The reader can see more precisely in Fig. 10 how the time is consumed for each iteration. For comparison, the basic algorithm has taken roughly 48 min to obtain a solution having a higher rate of false matches! Figs. 12–14 present matching results on two images of a scene differing in viewpoint and intensity. The matches have been obtained using the iterative method. There were 1170 Color Harris points found on the left image and 1035 on the right image. The iterative matching process has found 403 matches. The two last images show the final epipolar geometry estimated at the end of the matching. A few minutes were necessary to carry out this matching. The reader can easily predict how much time would have been necessary to try to match 1170 points with 1035 other ones, using the basic algorithm! At the end of the matching process, we have a set of matched points M. In addition we have got the epipolar geometry of the camera system characterized by FM and a triangulation TM of the matched points. These geometric constraints have been very useful in obtaining efficient matching but it is interesting to note that their utility does not end there; they prove useful allies in methods of dense matching, images transfer and 3D scene reconstruction.

7. Conclusion In this paper, we have defined a new method for matching stereoscopic, uncalibrated color images. First we have presented new differential invariants specific to color information. We have shown that adding these invariants to Hilbert’s invariants computed for each color channel gives sufficient information to consider only first order invariants. In addition, this description can be easily made invariant to affine transformations of intensities. Secondly, we have implemented an incremental technique for point matching, based on this differential characterization, which works robustly and rapidly. Indeed, results of the last section show the relevance of our approach; the percentage of correct matches is very high (approximately 95%) and the recovered epipolar geometry very accurate, in spite of the differences between the images (differences in points of view and illumination) and the large number of points to be matched. Using this matching scheme, we are able to match a large number of points rapidly and we are consequently able to implement efficient methods for dense depth maps computation. In parallel, we are already working on matching images involving changes of scale—when for instance the camera moves nearer the scene between images. A method

P. Montesinos et al. / Image and Vision Computing 18 (2000) 659–671

derived from the one described above is producing promising results. Acknowledgements This work was supported by a CCETT grant 96-ME-24. References [1] Z.D. Lan, R. Mohr, Robust matching by partial correlation, Technical Report RR-2643, INRIA, France, 1995. [2] Q.T. Luong, Matrice fondamentale et calibration visuelle sur l’environment, vers une plus grande autonomie des syste`mes robotiques, PhD Thesis, Universite´ de Paris-Sud centre d’Orsay, France, 1992. [3] P.P.N. Rao, D.H. Ballard, Object indexing using an iconic sparse distributed memory, Proceedings of the Fifth International Conference on Computer Vision, 1995, pp. 24–31. [4] C. Schmid, Appariement d’images par invariants locaux de niveaux de gris—application a` l’indexation d’une based d’objects, PhD Thesis, INPG, France, 1996. [5] Z. Zhang, R. Deriche, O. Faugeras, Q.T. Luong, A robust technique for matching two uncalibrated images through the recovery of the unknown epipolar geometry, Technical Report RR-2273, INRIA Sophia-Antipolis, France, 1994. [6] K. Nishihara, Prism: a practical real-time imaging stereo matcher, Proceedings of the Third International Conference on Robot Vision and Sensory Controls, 1993. [7] W.T. Freeman, E.H. Adelson, The design and use of steerable filters, IEEE Transactions on Pattern and Machines Intelligence 13 (9) (1991) 891–906.

671

[8] D. Hilbert, Theory of Algebraic Invariants, Cambridge Mathematica Library, Cambridge University Press, Cambridge, 1890. [9] J.J. Koenderink, A.J. Van Doorn, Representation of local geometry in the visual system, Biological Cybernetics 55 (1987) 367–375. [10] A.H. Salden, B.M. ter Haar Romeny, L.M.J. Florack, M.A. Viergever, J.J. Koenderink, A complete and irreductible set of local orthogonally invariant features of 2-dimensional images, Proceedings of the 11th International Conference on Pattern Recognition, 1992, pp. 180–184. [11] L.M.J. Florack, B. ter Haar Romeny, J.J. Koenderink, M.A. Viergever, General intensity transformations and differential invariants, Journal of Mathematical Imagining and Visions 4 (1994) 171–187. [12] P. Montesinos, S. Dattenny, Sub-pixel accuracy using recursive filtering, Proceedings of the 10th Scandinavian Conference on Image Analysis, vol. 1(10), 1997. [13] G.D. Finlayson, M.S. Drew, B. Funt, Color constancy: generalized diagonal transforms suffice, Journal of the Optical Society of America A 11 (11) (1994) 3011–3019. [14] C. Bauckhage, C. Schmid, Evaluation of keypoint detectors, Technical report, INRIA, 1996. [15] C. Harris, M. Stephens, A combined corner and edge detector, Proceedings of the Fourth Alvey Vision Conference, 1988, pp. 147–151. [16] S. Laveau, Ge´ome´trie d’un syste`me de N came´ras. The´orie, estimation et applications, PhD Thesis, Ecole Polytechnique, France, 1996. [17] J.D. Boissonnat, M. Teillaud, A hierarchical representation of objects: the delaunay tree, Second ACM Symposium on Computational Geometry in YorkTown Heights, 1986. [18] P.H.S. Torr, D.W. Murray, The development and comparison of robust methods for estimating the fundamental matrix, International Journal of Computer Vision 24 (3) (1997) 271–300.