slides in pdf (6.1MB)

Section 1: Basic Theory and Methods by Tat-Jun Chin

Outline

Applications Parameter estimation Robust criteria Random sampling and heuristics Multiple structures

Trivia

Trivia (cont.)

Figure from [16].

Computer vision

Computer vision (cont.)


Figure from Harley and Zisserman, Multiple View Geometry.


Second figure from Hannuna and Mirmehdi, Bristol University.

Outline


Parameter estimation We are given a set of noisy data points X in D-dimensional space, and we want to estimate the parameters θ of a model M that best fits the data. Simplest example is line fitting, where X = {xi , yi }N i=1 , and θ = [m c]T consists of the slope and intercept.

Ordinary least squares Standard approach to estimate θ from noisy data: θ ∗ = arg min θ

N X

ri (θ)2 ,

ri (θ) = mxi + c − yi ,

i=1

i.e., find the line that minimises the sum of squared residuals.

Assuming residuals are distributed normally, the OLS estimate is the maximum likelihood solution.

Ordinary least squares (cont.) In matrix form, θ ∗ = arg min kXθ − yk2 θ

where 

x1  .. X= . xN

 1 ..  , . 1



m θ= , c

The solution is θ ∗ = (XT X)−1 XT y. Extending to hyperplanes is trivial.

 y1   y =  ...  . yN

Implicit models In many cases the model is defined as an implicit condition, e.g., a fundamental matrix (˜ x0 )T F˜ x=0 ˜ = [x y 1]T and x ˜ 0 = [x0 y 0 1]T . where x Given data X = {(xi , yi ), (x0i , yi0 )}N i=1 we are interested in finding F∗ = arg min F

N X

ri (F)2 ,

i=1

where the user decides on the specific residual function, e.g., Algebraic error:

ri (F) = (˜ x0i )T F˜ xi

Symmetric transfer error:

˜ 0i ) ri (F) = d(˜ x0i , F˜ xi ) + d(˜ x i , FT x

Usually no analytic solutions exist.

Implicit models (cont.) To bring to bear the framework of OLS, we need to linearise and “dehomogenise” the implicit condition [9, 8]. First, multiply through the implicit condition 0 x x x0 y x0 y 0 x y 0 y y 0 x y 1 f = 0 where f = [F11 F12 ... F33 ] is a vector with the 9 elements of F. Choose one element of f to fix at 1, and designate the corresponding measurement as the “dependent” variable. e.g., fix F11 = 1, then arrange x01 y1  .. X= . x0N yN 

x01 .. . x0N

y10 x1 .. . 0 yN xN

y10 y1 .. . 0 yN yN

and solve θ ∗ = (XT X)−1 XT y.

y10 .. . 0 yN

x1 .. . xN

y1 .. . yN

     1 y1 F12  ..   ..  ..  , θ = , y =  .   .  . 1

F33

yN

Outline


The need for robustness Least squares has a breakdown point of 0 — the existence of a single outlier may bias the estimate arbitrarily.

The breakdown point of an estimator is the proportion of outlying observations it can handle before giving arbitrarily large results.

Robust criteria The least squares objective gives undue influence to the outlier residuals. The key to robustness is to limit their influence, e.g., I

Least absolute deviation ∗

θ = arg min θ

N X

|ri (θ)|

i=1

“More robust” than least squares, but still breaks down with significant outliers. May not be stable since the derivative at ri (θ) = 0 is unbounded.

Figure from [19].

Robust criteria (cont.) I

More general, non-convex, norms, e.g., Tukey’s biweight θ ∗ = arg min

N X

ρ(ri (θ)),

θ

i=1 ( 2 (σ /6)(1 − [1 − (r/σ)2 ]3 ) ρ(r) = (σ 2 /6)

Figure from [19].

if |r| ≤ σ if |r| > σ


Maximum consensus θ ∗ = arg max θ

N X

I(|ri (θ)| ≤ σ)

i=1

where I(·) is the indicator function that returns 1 if the input proposition is true and 0 otherwise.

Figure adapted from [7].


Least median squares (LMedS) θ ∗ = arg min med ri (θ)2 = arg min r[N/2] (θ)2 i

θ

θ

where r[k] (θ) is the k-th largest value among all ri (θ). LMedS has a breakdown point of 0.5. I

Least k-th order (LKO) θ ∗ = arg min r[k] (θ)2 θ

where k is a user-set parameter. LKO has a breakdown point of min(k/N, 1 − k/N ). I

Least trimmed squares (LTS) θ ∗ = arg min θ

k X i=1

r[k] (θ)2

Outline


Trivia Approx. 50 years before least squares, R. J. Boscovich was trying to determine the meridian arc near Rome from 5 measurements.

He solved for the arc using all 52 = 10 pairings of the data, but rejected results from 2 pairings which he thought was odd. The remaining estimates were then averaged for his final result [17].

The joy of sampling... Random sampling (RANSAC, PROGRESS, Monte Carlo, ...) is a very useful “general purpose” tool for robust estimation: 1. Sample a p-tuple, where p is the minimum number of points required to define θ e.g., p = 2 for lines, p = 8 for fundamental matrix. 2. Instantiate θ from p-tuple and evaluate robust criterion. 3. Repeat T times and output the best θ.

Figure adapted from [1].

The joy of sampling... (cont.) Has complexity O(T N ), and total run time depends on 1. Does a fast solver for the minimal cases exist? e.g., matrix inversion or SVD using the p-tuple. 2. How to determine T ? Usually, assume that the goal is to “hit” one p-tuple composed entirely of inliers. If η is proportion of inliers, then T =

log(1 − Y ) log(1 − η p )

to ensure with prob. Y that one clean p-tuple is retrieved. 3. Adjust T on-the-fly? In practice, true η is unknown. Revise T based on the number of inliers C of the best model found thus far. T =

log(1 − Y ) log(1 − (C/N )p )

Heuristics to speed up random sampling ⇒ T is very large if η is low and p is moderately high. Numerous methods have been proposed. I

Proximity sampling — assume inliers form a cluster (relative to outliers) and sample points that are close together [12, 10, 13].

Figure from [10].

Heuristics to speed up random sampling (cont.) I

Use prior inlier probabilities — in two-view geometry estimation, can use feature matching scores (e.g., SIFT, NCC) as proxies for the inlier probabilities [18, 3].

Figure from [18].

Heuristics to speed up random sampling (cont.) I

Quick “bail-out” from the verification of an unpromising p-tuple, e.g., compute residuals of only a subset of points [11], sequential probability ratio test [4].

Image credit to D. Nister.

I

Score p-tuples in a breadth-first manner [14, 15].

Image credit to D. Nister.

I

... many other heuristics. See talk by J. Matas in CVPR 2011 tutorial on “Tools and Methods for Image Registration”.

Avoiding degeneracies Iteration count T is predicted based on the probability of hitting a single clean p-tuple. Not all clean p-tuples are good...

Image from [7].

More generally, p-tuples with small “span” yield degenerate models.

Right image from [6].

Avoiding degeneracies (cont.) T is just the lower bound on the number of iterations required. I

Introduce an inner loop of random sampling on the inliers of the current model to refine the estimate [5].

I

Detect and recover from degenerate estimates, e.g., use plane-and-parallax method for fundamental matrices [6].

Outline


Multiple structures

What difference does it make? 1. The effective outlier rate faced by a particular structure is contributed by gross outliers and inliers of other structures.

Here, there are 113 outliers, and resp. 69, 68 and 29 inliers — the inlier ratio of the largest structure is only ≈ 25% — for p = 8, need to draw T ≈ 30, 000 p-tuples to have Y = 99% chance of hitting one clean p-tuple. 2. A clean p-tuple needs to contain inliers from the same structure — feature matches with high matching scores may come from different structures!

What difference does it make? (cont.)

New method: Multi-GS [2] Let {θ1 , . . . , θM } be a set of parameters fitted on p-tuples generated thus far. Let the list of residuals (arranged in a vector) of the i-th point be ri = [ri (θ1 ) ri (θ2 ) . . . ri (θM )]. For each i, sort ri and let the resulting index of sorting be ai = [ai1 ai2 . . . aiM ]. Given points i and j, we compute their preference correlation as fij =

|ai (1 : K) ∩ aj (1 : K)| K

i.e., the normalised intersection count among the top-K items.

New method: Multi-GS [2] (cont.) Inliers from the same structure tend to have higher preference correlation.

New method: Multi-GS [2] (cont.) We can draw p-tuples according to preference correlation, i.e., sample a point into a p-tuple based on its correlation with the points already in the p-tuple.

Comparisons

Bibliography [1]

A. Bab-Hadiashar and R. Hoseinnezhad. Bridging parameter and data spaces for fast robust estimation in computer vision. In DICTA, 2008.

[2]

T.-J. Chin, J. Yu, and D. Suter. Accelerated hypothesis generation for multi-structure data via preference analysis. IEEE Trans. on Patt. Analysis and Mach. Intell., 34(4):625–638, 2012.

[3]

O. Chum and J. Matas. Matching with PROSAC- progressive sample consensus. In Computer Vision and Pattern Recognition (CVPR), 2005.

[4]

O. Chum and J. Matas. Optimal randomized RANSAC. IEEE Trans. on Patt. Analysis and Mach. Intell., 30(8):1472–1482, 2008.

[5]

O. Chum, J. Matas, and J. Kittler. Locally optimized RANSAC. In Deutsche Arbeitsgemeinschaft f¨ ur Mustererkennung (DAGM), 2003.

[6]

O. Chum, T. Werner, and J. Matas. Two-view geometry estimation unaffected by a dominant plane. In Computer Vision and Pattern Recognition (CVPR), 2005.

[7]

R. Hartley and A. Zisserman. Multiple view geometry in computer vision. Cambridge University Press, 2 edition, 2004.

[8]

F. Kahl and R. Hartley. Multiple-view geometry under the l∞ -norm. IEEE TPAMI, 30(9):1603–1617, 2008.

[9]

F. Kahl and D. Henrion. Globally optimal estimates for geometric reconstruction problems. In ICCV, 2005.

Bibliography (cont.) [10] Y. Kanazawa and H. Kawakami. Detection of planar regions with uncalibrated stereo using distributions of feature points. In British Machine Vision Conference (BMVC), 2004. [11] J. Matas and O. Chum. Randomized RANSAC with td,d test. Image and Vision Computing, 2004. [12] D. R. Myatt, P. H. S. Torr, S. J. Nasuto, J. M. Bishop, and R. Craddock. NAPSAC: high noise, high dimensional robust estimation - it’s in the bag. In British Machine Vision Conference (BMVC), 2002. [13] K. Ni, H. Jin, and F. Dellaert. GroupSAC: Efficient consensus in the presence of groupings. In International Conference on Computer Vision (ICCV), 2009. [14] D. Nister. Preemptive RANSAC for live structure and motion estimation. In Int. Conf. on Computer Vision (ICCV), 2003. [15] R. Raguram, J.-M. Frahm, and M. Pollefeys. A comparative analysis of RANSAC techniques leading to adaptive real-time random sample consensus. In European Conference on Computer Vision (ECCV), 2008. [16] S. M. Stigler. Gauss and the invention of least squares. The Annals of Statistics, 9(3):465–474, 1981. [17] S. M. Stigler. The history of statistics: the measurement of uncertainty before 1900. The Belknap Press of Harvard University Press, 1986. [18] B. J. Tordoff and D. W. Murray. Guided-MLESAC: Faster image transform estimation by using matching priors. IEEE Trans. on Pattern Analysis and Machine Intelligence, 27(10):1523–1535, 2005.

Bibliography (cont.)

[19] Z. Zhang. Parameter estimation techniques: a tutorial with application to conic fitting. Image and Vision Computing, 15(1):59–76, 1997.