Solving for Relative Pose with a Partially Known ... - Washington

48 downloads 1205 Views 2MB Size Report
responding sensor measurements from an iPhone 4 with 6 degree of freedom .... European Conference on Computer Vision, pages 269–282. Springer, 2010.
Solving for Relative Pose with a Partially Known Rotation is a Quadratic Eigenvalue Problem Chris Sweeney University of California, Santa Barbara

John Flynn Google, Inc.

[email protected]

[email protected]

Matthew Turk University of California, Santa Barbara [email protected]

Abstract We propose a novel formulation of minimal case solutions for determining the relative pose of perspective and generalized cameras given a partially known rotation, namely, a known axis of rotation. An axis of rotation may be easily obtained by detecting vertical vanishing points with computer vision techniques, or with the aid of sensor measurements from a smartphone. Given a known axis of rotation, our algorithms solve for the angle of rotation around the known axis along with the unknown translation. We formulate these relative pose problems as Quadratic Eigenvalue Problems which are very simple to construct. We run several experiments on synthetic and real data to compare our methods to the current state-of-the-art algorithms. Our methods provide several advantages over alternatives methods, including efficiency and accuracy, particularly in the presence of image and sensor noise as is often the case for mobile devices.

(a) Relative pose

(b) Gen. relative pose

Figure 1: The minimal problems of determining the relative pose for two perspective cameras, and the relative pose for two generalized cameras are shown above. Generalized cameras contain image rays that do not require a common ray origin. Our algorithms utilize partial knowledge of the rotation to simplify the pose estimation and require only 3 and 4 points instead of 5 and 6 points respectively.

angles (all rotations may be represented by both parameterizations); however, an axis of rotation is more flexible to use because it allows for any type of partial rotation to be used as a prior instead of only pitch and roll rotations. A partial rotation in the form of a rotation axis can be accurately determined without knowing the full rotation through a variety of methods [3,5,9,12,14]. Vertical vanishing points can be detected using computer vision techniques [16] and aligned to the “up” direction (e.g., [0 1 0]> ). After aligning the “up” direction, only the rotation angle about the “up” direction and the translation remain unknown. In mobile robotics, imaging from a fixed tripod, or acquiring panoramas from cars (e.g., Google Street View), the cameras are oriented vertically and it is common to assume that motions are planar. In these cases, the “up” direction corresponds to the vertical image axis. Alternatively, modern

1. Introduction The problem of determining a camera’s position and orientation is a major area of focus in computer vision [7, 18, 19]. Robust solutions to pose problems are of particular interest to the SLAM and Structure from Motion communities where accuracy and efficiency are especially important. This paper focuses on solving pose problems when a partial rotation is known. Specifically, we focus on the case when the axis of rotation is known, which reduces the degrees of freedom in the pose problems by two. This leaves us to solve for four degrees of freedom: an unknown angle of rotation about the known rotation axis and the unknown translation. The angle-axis rotation parameterization is equivalent to other parameterizations such as Euler 1

smartphone devices are equipped with inertial measurement unit (IMU) sensors capable of providing partial rotation information with high accuracy (smartphone sensors have an operating noise of about 0.5 degrees while noise of high accuracy sensors is around 0.06 degrees [9]). It is not typically possible to recover a full rotation estimation from IMU sensors because the compass sensor is extremely unreliable [23], so only some of the sensor measurements may be used confidently. However, most phones provide a direct measurement of the gravity direction relative to the phone. This measurement can be used to align cameras with the “down” direction (similarly to the alignment of vanishing points to the “up” direction), and the gravity vector can thus be used as the known axis of rotation (c.f . Fig. 2). Gyroscope measurements can be used in a similar manner to recover a known partial rotation. The ease with which partial rotations can be extracted places an importance on efficient and accurate pose algorithms that utilize all sensor measurements, especially in the areas of computer vision, SLAM, and robotics. Methods that utilize partial knowledge of the rotation require fewer correspondences, resulting in greater efficiency within a RANSAC scheme [2]. Finding an outlier-free sample in RANSAC depends on the number of parameters needed to form a minimal sample and reducing the required number of correspondences effectively increases the convergence rate of RANSAC exponentially. For an inlier probability of 0.5, for instance, reducing the number of required correspondences from 5 to 3 provides a 2-4x speed increase for convergence. Much work has been done to increase the efficiency and quality of RANSAC-like methods [1, 28], and further speed increases can likely be observed with a more intelligent RANSAC scheme. Reducing the number of correspondences required is a key advantage of our approaches. In this paper we propose two new methods for estimating relative pose when an axis of rotation is known: a 3-point algorithm for determining the relative pose between two calibrated cameras and a 4-point algorithm for determining the relative pose between two generalized cameras, i.e., cameras that produce image rays that do not contain a central origin. We formulate the relative pose problems as Quadratic Eigenvalue Problems (QEPs) that lead to increased efficiency compared to state-of-the-art algorithms. Further, our formulations are simple and produce constraints that solve directly for relative rotation and translation instead of, for instance, essential matrices. Synthetic experiments show that our methods lead to increased robustness to image and IMU noise compared to alternative methods. Additionally, we demonstrate that our algorithms are well-suited for mobile devices such as smartphones by conducting a localization experiment with real data from an iPhone 4. We have provided efficient C++ implementations

Figure 2: Smartphone devices are commonly equipped with sensors that can detect the direction of gravity (green line). With the knowledge that gravity always corresponds to the down direction we can rotate each image so that the vertical direction of the image is aligned to the gravity vector. This way, the gravity direction may be used as the known rotation axis and all that remains is to solve for the angle of rotation around this axis and the unknown translation. of all proposed methods as open-source software within the Theia Multiview Geometry Library [26] 1 .

2. Related Work Our work builds on much previous work in the domain of pose estimation aided by sensor measurements. Recent Structure from Motion work focuses on calculating global rotations for each camera first, then solving for translation [22]. The robotics and computer vision communities have made extensive use of non-visual sensors to aid pose estimation and motion tracking [3, 6, 12, 15]. The proliferation of smart phone devices allows for convenient access to cameras equipped with IMUs. The limited computational power of those devices (and other mobile devices, such as micro air vehicles) stresses the importance of using as much sensory information as is available to limit the computational resources needed for visual tracking. When an axis of rotation is known, the number of correspondences needed to determine the essential matrix relating two cameras is reduced to 3 instead of the standard 5-point algorithm [18]. Our 3-point algorithm is closely related to that of Fraundorfer et al. [3] which creates a simplified essential matrix composed of three angle-axis rotation matrices that correspond to measurements for yaw, pitch, and roll. The simplified essential matrix is used to solve for the unknown yaw angle and translation (given relative pitch and roll from a gyroscope measurement) by considering the determinant and trace constraints, yielding 4 candidate solutions for the essential matrix (each of which has 4 possible rotation and translations). Our method directly solves for up to 4 relative rotations and translations and yields more stable results under the presence of image noise. More complex formulations also solve for the essential matrix given a 1 http:/cs.ucsb.edu/˜cmsweeney/theia

partially known rotation [5, 17] but they are slower and require constructing an action matrix that is 4-10 times larger than the matrices used in our method. Generalized cameras were first introduced by Grossberg and Nayar [4] and have since become a common camera model for multi-camera and panoramic-camera setups [6, 20]. In particular, generalized cameras have become the standard for multi-camera systems on moving vehicles [12]. Generalized camera models can also be used for localization tasks by using nearest neighbor correspondences from multiple posed images (thus providing multiple image rays with different origins) to determine the query image’s camera pose. Generalized cameras can produce highly stable motion estimates because of their potentially large visual coverage. Lee et al. [13] propose a solution to the 4-point relative pose problem for generalized cameras by using Euler angles to rewrite the generalized epipolar constraint [20] in terms of the one unknown Euler angle and the three unknown translation parameters. By using the hidden variable resultant technique, the yaw angle can be determined from the roots of an eight-degree univariate polynomial and the translation can be easily extracted. Lee et al. [13] also propose a linear 8-point algorithm; however, it is quite sensitive to image and IMU noise. Kukelova et al. [8] describe how to formulate various relative pose problems as Polynomial Eigenvalue Problems (PEPs), which are a generalization of QEPs. This method is later generalized to provide guidelines for transforming other problems into PEPs that can be solved efficiently [10]. These solutions, however, assume no partial information about the relative pose is available and do not apply to generalized camera models.

3. Estimating Relative Pose with a Known Rotation Axis In this section we formulate our solutions to the relative pose problem with a known rotation axis. If the transformation from the coordinate system of image 1 to the coordinate system of image 2 is given by a rotation R and a translation t, then the standard epipolar constraint relating a normalized point p1 in image 1 to the corresponding normalized image point p2 in image 2 is: p2 · (t × Rp1 ) = 0,

(1)

which can be rewritten as − (p2 × Rp1 ) · t = 0.

(2)

Following Stew´enius et al. [25], we use quaternions to represent the rotation. Quaternions are a 4-vector [v1 v2 v3 s]> , which can be written as [v > s]> . When an axis of rotation is known, we set v to the known axis (normalized to unit length), and a scaled version of the rotation

can be written as: R ∼ 2(vv > + s[v]× ) + (s2 − 1)I,

(3)

where [v]× is the skew-symmetric cross product matrix. Subsequent equations are homogeneous in R, so the norm of R does not affect the results of our algorithms. This parameterization of the rotation allows for very simple formulations for the 3 and 4-point relative pose problems.

3.1. A 3-Point algorithm for the essential matrix By substituting the parameterization of the rotation equation (3) into the epipolar constraint (2), we have − (p2 × (2(vv > + s[v]× ) + (s2 − 1)I)p1 ) · t = 0. (4) By expanding Eq. (4), and collecting coefficient vectors of s, the epipolar constraint can be rewritten as 0 =((p2 × p1 )s2 + (2p2 × v × p1 )s + 2(v · p1 )(p2 × v) − p2 × p1 ) · t.

(5)

Each correspondence gives a constraint in this form. We can stack the minimal three correspondences to obtain a 3 × 3 constraint matrix that is a function of s: (s2 M + sC + K) · t = 0.

(6)

This is a Quadratic Eigenvalue Problem, a well-studied problem in linear algebra [27]. The QEP of Eq. (6) can be solved by first converting it to a Generalized Eigenvalue Problem of the form     C K −M 0 z=s z, (7) −I 0 0 −I  > where z = st> t> is the eigenvector and s is the eigenvalue. This can be converted to a standard eigenvalue problem of the form Av = λv by multiplying the left-hand side of Eq. (7) with the inverse of the right-hand matrix. In our case, the inverse is particularly simple:  −M 0

0 −I

−1 =

 −M −1 0

 0 . −I

Inversion of a small matrix (3×3 in this case, or 4×4 for the 4-point algorithm described in Section 3.2) is generally very efficient, and we have not found it to be prohibitive in our implementation. The Generalized Eigenvalue Problem of Eq. (7) is thus reduced to a standard eigenvalue problem,   −M −1 C −M −1 K z = sz, I 0 which can be solved with standard methods. The solution to this produces 6 eigenvalues corresponding to solutions for

s, 4 of which are real. This can be verified using a symbolic mathematics software such as Maple. These solutions can be inserted into Eq. (3) to determine the rotation, and the corresponding eigenvector z contains the translation vector as the last three entries. The translation vectors are only known up to scale. From the candidate rotation and translations, the essential matrix can be recovered: E = [t]× R.

3.2. A 4-Point algorithm for the generalized essential matrix The generalized epipolar constraint between two images maps a set of rays in one coordinate system to a set of rays in a second coordinate system [20]. We represent image rays in Pl¨ucker coordinates: L=

  p , q

(8)

where p and q are 3-vectors formed from image ray origins and unit directions. For a thorough review of Pl¨ucker coordinates, we refer the reader to [20] or [21]. By using Pl¨ucker coordinates, we are able to derive a similar set of equations for the epipolar constraint as the 3-point algorithm. For two corresponding image points in Pl¨ucker coordinates L1 and L2 , the generalized epipolar constraint is: p2 · Rq1 + p1 · R−1 q2 − (p2 × Rp1 ) · t = 0,

(9)

which can be written in terms of the dot product of two 4vectors: (−p2 × Rp1 , p2 · Rq1 + p1 · R−1 q2 ) · tˆ = 0,

(10)

where tˆ is the 4-vector: [t> 1]> . In a similar method as the 3-point pose algorithm, R can be expanded into s and the known rotation axis v, and the constraint can be written as a quadratic of s. Using the minimal four correspondences, a 4 × 4 constraint matrix is created: (s2 M + sC + K) · tˆ = 0.

(11)

This QEP can be solved the same way as the 3-point algorithm. Unlike the QEP for the 3-point algorithm, scale can be recovered for the translation by dividing the extracted 4-vector by the homogeneous component w. 8 eigenvalues can be extracted, of which 6 are real, leading to 6 candidate solutions for the rotation and translation. The generalized essential matrix from Pless [20] is then: G=

 [t]× R R

 R . 0

(12)

3.3. An efficient solution to the QEP In order to produce non-trivial solutions, the constraint matrices of Eqs. (6) and (11) must be singular and thus the determinant must be zero: det(s2 M + sC + K) = 0.

(13)

Expanding this constraint for the 3-point algorithm gives a six-degree polynomial in s, which, when created with the coefficients from Eq. (5), is always divisible by s2 + 1, leading to a quartic polynomial that can be solved in closed form. For each real root found, the constraint matrix can be created according to Eq. (6) and the corresponding null vector t can be extracted using standard methods. For the 4-point algorithm, the determinant from Eq. (11) results in an eight-degree polynomial in s, the unknown rotation angle. Expanding the singularity constraint yields an equation that is divisible by s2 + 1, leading to a six-degree polynomial with a maximum of 6 solutions. This solution is similar to the method of Lee et al. [13] that also reduces the problem to an eight-degree polynomial. However, they parameterize the univariate polynomial as a function of the tangent of an unknown Euler angle which is not as robust to image and IMU noise as we will demonstrate in Section 4.

4. Experiments We conducted several experiments using synthetic and real data to measure the performance of our algorithms. We first tested the numerical stability of our solutions over 105 noise-free trials. In each trial, scenes were generated using 3D points randomly distributed in a 3D cube. The points were were projected by cameras with random feasible orientations and positions to establish correspondences. We measured the angular rotation error and the translation error for our 3 and 4-point algorithms as well as the 3-point algorithm of [3] and the 4-point algorithm of [13]. For all algorithms, the errors were less than machine precision, so the errors cannot be displayed properly here. These findings are consistent with [9] and [8]. In the following sections we test the performance of our algorithms compared to state-of-the-art methods under increasing levels of pixel noise and increasing levels of IMU noise, respectively. An analysis of degenerate scenarios follows. We then demonstrate the practical use of our algorithms on a real dataset composed of image and IMU measurements from an iPhone 4 with ground truth poses available for comparison. Finally, we give a runtime analysis of each of our algorithms. For all experiments, we compare our algorithms to the state-of-the-art 3-point algorithm by Fraundorfer et al. [3] and the 4-point algorithm by Lee et al. [13]. All algorithms were implemented in C++. We use our own implementations of [13] and [3] that were ported from Matlab implementations provided by the authors.

0.003 0 0

1 2 3 4 Image Noise Std Dev

5

0.15

Our 3pt. Fraun 3pt

0.1 0.05 0 0

1 2 3 4 Image Noise Std Dev

5

0.04 0.03

0.02

Our 4pt. Lee 4pt

Mean translation error

0.006

0.2

Mean rotation error (deg)

0.009

Mean translation error (deg)

Mean rotation error (deg)

0.012

Our 3pt. Fraun 3pt

0.02 0.01 0 0

1 2 3 4 Image Noise Std Dev

5

0.015

Our 4pt. Lee 4pt

0.01 0.005 0 0

1 2 3 4 Image Noise Std Dev

5

Figure 3: We measured the pose error for increasing amounts of pixel noise. Our algorithms are plotted in red, alternative algorithms are in blue. Our algorithms perform slightly better than alternative approaches for all levels of image noise.

4.1. Robustness to image noise The high numerical stability of the 3 and 4-point algorithms means that we should expect the performance of our minimal solvers under noise to be similar on average to the state-of-the-art algorithms. Indeed, we found this to be the case. However, the different rotation parameterizations (i.e., Euler angles instead of quaternions) and the different solution methods lead to slightly different distributions of errors. To best demonstrate how the distributions of errors affect the performance of our algorithms, we use each algorithm in a RANSAC [2] scheme. This is more indicative of the performance that can be expected when using the algorithms in real-world applications. To measure the accuracy and robustness of our methods, we performed an experiment on synthetic data with Gaussian pixel noise ranging from 0 to 5 pixels. We performed 1,000 trials (running RANSAC once per trial) at each level of pixel noise. For each trial, we generate relative pose with a rotation in the range of [-30, 30] degrees rotated on each axis, and a translation in a random direction with a baseline length of 1 unit. We generate 1000 random 3D points with a depth between 2 and 6 units, and reproject them into each camera to establish correspondences that are randomly sampled in RANSAC. For the 4-point algorithms, we use the same randomly feasible multi-camera setup consisting of 4 cameras for each generalized camera. The results are plotted in Figure 3. For the 3-point algorithms, we measure the translation error as the angular error because translation is only defined up to scale. All of our algorithms have performance that is slightly better than alternative algorithms for all levels of image noise. It should be noted, though, that the errors for all algorithms are small for all levels of image noise. Robustness to image noise indicates our algorithms are suitable for use on mobile devices, which often have low-resolution cameras with large pixel noise.

4.2. Robustness to IMU noise In real scenarios, estimating the rotation axis is an imperfect process. To measure the robustness of our algorithms to

errors in the rotation axis, we simulate noisy IMU measurements corresponding to three gyroscopes and determined an axis of rotation from two of those measurements. We increased the level of IMU noise within the range of expected operating noise of sensors found on common devices today (smartphone IMUs typically have noise of about 0.5 degrees). We use the same camera and point configuration as the image noise experiment and plot the mean errors from RANSAC over 1000 trials at each noise level. All trials were run with 0.5 pixel noise. As shown in Figure 4, our algorithms perform better than state-of-the-art algorithms for all levels of IMU noise. This is due in part to our chosen representation of the rotation. The known axis of rotation is generally not orthogonal to the noise in the rotation so the optimal solution angle about the rotation axis is able to minimize the effect of the noise on the solution rotation. The performance under IMU noise indicates that our algorithms are well-suited for mobile devices and other common cameras that are equipped with IMU sensors within the common range of operating noise (less than 1 degree).

4.3. Degenerate cases Our 3-point algorithm is able to handle some cases that are degenerate for the standard 5-point algorithm [18] such as coplanar points. Our 3-point algorithm can also handle colinear points similar to [3] unless the line is parallel to the camera translation. The 5-point algorithm is known to be unstable for forward motion, but our algorithm is able to handle this motion without issue. Additionally, we compute rotation and translation for the 3 and 4-point algorithms directly, and our formulation allows for more stable computation of translation when there is no rotation. While this is not a degenerate configuration, it is one that is often unstable for standard methods. The 4-point algorithm additionally suffers from two well-known degeneracies for generalized cameras. First, for generalized cameras with very close camera-to-camera distances (leading to near image ray origins), the 4-point algorithm is unstable. The second degenerate case occurs

0.1 0.05 0 0

0.2 0.4 0.6 0.8 IMU Noise Std Dev

1

0.3

Our 3pt. Fraun 3pt

0.2 0.1 0 0

0.2 0.4 0.6 0.8 IMU Noise Std Dev

1

0.25 0.2

0.016

Our 3pt. Lee 4pt

Mean translation error

0.15

0.4

Mean rotation error (deg)

0.2

Our 3pt. Fraun 3pt

Mean translation error (deg)

Mean rotation error (deg)

0.25

0.15 0.1 0.05 0 0

0.2 0.4 0.6 0.8 IMU Noise Std Dev

1

0.012

Our 3pt. Lee 4pt

0.008 0.004 0 0

0.2 0.4 0.6 0.8 IMU Noise Std Dev

1

Figure 4: Pose error as noise in the IMU measurement used to compute the axis of rotation is increased. Our algorithm performs better than alternative algorithms in all cases. This indicates that our algorithms are well-suited for smartphone devices which commonly contain sensors with less than 1 degree of IMU noise. when all cameras in a multi-camera setup lie along a line and all points are observed by the same camera in the first and second observations [25]. Then the solution set for the generalized essential matrix consists of all solutions along a particular 1D line in space; however, it is straightforward to detect this scenario [24, 25].

4.4. Real data with an Apple iPhone 4 We demonstrate that our algorithms are suitable for current mobile devices by conducting experiments with real image and sensory data from the Metaio Outdoor Dataset2 [11]. This dataset consists of nearly 800 images and corresponding sensor measurements from an iPhone 4 with 6 degree of freedom ground truth poses. Some example images from this dataset are shown in Figure 5. We obtain the gravity vector with measurements taken directly from the iPhone 4, and align that vector to the vector in world coordinates: [0 −1 0]> . After aligning each image such that the gravity direction corresponds to [0 −1 0]> , we use this vector as the axis of known rotation for our algorithms. SIFT features were extracted and matched between images to form feature correspondences. For the 3-point algorithm, we chose the image with the highest number of correspondences and used 2D-2D correspondences inside a RANSAC loop to determine the essential matrix. For the 4-point algorithm we treat the database images as a single generalized camera with ray origins and directions corresponding to the ground truth camera and feature positions. We create a second generalized camera from two images that are not in the database, and localize this camera with respect to the database generalized camera in a RANSAC loop. We only consider symmetric feature matches that pass the ratio test. Table 1 shows the results for 100 images localized. All algorithms were able to successfully localize all images. Our algorithms gave comparable or better accuracy to the state-of-the-art solutions, while performing more efficiently. The speedup of 1.2-2.1× is due to our faster pose solvers as well as the fewer number of candidate solutions. 2 http://www.metaio.com/research

Table 1: We performed localization using the 3 and 4-point algorithms with image and sensor data from an iPhone 4. The mean position error and median localization timings for 100 images is given. Our algorithms provide comparable or better accuracy compared to state-of-the-art methods while performing up to 2× faster. Solver 3-point QEP 3-point fast 3-point [3] 4-point QEP 4-point fast 4-point [13]

Position Error 0.56◦ 0.53◦ 0.59◦ 5.21 cm 5.29 cm 5.22 cm

Time (ms) 2.12 1.63 3.48 5.11 4.69 5.55

Table 2: Our methods produce fewer candidate solutions compared to alternative methods, resulting in better performance in RANSAC. We compare the number of total iterations required for RANSAC to obtain an all-inlier sample with 0.99 confidence when assuming an inlier ratio of 0.5. Solver 3-point (ours) 3-point [3] 4-point (ours) 4-point [13]

RANSAC Its. 34 34 71 71

# Soln 4 16 6 8

Total 126 544 426 568

4.5. Computational complexity Our algorithms are solved as QEPs, for which efficient solvers exist. Our quaternion representation avoids costly trigonometric functions unlike other approaches that rely on Euler angles. Over 105 trials, the 3-point algorithm of [3] had a mean execution time of 21.6 µs compared to 16.7 µs for our QEP method and 12.3 µs for our fast method described in Sec. 3.3. The 4-point algorithm of [13] required 27.5 µs compared to 26.2 µs for our QEP method and 23.7 µs for our fast method.

Figure 5: Example images from the Metaio Outdoor Localization dataset [11]. Images have varying lighting, position and rotation. Additionally, our algorithms produce fewer solutions than the alternative state-of-the-art algorithms. Fraundorfer et al.’s [3] method produces 4 solutions for essential matrices. Each of these essential matrices must be decomposed into 4 relative rotations and translations, yielding 16 possible relative poses in total compared to the 4 relative poses that our 3-point algorithm produces. Similarly, the method of [13] “gives up to 8 real solutions” while our method only produces up to 6 real solutions. The reduced number of possible solutions helps limit the time required for RANSAC operations (c.f . Table 2) which makes the speed gains in practical applications noteworthy.

5. Conclusion In this paper, we have presented two new algorithms for determining the relative pose of a camera with a known rotation axis: a 3-point algorithm for central cameras, and a 4-point algorithm for generalized cameras. With more devices being equipped with IMUs and gravity sensors, these algorithms are of increasing relevance. By formulating rotations as quaternions, the pose constraints lead to simple and extremely efficient algorithms. We solve the relative pose problems as QEPs that are more efficient than alternative algorithms. Additionally, our QEP methods slightly more robust to image noise and produce high accuracy results in the presence of IMU noise. These algorithms are particularly suited for use in RANSAC schemes, as the fewer parameters leads to fewer samples needed for convergence. Given the accuracy, simplicity, and efficiency of these methods, our methods are extremely well-suited and practical for use on mobile devices.

Acknowledgements We would like to thank Petri Tanskanen and Gim Hee Lee for providing source code for alternative methods as well as Henrik Stew´enius) for his insightful feedback. This work was supported in part by NSF Grant IIS-1219261 and NSF Graduate Research Fellowship Grant DGE-1144085.

References [1] O. Chum and J. Matas. Matching with PROSAC-progressive sample consensus. In Proc. of IEEE Conf. on Computer Vision and Pattern Recognition, 2005. 2 [2] M. A. Fischler and R. C. Bolles. Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Communications of the ACM, 24(6):381–395, 1981. 2, 5 [3] F. Fraundorfer, P. Tanskanen, and M. Pollefeys. A minimal case solution to the calibrated relative pose problem for the case of two known orientation angles. In Proc. of the European Conference on Computer Vision, pages 269–282. Springer, 2010. 1, 2, 4, 5, 6, 7 [4] M. D. Grossberg and S. K. Nayar. A general imaging model and a method for finding its parameters. In Proc. of IEEE Intn’l. Conf. on Computer Vision, 2001. 3 [5] M. Kalantari, A. Hashemi, F. Jung, and J.-P. Guedon. A new solution to the relative orientation problem using only 3 points and the vertical direction. Journal of Mathematical Imaging and Vision, 39(3):259–268, 2011. 1, 3 [6] J.-H. Kim, H. Li, and R. Hartley. Motion estimation for nonoverlapping multicamera rigs: Linear algebraic and l geometric solutions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(6):1044–1059, 2010. 2, 3 [7] L. Kneip, D. Scaramuzza, and R. Siegwart. A novel parametrization of the perspective-three-point problem for a direct computation of absolute camera position and orientation. In Proc. of IEEE Conf. on Computer Vision and Pattern Recognition, 2011. 1 [8] Z. Kukelova, M. Bujnak, and T. Pajdla. Polynomial eigenvalue solutions to the 5-pt and 6-pt relative pose problems. In Proceedings of the British Machine Conference, pages 56–1, 2008. 3, 4 [9] Z. Kukelova, M. Bujnak, and T. Pajdla. Closed-form solutions to minimal absolute pose problems with known vertical direction. In Proc. of Asian Conference on Computer Vision, pages 216–229. Springer, 2011. 1, 2, 4 [10] Z. Kukelova, M. Bujnak, and T. Pajdla. Polynomial eigenvalue solutions to minimal problems in computer vision. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(7):1381–1393, 2012. 3 [11] D. Kurz, P. G. Meier, A. Plopski, and G. Klinker. An Outdoor Ground Truth Evaluation Dataset for Sensor-Aided Vi-

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20] [21]

[22]

[23]

[24]

[25]

[26] [27]

sual Handheld Camera Localization. In Proc. of the Intn’l. Symposium on Mixed and Augmented Reality, 2013. 6, 7 G. H. Lee, F. Fraundorfer, M. Pollefeys, P. Furgale, U. Schwesinger, M. Rufli, W. Derendarz, H. Grimmett, P. Muhlfellner, S. Wonneberger, et al. Motion Estimation for Self-Driving Cars With a Generalized Camera. In Proc. of IEEE Conf. on Computer Vision and Pattern Recognition, 2013. 1, 2, 3 G. H. Lee, M. Pollefeys, and F. Fraundorfer. Relative pose estimation for a multi-camera system with known vertical direction. In Proc. of IEEE Conf. on Computer Vision and Pattern Recognition, 2014. 3, 4, 6, 7 B. Li, L. Heng, G. H. Lee, and M. Pollefeys. A 4-point Algorithm for Relative Pose Estimation of a Calibrated Camera with a Known Relative Rotation Angle. In Proc. of IEEE/RSJ Intn’l. Conf. on Intelligent Robots and Systems, 2013. 1 L. Meier, P. Tanskanen, L. Heng, G. H. Lee, F. Fraundorfer, and M. Pollefeys. PIXHAWK: A micro aerial vehicle design for autonomous flight using onboard computer vision. Autonomous Robots, 33(1-2):21–39, 2012. 2 B. Micusik and H. Wildenauer. Minimal Solution for Uncalibrated Absolute Pose Problem with a Known Vanishing Point. In Proc. of IEEE Intn’l. Conf. on 3DTV, 2013. 1 O. Naroditsky, X. S. Zhou, J. Gallier, S. I. Roumeliotis, and K. Daniilidis. Two efficient solutions for visual odometry using directional correspondence. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(4):818–824, 2012. 3 D. Nist´er. An efficient solution to the five-point relative pose problem. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(6):756–770, 2004. 1, 2, 5 D. Nist´er and H. Stew´enius. A minimal solution to the generalised 3-point pose problem. Journal of Mathematical Imaging and Vision, 27(1):67–79, 2007. 1 R. Pless. Using many cameras as one. In Proc. of IEEE Conf. on Computer Vision and Pattern Recognition, 2003. 3, 4 J. Pl`ucker. On a new geometry of space. Philosophical Transactions of the Royal Society of London, 155:725–791, 1865. 4 S. N. Sinha, D. Steedly, and R. Szeliski. A multi-stage linear approach to structure from motion. In Trends and Topics in Computer Vision, pages 267–281. Springer, 2012. 2 B. Steder, G. Grisetti, S. Grzonka, C. Stachniss, A. Rottmann, and W. Burgard. Learning maps in 3d using attitude and noisy vision sensors. In Proc. of IEEE/RSJ Intn’l Conference on Intelligent Robots and Systems, pages 644– 649, 2007. 2 ˚ om. Structure and motion problems H. Stew´enius and K. Astr¨ for multiple rigidly moving cameras. In Proc. of the European Conference on Computer Vision, page 238, May 2004. 6 ˚ om. SoH. Stew´enius, D. Nist´er, M. Oskarsson, and K. Astr¨ lutions to minimal generalized relative pose problems. In Workshop on Omnidirectional Vision, 2005. 3, 6 C. Sweeney. Theia Multiview Geometry Library: Tutorial & Reference. University of California Santa Barbara. 2 F. Tisseur and K. Meerbergen. The quadratic eigenvalue problem. SIAM review, 43(2):235–286, 2001. 3

[28] P. H. Torr and A. Zisserman. MLESAC: A new robust estimator with application to estimating image geometry. Computer Vision and Image Understanding, 78(1):138–156, 2000. 2