Extensions of Plane-Based Calibration to the Case of ... - CiteSeerX

322

IEEE TRANSACTIONS ON ROBOTICS, VOL. 22, NO. 2, APRIL 2006

Extensions of Plane-Based Calibration to the Case of Translational Motion in a Robot Vision Setting Henrik Malm and Anders Heyden, Member, IEEE

Abstract—In this paper, a technique for calibrating a camera using a planar calibration object with known metric structure, when the camera (or the calibration plane) undergoes pure translational motion, is presented. The study is an extension of the standard formulation of plane-based camera calibration where the translational case is considered as degenerate. We derive a flexible and straightforward way of using different amounts of knowledge of the translational motion for the calibration task. The theory is mainly applicable in a robot vision setting, and the calculation of the hand–eye orientation and the special case of stereo head calibration are also being addressed. Results of experiments on both computer-generated and real image data are presented. The paper covers the most useful instances of applying the technique to a real system and discusses the degenerate cases that needs to be considered. The paper also presents a method for calculating the infinite homography between the two image planes in a stereo head, using the homographies estimated between the calibration plane and the image planes. Its possible usage and usefulness for simultaneous calibration of the two cameras in the stereo head are discussed and illustrated using experiments. Index Terms—Infinite homography estimation, intrinsic camera calibration, plane-based calibration, robotic vision, stereo head calibration, translational motion.

Fig. 1. Robot with a hand–eye configuration.

I. INTRODUCTION

C

AMERA calibration is a fundamental task in the field of computer vision. A vast amount of the vision algorithms in practical use assumes precalibrated cameras. In this paper, we will primarily focus on the calibration of robot-mounted cameras and especially on hand–eye configurations, as shown in Figs. 1 and 3. Apart from the single camera case, we will also consider rigid stereo head configurations (Fig. 2). The main problem addressed in this paper is intrinsic camera calibration, i.e., estimating the so-called intrinsic parameters of the camera model such as the focal length and the principal point (the intersection point of the focal axis and the image plane). However, we will also comment on the calculation of the Manuscript received February 19, 2004; revised March 13, 2005. This paper was recommended for publication by Associate Editor H. Zhuang and Editor F. Park upon evaluation of the reviewers’ comments. This work was supported in part by the Swedish Research Council for Engineering Sciences (TFR) under Project 95-64-222 and by in part the Royal Swedish Academy of Sciences, Claes Adelskölds medalj och minne. This paper was presented in part at the IEEE Conference on Computer Vision and Pattern Recognition, Kauai, HI, December 2001, in part at the Workshop on Machine Vision Applications, Nara, Japan, December 2002, and in part at the International Conference on Intelligent Robots and Systems, Las Vegas, NV, October 2003. H. Malm is with the Department of Cell and Organism Biology, Lund University, S-221 00 Lund, Sweden (e-mail: [email protected]). A. Heyden is with the Applied Mathematics Group, School of Technology and Society, Malmö University, S-205 06 Malmö, Sweden (e-mail: [email protected]). Digital Object Identifier 10.1109/TRO.2005.862477

Fig. 2. Stereo head configuration.

Fig. 3. Relation between the robot hand coordinate system and the camera coordinate system.

extrinsic configuration of the camera(s) such as the orientation of the camera(s) in relation to the robot hand, i.e., the hand–eye orientation, and the relative positions of the cameras in a stereo head.

1552-3098/$20.00 © 2006 IEEE

MALM AND HEYDEN: EXTENSIONS OF PLANE-BASED CALIBRATION

The traditional methods for camera calibration are based on using a known reference object or calibration object. Features, such as corners on the calibration object, are then matched and tracked between the images in the obtained image sequences. Numerous techniques for camera calibration using a three-dimensional (3-D) calibration object have been developed, see, e.g., the book by Faugeras [1] or any other general book on computer vision. Zhang [2] and Sturm and Maybank [3] have independently introduced practically identical theories on calibration using a planar calibration object. The use of a 2-D calibration object greatly simplifies its construction, and high precision of the object points is easy to achieve. The current paper extends the theory of these two papers by deriving the special constraints that appear when we are using pure translational motions and when we are considering a stereo head configuration. These two cases are mainly of interest in a robot vision setting. Calibration from a plane using translational motions has earlier been addressed by Tsai [4], who presented a rigorous calibration theory with extensive accuracy evaluations. However, the theory presented in the current paper is considerably simpler and more direct than the work of Tsai. The motivation for the current study has been to extend the theory of [2], [5], and [3] to find simple flexible relationships that fit nicely with the framework presented in those papers. When discussing camera calibration, one should also mention the multitude of so-called self-calibration algorithms that have been presented during the last decade [6]–[10], [11]. These methods do not use a calibration object and only rely on the rigidity of the 3-D scene and on different assumptions on the intrinsic parameters, such as fixed parameters throughout the image sequence. In [12], a self-calibration algorithm is presented that, in part similar to this paper, uses knowledge of the translation direction of the camera which leads to a linear least squares problem. Methods for self-calibration from images of planar objects has also been proposed [13]–[15]. These methods use constraints involving the homographies between the image planes to solve the nonlinear problem. However, in applications requiring high-precision measurements in performing their tasks, e.g., in most industrial robot vision systems, the use of a carefully constructed calibration grid is still the most reliable. Concerning the calculation of the orientation and position of the camera in relation to the robot hand, i.e., hand–eye calibration, the traditional methods rely on first moving the robot hand, and, consequently, the camera, to different locations and orientations (different poses) and calculating the relative poses of the camera in relation to a calibration object using a camera calibration method. Two early papers that introduced the basic constraints of the hand–eye calibration problem are the works by Tsai and Lenz [16] and Shiu and Ahmad [17]. The task here, in essence, boils down to solving a matrix equation of the form . Further developments in the same tradition includes [18]–[22]. Methods that do not rely on the use of a calibration object have also been proposed. Examples of this are the works by Andreff et al. [23], Ma [24], and Malm and Heyden [25]. In the case of stereo head calibration, recent research has been focused mainly on self-calibration in this setting. Several au-

323

thors have proposed new methods and examined the geometrical relations [26]–[31]. In [32], Demirdjian et al. present a method for self-calibration of a stereo head using a planar scene. The paper describes a stratified approach to calibration, without the use of metric information, which involves quadratic constraints in its different stages. The work presented in this paper in contrast uses a calibration plane with known metric structure and concerns the relations that appear in this case. As mentioned above, the basic problem discussed in this paper is the case of calibration from a planar calibration object using translational motions. This special case of calibration could, for example, be of interest for simple, effective, and unsupervised recalibration of an industrial robot vision system, especially when the number of degrees of freedom (DOFs) of the robot is severely limited. As will be shown in theoretical development and in the experimental evaluation, the use of the derived calibration constraints seems especially reliable and effective in the special case of stereo head calibration. A discussion on the estimation of the infinite homography and its significance in stereo head calibration is also included in this paper. The study presented here is a collection and extension of results previously presented in [33]–[35]. We begin our presentation of the calibration theory in Section III by presenting the camera model in use and by reviewing the results of Zhang [2] and Sturm and Maybank [3], which constitutes the foundation of the developments presented in the subsequent sections. In Section II, the basic constraints for plane-based calibration using translational motions are derived together with a presentation of the degenerate cases for some interesting applications of the constraints. It is also shown what additional calculations and motions of the robot hand are needed to obtain the hand–eye orientation in a robot vision setting. Then, in Section IV, we turn to stereo head calibration and explore the relations that arise in this setting, both for the case of pure translational motions and general motions, including the estimation of the infinite homography. In Section V, results of experiments on different realizations of the theory are presented, both for computer-generated and real image data. Section VI then presents some concluding remarks and ideas for future work. II. PRELIMINARIES A. Camera Model Throughout this paper, the coordinates of a point in the image and the coplane are represented by small letters . ordinates in the 3-D world by capital letters The pinhole camera model is used as our projection model. This means that the projection is governed by the following equation, where the coordinates are expressed in homogeneous form:

(1)

Here, denotes the focal length, and are the aspect ratio is the principal point. and the skew, respectively, and These are the intrinsic parameters. The right triangular ma-

324


trix that contains the intrinsic parameters in (1) is often called the calibration matrix and will, in the forthcoming sections, be denoted by . Furthermore, and denote the relation between the world coordinate system and the camera coordinate system, where denotes a rotation matrix and a translation vector, i.e., an Euclidean transformation. For readability, a bar separates the leftmost 3 3 matrix from the rightmost 3-vector, in the 3 4 transformation matrix in (1). The matrix and the vector are sometimes called the extrinsic parameters of the camera. Note that, when formulating the projection as in (1), denotes the translation from the focal point of the camera to the origin of the world coordinate system in the coordinate frame of the camera. In the case of a robot-mounted camera, the problem of finding the extrinsic parameters can formulated as finding the so-called of the robot-camera hand–eye transformation is the rigid Euclidean system. The hand–eye transformation transformation between the robot hand coordinate system and the camera coordinate system (Fig. 3). The task of finding this transformation is usually called hand–eye calibration. In the general case, the transformation has six DOFs, three for the position, defined by the 3-vector , and three for the orientation, . defined by the rotation matrix

. Since and are orNow we introduce thonormal vectors, the following constraints, involving , , and , can be derived from (5) and (6):

B. Basic Plane-Based Calibration Theory

In this section, we will examine the problem of calibration from images of a planar object when the relative orientation between the object and the camera does not change, i.e., the motion of the camera between successive images is a pure translation. This case is degenerate in the standard formulation from the previous section. This is evident from the fact that the column vectors and only depend on and on and , respectively. Consequently, the constraints (10) and (11) remain unchanged if the orientation does not change. However, by using some knowledge about the translational motion, new constraints can be formulated. Zhang briefly addresses this possibility in [38, App. D]. This special case of plane-based calibration is interesting for robotic systems with a few DOFs. We start by examining the number of constraints available for the translational calibration problem to see what can be calculated assuming different amounts of knowledge. Assume that two images are obtained by the camera and that it has performed a pure translational motion between the imaging instants. From each of the two images, a homography from the calibration plane to the image plane is estimated. Denote these homograand , respectively. Since the matrix built up by phies by the first two columns of is proportional to the matrix built up [see (3)], and, since the homoby the first two columns of graphies are known only up to scale, we get 11 distinct elements from the two homographies. There are 6 DOF for the pose of conthe camera in the first image. Thus, there are straints left for the intrinsic parameters and the translational motion. That is, to calculate all five intrinsic parameters, we need to know the translation completely. If, for example, the skew is , the camera can be calibrated even assumed to be zero, if the length of the translation is unknown. If, additionally, it is assumed that the aspect ratio is equal to 1, then the camera can be calibrated when the direction of the translation is unknown but the length is known.

We will here review the basic theory of plane-based calibration as presented by Zhang [5] and Sturm and Maybank [3]. For the following derivation, we choose the orientation and origin of the world coordinate system so that the plane of the calibration . We then obtain object is

(2)

where is the th column of . In this way, an object point is related to the corresponding image point by a homography , i.e., a 2-D projective transformation [36], [37],

(3) This homography between the object plane and the image plane can be easily estimated for each image [38], [37]. Let be the th column in . Then (4) (5) (6)

(7) (8) (9) From these equations, the unknown scale factor inated

can be elim-

(10) (11) The matrix actually represents the image of the object from projective geometry called the absolute conic [37], and we now have two linear constraints on this symmetric matrix from each image of the plane. By using three different views of the plane, there are enough constraints to solve for . The intrinsic camera matrix can then be obtained by Cholesky factorization [39] and matrix inversion. III. PLANE-BASED CALIBRATION USING TRANSLATIONS


325

The constraints for the translational case will now be formulated. If is expressed as in (3), the estimated homography after a translation can be written as (12) (13) where is the translation vector expressed in the coordinate system of the calibration object, i.e., in the world coordinate system. When estimating the homographies, the knowledge of the special structure of these homographies should be used in order to get more robust estimations. When using translational motions, the images are obtained with the same orientation of the camera, and the first two columns in the associated homographies should be parallel. Since the scalings of the homographies are arbitrary, the homographies for all images with the same orientation can be estimated simultaneously, so that the first two columns are the same for every estimated homography. That is, . This drastically the scale factors are chosen to be equal, reduces the number of parameters to be estimated and makes the preceding calculations more accurate. In the continuation, there will thus only be one scale factor present. and as Consider the third columns in the matrices follows:

Similarly, the scalar product of

and

yields (20)

It remains to consider the scalar product of the norm condition

with itself, i.e.,

implying that (21) When applying the constraints derived in this section, it is convenient to separate the length and the direction of the translation in the notation. To this end, let represent the normalized version of the translation vector . Further, reduction of the number of constraints by one and elimination of the scale factor by fusing of the equations in the same manner that (7) and (8) were reduced to (11) will, in this case, lead to quadratic constraints. , the complete set of constraints Instead, by letting arising from two images of a plane when the camera undergoes are a translational motion

(14) (22) (15)

(23) (24)

Let

(25) (16)

(26) (27)

Then, using (4), we obtain (17) and consequently (18) and , In search for new calibration constraints, containing scalar products which includes are written down. Taking the scalar product of the orthonormal vectors and and using (7) and (9) gives

After solving the system (22)–(27) in the unknowns of , and , a Cholesky factorization of is performed. Inverting and scaling the resulting matrix gives us the intrinsic calibration matrix . The scale factor is easily found since has an element equal to 1 in the bottom right position. is an unknown in the calculations, the Note that, if problem becomes quadratic. However, by letting , and , in this case, the equations become and . Once these new unknowns are estimated, linear in can be recovered easily. Since the two solutions for should be nonnegative, the valid solution is found directly. If is a known quantity, the equation system becomes linear in the unknowns. A. Interesting Cases and Their Singularities

This implies that (19)

Some interesting instances of applying (22)–(27) will be discussed here. For successful application of the calibration constraints, one needs to be aware of the singularities of the problem at hand. This helps avoiding unsolvable situations where the result is expected to be unreliable. We will assume that two or more images are used for the calibration with pure translational motion of the camera between

326

each imaging instance. Using only one image of the calibration plane, the aspect ratio and the focal length can be calculated using merely equations (10) and (11), if zero skew and known principal point is assumed. For the singularities appearing in this case, see [3]. Three different cases of calibration using translational motions will be considered. Case 1) The length of the translational motion is known, but the direction is unknown. The aspect ratio and the skew is assumed to be 1 and 0, respectively. This case can e.g., be implemented by placing the calibration plane straight ahead of the camera (nonparallel to the image plane, see below) and translating the camera straight forward toward the plane. This setup makes it easy to obtain a large number of images with good views of the plane. Case 2) The direction of is known, but the length is unknown. The skew is assumed to be 0. This case can be put in to practice by e.g., placing the calibration plane on the floor and translating the camera orthogonally toward the floor, i.e., in the known direction . However, nontrivial degenerate configurations reduces its applicability, see below. Case 3) Both the length and the direction of is known. All intrinsic parameters are calculated. This case actually amounts to using a full 3-D calibration object, i.e., to perform classical calibration. We will first comment on the choice of translation direction. There is only one class of translations that give rise to singularities and that is translation parallel to the calibration plane. Intuitively, this type of motion only adds more feature points to the same plane in space. No information is given in the direction orthogonal to the plane. However, every other choice of translation works well. Thus, translation along the focal axis of the camera or translation orthogonal to the calibration plane are reasonable choices of translation direction. In addition to the singularities mentioned in each case dealt with below, the degenerate situation of translation parallel to the image plane should also be considered in each case. All degenerate configurations mentioned below results in linearly dependent equations in the system (22)–(27). 1) Case 1: In the case of known length of the translation, , but unknown direction, the only degenerate i.e., known case is when the plane is parallel to the to image plane of the camera. That is, a simple tilting of the camera in relation to the calibration plane suffice to make the calibration feasible. 2) Case 2: In this case, where the direction of the translation is known but the length is unknown, the calibration problem becomes singular when either the or the axis of the image plane is parallel to the calibration plane. This case needs some caution, since this degenerate configuration is a quite natural setup. For example, rotating the image plane first around the axis of the calibration plane and then around the axis gives rise to a degenerate case since the axis of the image plane is still kept parallel to the calibration plane. However, rotating the image plane first around the axis and then around the axis of the calibration plane gives a solvable calibration problem.


3) Case 3: There are no degenerate configurations in this case where the translation is completely known, i.e., the image plane could even be parallel to the calibration plane. The only thing to be aware of is that the translation direction should not be parallel to the calibration plane as mentioned above. B. Calculating the Hand–Eye Orientation Considering a robot vision system with a hand–eye configuration, it will now be discussed what is needed to extend the calculations to include computation of the rotational part of the hand–eye transformation. For calculation of the translational part, rotational motions need to be used [16]. Since this section concerns calibration from translational motions, we will focus . on estimating the hand–eye orientation The calculation of is very direct and can be based on a minimum number of two translations. Consider two translations and in the coordinate system of the calibration object. From the second translation , a new set of equations of the form is the analog of for the second (22)–(27) is obtained. If translation, one of the three new constraints is (28) To calculate the hand–eye orientation, it is assumed that the translation vectors are known in the robot hand coordinate system. Since the lengths of these vectors are equal to the lengths of the corresponding translation vectors in the coordiand are known nate system of the calibration object, quantities. In this way, we have five constraints on and the , is assumed. camera can be calibrated if zero skew, For the estimation of , we want to find the representation , in the camera coordinate system, of the corresponding translation , in the robot hand coordinate system, which in turn corresponds to the translation in the coordinate system of the calibration plane. The homography after the translation can be represented as (29) Using the vector

introduced earlier (16), we obtain (30)

For the calculation of and

, the length of

is not of importance (31)

The hand–eye orientation is calculated by relating the translation vectors in the camera coordinate system to the corresponding translation vectors in the robot hand coordinate and corresponding to and system. After calculating , respectively, a third corresponding pair of translations can and . After be constructed using normalizing the translation vectors, can be calculated using (32) will not be exactly When using (32) numerically, the matrix orthogonal. To find the nearest orthogonal matrix in the Frobe-


327

nius norm a singular value decomposition can be performed on . The orthogonal matrix is then chosen as the rotation matrix.

from relation (4). The following formulas are obtained for the columns in and for : (48) (49) (50) (51)

IV. PLANE-BASED STEREO HEAD CALIBRATION In this section, the relations that arise when calibrating a stereo head configuration using a planar calibration object will be studied. The discussion is divided into two subsections. First, in Section IV-A, the theory on using pure translational motions from Section III will be applied to the stereo case. Second, in Section IV-B, general motions of the stereo head will be considered and relations that can be derived in this case will be discussed. Throughout this section, entities corresponding to the right camera in the stereo head will be denoted by an and entities corresponding to the left camera will be denoted by an . A. Stereo Head Calibration Using Translations One interesting application of the theory presented in Section III is the calibration of a stereo head using pure translational motions. If the stereo head is translated, the translation vector [see (12)] will be the same for the two cameras in the coordinate system of the calibration object. This can be used to make a simultaneous calibration of the two cameras. Assume that the length of the translational motion of the stereo head is known. If, additionally, zero skew is assumed, , the following equations can be used to calibrate the cameras: (33) (34) (35) (36)

[5]. From this, the where relative rigid transformation between the two cameras in the stereo head can be obtained and the stereo head is then completely calibrated. Thus, this can be done from one single translation, if zero skew is assumed and either the direction or length of the translation is known. B. Stereo Head Calibration Using General Motions The case of general motions of the stereo head will now be considered. It will be studied how the rigid transformation between the two cameras can be included into the representation of the homographies. Following the notation in the last section, an and an will be used to denote the right and left cameras, respectively. In essence, our discussion will result in a way of calculating [37] between the image the so-called infinite homography planes of the two cameras. This will be done by a similar methodology and notation as in the previous derivations, by and writing down equations for the homographies from the calibration plane to the left and right image planes, respectively. The calculation of for a stereo head has previously been addressed by Zisserman et al. [26] in the case of self-calibration. By using the transformation , the orientations and positions of the left and right cameras in relation to the calibration plane can be related in the following way:

(37) (38) (39)

(52) It then follows that

(40)

(53)

(41) (42) (54)

In the case of unknown length, but known direction, of the translation (39)–(42) can be replaced by (43)

(55)

(44) (45) (46)

and accordingly

(47) and , the pose After calculation of the camera matrices of the cameras, i.e., the orientation and position of the cameras in relation to the calibration object, can easily be obtained

(56) For the right camera we have [see (3)] (57)

328


By eliminating , and between (56) and (57), the and are following relations between the columns in obtained: (58) (59) (60) Using images from at least two stereo views (two positions of can be obtained from the stereo head), the matrix (58) and (59), up to a scale factor. This follows from the fact that is independent of the pose of the stereo head, while the homographies from the calibration plane to the image planes change for each position. The calculated matrix is denoted by

Fig. 4.

Setup used for the experiments connected to Table I. TABLE I EXPERIMENTS WITH CHANGING TRANSLATION DIRECTION ACCORDING TO FIG. 4

(61) since it actually is an expression for the infinite homography between the images planes of the cameras in the stereo head, i.e., the homography between the image planes induced by the plane at infinity (see [37]). The infinite homography can be used for relating the matrices and in the following way:

nonlinear minimization of the projection errors using the Levenberg–Marquart algorithm. A. Experiments Using Translational Motions

(62) Using (62) together with the single-camera constraints for each camera, (10) and (11), the two cameras in the stereo head can be calibrated simultaneously. In [26], in the case of self-calibration, it is argued that (62) does not provide any additional constraints on the calibration problem. Further, we have observed that two motions, which is the minimum number for single-camera calibration, still are needed for calibrating the stereo head after adding (62) into the calculations. However, in the experiments presented in Section V-B, it is shown how the addition of (62) influences the calculations under noisy conditions. 1) A Note on Determining : At the end of Section IV-A, we commented on how to calculate the rigid transformation between the left and right cameras of the stereo pair. However, can be obtained without first calculating the transhas formations from the calibration plane to the cameras. If been calculated, can be obtained from (61) as (63)

V. EXPERIMENTS Results of experiments, using different implementations of the theory derived in this chapter, will be presented here. We will start by studying the practical performance of plane-based calibration from translations in Section V-A. Then, in Section V-B, the practical effect of relation (62) on the calibration of a stereo head from a planar object is analyzed. All homographies were estimated by an initial least-squares estimation and a subsequent

For the experiments on calibration using translational motions, both synthetic image data and images from a real robot vision system have been used. We will begin with the synthetic data set. 1) Synthetic Data: Calibration has been performed using computer-generated data to be able to analyze different aspects of the calibration technique under exactly known conditions. The different parameters that have been analyzed includes the number of images used, the direction of the translational motion in relation to the calibration plane, and the sensitivity to noise in the input data. Projections of a grid with 9 7 points were calculated. The distances between the grid points, both horizontally and vertically, were five length units (l.u.). This is to be compared to the distance between the center of the grid and the focal point of the camera in the initial position, which was 100 l.u. in all experiments in this section. The length of translation between each l.u.. Gaussian imaging instance was always chosen as noise with a standard deviation of pixel was added to the coordinates of the image points. This standard deviation is to be compared to the artificial image size which was approximately 300 240 pixels. For each setting, the calibration was performed 100 times, and the means and the standard deviations of the different parameters were calculated. We started by studying Case 1 from Section III-A, i.e., where the length of translational motion is known but the direction is unknown. One natural setup for this case is that the camera is directed toward the calibration plane and that the camera then is translated toward the plane, keeping it in the field of view during the whole sequence. As mentioned in Section III-A, calibration can be performed in this case as long as the calibration


329

TABLE II EXPERIMENTS USING DIFFERENT NUMBERS OF IMAGES OBTAINED ALONG THE TRANSLATION DIRECTION

plane is not too close to being parallel to the image plane of the camera and as long as the translational motion is not parallel to the plane. As illustrated in Fig. 4, the orientation of the camera was changed in relation to the calibration plane by a rotating the camera degrees around the axis, where the center of rotation is located at the origin of the calibration grid. The camera was then translated toward the center of the grid and five images were obtained along each translation direction. The camera matrix used for these experiments was

TABLE III EXPERIMENTS WITH DIFFERENT AMOUNTS OF ADDED GAUSSIAN NOISE WITH STANDARD DERIVATION , MEASURED IN PIXELS

TABLE IV EXPERIMENTS WHEN PERFORMING TRANSLATIONS WHICH DEVIATE FROM THE PRESUMED TRANSLATION LENGTH kt k = 5 l.u.

The aspect ratio was set to 1 here and the skew was set to 0, since we are only able to estimate three different parameters in when the direction of translation is unknown. The calibration results using different are shown in Table I. There is a quite natural decrease in the quality of the estimation of the coordinate of the principal point when increasing since the spread of the projections of the grid points in the direction is getting more and more narrow with increasing . A slight decrease in quality can also be noticed in the estimation of the focal length , while the estimation of seems totally unaffected by increasing . The focal length also seems to be consistently underestimated in this setting. Another thing to be noticed is that the standard deviation of the estimations rise when the setup is close to the degenerate configuration of parallel image and cal. ibration planes, at Next, we turn to Case 2 from Section III-A, i.e., where the length of translation is unknown but the direction is known. Since the translation direction is given in the coordinate system of the calibration plane, a setting with known direction is easily achieved when the calibration plane is placed on the floor and the camera is translated in the direction orthogonal to the floor. However, for the setup to be nondegenerate, neither the or the axis of the image plane can be parallel to the calibration plane. Therefore, in these experiments, the camera is tilted 30 along the axis and 30 along the axis in relation to the image plane. The camera matrix used for these experiments was chosen as

i.e., the aspect ratio is now . Table II shows the influence of the number of images obtained along the translation direction on the calibration results, using this setup. It can

be seen that there is a clear decrease in the standard deviation of the estimations when more data is being used for the computations. We also tested the sensitivity to different amounts of added noise using the same setup with five images obtained for are each estimation. Results of calibration using different displayed in Table III. It can be seen that the standard deviations of the estimations quite naturally increase with increasing noise. A slight positive bias can be observed for all the estimates, but it is kept at a relatively low, acceptable level. Another interesting aspect to investigate is the sensitivity to deviations from the expected performed motion. Both Case 1, with errors added in the translation length, and Case 2, with errors added in the translation direction, have been investigated. The setup is the same as that used in the preceding experiment , with the exception that the aspect ratio (Table III) with is set to 1 for Case 1 as in the first presented experiment (Table I). Starting with Case 1, the results when performing increasingly longer translations, with respect to the presumed length of 5 l.u., are shown in Table IV. Turning directly to Case 2, the results when performing translations that deviates from the presumed direction of translation with a rotation of degrees around the axis are presented in Table V. As could be expected, the accuracy of the results are directly affected by the accuracy in the motion parameters. This means that high precision in the robot motion is of paramount importance for accurate results.

330

TABLE V EXPERIMENTS WHERE THE PERFORMED TRANSLATION DEVIATES WITH A ROTATION OF DEGREES AROUND THE X AXIS FROM THE EXPECTED TRANSLATION DIRECTION


As a reference, the cameras were calibrated using general motions and the basic constraints for plane-based calibration, (10) and (11). This resulted in the following calibration matrices:

and

To test the practical performance of calibration from pure translations, the left camera was first translated in the direc. By using tion orthogonal toward the floor, i.e., (22)–(26), the camera was calibrated assuming zero skew, . The following intrinsic camera matrix was obtained:

Fig. 5. Robot used in the experiments on real image data.

Fig. 6. Image from one of the real data sequences.

2) Real Data: The robot used for the experiments on a real robot vision system is a modified ABB IRB2000 (see Fig. 5), which is capable of moving in all 6 DOF. The camera is mounted by a ball-head camera holder on the hand of the robot so that the orientation of the camera can be changed with respect to the orientation of the robot hand coordinate system. Fig. 6 shows an image from one of the used image sequences, picturing the calibration plane. In the following experiments, approximately ten images were used in each sequence. Experiments have been performed for both a single camera and a stereo head. In the stereo head case, the head was simulated by making two identical translational motions, starting from two different locations using different orientations and settings of the camera. The two different camera settings will be referred to as the left and the right cameras, respectively.

The value on the focal length is clearly underestimated com. As seen in Fig. 6, the tilting of the camera pared to was rather limited in this experiment. A stronger tilt might have given a more accurate estimate. The left camera was also calibrated using a motion sequence where the translation direction was changed after each image. Here, the directions of the translations were unknown, and instead we used a known fixed length of the translations. Hence, (22)–(24) and multiple instances of (27) were used for the calculations. The following matrix was obtained:

With respect to seems to be a better estimate than concerning the focal length. However, the estimated aspect ratios in the different cases are , and , so, concerning this quantity, seems to be more accurate. The stereo head configuration of the left and right cameras was also calibrated using the theory presented in Section IV-A. The camera pair was translated in the direction orthogonally to the floor, but this information was actually not used. Instead, a known length of the translations was used and (33)–(42) were applied. The following two matrices were calculated:

and

The result obtained here is very close to the reference method, where general motions of the cameras were used, at least with respect to the focal length and the aspect ratio. It seems that the added camera view constrains the problem in an effective way.


Fig. 7. Mean of the relative errors in the estimation of f for the left camera. The solid line corresponds to the use of (62). The dashed line corresponds to calculation without using this relation.

331

Fig. 8. Analog to Fig. 7 for the right camera.

When, as in the last experiment, the lengths of the translations are known, and it is known that the translation direction is unchanged through the whole image sequence, additional constraints can be used in the estimation of the homographies. Indeed, by using the known translation lengths in the estimation, only 11 parameters need to be estimated for all the homographies together for each camera. In general, by using as much a priori information of the structure of the homographies as possible, the calibration process becomes more stable and accurate. B. Experiments Using a Stereo Head and General Motions To test the influence of relation (62) on the calibration of a stereo head, experiments on synthetic data have been performed. This allowed for a study of the calibration procedure in the presence of different amounts of added noise. Our input data in this case consisted of a planar grid with 10 14 points. Six different stereo views of the plane were used. Following [38], we describe the rotations by a vector which is parallel to the rotation axis, which has a length that is equal to the angle of rotation measured in degrees. The translations are described by a vector . The six orientations and positions of the right camera in the stereo pair were chosen as follows:

The left camera was then rotated and translated using and with respect to the right camera. The two simulated cameras were calibrated both separately, using only the standard constraints (10) and (11) and, simultaneously, using the addition of (62). Noise was added to the projected image points, from a variance of 0.1 pixels up to 0.8 pixels in steps of 0.1. In Figs. 7 and 8, the mean of the relative

Fig. 9. Mean of the relative error when estimating x for left camera, when this camera is disturbed by twice the amount of noise as the right camera.

error from 100 simulations of estimating , i.e., the upper leftmost element in the calibration matrix , is plotted for the left and right cameras, respectively. The solid line corresponds to the use of (62) and the dashed line to estimation without using this relation. It can be seen that (62) gives an improvement of the calibration for the left camera, while it hardly affects the right camera. The result is similar for the other intrinsic parameters. Probably, the views of the calibration grid by the left camera are more sensitive to the added noise, and the right camera helps the left camera to achieve better estimations through the simultaneous calibration. One application of the findings presented here would be the case of calibrating two cameras, when it is known that one of the cameras is more difficult to calibrate than the other one. In this case, it should be a good idea to first calibrate the superior camera on its own and then calibrate the two cameras simultaneously. This would probably lead to more accurate estimates of the intrinsic parameters for the more error-prone camera. An example of this is illustrated in Fig. 9, where results are shown

332


from the calibration of two cameras when the left camera is disturbed by twice the amount of noise as the right camera. The plot shows the mean error in pixels of 50 simulations, when estimating for the left camera. The noise level in this camera is twice as high, as is indicated on the horizontal axis. The dashed line represents single-camera calibration and the solid lines represents simultaneous calibration of the two cameras. The same effect is seen for the other intrinsic camera parameters. VI. CONCLUSION AND FUTURE WORK Extensions of the plane-based calibration theory due to Zhang [5] and Sturm and Maybank [3] have been presented in this paper. It has been shown what relations can be derived if the motion is assumed to be purely translational. This was done in a manner that adequately connects the constraints for calibration from general motions to the new constraints for translational motions. This theory is also considered in the case of stereo head calibration and for the calculation of the hand–eye orientation in a robot vision setting. Experiments on computer-generated data have been performed in order to analyze how different parameters, such as the number of images used and the amount of added noise, influence the calibration results. The obtained results mostly follow the expectations, and the method seems to be usably accurate. However, a slight negative bias in the estimates could be noticed in the experiments with known translation length but unknown direction. A thorough error analysis of the calibration method might give insight into how this bias could be compensated for. A thoroughly tested bundle adjustment algorithm, i.e., a nonlinear iterative scheme minimizing the reprojection errors, has been applied using the result from the presented linear method as an initial value. The algorithm was modified to respect the constraints of the particular settings presented in this paper. However, this did not give any notable improvement of the results. The reprojection errors were already so small in the initial state that there was hardly any room for improvement. As might be expected, the method depends on good accuracy in the robot’s translational movement. Experiments where errors were added to the translation length or translation direction, which made them deviate from the presumed values, showed that these errors were transfered to errors in the camera parameters and resulted directly in poorer estimations. Experiments were also performed using a real robot vision system. For both of the camera settings calibrated in these experiments, one camera at a time, the calculated value of the focal length was lower than the value obtained by the reference method. One application that seems to work very well is the simultaneous calibration of the two cameras in a stereo head using translational motions. The results obtained in this setting on the real system shows strong agreement with the results obtained using the basic constraints for plane-based calibration and general motions. It seems as though the additional view of the plane constrains the calibration of both of the cameras in an effective way. The critical configurations of the relative orientation of the two image planes and the calibration plane as well as the choice of translation direction have not yet been investigated. A complete study of these degenerate configurations is another idea for future work.

A method for calculating the infinite homography between the two image planes in a stereo head, using the homographies estimated between the calibration plane and the image planes, has also been derived. Its possible usage and usefulness for simultaneous calibration of the two cameras in the stereo head has been discussed and illustrated using experiments. REFERENCES [1] O. Faugeras, Three-Dimensional Computer Vision: A Geometric Viewpoint. Cambridge, MA: MIT Press, 1993. [2] Z. Zhang, “Flexible camera calibration by viewing a plane from unknown orientations,” in Proc. Int. Conf. Comput. Vis., vol. 1, Kerkyra, Greece, Sep. 1999, pp. 666–679. [3] P. Sturm and S. Maybank, “On plane-based camera calibration: A general algorithm, singularities, applications,” in Proc. Conf. Comput. Vis. Patterns Recogn., vol. 1, 1999, pp. 432–437. [4] R. Y. Tsai, “A versatile camera calibration technique for highaccuracy 3d machine vision metrology using off-the-self tv cameras and lenses,” IEEE J. Robot. Automat., vol. 3, no. RA-4, pp. 323–344, Aug. 1987. [5] Z. Zhang, “A flexible new technique for camera calibration,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 22, no. 11, pp. 1330–1334, Nov. 2000. [6] S. Maybank and O. Faugeras, “A theory of self-calibration of a moving camera,” Int. J. Comput. Vis., vol. 8, no. 2, pp. 123–152, Aug. 1992. [7] O. Faugeras, T. Luong, and S. Maybank, “Camera self-calibration: theory and experiments,” in Proc. Euro. Conf. Comput. Vis., Stockholm, Sweden, May 1994, pp. 471–478. [8] A. Heyden and K. Åström, “Euclidean reconstruction from constant intrinsic parameters,” in Proc. Int. Conf. Pattern Recogn., Vienna, Austria, 1996, pp. 339–343. [9] Q.-T. Loung and O. Faugeras, “Self-calibration of a moving camera from point correspondences and fundamental matrices,” Int. J. Comput. Vis., vol. 22, no. 3, pp. 261–289, 1997. [10] B. Triggs, “Autocalibration and the absolute quadric,” in Proc. IEEE Conf. Comput. Vis. Pattern Recogn., Puerto Rico, 1997, pp. 609–614. [11] T. Brodsky and C. Fermüller, “Self-calibration from image derivatives,” Int. J. Comput. Vis., vol. 48, no. 2, pp. 91–114, 2002. [12] L. Dron, “Dynamic camera self-calibration from controlled motion sequences,” in Proc. IEEE Conf. Comput. Vis. Pattern Recog., Jun. 1993, pp. 501–506. [13] B. Triggs, “Autocalibration from planar scenes,” in Proc. Eur. Conf. Comput. Vis., vol. 1, Freiburg, Germany, Jun. 1998, pp. 89–105. [14] E. Malis and R. Cipolla, “Camera self-calibration from unknown planar structures enforcing the multiview constraints between collineations,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 24, no. 9, pp. 1268–1272, Sep. 2002. [15] P. Gurdjos and P. Sturm, “Methods and geometry for plane-based selfcalibration,” in Proc. IEEE Conf. Comput. Vis. Pattern Recogn., vol. 1, Jun. 2003, pp. 491–496. [16] R. Y. Tsai and R. K. Lenz, “A new technique for fully autonomous and efficient 3d robotics hand/eye calibration,” IEEE Trans. Robot. Automat., vol. 5, no. 3, pp. 345–358, Jun. 1989. [17] Y. C. Shiu and S. Ahmad, “Calibration of wrist-mounted robotic sensors xb,” by solving homogeneous transform equations of the form ax IEEE Trans. Robot. Automat., vol. 5, no. 1, pp. 16–29, Feb. 1989. [18] H. Zhuang, Z. S. Roth, and R. Sudhakar, “Simultaneous robot/world and tool/flange calibration by solving homogeneous transformation equation of the form ax yb,” IEEE Trans. Robot. Automat., vol. 10, no. 4, pp. 549–554, Aug. 1994. [19] R. Horaud and F. Dornaika, “Hand-eye calibration,” Int. J. Robot. Res., vol. 14, no. 3, pp. 195–210, 1995. [20] F. Dornaika and R. Horaud, “Simultaneous robot-world and hand-eye calibration,” IEEE Trans. Robot. Automat., vol. 14, no. 4, pp. 617–622, Aug. 1998. [21] S. Rémy, M. Dhome, J. Lavest, and N. Daucher, “Hand-eye calibration,” in Proc. IEEE/RSJ Int. Conf. Intell. Robots Syst., 1997, pp. 1057–1065. [22] K. Daniilidis, “Hand-eye calibration using dual quarternions,” Int. J. Robot. Res., vol. 18, no. 3, pp. 286–298, 1999. [23] N. Andreff, R. Horaud, and B. Espiau, “Robot hand-eye calibration using structure-from-motion,” Int. J. Robot. Res., vol. 20, no. 3, pp. 228–248, 2001.

=

=


[24] S. Ma, “A self-calibration technique for active vision systems,” IEEE Trans. Robot. Automat., vol. 12, no. 1, pp. 114–120, Feb. 1996. [25] H. Malm and A. Heyden, “Hand-eye calibration from image derivatives,” in Proc. Eur. Conf. Comput. Vis., vol. II, Dublin, Ireland, Jun. 2000, pp. 493–507. [26] A. Zisserman, P. Beardsley, and I. Reid, “Metric calibration of a stereo rig,” in Proc. Workshop Representation of Visual Scenes, Cambridge, MA, Jun. 1995, pp. 93–100. [27] Z. Zhang, Q.-T. Loung, and O. Faugeras, “Motion of an uncalibrated stereo rig: Self-calibration and metric reconstruction,” IEEE Trans. Robot. Automat., vol. 12, no. 1, pp. 103–113, Feb. 1996. [28] F. Devernay and O. Faugeras, “From projective to euclidean reconstruction,” in Proc. IEEE Conf. Comput. Vis. Pattern Recogn., San Fransisco, CA, 1996, pp. 264–269. [29] L. D. Agapito, D. Huynh, and M. Brooks, “Self-calibrating a stereo head: An error analysis in the neighborhood of degenerate configurations,” in Proc. 6th Int. Conf. Comput. Vis., Bombay, India, 1998, pp. 747–753. [30] R. Horaud and G. Csurka, “Self-calibration and euclidean reconstruction using motions of a stereo rig,” in Proc. Int. Conf. Comput. Vis., Bombay, India, 1998, pp. 96–103. [31] F. Dornaika and C. R. Chung, “Stereo geometry from 3-D ego-motion streams,” IEEE Trans. Syst., Man Cybern. B, vol. 33, no. 2, pp. 308–323, Apr. 2003. [32] D. Demirdjian, A. Zisserman, and R. Horaud, “Stereo autocalibration from one plane,” in Proc. Eur. Conf. Comput. Vis., vol. 2, Dublin, Ireland, 2000, pp. 625–639. [33] H. Malm and A. Heyden, “Stereo head calibration from a planar object,” in Proc. IEEE Conf. Comput. Vis. Pattern Recogn., vol. 2, Kauai, HI, Dec. 2001, pp. 657–662. , “Plane-based camera calibration: The case of pure translation,” [34] in Proc. Workshop Mach. Vis. Applicat., Nara, Japan, Dec. 2002, pp. 146–149. [35] , “Simplified intrinsic camera calibration and hand-eye calibration for robot vision,” in Proc. Int. Conf. Intell. Robots Syst., vol. 1, Las Vegas, NV, Oct. 2003, pp. 1037–1043. [36] J. G. Semple and G. T. Kneebone, Algebraic Projective Geometry. Oxford, U.K.: Clarendon, 1952. [37] R. Hartley and A. Zisserman, Multiple View Geometry in Computer Vision. Cambridge, U.K.: Cambridge Univ. Press, 2000. [38] Z. Zhang, “A Flexible New Technique for Camera Calibration,” Microsoft Research, Tech. Rep. MSR-TR-98-71, Dec. 1998. [39] G. H. Golub and C. F. Van Loan, Matrix Computations. Baltimore, MD: Johns Hopkins Univ. Press, 1996.

333

Henrik Malm recieved the M.Sc. degree in computer science and technology and the Ph.D. degree in applied mathematics from Lund Institute of Technology, Lund University, Lund, Sweden, in 1998 and 2003, respectively. In the autumn of 2000, he was a Guest Researcher with the Computer Vision Laboratory, Center for Automation Research, University of Maryland, College Park. He is currently a Postdoctoral Fellow with the Lund Vision Group at the Department of Cell and Organism Biology, Lund University. He has authored or coauthored a significant number of papers on the following subjects in international journals and conference proceedings: robot vision, camera calibration, hand-eye calibration, visual illusions and nonlinear diffusion filtering. He is currently working on biologically inspired night-vision systems.

Anders Heyden (M’01) received the M.Sc. degree in engineering physics and the Ph.D. degree in applied mathematics from Lund Institute of Technology, Lund University, Lund, Sweden, in 1989 and 1995, respectively. He became a Reader in 1999 at Lund University and was promoted to a Professor of mathematics in 2002. In 2001 he became a full Professor of applied mathematics with Malmö University, Malmö, Sweden, where he is currently leading research in applied mathematics. He has made significant contributions within the fields of multiple view geometry, autocalibration, and reconstruction methods. His current research interests are mainly computer vision and image analysis, especially surface reconstruction and image segmentation based on variational methods, dynamic vision and auto-calibration, but also within mathematical biology. He has authored or coauthored more than 100 papers published in international journals and conference proceedings and is the inventor of 12 patents. He has been involved in the start-up of several companies within the area of image analysis; CellaVision AB, Precise Biometrics AB, WeSpot AB, and Ludesi AB. He is currently a Member of the Editorial Board of the International Journal of Computer Vision, and a Member of the Conference Board of the European Conference on Computer Vision. Prof. Heyden was the recipient of an honorable mention for the Marr Prize, has been an invited speaker at the Asian Conference on Computer Vision, and was the recipient of the Best Paper Award at the International Conference on Automation, Robotics, and Vision.