Configuration control of a

4 downloads 0 Views 153KB Size Report
pure rotation since there is a quite simple transformation between the ... must make sure that the optical center coincides with the pan-tilt center. ... object point in two consecutive images taken by a ... After the motion of the motion frame, we.
3 DOF Modular Eye For Smart Car Jin Xu, Kok Sin Tay, Guang Chen and Ming Xie School of Mechanical & Production Engineering Nanyang Technological University Singapore 639798 Abstract This paper presents our work on the design of 3 DOF modular eye for smart car. This design is specially guided by the requirement of facilitating the cancellation of camera frame (i.e., the merging of the camera frame with the pan-tilt frame). The design solution and the vision principle of canceling the camera frame are discussed in detail with real results.

1. Introduction The guidance of smart car needs active vision system, we propose the solution of modular eye for the following reasons: • Re-configurabilty: you can mount a modular eye at different places for different purpose or re-configure a group of modular eyes for a changed situation. • Expandability: you can add as many as you like for your special purpose. During the modular design, we also specially focus on the ability of real-time merging the camera frames with the pan-tilt frame, in order to solve the following problems: Problem 1: large number of coordinate systems: For a vision-guided smart car, the car alone has 3 DOF (one for rotation and 2 for translation on the road plane), there will be additional 2*N DOF for the pan-tilt motion of cameras (N is the number of cameras we have). For each DOF we must assign a coordinate system to it to support kinematic and dynamic modeling. If the camera frame is not merged with the pan-tilt frame, there will be additional N frames for the cameras. This large number of coordinate systems will be quite troublesome. Problem 2: requirement for pure rotation of camera: There are many cases where we need cameras to perform pure rotation since there is a quite simple transformation between the images taken by such a camera. For example, it’s quite easy to get a panoramic view from image sequence by applying the transformation to the images if the images are captured by a camera undergoing a pure rotation. If the camera frame is merged with the motion frame, we solve the second problem and reduce N DOF for the first problem. In our solution, the design part and the vision

principle part are tightly combined in order to facilitate this ability of merging. We will discuss these two aspects respectively in the following parts.

2. DESIGN OF MODULAR EYE OF 3 DOF 2.1. Problem statement Researchers have developed many kinds of active eyes but it seems that few of them have realized the importance of a modular design. Normally their eyes are just one part of a whole system, it will be quite difficult if they are asked to add one eye to this system or change the configuration of their eyes. This paper presents a new design solution where the design is a module by itself. The advantages of being modular are the flexibility to be re-configurable and expandable based on the design. Another advantage is that modular eyes are interchangeable: when one eye does not work, another eye can replace it without problems of incomparability. Some applications of the modular eyes are: •

guidance for an smart car V

V

V

V Figure 1 Different configurations on a car Different configurations of our modular eyes on a smart car are demonstrated in figure 1, where V stands for “Viewing Area of the Cameras”. •

Body Scanner for Clothes Sizing

Figure 2 Body scanner

For a body-scanner shown as in figure 2, you can place as many as modular eyes at any position to get a full view of the clothes. Besides being modular, another distinct character of our design is the ability of real-time adjustment on the camera to realize pure rotation. For a vision system, it’s really important to have this ability. To achieve this, we must make sure that the optical center coincides with the pan-tilt center. For a high-resolution camera, there will be a need for real-time adjustment on the offset between the camera system and the pan-tilt system while we change the visual parameters of the camera. If a zoom in/out action, which effects the position of the optical center, is performed, then the optical center may not be merged with the pan-tilt center and we need an adjustment.

and placement of items are based on their stability and compactness of the design.

In this part we give our design solution for this aspect and in the following part, we will propose an algorithm of how to adjust it.

Figure 4 Modular eye on a smart car

2.2 Design solution In order to have a pure rotation condition, the pan, tilt axes of the mechanical parts have to intersect each other. Besides pan/tilt DOF, our design enables the camera to translate along the optical axis. By this additional DOF, we can adjust the position of the camera to guarantee a pure rotation. Implementing a rack and pinion on the side of the camera help to generate such a movement.

3.1. Problem statement The modular eye we designed is shown in figure 3 as in part 2. Figure 5 shows the frame assignment of all the coordinate systems. They include: Fw: W-XwYwZw is a global 3-D world reference coordinate system. It is a fixed reference frame.

Fet: Et-XetYetZet is the 3-D coordinate system assigned to the tilt motion of the eye (left/right), and Fet is the tilt axis.

AXIS B AXIS A

AXIS A

AXIS B

AXIS B AXIS A FRONT VIEW

3. Camera Frame Cancellation

Fep: Ep-XepYepZep is the 3-D coordinate system assigned to the pan motion of the eye, and Yep is the pan axis.

AXIS B

PLAN VIEW

Such a modular eye with the adjustment mechanism is quite suitable for real-time applications such as a smart car. Figure 4 shows such an eye mounted on our car.

AXIS A SIDE VIEW

Figure 3 Design solution Figure 3 shows the three-view drawings of our design where: (Axis A) represent the pan axis, (Axis B) represent the tilt and zooming axis, 2.3 Results and conclusion Our design has fulfilled to be a rigid and stable product when operating at standard condition. Standard items used for the design are servomotors, timing belts and harmonic drives. They provided better transmission of torque and speed. Sensors are installed to detect limiting positions and set the eye to home position. Orientations

Fc: C-XYZ is the 3-D coordinate system assigned to each camera with origin at the optical center c. Z-axis coincides with the optical axis and X, Y parallel to the pixel row and column directions-x, y axis in the image coordinate system Xi, which is defined as below. Fi: o-xy is the 2-D coordinate system assigned to the image plane. x, y axes are parallel to the pixel row and column direction respectively. The problem here is how to determine whether the cameras are undergone pure rotations when rotating the pan and tilt axes of the corresponding motion frame. If this can be solved, it means that the camera frame is merged with the motion frame controlling the pan/tilt motion (this problem is also mentioned in [6, 7, 8, 9], but none of them has proposed a practical solution). In the following parts, we propose a solution to this problem.

order to minimize the error function. The following section discusses how to do the camera adjustment.

y o

Y C

x

X

3.3. Implementation In this part, we present the algorithm for merging camera frame with motion frame. The algorithm involves two steps:

Yw

Z

Step 1: To determine whether the rotation is a pure one:

Yep

Xw

Zet Xep

Zw

Zep Yet

Xet

Figure 5 Coordinate systems assignment

The procedure is as follows: We first acquire one image and then rotate the motion frame either by a pan rotation or tilt rotation. After the motion of the motion frame, we acquire the second image. Subsequently, we establish the correspondence of at least four pairs of matches. From these matches, we estimate the M3×3 matrix. Knowing this matrix, we can calculate the error between the second image and the “transformed” image calculated as M3×3•I(t1) by a certain method:

ε = f ( I (t2) − M3 × 3 • I (t1) ) 2

3.2. Geometrical principle of merging In computer vision, it is well-known that two consecutive images are related to each other by a 2D projective transformation matrix if the camera has undergone a pure rotation about any axis that goes through the origin of the camera frame ([3]). This relationship can be described as follows:

é sx1ù é m11 m12 ê sy 1ú = êm 21 m 22 ê ú ê êë s ú êëm 31 m 32

m13 ù é x 0 ù m 23ú ê y 0 úê m 33ú êë 1

(Eq.1)

where (x1, y1) and (x2, y2) are the coordinates of a same object point in two consecutive images taken by a camera that undergoes a pure rotation. mij (i, j=1, 2, 3) are all constants that can be determined if four pairs of matches can be found. In practice, we apply the above equation to two images taken by a camera that is being moved by the pan and tilt motions of the corresponding motion frame. Then, we define the following error function:

ε = I (t1) − M 3 × 3 • I (t2)

(Eq.2)

If ε is smaller than a pre-defined threshold value, the camera frame is merging with the motion frame. Otherwise, the camera needs to be adjusted. Step 2: To adjust the camera to realize a pure rotation: For our design, the only parameter we can adjust is the camera offset along the optical axis, we will adjust this offset to realise a pure rotation as below: (1) First we make the pan axis fixed and the camera performs a rotation around the tilt axis. After the rotation, we calculate the error value defined in Eq.2 above. Then we methodically vary the camera’s offset in a certain range, what we expect to get is a curve like C1 as in Figure 6 below. A minimum point can be found on the curvethe minimum value may not be exactly zero, and call the offset here as x0. ε

C1

C2

2

By testing the error function, we have the following two cases: (1) If the error is less than a pre-defined threshold value, it means that the camera frame has undergone a pure rotation and the camera frame is merged with the corresponding motion frame controlling the pan and tilt motion. (2) If the error is greater than a pre-defined threshold value, it means that the camera frame has undergone a general motion and is not merged with the motion frame controlling the pan and tilt motion. In this case, we need to adjust the offset of the camera in

∆x

x0

xf

x1

Figure 6 Error curve (2) Then we make the tilt axis fixed. At every certain offset value, the camera performs a pan motion of a chosen angle. Again we calculate

the value of the error function and get a curve like C2, and a minimum value at some x1. (3) The value of x0 and x1 may not exactly equal, and we can choose the average of them as the final offset. At this point, the combined error is supposed to achieve the minimum or so. This is shown in Diagram 1 below. If the error is small enough, then by our method we have reached pure rotation, or merging of the coordinate systems. 3.4. Results In our experiment, we carry out the experiments on the smart car in our Intelligent Vehicle Lab. The cameras are Sony color CCD cameras (model CVI-371D). Without loss of generality, we set the image resolution to be 256*240. The results are shown in Figure 7.

the camera performing a pure rotation while zooming in/out. In fact, by using the pin-hole model, we got an approximate equation between the error in image space (E) and the offset between camera frame and the motion frame ( ∆x ): E = f ∆ x • tg 2θ • sin θ . Where f u stands for the focal length, u stands for the distance from the object to the camera, θ is the angle of the rotation. Put aside the effect of the term tg 2θ • sin θ , we can

see that the allowable range of ∆x is nearly the same as the detectable difference of the object, for the same camera at a distance of u.

As the error of the whole image increases in square with respect to the increment of resolution, when the resolution increases, the allowable ∆x (the error in image space less than one pixel in the range of ∆x ) for a pure rotation will decrease rapidly. Our solution is quite suitable for the need of adjustment to guarantee pure rotation in this case. 4.

Figure 7 Result images In the figure, there are two triples: the upper triple for three intensity images and the lower triple for three edge images. The three images in the upper triple are: the image (before image) taken before rotation of motion frame, the image (after image) taken after rotation of motion frame, and the transformed image. The transformed image is the result of applying the estimated M3×3 (with 16 pairs of matches) to the first image. If the camera frame is merged with the motion frame, the transformed image must be equal to a sub-part of the second image. The three edge images in the lower triple are: the edge image extracted from the first intensity image, the edge image extracted from the second intensity image, and the superimposed edge image between the first edge image and the edge image extracted from the transformed image. For the camera, the adjustable range of the offset is 5.5 cm. When the image resolution is low, the adjustment is not needed, because the error caused by the offset is so small that we can regard it as a pure rotation without any adjustments. But as the resolution of cameras increasing (it is reported to be as high as 10000*10000), we believe there’s such a need for real-time adjusting in the camera axis to keep

Conclusion

We have presented in this paper the design of a modular eye and the algorithm for real-time adjustment of it to make the camera frame merged with the pan-tilt motion frame. This special design and the adjustment will make the modular eye more suitable for the guidance of smart car. Results of real images are also presented. Future works include the control of the modular eye and a realtime vision for the guidance of the smart car shown in Figure 8.

Figure 8 Picture of a smart car

5. REFERENCES [1] Aloimonos, J., Weiss, I., and Bandyopadhay, A. 1988. “Active vision”. Internat. J. Comput. Vision l(4):333-356. [2] A. Blake and A. Yullie, editors. Active Vision. MIT Press, 1992. [3] Richard P. Paul, Robot Manipulations, the MIT press, 1981.

[4] M. Li, ``Kinematic calibration of an active head-eye system'', IEEE Transactions on Robotics and Automation, vol. 14, no. 2, 1997. [5] H. Zhuang and Z. S. Roth, “A note on ‘Calibration of wrist-mounted robotic sensors by solving homogenious transform equations of the form AX=XB’”, IEEE Trans. Robot. Automat., vol. 7, pp. 877-878, Dec. 1991. [6] S. Asaad, M. Bishay, D.M. Wilkes, and K. Kawamura, "A Low-Cost, DSP-Based, Intelligent Vision System for Robotic Applications," Proceedings of the 1996 IEEE International Conference on Robotics and Automation, Minneapolis, Minnesota, pp. 1651-1661, April, 1996. [7] José Santos-Victor, Franc van Trigt and João Sentieiro. “Medusa - A Stereo Head for Active Vision”. VisLab-TR xxx/94 -International workshop on Intelligent Robotic Systems - IRS94, Grenoble, France, July 1994 [8] Brian C. Madden and Ulf M. Cahn Von Seelen. “PennEyes - A Binocular Active Vision System”. Technical Report MS-CIS-95-37, GRASP LAB 396, U. Penn. [9] Fiala, J.C., Lumia, R., Roberts, K.J., Wavering, A.J. “TRICLOPS: A Tool for Studying Active Vision”. International Journal of Computer Vision. Vol 12, No 23, pp. 231-50, April 1994. [10] Xie M., “New Development of Stereo Vision: A Solution for Motion Stereo Correspondence”. 3rd Asian Conference on Computer Vision, HongKong, Vol.1, p280-287, Jan 8-11, 1998. [11] Faugeras, Olivier, Three-dimensional computer vision : a geometric viewpoint Cambridge, Mass. MIT Press, c1993. [12] P. M. Sharkey, D. W. Murray, S. Vandevelde et al. “A modular head/eye platform for real-time reactive vision”. Mechatronics, Vol. 3, No. 4, pp. 517-535, 1993.