Real-time Monocular Vision-Based Object Tracking with Object ... - ACIS

1 downloads 0 Views 159KB Size Report
formulation of the distance and motion estimation is presented in the paper. ... proposed approach avoids singularity and local minimum problem in IBVS ...
Real-time Monocular Vision-Based Object Tracking with Object Distance and Motion Estimation H. Firouzi and H. Najjaran

Abstract - This paper presents a real-time vision-based object tracking system consisting of a camera on a 2-DOF manipulator which, for example, can be a PT camera. The main novelties of the proposed tracking system include the ability to i) reduce the image processing load by relying on the object position as the only feature of the images acquired from a camera, and ii) estimate the distance and motion of the object without the need for an active rangefinder. The object tracking system is capable of controlling a manipulator using a feedback system based on the object position. The control rule of the feedback system is to minimize the distance between the object position in the camera image and the center point of the image. The proposed method can be readily adopted in dynamic environments, and achieve better tracking accuracies and efficiencies than the traditional methods. The formulation of the distance and motion estimation is presented in the paper. The efficiency and robustness of the proposed method in dealing with noise is verified by simulation using a PT camera.

I. INTRODUCTION

A

UTOMATED robotic systems have increasingly been desirable for many real-world applications such as automatic surveillance, vehicle control and robotic assembly. Vision sensors provide a wide range of information in both unstructured and structured environments; therefore, they are affluent tools for automatic control of robotic systems. In the literature, visual servoing refers to the use of computer vision for the control of robot motion using a camera that can be stationary or moving with the robot. In visual servoing different areas ranging from image processing, kinematics and control theory are integrated to control a robot that not only observes its environment but manipulates it also [1]. From the control viewpoint, there are two basic visual servo control approaches, namely position-based visual servo control (PBVS) and image-based visual servo control (IBVS). A thorough review of these approaches is available in [2], [3]. In the PBVS, the reference input is the 3D

Manuscript received by February 14, 2010. This work has been supported by the Natural Science and Engineering Council (NSERC) Canada. Hadi Firouzi is a PhD student at the Okanagan School of Engineering, the University of British Columbia (email:[email protected]). *Homayoun Najjaran is an Assistant Professor at the Okanagan School of Engineering at the University of British Columbia, 3333 University Way, Kelowna, BC, V1V1V7, Canada (phone: 250-807-8713, fax: 250807-9850 email:[email protected]).

relative position and orientation between the object and the robot end-effector obtained from the 3D measurements expressed in the Cartesian Space. The control strategies of the PBVS approach are well established, and the state-ofthe-art research on this topic focuses mainly on the robustness and computational efficiency for real-time applications. However, the main problem with this approach is that small measurement errors can compromise the servoing accuracy significantly when only a camera is used. In the IBVS approach, the reference input is with respect to a 2D object position expressed in the image plane and typically measured in pixels. In general, the IBVS approach is more robust to calibration errors and image noise than the PBVS. However, the IBVS approach will ultimately fail without an accurate estimation of object distance and motion especially in dynamic environments where the objects can move without any limitations. Many researchers have recently proposed different methods for object distance and motion estimation to improve the efficiency of visual servo control. Fang and Lin (2001) proposed the observabilized damped least-squares method as a featurebased visual servo control on a robot manipulator based on a modified EKF for depth estimation [4]. Lippiello et al. (2007) used an adaptive version of the EKF for enhancing the real-time estimation of the position and orientation of a moving object by taking advantage of pre-selection of the features to be extracted from the image at each sample time [5]. Ma et al. (2007) presented an approach motivated by adaptive control techniques using a non-linear observer for range identification of a moving object with unknown motion parameters [6]. The object motion is assumed to follow a known linear model with unknown coefficients. Other researchers focused on the use of nonlinear observers for the state estimation of unknown object distance or relative object motion parameters. In order to identify the range of the moving object through two dimensional images obtained from a single camera, Karagiannis and Astolfi (2005) gave a new solution based on a nonlinear reduced-order observer with assignable convergence by applying the rules of immersion and invariance methods [7]. Luca et al. (2007) proposed an online distance estimation method by interpreting the distance as a continuous unknown state but known dynamics in a nonlinear observer framework [8]. They eliminated unnecessary hypotheses on camera motion while configured their algorithm without a need for image motion

characteristics present in the previous distance estimation methods. Fioravanti et al. (2008) implemented IBVS for a 6 DOF single eye-in-hand robotic system for the execution of positioning tasks [9]. The positioning task was developed using depth estimation and image path planning following a helical path trajectory. Image path planning in their proposed approach avoids singularity and local minimum problem in IBVS systems and gives the possibility to choose the 3D shape of the camera trajectory. Published work in servo control has typically focused on the methods that rely on finding a set of matching features in two or more consequent images. However, this can be a strong and often an invalid assumption for many real-world applications, especially where the robot environment is subject to rapid change. Also, matching a set of features in streaming images is computationally expensive and impractical for real-time applications. This article describes the formulation of an IBVS method capable of estimating both object distance and motion using only the 2D target object position in the image-plane. The simulation results show that even using distance estimation alone can considerably enhance the performance of the visual servoing system. Besides, the use of object motion estimation together with distance estimation further improves the tracking efficacy of the proposed IBVS.

xc

yc

zc Object

q2

L2

Camera

L1

yb

zb

xb

q1

Fig 1. 2-DOF eye in hand configuration – system overview

xc yc zc

View Point

f Image Plane

Fig 2. object position in camera frame ( ) and image plane ( )

II. SYSTEM CONFIGURATION AND MODELING

The time derivative of Eq. (1) is given by,

A. System Configuration Fig. 1 shows the configuration of the proposed system that consists of a motorized two degrees of freedom manipulator with a CCD camera on its end effecter. The system tracks an object by positioning the object at the center of the images captured by the camera. In Fig 1, pb and pc are the object positions in base frame and camera frame, respectively. The joint variable vector q = q q as well as the length of the manipulator links L1 and L2 are known. B. System Modeling Fig 2 shows the camera model. The object position p = u v can be defined in an image using the camera parameters including the canonical length, ratio of pixel dimensions α, and camera coordinates (u , v ) which is the coordinate of camera principal point [10], [11]. However, the object position (p = x y ) is presented without referring to the camera parameters to simplify the formulation in the following. Thus, =

=

( −

)

( −

)

=



=

− (2)

=



=



The velocity of the object can be found as follows,

=− −

×



=−



+

=−



+

=−



+

(3)

where ν, ω are the linear and angular velocity vectors of the camera, and P is the object 3D position in camera coordinate system. Solving for object velocity, we have, =



+



+

+

− 1+

+ (4)

= (1)

=

=

+ 1+



+

or, =

(5)

Thus, the camera spatial velocity (V ) and interaction matrix (L ) related to object position (P) are = −1 = 0



0

−1









=−

− 1 +

1 +





s( ) c( ) − s( ) + c( ) c( ) =

=

+

(7)



(14)

(6)

Using the Denavit-Hartenberg convection [14], the position (P ) and orientation (Θ ) of the camera are

Θ

Recently joint velocities are used to control many industrial manipulators, therefore we consider q as our control signal. Solving for joint velocity we obtain

=

where L is the inverse of the interaction matrix (L ) with respect to the joint variable vector (q). As L is a 2 by 2 matrix and there are two independent variables (i.e. joint variables) to control, it is clear that the interaction matrix L is a full rank matrix and its inverse can be found [2]. However object distance z need to be estimated to make L . Let assume that z is the estimation of z . Also we assume that the object is not stationary, therefore p∗ is not zero and should be estimated as well.

0

where s(. ), c(. ) are sine and cosine functions, respectively. Thus, the spatial velocity can be found as follows, =

=

Θ

IV. OBJECT MOTION AND POSITION ESTIMATION Using (16) object position in the base-frame and cameraframe at time t+1 is

(8)

=

+∆ (15)

where the Jacobean matrix J is c( ) c( ) − s( ) s( ) 0 − c( ) = − s( ) c( ) − c( ) s( ) 0 1 1 0 0 0



(9)

III. CONTROLLER MODEL The aim of the proposed system is to minimize the distance between the current position (p ) and the desired position (p∗ ) of the object in the image plane, so the error function e is defined as =





+

=

= (16)

(10)

=−

=

(11)

Grouping (16) and (15) we obtain: ∆ − ∆ ∆

Taking the time derivative of (10), we obtain − −

+∆

where P , P are object position in base-frame at time t and t+1 respectively. Also P , P are object positions in camera-frame at time t and t+1 respectively. R , T , are rotational and translational R , T transformation matrices from camera-frame to base-frame at time t and t+1 respectively. ∆P is object motion in baseframe over time [t,t+1]. Based on (1) object position is computed in the camera frame as follows,

The control law is defined to ensure an exponentially decouple decrease of the error that is

=

+



(12)

(17)

Combining (5), (11), and (12), we obtain −

=





= (13)

− − −

Considering the third row of (17) we have:

+





+∆

1

=



2∆ 2∆ 2∆



(18) =−



1

1

Solving for object distance and motion we obtain:



=





∆ ∆ ∆





=

,

0



0

=



(22)

= =



1 =−



=

1 ∆



=

,

0



=



(19)

=



+



(

)+





2

− −



∆ ∆ ∆

1 )+





− −

Similar to (15) object position at time t+2 is =



2





1 ∆

(

Combining (19) and (22) we have =

=

1

0

1 =



1

=



+





2





(23) =

=

Having manipulator joint variables (q), we can calculate matrices A, D. As a result, object distance (z ) and motion (∆P ) is

+∆ (20)

=



+∆

+∆ ∆

where object motion in base-frame at time t+1 (∆P ∆ ≅

=∆

) is

=

(24)

+ (21)

=

∆ ∆ = ∆

+ 2∆

Due the fact that the target objects are common objects such as humans or mobile robots and also the sampling interval is proportionally high (e.g. 30 frame per second) to the object motion, we can assume that changes of object motion over two consecutive times [t, t+1] is negligible (i.e. δP ≅ 0). Therefore the same process can be applied for time t, t+2 and based on (15)-(19) we obtain:

A. Object Motion Estimation in Base-Frame: Based on (21) if δP is negligible, the preliminary estimation of the object motion at time t+2 is: ∆

=∆

(25)

Using the Kalman filter method [12] we can improve this preliminary estimation. To use this method, we have to implement two consecutive steps; i) prediction step and ii) correction step. In Prediction step, we predict the real value based on the noisy observation (∆ ) and the process model, and then in correction step, the predicted value is

corrected using the Kalman weight observations. Object Motion Prediction step: ∆

and

previous = =

=∆

+

=∆



+

where ∆P , ∆P are estimated object motion and estimated object motion covariance respectively. Also Q∆ is the process noise covariance [12]. Object Motion Correction step: ∆



=∆



+



=∆

+



= 1−

∆ ∆

1 4





, :



(28)

,

where A , : is row i and columns 1 to 3 of the matrix A, A , is the row i and column 4 of the matrix A, and D is row i of the vector D. Having the preliminary estimation of the object distance in camera-frame at time t+2, the preliminary estimation of the object position in base-frame is:

̃

=

+

+ 2∆

(29)

1

Similar to the object motion estimation part, this preliminary estimation of object position can be improved using the Kalman filter approach [12]. Object Position Prediction step: =

+∆ (30)

=

=

+

(32)

(27)

B. Object Position Estimation in Base-Frame: Using (24) and (27) the preliminary estimation of the object distance in camera-frame at time t+2 is: =

)

C. Object Distance Estimation: Using the estimated object position and motion in baseframe, we can calculate the object distance at time t+2. ̂

where K ∆ is Kalman updating weight and R∆ is estimated measurement error covariance respectively [12].

̃

(31)

where K is Kalman updating weight and R is estimated measurement error covariance respectively [12].



−∆



= (1 −

(26) ∆

+

+

where P , P are the estimated object position and estimated object position covariance respectively. Also Q is the process noise covariance [12]. Object Position Correction step:

D. Object Motion Estimation in Image-Plane Having object position in base-frame at time t+1, we can estimate the object motion in image-plane at time t+2 as follows: =

= ∗

=

1

+ (33) −

where p is the object position in camera-frame at time t+2 regardless of object motion at time t+1. Thus subtracting the observed object position and expected object position in image-plane at time t+2 is the object motion in image-plane (p∗ ). V. SIMULATION RESULTS To illustrate the accuracy and efficiency of the proposed object tracking with object distance and motion estimation, three simulations are performed. In these simulations an eye-in-hand configuration is employed to track an object which moves along a triangle in a 3D space, see Fig 3. The length of the manipulator links (i.e. L1 and L2 in Fig 1) are 10 and 2 units respectively. The object velocity is almost constant along each side of the triangle. Indeed, a white noise signal (i.e. ± 0.001 unit/sample) is added to the object velocity.

0.1 X: 6 Y: 2 Z: 22

0.05 Object Trajectory

25

0 -0.05

20

15 Z

X: 7 Y: 16 Z: 3

rad/sample

Vx

X: 5 Y: -15 Z: 4

10

Vy

-0.1

Vz -0.15

Wx Wy

-0.2

Wz

8

5

-0.25

6

Pan-Tilt Camera Mount Position 4 0

-0.3

2

20

15

10

5

0

-5

-10

-15

0

-20

X

-0.35

Y

Fig 3. Object trajectory in the base-frame 0.2

0.15

100

200

300 Sample

400

500

600

As shown in Fig 7 and Fig 8, camera velocity and manipulator joint velocity are changing smoothly and in a limited range which is desirable in many practical cases

X Velocity Y Velocity Z Velocity

X: 77 Y: 0.1549

0

Fig 7. Camera linear and angular velocity – object tracking with range and object motion estimation

X: 290 Y: 0.09493

0.1

0.1 0.05

q1

X: 81 Y: 0.009959

X: 285 Y: -0.005067

X: 498 Y: -0.004985

X: 92 Y: -0.004932

0

X: 285 Y: -0.06992

X: 515 Y: -0.085

-0.1

rad/sample

-0.05

X: 478 Y: -0.08998

-0.15

-0.2

q2

0.05

0

-0.05

-0.1

-0.15 0

100

200

300 Sample

400

500

600

-0.2

Fig 4. Noisy object velocity in the base-frame

-0.25

0

100

200

0.25 Estimated X Velocity Estimated Y Velocity Estimated Z Velocity

0.2 0.15

300 Sample

400

500

600

Fig 8. 2-DOF Manipulator Joints velocity during object tracking

0.1

80

0.05

60

q

1

0

q

2

40

-0.05 20 Degree

-0.1 -0.15

0 -20

-0.2 -0.25

-40

0

100

200

300 Sample

400

500

600 -60

Fig 5. Estimated object velocity using the proposed approach 18 Estimated object distance Actulal object distance

16

12 10 8 6 4 2

0

100

200

300 Sample

400

500

0

100

200

300 Sample

400

500

600

Fig 9. 2-DOF Manipulator Joints values during object tracking

14

0

-80

600

Fig 6. Object distance estimation using the proposed method

Due to noise, there are some peaks in Fig 5 especially for the estimated object velocity in Z axis however, based on Fig 6, the final estimated object distance is less sensitive to the noise and proportionally accurate and reliable.

To compare the efficiency and accuracy of the object tracking with and without object distance and motion estimation, three different situations are simulated; 1) object tracking without distance and object motion estimation, 2) object tracking with distance and without motion estimation, and 3) object tracking with object distance and motion estimation. Fig 10 shows that the proposed approach can estimate the object velocity even with a noisy signal. In spite of the abrupt changes in the object velocity (i.e. at sample 200 and 400), the proposed approach is able to adapt itself and estimate the actual values. Although object motion estimation is not perfect at some samples (e.g. at sample 257), the estimated object distance is very robust and reliable, see Fig 6.

[2]

0.1 no estimation distance estimation distance and motion estimation

0

[3]

-0.1 -0.2

[4] -0.3 y

Desired object position at (0,0) -0.4

[5]

-0.5

[6]

-0.6 -0.7 -0.8 -2

-1.8

-1.6

-1.4

-1.2

-1

-0.8

-0.6

-0.4

-0.2

0

0.2

x

Fig 10. Object position in the image plane; dash-line: without object distance / motion estimation, dot-line: with object distance estimation, solid-line: with object distance / motion estimation Table 1. Compare object tracking Mean Square Error (MSE) with different conditions

Object Tracking With No Estimation Object distance estimation Object distance and motion Estimation

MSE 0.0877 0.0508 0.0298

Based on Fig 10, object distance and motion estimation can improve the object tracking accuracy and efficiency especially in terms of reach time and steady state error. Table 1 shows that Mean Square Error (MSE) reduces if object distance and motion estimation are used. VI. CONCLUSIONS In this paper, an eye-in-hand monocular image-based visual servo control (IBVS) system is proposed. The main contribution of this work is to estimate the object distance and motion using a single feature (i.e., 2D object position in image-plane) extracted from the camera image. The latter is a significant advantage since extracting and matching a set of visual features is computationally expensive and sometimes impossible especially in unstructured and dynamic environments. The estimation of object distance and motion plays a crucial role in IBVS systems. More precisely, first the object distance is necessary for calculation of the interaction matrix (L ), and second the object motion is required to cancel the tracking steady-state ∗ ) in the control error due to the unwanted extra term ( command. Simulation results show the superiority of the proposed object tracking method in tracking and estimating object characteristics (i.e. distance and motion) in comparison with a traditional object tracking algorithm with no object distance and motion estimation. REFERENCES [1]

K. Hashimoto, ''A review on vision-based control of robot manipulators'', Advanced Robotics, 2003 Vol. 17, No. 10, p. 969–99.

[7]

[8]

[9] [10] [11]

[12] [13]

[14]

F. Chaumette and S. Hutchinson, "Visual servo control part I: Basic approaches", IEEE Robotics & Automation Magazine, 2006 IEE, 1070-9932/06. F. Chaumette and S. Hutchinson, "Visual servo control part II: Advanced approaches", IEEE Robotics & Automation Magazine, 2007, No.1, p. 109-118. C.-J. Fang, S.-K Lin, "A performance criterion for the depth estimation with application to robot visual servo control", Journal of Robotic Systems, 2001, Vol. 18, Issue 10, p. 609 - 622. V. Lippiello, B. Siciliano, L. Villani, "Adaptive extended Kalman filtering for visual motion estimation of 3D objects", Control Engineering Practice, 2007, No.15, p.123–134 L. Ma, C. Cao, N. Hovakimyan, W. E. Dixon, C. Woolsey, "Range identification in the presence of unknown motion parameters for perspective vision systems", Proceedings of the 2007 American Control Conference, New York City, USA, July 11-13, 2007. D. Karagiannis and A. Astolfi, "A new solution to the problem of range identification in perspective vision systems", IEEE Transactions on Automatic Control, 2005, Vol. 50, No. 12. A. De Luca, G. Oriolo, P. R. Giordano, "On-line estimation of feature depth for image-based visual servoing schemes", 2007, IEEE International Conference on Robotics and Automation, Roma, Italy, 10-14 April. D. Fioravanti, B. Allotta, A. Rindi, "Image-based visual servoing for robot positioning tasks", Meccanica 2008, No.43, p291–305. D. Forsyth and J. Ponce, "Computer Vision: A Modern Approach",Upper Saddle River, NJ: Prentice Hall, 2003. Y. Ma, S. Soatto, J. Kosecka, S. Sastry, "An Invitation to 3-D Vision: From Images to Geometric Models", New York: SpringerVerlag, 2003. G. Welch and G. Bishop, "An introduction to the Kalman filter", UNC-Chapel Hill, TR 95-041, July 24, 2006. R. Kelly, R. Carelli, O. Nasisi, B. Kuchen, F. Reyes, "Stable Visual Servoing of Camera-in-Hand Robotic Systems", IEEE/ASME Transactions on Mechatronics, 2000, Vol. 5, No. 1, p.39-48. L. Sciavicco and B. Siciliano, "Modelling and control of robot manipulators", London: Springer-Verlag, 2000.