Model-based Vehicle Tracking using Bayesian Inference Model under ...

Model-based Vehicle Tracking using Bayesian Inference Model under Uncalibrated Camera Views Shuang Wang, Bei Li, Lijuan Cui, Robert C. Huck and Samuel Cheng School of Electrical and Computer Engineering University of Oklahoma-Tulsa Tulsa, OK 74135 Tel: 918-660-3234 {shuangwang, bei.li, lj.cui, rchuck, samuel.cheng}@ou.edu

1. Introduction Vehicle tracking is an essential Computer Vision (CV) problem, which can be widely used in many different areas, such as transportation control, security guard, accident detection on highways and so on. Model-based approach has been widely used in object (car) tracking. Some approaches [1]-[3] model the target by a wire-frame model with multiple points. Then the algorithms try to minimize the squared sum of distances from these points to the matching target. For each model feature, a candidate scene feature is located by searching in certain direction, usually a direction that is perpendicular to the projection of the model feature. However, these methods have limitations that they work only when the model features are close to the true matching scene features. To fix this, voting-based schemes for estimating the pose parameters have been proposed [4][5][6]. Though the voting based algorithm is more accurate and robust, it’s time-consuming thus not suitable for the real-time applications. In this paper, we propose a model based vehicle tracking framework under uncalibrated camera views. Two key components, vehicle tracking and vehicle model registration, are included in our proposed framework. In the vehicle tracking part, we implement the FG/BG detecting, Blob detecting/tracking, and motion direction and speed detecting. Then we apply the vehicle model registration scheme based on the detected Blob. Here, a Bayesian inference model is used to fit the vehicle model with the moving vehicle in the each video frame. Compared with existing methods, our method is more robust, since it does not require a calibrated camera. Bayesian inference model can offer the information for correcting the projection of the vehicle model. Moreover, we propose a hidden line removal algorithm, which is more suitable for operating on the vehicle model and reduced the complexity of the traditional hidden line removal algorithms. This paper is organized as follows, In Section II, we provide the design of our proposed vehicle tracking

schemes. In Section III, the vehicle model registration based on Bayesian inference model is presented. Finally, we present experiments and results of our proposed scheme and draw the concluding remarks in Section IV and Section V respectively.

2. Vehicle tracking In this section, we will introduce our proposed vehicle tracking scheme. The four main components in our vehicle tracking scheme is shown in Figure 1, which contains Foreground/ Background (FG/BG) detecting, Blob detecting, Blob tracking, and moving direction and speed detecting. We will discuss the details of each component in the following subsections.

Figure 1. The workflow of proposed vehicle tracking scheme

2.1. Foreground / background detecting FG/BG detection is the first step and the most important step in our vehicle tracking scheme, since the accuracy of FG/BG estimation has a direct influence on the following steps. Many FG/BG detection methods have been proposed in [1]-[9]. In [1], Ridder et al. use Kalman Filter to model each pixel, which made their algorithm robust to light variance. However, Ridder’s method does not offer an adaptive thresholding. Pfinder [7] implemented a multiclass statistical model for the tracked objects, while the background is modeled as a single Gaussian per pixel. And this method is not working well for outdoor scenes. Recently, a pixel-wise EM framework for vehicles

detection is proposed by Rriedman and Russell [8]. However, the pixels of this system were constraint in three separated, predetermined distributions. In this paper, we use the approach proposed by Stauffer et al. in [9], which provides an adaptive background mixture models for real-time tracking by modeling the values of a particular pixel as a mixture Gaussians. This method is robust for lighting changes, tracking through cluttered regions, and slow-moving objects and so on. In Figure 2 (a), we present a video frame which contains a moving vehicle. Figure 2 (b) shows the background subtraction results by using the algorithm in [9].

(a)

2.2. Blob detecting The blob detection is mainly used to detect the new entering blob on the scene for each frame by using the result of FG/BG estimation module, which is implemented based on [17]. Firstly, the connected components are calculated for each FG mask. Each foreground component is described by a blob, which indicates those pixels in the blob belongs to the foreground. Then for each successive frame, each blob is tracked, if it can be detected successfully, it will be added as a new blob into the list.

2.3. Blob tracking

(b) Figure 2. The example of vehicle tracking framework.

3. Vehicle model registration

Since the blobs have been found in the previous detecting step, we introduce the blob tracking step. The blob tracking component does tracking of the blob position. This component consists of two subcomponents: One subcomponent utilizes mean-shift algorithm [16] which creates a confidence map for the new image based on the color histogram of the object in the previous image, and use mean shift to find the peak of a confidence map close to the old position of the object. Another subcomponent uses Kalman filter to predict the position of the blob in the next frame.

In this section, we will introduce our proposed Bayesian Inference model for the vehicle model registration, since the vehicle location has been found by using the proposed schemes in above section. The section contains two subsections. In the first section, we will introduce our Bayesian inference model. Then, according to the characteristics of our vehicle model, we proposed a low complexity hidden line removal algorithm in the other subsection.

2.4. Moving direction and speed detecting

3.1. Bayesian Inference Model

The moving direction and speed detecting is accomplished by using optical flow estimation [11], which tries to calculated the motion between two video frames at times t and t + τ . In our scheme, we use the blobs with the same index in different video frames to calculate the optical flow. In Figure 2 (a), the red arrow indicates the moving direction and the length of the arrow indicates the speed of the vehicle, where a longer arrow means a faster moving.

3.1.1. Vehicle model In this paper, we use a parameterized 3-D model to represent different types of vehicles. In our 3-D model, there are 12 length parameters. In Figure 3, we present four different types of vehicles, which are generated by the proposed 3-D model. In our implementation, we use the matching algorithm proposed by Roller in [18] to decide the selection of different vehicle models.

sedan

hatchback

1

1

0.5

1 0.5

0 0

1

2

0.5

1 0.5

0 0

0

1

minivan

2

0

pickup

1

1

0.5

1 0.5

0 0

1

2

0

0.5

1 0.5

0 0

1

2

3.1.3. Bayesian Inference Model

0

Figure 3. The example of vehicle tracking framework. hatchback 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 1 0.5 0

0

0.5

1

After the rotation and projection, we will scale the projected 2D vehicle model to fit the blob size. Then we apply the Bayesian inference model to refine the projected 2D model. After the projection of the 3-D model, some lines on the 2-D plane are invisible, since they are shaded by the front surfaces of 3-D model. Thus we need to remove these invisible lines. This procedure is also essential for in the Bayesian inference model, since the existed hidden lines would affect the vehicle model registration.

1.5

2

2.5

Figure 4. The example of vehicle tracking framework. 3.1.2. 3-D model projection In this paper, we suppose the camera is uncalibrated. The only parameter we need to assume is the distance between vehicle and camera center. Our experiments show that the assumption of the distance does not affect the accuracy of the projection as long as the distance between camera center and the vehicle is larger than 20 meters. Then we can calculate the X , Y , Z coordinates of the vehicle in the real space according to the position of blob in the video frame by using the standard camera projection model. Since we have got the location of the vehicle in the real space, we can calculate the rotation angles by using the direction information obtained from optical flow. We can get 3 unknown variables of the rotation angles by solving the equations, which is formed by using the rotation matrices in 3 different dimensions. We assume that all the car models are located in the origin of the coordinate in their own coordinate system before the rotation. This natural characteristic makes it possible to cancel the number of unknown variables from 3 to 1 in these equations, which make the equations solvable.

In Figure 4, balls denote variable nodes, while cubes and square box denote factor nodes. Here each variable node represents the each line of vehicle model. Through the segmentation straight-line algorithm, it generates N number of candidate lines, which are treated as N number of labels in variable node. Each cube factor node connects all its adjacent lines, which constrain that the end-points of each adjacent variable node line are not far away from each others. The square box factor nodes offer the initial estimation of the probabilities between the variable nodes line and all the candidate straight-lines in the video frame. Here we consider three different factors, the angle difference, the length difference and the distance difference between a given variable nodes line and all the candidate labels (straight-line segmentations). Different factors are assigned different weights, where the summation of all the weights should be equal to 1. According to the above assumption, we can perform the belief propagation algorithm to solve this Bayesian Inference model. The details of BP algorithm are addressed in [10].

3.2. Hidden line removal Much of the initial work in the field of vector hidden line removal has been done by Appel [12] and [13]. Appel’s algorithm implements HLR by keeping a commutative count of the number of obscuring frontfacing faces on all the projected curves. Other algorithms often used are z-buffering [12] and BSP-tree [15]. Zbuffering is hidden surface removal algorithm and thus would be slower since all the interior points of faces are compared. Paint’s algorithm is very simple, but it solves the visibility problem at the cost of having painted redundant areas of distant objects. In the case of car model, each car model has on average 12 faces, which can be decomposed into around 20 triangles. With Z-buffer or Painter’s algorithm, the complexity is unnecessarily high, thus a simple algorithm leveraging features of car models is developed.

The algorithm does one preprocessing step which decomposes the object into two lists. One list contains the triangles composing the object; another contains the lines composing the object. Then the algorithm goes through each line in the list, tries to cover the line with every triangle. If a line is covered by a triangle, then the visible parts form new lines. The new lines in turn to be covered by triangles left. After a line is covered by all triangles, the left part will be visible. If each line composing the object is processed as such, then only visible lines will be left. There are two key functions:

These cases are illustrated in Figure 5.

(a)

(b)

(c)

(d)

y = f (t , x ) ,

where t denotes the triangles to cover the lines, x denotes original lines, and return value y denotes lines left visible.

y ' = g (t ' , x') ,

where t ' is the triangle to cover the line and x' denotes the line. y ' denotes the newly generated lines which are visible parts of The function

x' . f is briefly:

for each line x' in x for each triangle t' in t a) apply function g with parameters t' and x' b) append the return values of g to the end of the list x. c) remove x' from x end end

The function g is briefly: if the line is outside of the triangle return x' end if the line is closer (to the viewpoint) return x' end if the line has 0 intersections with the triangle return empty set end if the line has 1 intersections with the triangle return 1 new line end if the line has 2 intersections with the triangle return 2 new lines end

(e) Figure 5. The line (a) is outside of triangle; (b) is closer to the viewpoint than the triangle; (c) has 2 intersections with the triangle; (d) has 1 intersection; (e) has 0 intersections.

Three basic mathematical procedures required in function g: detecting if a line is closer to the viewpoint than the triangle, if a line has any intersections with the triangle and if the line is inside the triangle. For the first question, if a line is closer to the viewpoint, its end points must be closer to the viewpoint. So the question is narrowed down to how to detect if a point is closer to the viewpoint than a triangle. A way to solve this is: First to do cross product of two edges of the triangle:

AB = A × B

Then do a dot product of resulting vector P and a vector S going from a vertex to the point:

PS = P × S

Then do dot product of P and a vector O going from a vertex to the focal point:

PO = P × O

If sgn (PS ) is equal to sgn (PO ) , then the point is closer to the focal point, otherwise further. For the second question, if a line has any intersections with a triangle, it must have intersections with the edges of the triangle. So this question is actually line intersection problem.

For the third question, if two ends of a line are both inside a triangle, then the line is inside the triangle as well. So the question becomes how to detect if a point is inside the triangle. To do this, vertex points can be connected to the point. If sgn ( AB × AP ) is equal to sgn (CA × CP ) and is

see the proposed Bayesian inference model can find nearly true correspondence between the model and the candidate straight-line segments, which offered the information for correcting the projection of the vehicle model. Current Frame : 229

equal to sgn( BC × BP) where A, B and C denote corners of the triangle and P denotes the point, then the point is inside the triangle, otherwise is outside of the triangle. Applying this algorithm to the car model, the result is in Figure 6.

50 100 150 200 250 300 350 400 450 100

200

300

400

500

600

(a) Current Frame : 239

(a)

50 100 150 200 250 300 350 400

(b) Figure 6. Car models applied hidden line removal algorithm. (a) The original car model is on the left and the model after hidden line removal is on the right. (b) More car models after apply the algorithm.

450 100

200

300

400

500

600

(b) Current Frame : 249

4. Experiments and results 50

In this section, the experimental results was presented by performing our proposed framework on a video sequences. In Figure 7 (a) – (c), we presented our experimental results on the different video frames, where the red box indicated the detected moving vehicle, blue vehicle model was the projected car model, and the red arrow showed the moving direction. We can see the projected vehicle model fitted well with the moving vehicle in the different frames by using our proposed framework. In Figure 8 (a), we presented the straight-line edge segments extracted from the image and the projected vehicle model on a video frame. We can see there are total 44 straight-lines have been extracted from the blob of vehicles. In Figure 8 (b), we presented matching results between the straight-lines and the vehicle model. We can

100 150 200 250 300 350 400 450 100

200

300

400

500

600

(c)

Figure 7. The experimental results of our proposed scheme on different video frame.

can offer the information for correcting the projection of the vehicle model. Moreover, we proposed a hidden line removal algorithm, which is more suitable for operating on the vehicle model and reduced the complexity of the traditional hidden line removal algorithms.

Frame: 229; minivan with 44 input lines 5 35 37 7

9 36 38

10 40 33

15 20

6

12

25

32

2

39 29 23 1522 11

43 26 17 1918

30 35

30 13

27 42 41 5 21

25 1 44

3 4

40

16

6. Future work 34

31

14 28 10 24 20 8

45 10

20

30

40

50

60

70

80

90

(a)

In this paper, we present that the Bayesian inference model can offer the information for correcting the projection of the vehicle model. However, due to the time limitation, we did not use this information to refine our projected vehicle mode. And we will leave this part in our future work.

7. References Frame: 229

5 10 15

16 18 18

15 15 14 11 11

19 19

14 17

20 9 25 30

3

17 20

9

21

22 22

21

20

23 23

3

35

25 25

1 24 24

40 45 20

16

1 30

40

26 26 50

60

70

80

90

(b) Figure 8. The workflow of proposed vehicle tracking scheme

5. Conclusions In this paper, we presented a model based vehicle tracking framework under uncalibrated camera views. There are two key components in our proposed framework, vehicle tracking and vehicle model registration. In the vehicle tracking part, we implement the (FG/BG) detecting, Blob detecting, Blob tracking, and moving direction and speed detecting. Then we apply the vehicle model registration scheme based on the detected Blob. Here, a Bayesian inference model is used to fit the vehicle model with the moving vehicle in the each video frame. The inference model is solved using BP algorithm. Good results for our proposed framework are obtained from the experiment video. Compared with existing methods, our method is more robust, since it does not require a calibrated camera. Bayesian inference model

[1] Christof Ridder, Olaf Munkelt, and Harald Kirchner “Adaptive Background Estimation and Foreground Detection using Kalman-Filtering,” Proceedings of International Conference on recent Advances in Mechatronics, ICRAM’95, UNESCO Chair on Mechatronics, 193-199, 1995. [2] F. Martin and R. Horaud, “Multiple-camera tracking of rigid objects,” Int’l J. Robotics Research, vol. 21, no. 2, pp. 97–113, Feb. 2002. [3] T. Drummond and R. Cipollar, “Real-time visual tracking of complex structures,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 24, no. 7, pp. 932–946, July 2002. [4] T. N. Tan, G. D. Sullivan, and K. D. Baker, “Pose determination and recognition of vehicles in traffic scenes,” in Proc. the 1994 European Conf. Computer Vision, May 1994, pp. 501–506. [5] E. Marchand, P. Bouthemy, and F. Chaumette, “A 2d-3d modelbased approach to real-time visual tracking,” Image and Vision Computing, vol. 19, no. 13, pp. 941– 955, Nov. 2001. [6] F. Jurie, “Tracking objects with a recognition algorithm,” Pattern Recognition Letters, vol. 19, no. 3-4, pp. 331–340, 1998. [7] Wren, Christopher R., Ali Azarbayejani, Trevor Darrell, and Alex Pentland. “Pfinder: Real-Time Tracking of the Human Body,” In IEEE Transactions on Pattern Analy-sis andMachine Intelligence, July 1997, vol 19, no 7, pp.780-785. [8] Nir Friedman and Stuart Russell. “Image segmentation in video sequences: A probabilistic approach,” In Proc. of the Thirteenth Conference on Uncertainty in Artificial Intelli- gence(UAI), Aug. 1-3, 1997 [9] C. Stauffer, W. E. L. Grimson “Adaptive background mixture models for real-time tracking” IEEE

Computer Society Conference on Computer Vision and Pattern Recognition, 1999., Vol. 2 [10] Felzenszwalb, P.F. and D.P. Huttenlocher, Efficient belief propagation for early vision. International Journal of Computer Vision, 2006. 70(1): p. 41-54. [11] Verri, A. and T. Poggio, Motion field and optical flow: Qualitative properties. IEEE Transactions on pattern analysis and machine intelligence, 1989: p. 490498. [12] Appel, A., "The Notion of Quantitative Invisibility and the Machine Rendering of Solids," IBM research report RC 1618, 5/20/66. [13] [Appel, A., "The Notion of Quantitative Invisibility and the Machine Rendering of Solids," Proceedings ACM National Conference, Thompson Books, Washington, DC, 1967, pp. 387-393. Also in [FREE80], pp. 214-220. [14] Catmull, E. 1978. A hidden-surface algorithm with anti-aliasing. In Proceedings of the 5th Annual Conference on Computer Graphics and interactive Techniques (August 23 - 25, 1978). SIGGRAPH '78. ACM, New York, NY, 6-11. [15] Fuchs, H., Kedem, Z. M., and Naylor, B. F. 1980. On visible surface generation by a priori tree structures. In Proceedings of the 7th Annual Conference on Computer Graphics and interactive Techniques (Seattle, Washington, United States, July 14 - 18, 1980). SIGGRAPH '80. ACM, New York, NY, 124-133. [16] D. Comaniciu, and V. Ramesh, P. Meer, “real time tracking of non rigid objects using mean shift,” IEEE International Conference on Pattern Recognition, vol. 2, pp. 142-149, 13-15 June, 2000. [17] Senior, A., et al., Appearance models for occlusion handling. Image and Vision Computing, 2006. 24(11): p. 1233-1243. [18] Roller, D., K. Daniilidis, and H.H. Nagel, Modelbased object tracking in monocular image sequences of road traffic scenes. International Journal of Computer Vision, 1993. 10(3): p. 257-281.