A Bayesian Approach to Object Contour Tracking Using Level Sets

0 downloads 0 Views 583KB Size Report
Transformation-based trackers: Detect the object only ... in [5], after initialization, a gradient-based approach similar ..... quence captured with a camcorder.
A Bayesian Approach to Object Contour Tracking Using Level Sets Alper Yilmaz Computer Science Dept. University of Central Florida

Xin Li Mathematics Dept. University of Central Florida

Abstract

Mubarak Shah Computer Science Dept University of Central Florida

most obvious limitation of these approaches is the fact that the camera has to be stationary. Another approach to detect the objects is to segment the frames. Although there are various ways to perform segmentation, due to the relevance to our approach, we will discuss only active contours (snakes). Snakes define the contour of the object by a number of points [3]. Generally, they segment the object by minimizing a gradient-based boundary condition. Compared to the background difference schemes, active contours produce tighter object boundaries and are more suitable for performing higher level tasks such as action recognition, object retrieval and surveillance. Main problems of these methods include contour initialization and their reliance on only image gradients. Recently, detection schemes that fuse snakes with background subtraction and change detection were proposed by Besson et al. [4], and Paragios and Deriche [5], respectively. These methods overcome the initialization problem by evolving the contour to the moving regions. Specifically, in [5], after initialization, a gradient-based approach similar to [6] is applied to obtain the true object region. In contrast, in [4] the contour is evolved based on the flow of the centroid of the same object in two consecutive frames. Although this approach overcomes the problems of the image gradient, its performance for non-rigid objects degrades due to the centroid flow. Note that both of these methods are only suitable for tracking a single object. In the detection-based trackers, tracking is accomplished by finding the corresponding objects in the consecutive frames. The simplest case is point correspondence, where each object is represented by a point, and the cost function based on the spatial positions of objects is minimized to determine correspondence. Point correspondence can be generalized to region correspondence, where in addition to the centroid of a region, other properties such as shape, color and texture can be employed in the cost function. On the other hand, transformation-based trackers require object detection only in the first frame. Once the object is detected, the task is to find the motion of the object based on the object representation. The object representations used by this class of trackers are circles, rectangles or elliptical shaped object patches or the full object con-

High level vision tasks (recognition, understanding, etc.) for video processing require tracking of the complete contour of the objects. In general, objects undergo non-rigid deformations, which limit the applicability of using motion models (e.g. affine, projective) that impose rigidity constraints on the objects. In this paper, we propose a contour tracking algorithm for video captured from moving cameras of different modalities. The proposed tracking algorithm uses a Bayesian approach using the probability density functions (PDFs) of texture and color features. These feature PDFs are fused in an independent opinion polling strategy, where the contribution of each feature is defined by its discrimination power. We formulate the evolution of the object contour as a variational calculus problem and solve the system using level sets. The associated energy functional combines region-based and boundary-based object segmentation approaches into one framework for object tracking in video, where the search space for solution is reduced. In this regard, it can be viewed as generalization of formerly proposed methods where the shortcomings of other methods (color, shape, gradient constraints, etc.) are overcome. The robustness of the proposed algorithm is demonstrated on real sequences. Several video sequences are given in the supplementary media file.

1 Introduction In the recent years, a great deal of effort has been expended on object tracking. There are two broad classes of methods for tracking multiple objects: - Detection-based trackers: Detect the objects in every frame, then find the correspondence. - Transformation-based trackers: Detect the object only in the first frame then track it in consecutive frames. For detecting objects, Wren et al. [1] used a single Gaussian model for each pixel in the reference (background) image to find moving pixels. Stauffer and Grimson [2] generalized this scheme by modeling the pixel color in the reference image using a Gaussian mixture model. However, the 1

tours. According to the object representation chosen, different transformation models are used. Commonly used transformations are translation, translation+scaling, affine, projective, etc.

In a recent paper, Mansouri proposed a Bayesian approach to contour tracking [11]. He modeled the possible transformations between the two regions in two frames by pixel based translations. Every pixel in the image is allowed to move in a circular window in which the probability of object or background membership is computed. Mansouri implemented the energy functional using level sets. The performance of this algorithm degrades in the presence of illumination changes and self shadowing. Manually selected local window size drastically affects the tracking performance. For small neighborhood, only small motion is required, whereas for large neighborhood, similar object and background intensities result in undesired tracking results.

Patch representations impose rigidity constraints on the object. For instance, Comaniciu and Meer [7] use a circular region around the object and compute the translation of the object by maximizing the likelihood of the model and the object density estimates. Similarly, Jepson et al. [8] used an ellipsoidal patch for object representation and computed the affine motion parameters using a probabilistic model that captures stable features, outliers and object structure. Although the performance of patch based trackers is acceptable, they cannot provide much information except for the centroid or the orientation of the region.

In the context of segmentation, Paragios and Deriche [12] proposed a Bayesian approach based on region competition. Their approach uses a mixture model for the magnitude of the Gabor responses. The algorithm computes a combination of boundary and region-based forces to evolve the contour. The boundary term used by the authors is essentially just the region term defined in four directional support regions around the boundary pixel. One problem with this approach is that the authors use the magnitude of complex valued Gabor filter outputs that do not constitute an orthonormal basis, which is necessary for independence constraint imposed on the features. Also as discussed in [8], the magnitude does not provide unique set of features.

Contour representation, on the other hand, relaxes the rigidity constraint and allows the objects to undergo nonrigid deformations. In contour tracking, the contour is initialized with the previous position and is evolved into the correct position by minimizing the associated energy functional. There are several issues in the active contour framework: selection of the energy functional, represention of the object boundary and the boundary evolution method. Traditional energy functionals are based on the image gradients, and the curve is represented and evolved in a Lagrangian methodology. Although we postpone the detailed discussion to Section 4, it is worth mentioning that the Lagrangian approach does not allow topology changes (splitting or merging), thus is not suitable for tracking nonrigid objects [9].

In this paper, we focus on tracking regions using a variational framework, which can be viewed as a generalization of [10], [11], and [12]. The proposed framework contributes to the field in several aspects. First, the objective functional is motivated by the Bayesian framework and is derived without imposing any constraints on the shape, the color or the gradient of the objects. Second, an independent opinion polling strategy is used to fuse different sources of information computed from the regions (the object and the background). Finally, the method uses a free band size parameter, which builds a bridge between the boundary-based and region-based variational tracking schemes, and is a generalized version of the previously proposed methods. Our approach differs from the generic active contour models of [3], [4], [5], [6] and [13] in terms of object boundary definition. We define the boundary in a multi-model Bayesian framework instead of the image gradients. In contrast to [10] and [11], the curve is evolved to the object boundary by a region-motivated boundary force which uses higher order statistics. The proposed method fuses texture and color information intelligently (in contrast to active contour methods including [5], [11], [12]). Due to modeling of the region instead of modeling of individual pixels, and due to the use of texture cues, it is less prone to lighting changes (in contrast to [11]). Also, the proposed method uses more reliable modeling of both color and texture information (compared to [12]).

To overcome the limitations of the Lagrangian approach, Caselles et al. [6] used an Eulerian approach called level set, which allows topology changes. Detailed discussion about level sets will be given in Section 4. In their approach, the object contour converges to the object boundary, which is defined by the gradient magnitude. Bertalmio et al. [9] proposed a tracking scheme using the temporal gradient of the consecutive images using level set partial differential equations. However, their approach required small motion, and no changes in the local intensities. They also mentioned that the performance of the method degrades for objects with non uniform intensities on a complex background. In order to overcome the shortcomings of the gradient information, Zhu and Yuille proposed a region-based segmentation method using an active contour framework [10]. They selected a number of seed regions and performed a region growing step defined by intensity homogeneity criterion. In contrast to [4], [5], [6], and [9], the topology change of the segmented objects is obtained offline by merging regions. Although the paper represents a pioneering work in this area, the work is not suitable for tracking objects due to the region growing step and the offline region merging step. 2

The paper is organized as follows: Section 2 outlines how the texture and color features are modeled. In Section 3, a Bayesian approach to object tracking is outlined and the associated energy functional related to the current boundary hypothesis is derived. Section 4 deals with the curve evolution strategies and proposed speed function for evolving the boundary. Finally, the experimental results and conclusions are sketched in Sections 5 and 6 respectively.

2

steerable pyramid representation. Gabor wavelets are Gaussian modulated sine and cosine gratings: Gi (x, y) = e−π[x

2

/α2 +y 2 /β 2 ]

· e−2πi[u0 x+v0 y] ,

(1)

where α and β specify the width and the height, while u0 and v0 specify modulation of the filter. Similar to the idea of modeling the color information, PDFs for orthonormal texture features can be generated by kernel density estimates or parametric models. For the filter-based texture representation, a non-parametric density estimate is not efficient for two main reasons: the size of the non-parametric model is not known, and post-normalization of the filter responses degrades the performance. Instead, we use Gaussian mixture model similar to [8] and [12]. In a mixture model, the probability of observing a value x is computed using model parameters (µk and σk for Gaussian) and a priori probabilities Pk by:

Feature Selection

The performance of tracking and segmentation approaches depend on the features used. During the last two decades, two classes of features have been widely considered in the field of computer vision for tracking and segmentation purposes. • Color features are obtained from raw color values in an image. In general, performance degrades due to the changes in illumination. However, practically, the color based approaches have reasonable performance.

p(x|Θ) =

CN X

Pk pk (x|µk , σk ),

(2)

k=1

where Θ = {Pk , µk , σk : k = 1 . . . CN }. Fixing the number of components CN = 3, unknowns Pk , µ and σ in Eq. 2 are computed using the Expectation Maximization (EM) algorithm. In Fig. 1, a sample texture and the contour plot of its Gaussian mixture model are shown. We believe an ideal tracking system should use both color and texture features. In Fig.s 2a and b, we present selected frames from two synthetic sequences, which show the importance of color and texture features respectively. In (a), a synthetic object which has the same texture pattern but different color from the background is tracked using color (row i) and texture (row ii). In contrast to the color-based tracker, the texture-based tracker fails, due to texture similarity around the boundary. In part (b), the object and background have different textures but similar colors. For this case, the texture-based approach (row i) correctly tracks the object, whereas the color-based tracker (row ii) fails.

• Texture features code the details in an image, which have repetitive structure. To compute texture features, an appropriate representation needs to be used. The most popular texture representation is the filter banks. The color histogram, which is the most widely used color representation, performs better than the parametric models in the context of skin detection [14]. Histograms can be generalized to multivariate kernel density estimates with kernel K(x) and the bandwith h for d dimensional data: n 1 X f˜(x) = K(x − xi ), nhd i=1

where the uniform kernel reduces to a histogram. Among various kernels, the Epanechnikov kernel: ½ 1 −1 2 2 cd (d + 2)(1− k x k ) if k x k< 1 , K(x) = 0 otherwise

2.1 Fusing Color and Texture Features Fusing different sources of information can be achieved in various ways: we can use cascaded integration (one cue at a time), supervised late integration (linear opinion polling: weighted linear combination) or alternatively, unsupervised early integration, which is evaluated prior to object membership (independent opinion polling) [16]. Among these, the independent opinion polling strategy is the only scheme that is statistically stable [17]. In independent opinion polling, the likelihood of the observations for the object or the background model is given by Y p(x|Mα ) = pβ (x|Mα,β ),

where cd is the volume of unit d-dimensional sphere, yields minimum average global error between the estimate and the true density ([7]). In contrast to color, texture features require a preprocessing step to generate texture representations. In this framework, we select a multi-scale and multi-oriented linear basis (specifically steerable pyramids [15]) for representing textures. Filter selection for subband analysis in steerable pyramids plays a critical role in statistical modeling. In order to have a disjoint feature space for creating independent PDFs based on texture analysis, we adopt the Gabor wavelets, which create an orthonormal subband basis in the

β

3

50

50

50

50

45

45

45

45

40

40

40

40

35

35

35

35

30

30

30

30

25

25

25

25

20

20

20

20

15

15

15

15

10

10

10

5

5

5

20

40

60

80

100

120

140

20

40

60

80

100

120

140

10

5

20

40

60

80

100

120

140

20

40

60

80

100

120

140

Figure 1: Sample texture (left) and contour plots of 2-dimensional (magnitude and phase) PDFs of Gabor wavelets in 4 directions (0◦ , 45◦ , 90◦ , 135◦ ). As seen, for each direction, only two Gaussians are fitted.

i)

ii) (a)

(b)

Figure 2: Necessity of using color cue (a) and texture cue (b). (a) The results of i) color based (correct tracking), ii) texture based (incorrect tracking due to texture similarity), tracker for a green object that has similar texture pattern on a brown background. (b) The results of i) texture based (correct tracking), ii) color based (incorrect tracking due to color similarity) tracker for the object that has similar color space. We recommend a color printer for viewing this figure. where x is the spatial variable, α ∈ {object, background} and β ∈ {color, {steerable subbands}}. Using Bayes’ rule, a posteriori estimate of membership can be computed: Q β pβ (x|Mα,β )p(α) p(α|x) = P Q , (3) γ β pβ (x|Mγ,β )p(γ)

are respectively borders of the foreground and background regions in the counterclockwise direction. Considering contour tracking or segmentation methods, the likelihood of observing the front between two regions is equal to the likelihood of partitioning the regions: P (∂Γ) = P (ϕ(R) = {Robj , Rbck }) ,

where γ ∈ {object, background} and α and β are as defined above. It can be easily observed from Eq. 3 that features which do not have discrimination power will have equal membership probabilities; whereas the discriminant features will have higher probabilities.

3

(4)

where ϕ is the partitioning operator [10]. Posteriori probability for the boundary (given on the left side of Eq. 4) can be used interchangeably with a posteriori probability of partitioning the the space. Thus, for frame n, we formulate the object tracking problem in terms of the boundary probability, P∂Γ , defined using the probabilities of the regions given the current image,¡I n , and the previous ¢ object boundary, ∂Γn−1 as P∂Γ = P ϕ(Rn )|I n , ∂Γn−1 . Using Bayes’ rule and discarding the constant terms, the probability of the object boundary is approximated as: ¢ ¢ ¡ ¡ n n , ∂Γn−1 P∂Γ ≈ P I n |Robj , ∂Γn−1 P I n |Rbck ¢ ¡ (5) P ϕ(Rn )|∂Γn−1 .

Bayesian Approach to Tracking

Tracking an object in an image sequence I i : 0 < i < ∞ can be treated as discriminant analysis of pixels on two nonoverlapping classes, where the classes correspond to the object region (foreground), Robj , and the background region, Rbck . In the spatial domain R = Robj ∪ Rbck . The boundary (front), ∂Γ, between these two regions is defined by the intersection of the directional curves which define each region, such that ∂Γ = ∂Γobj ∩ ∂Γbck , where ∂Γobj and ∂Γbck

The last term in Eq. 5 is the probability measure of partitioning in the current image, based 4

on only the previous information and can be dropped. ³ Thus, the´ front probability becomes ¡ ¢ n n n P∂Γ = P I |Robj , ∂Γn−1 P I n |Rbck , ∂Γn−1 . ∂Γ ∂Γ Let there be two subregions Robj and Rbck defined in ∂Γ n the neighborhood of the curve, such that Robj ⊂ Robj and ∂Γ n Rbck ⊂ Rbck . Following the independent opinion polling strategy and the noise process present in the observations, the front probability, P∂Γ , can also be defined in terms of ∂Γ ∂Γ probabilities of the these subregions Robj and Rbck using

P∂Γ

¡

¢ ¡ ¢ ∂Γ ∂Γ 6 P I n |Robj , ∂Γn−1 P I n |Rbck , ∂Γn−1 .

Figure 3: Subregions around front ∂Γ defined by rectangles. - Since the region boundary is a closed curve, pixels that do not belong to the object, but lie inside the boundary are not considered in the estimation;

(6)

Throughout the paper, we assume that each pixel is independent, which results in the region probabilities as product of probabilities of individual pixels: Y P (I n |Rα ) = P (I n (x)),

- The effect of the random noise process in the front estimation step is minimized; and - It intelligently fuses the boundary-based and regionbased methodologies into one framework.

x∈Rα

where Rα = {x|x ∈ Rα , α ∈ {obj,bck}}. The probability conditioned on ∂Γn−1 in Eq. 5 can be written in terms of the probability of the subregion, RA , defined by the curve: Y Y P∂Γ (I n |RA ) = PRA (I n (x2 )), (7)

3.1 Energy Functional A convenient way of converting a MAP estimation to a minimization problem is by computing the negative loglikelihood of probabilities. Thus, the tracking scheme proposed in Eq. 9 can be formulated as a minimization problem with the energy functional: µZ Z I E(∂Γ) = Ψobj (x2 )dx2 +

x1 ∈∂Γ x2 ∈RA (x1 )

where RA (x1 ) denotes the square region around boundary point x1 . Thus posteriori front probability becomes: ¶ Y YµY n n P∂Γ = PRbck (I (x3 )) , (8) PRobj (I (x2 )) x1

x2

{z

|

P (I n |R

obj ,∂Γ)

x3

}|

{z P (I n |R

x1 ∈∂Γ

x1

x2

∂Γ (x ) x2 ∈Robj 1

{z

bck ,∂Γ)

x3

|

}

EA

Z Z

}

where x1 ∈ ∂Γ, x2 ∈ Robj (x1 ) and x3 ∈ Rbck (x1 ). cn , of the The maximum a posteriori estimate (MAP), ∂ Γ front ∂Γn is found by maximizing the probability P∂Γ over the subsets ∂Γ ⊂ Ω, where Ω is the space of all possible object boundaries. Based on Eq. 8, the MAP estimate for ∂Γ can be written as the MAP estimate of the front based on subregion. Thus, we have the following tracking scheme, ¶ YµY Y n n n c ∂ Γ = arg max PRobj PRbck ∂Γ (I (x2 )) ∂Γ (I (x3 )) ∂Γ⊂Ω

|

∂Γ (x ) x2 ∈Robj 1

{z

¶ Ψbck (x2 )dx2 dx1 ,

(10)

}

EB

where Ψobj (x) = − log PRobj (I n (x)) and Ψbck (x) = − log PRbck (I n (x)). However, the integrals defined on the plane by the subregion around the front ∂Γ are not definite. For the sake of practicality and implementability, the subregion, where these integrals are defined, is denoted by [ R∂Γ (xi ) : R∂Γ (xi ) ∈ Robj ∪ Rbck , xi ∈∂Γ

(9) ∂Γ ∂Γ where x1 ∈ ∂Γ, x2 ∈ Robj (x1 ) and x2 ∈ Rbck (x1 ). ∂Γ In Eq. 9, Rα (x1 ) : x1 ∈ ∂Γ defines the band region ∂Γ with respect to the front variable x1 , such that Rα (x1 ) ⊂ ∂Γ Rα ⊂ Rα . This new region definition based on the front serves both as a boundary constraint (as in boundary based schemes [3], [6], [13]) and as a region constraint (similar to [10], [12]). The advantages of using the band around the boundary compared to using the complete region are:

and R∂Γ (xi ) are selected to be square regions [−m, m]x[−m, m] centered around xi as shown in Fig. 3, where m is the limit of the squares. The pixels inside the square region are defined for each front position (f (s), g(s)) by x = x ˜ + f (s) and y = y˜ + g(s), where f and g are the parametric curve functions with s being the ∂Γ ∂Γ parameter, x ˜, y˜ ∈ [−m, m] and x, y ∈ Robj ∪ Rbck . To compute object and background probabilities inside the rectangular region, we define an indicator function

- Boundary search space is reduced; 5

1∂Γ α {x ∈ Rα } of the form: ½ 1∂Γ α {x ∈ Rα }

=

1 0

- Similarly, in Eq. 11, a convex combination of boundary force (defined in a small boundary neighborhood) and regional force (similar to [10]) proposed in [5] is unified through regional probabilities attracted by the front which defines the regions.

x ∈ Rα otherwise

Using the definitions outlined above, the functional EA in Eq. 10 can be converted to its new form by changing the variables from (x1 , y1 ) and (x2 , y2 ) to s and (˜ x, y˜) respectively: Z mZ mµ EA (s) = ΨA (˜ x + f (s), y˜ + g(s)) −m −m ¶ 1∂Γ {˜ x + f (s), y ˜ + g(s)} J d˜ xd˜ y, obj

4 Curve Evolution An initial curve, ∂Γ0 , can be defined as an evolving front in the direction of its normal vector field with speed, F , by introducing an additional parameter t which constructs a family of curves [18]. Curve, ∂Γ, can be represented by its parametric form, implicit representation, etc. The parametric form has problems during the evolution phase due to the stability of the solution, singularity problems, and inability to handle topology changes. These limitations can be dealt with by using the implicit representations. Among various implicit representations, the level set is numerically the most stable [18]. In a level set scheme, the curve is implicitly represented on a fixed discrete grid φ : R2 → R1 , using φ (∂Γ(s, t), t) = 0, and the regions inside and outside the curve are defined by φ(x, t) < 0 and φ(x, t) > 0 respectively. Thus, evolving the curve corresponds to updating the level set function, φ, which defines new zero crossings. Deformation of φ as the front evolves is obtained by

where J is the Jacobian introduced due to the change of variables. Due to the translation of the square window around the boundary the Jacobian becomes 1 and is dropped. Once EB is written following the same idea, Eq. 10 results in the tracking functional: z I lZ

posteriori object log likelihood m

Z

m

E=− 0

I lZ

−m −m m Z m

− 0

|

−m

−m

}|

{

log PRobj (I(x))1∂Γ xd˜ y ds obj {x}d˜

log PRbck (I(x))1∂Γ xd˜ y ds, (11) bck {x}d˜ {z }

∂φ(∂Γ(s, t), t) = F (x, t)|∇φ(∂Γ(s, t), t)|, ∂t

posteriori background log likelihood T

1∂Γ bck

where l is the length of the front, x = (x, y) and = 1 − 1∂Γ obj . The functional proposed in Eq. 11 has similarities with the functionals proposed by Zhu and Yuille [10], Mansouri [11], and Paragios and Deriche [5], [12]:

where F is the speed in the normal direction, ~n, of the curve.

4.1 Minimizing the Energy Functional Evolution of the curve using the Bayesian approach given in Eq. 11 can be related to Eq. 12 by defining the speed F in terms of maximizing the front probability. This leads to moving the front in the steepest ascent direction with respect to front ∂Γ. Taking the functional derivative of the energy functional (computing the Euler-Lagrange equations and using Green’s theorem [6], [10]), the motion equation for the front is given by: Z mZ m d∂Γ log PRobj (I(˜ x))1∂Γ x}d˜ xd˜ y n~obj =− obj {˜ dt −m −m Z mZ m − log PRbck (I(˜ x))1∂Γ x}d˜ xd˜ y n~bck , (13) bck {˜

- The proposed objective function can be generalized to the two region case of [10] by increasing m such that the square window covers the object and background regions. In this case, the front dependent subregions ∂Γ Rα of Eq. 10 are changed to regional terms Rα . - Replacing the probabilities of Eq. 11 with the Gaussian of the temporal gradients results in the motion detection step of [5] (similar to tracking framework of [4]). For the tracking step, the same probabilities can be replaced by the Gaussian of the image gradient (similar to segmentation framework of [6]).

−m

- Limiting the discriminant analysis to the pixels inside the region and setting the probabilities of Eq. 11 to Pα (x) = max e z:kzk6m

(I n−1 (x)−I n (x+z)) 2σ 2

−m

where n~obj is the normal of object region and n~bck is the normal of the background region. Note that the counterclockwise curves of object and background regions provide normals in the opposite directions, thus n~obj = −n~bck . An interesting observation about the composition of the speed function proposed in Eq. 13 is that it is a differential equation which has integral terms! For instance, if we follow a

2



(12)

,

and dropping the plane integrals (due to max operation) results in the tracking scheme proposed in [11]. 6

different approach for defining neighborhood, the resulting energy functional may lead to unstable schemes. If we take a closer look at Eq. 13, it indeed includes the speed of the front point, x, in the normal direction, ~nobj . Thus the deformation of φ, given in Eq. 12, along with n−1 discretization, is φnx,y = φn−1 x,y − ∆tFx,y |∇φx,y |, where

Fx,y = −

m m X X

In another experiment shown in Fig. 4b, the approach is tested on a low quality sequence obtained from a stationary surveillance camera. The sequence is composed of 300 frames. The tracked person in the sequence undergoes high nonrigid motion. Due to the similarities of the background and the object models, both features (color and texture) contribute to the tracking function given in Eq. 14. The object contour is correctly tracked throughout the sequence. Shown in Fig. 4c, we also extended the method to track multiple objects. To the best of our knowledge, we propose the first approach to track multiple objects in the context of active contours. However, due to space limitations the discussion is excluded. Currently, we do not handle occlusions, but for the given sequence objects are correctly tracked after occlusion. The algorithm was also tested on IR sequences. In addition to the problems due to the intensity and texture similarity of the targets, the targets sometimes are not visible or distinguishable due to the background clutter. In Fig. 5, we show two different closing sequences, where the target size increases. Since the imagery is of low quality and the atmospheric effects cause changes in the features, in contrast to the EO imagery, we used dynamic models that reflect the change in the scene. The dynamic models are obtained by updating the models for color and texture features at every frame using the a priori models.

0 log PRobj (Ix0 )1∂Γ obj {(x )}

i=−m j=−m

+

m m X X

0 log PRbck (Ix0 )1∂Γ bck {(x )},

(14)

i=−m j=−m

where x0 = (x + i, y + j). Note that negative and positive terms correspond to the shrinking and expanding forces and are due to the opposite normal directions discussed in Eq. 13. We can interpret this speed function as follows: - When the boundary hypothesis for pixel x∂Γ is correct, such that square region defined by x∂Γ is correctly partitioned, the motion of the front pixels will be ≈ 0. - If the front is not correctly positioned, the background (object) probability of the front pixel will be higher than the object (background) probability and the speed in the normal direction will be negative (positive).

5

6

Experiments

Conclusions

We proposed a contour tracking method for non-rigid objects motivated by a Bayesian framework. The color and texture features of objects are modeled using kernel density estimates and the mixture models respectively. Color and texture features are then combined in an independent opinion polling strategy, where the features contribute to the energy functional based on their discrimination power. The energy functional is minimized by the level set method, which represents the contour implicitly. The level set implementation of the contour evolution supports topology changes for objects, which can split or merge. The results presented show robustness of tracking algorithm for EO and IR imagery. More results are presented in the accompanied media file submitted with this paper.

To demonstrate the performance of the proposed tracking approach, we have tested our algorithm on various sequences captured with infrared and electro-optical cameras. We have implemented Eq. 14 using the narrow band level set implementation outlined in [18]. The algorithm is initialized with boundaries of objects in the first frame. Selection of m (limits of square region in Eq. 14) is not sequence dependent and is fixed to 6 for all sequences. In contrast to [11], maximum range of motion is not constrained. The algorithm runs at different frame rates based on the motion and the size of the objects. For the video sequences we refer the reader to the media file submitted with this paper. In Fig. 4a, the results are shown for tennis player sequence captured with a camcorder. The sequence is composed of 940 frames, and the player has high non-rigid motion. The texture of the player and the background are very similar in this particular case and the resulting inside and outside probabilities in Eq. 14 are both very close to .5 so only the color cue is effectively used. The performance of the algorithm throughout the sequence is very good and the player is correctly tracked, in contrast to the object and the background segmentation method proposed in [17], where the interframe object segmentation had problems.

References [1] C.R. Wren, A. Azarbayejani, T. Darrell, and A. Pentland, “Pfinder: Real-time tracking of the human body,” PAMI, 19/7, pp. 780–785, 1997. [2] C. Stauffer and W.E.L. Grimson, “Learning patterns of activity using real time tracking,” PAMI, 22/8.

7

(a)

(b)

(c) Figure 4: Contour tracking results: (a) a tennis player in sequence captured by moving camera; (b) a person from a stationary surveillance camera; (c) occluding multiple objects. We suggest the reviewers to view the results using a colored printout.

(a)

(b)

Figure 5: Contours tracking of targets in closing infrared sequences taken from an airborne vehicle. (a) Sequence RNG14 15 every 30th frame; (b) sequence RNG16 18 every 30th frame. Video sequences are available in accompanied media file. [3] M. Kass, A. Witkin, and D. Terzopoulos, “Snakes: active contour models,” IJCV, vol. 1, pp. 321–332, 1988.

[10] S.C. Zhu and A. Yuille, “Region competition: unifying snakes, region growing, and bayes/mdl for multiband image segmentation,” PAMI, 18/9, pp. 884–900, 1996. [11] A.R. Mansouri, “Region tracking via level set pdes without motion computation,” PAMI, 24/7, pp. 947–961, 2002. [12] N. Paragios and R. Deriche, “Geodesic active regions and level set methods for supervised texture segmentation,” IJCV, 46/3, pp. 223–247, 2002. [13] F. Leymarie, “Tracking deformable objects in the plane using an active contour model,” PAMI, 15/6, pp. 617–634, 1993. [14] M.J. Jones and J.M. Rehg, “Statistical color models with application to skin detection,” IJCV, 46/1, pp. 81–96, 2002. [15] W.T. Freeman and E.H. Adelson, “The design and use of steerable filters,” PAMI, 13/9, pp. 891–906, 1991. [16] P. Torr, R. Szeliski, and P. Anandan, “An integrated bayesian approach to layer extraction from image sequences,” PAMI, 23/3, pp. 297–303, 2001.

[4] S. Besson, M. Barlaud, and G. Aubert, “Detection and tracking of moving objects using a level set based method,” in ICPR, 2000, v. 3, pp. 1100–1105. [5] N. Paragios and R. Deriche, “Geodesic active contours and level sets for the detection and tracking of moving objects,” PAMI, 22/3, pp. 266–280, 2000. [6] V. Caselles, R. Kimmel, and G. Sapiro, “Geodesic active contours,” in ICCV, 1995, pp. 694–699. [7] D. Comaniciu, V. Ramesh, and P. Meer, “Real-time tracking of non-rigid objects using mean shift,” in CVPR, 2000, v. 2. [8] A.D. Jepson, D.J. Fleet, and T.F. El-Maraghi, “Robust online appereance models for visual tracking,” in CVPR, 2001. [9] M. Bertalmio, G. Sapiro, and G. Randall, “Morphing active contours,” PAMI, 22/7, pp. 733–737, 2000.

8

[17] E. Hayman and J. Eklundh, “Probabilistic and voting approaches to cue integration for figure ground segmentation,” in ECCV, 2002, pp. 469–486. [18] J.A. Sethian, Level set methods: evolving interfaces in geometry,fluid mechanics computer vision and material sciences, Cambridge University Press, 1999.

9