Bayesian network for multiple hypothesis tracking

0 downloads 0 Views 75KB Size Report
For a flexible camera-to-camera tracking of multiple objects we model the object's behavior with a ... In [2] Pasula & Russell present a method for the same application, where the obser- ... The different associations make different hypotheses that explain the data. ... already believed to come from a single physical object.
Bayesian network for multiple hypothesis tracking Wojciech Zajdel

Ben Kr¨ose

University of Amsterdam, Faculty of Science, Computer Science Institute Kruislaan 403, 1098 SJ Amsterdam, The Netherlands wzajdel,krose  @science.uva.nl Abstract For a flexible camera-to-camera tracking of multiple objects we model the object’s behavior with a Bayesian network and combine it with the multiple hypothesis framework that associates observations with objects. Bayesian networks offer a possibility to factor complex, joint distributions into a product of intuitive conditional densities describing and predicting the object’s path. Yet, these models do not distinguish unambiguously between the object’s observations. The resulting uncertainty is explored by a multiple hypothesis evaluation. The paper provides experimental evidence of the performance of the proposed method.

1 Introduction An object (a person or a robot) is the subject of tracking as it moves across an area monitored with cameras, distributed in such way that their viewing fields do not overlap. In this context, tracking primarily involves object identification([1, 8]), that is deciding whether a new observation corresponds to one of the previously found objects or requires introducing a new object. The difficulties are noisy measurement process and the unpredictability of object’s movements. Huang & Russell studied the problem of object identification in [1], where vehicles were monitored on a motorway with two cameras. Their identification uses probabilistic appearance model , i.e., density distribution of object’s appearance at the second camera given what was first observed . The observations are matched in a way to maximize the product of over all pairs. The method is limited to only two cameras and objects moving in a fixed direction. In [2] Pasula & Russell present a method for the same application, where the observations come from more than two cameras. Their method describes a vehicle with a 1st order hidden Markov model (HMM) ([4]). HMM is used to find probability that a proposed sequence of observations came from one object. The search for the best partition of all observations into sequences is based on random building (sampling) of sequences according to the Markov chain Monte Carlo algorithm. In this application the sampling is effective due to the assumption that all cars move in the same direction and visit most of the camera locations. Moreover, the sampled assignments do not provide single solution. A somewhat different task is studied by Nicholson & Brady in [3], who are monitoring a robot’s position with movement detectors in a building. The number of robots is known beforehand, so for each robot a Bayesian network (BN) is constructed to model the probabilistic dependencies between the unobserved robot’s position, heading, state (moving

       

 



or stationary) and the observed sensory information that indicate that some movements occurred. After the inference step, the position node in the BN for each robot gives a posterior distribution on the regions of the building in which the robot might be. The method does not take into account the visual description of the object, and assumes that the starting positions and the number of objects is known.

2 Our Method Our identification method is again based on the ground that a moving object is best described by a hidden process, yielding an observation when the object enters the camera viewing field. Since our objects (people) have unlimited freedom to choose the speed and the path inside a building the simple linear Gaussian system (typical for other tracking methods, [8]) is not a sufficient representation. Instead we propose a custom built dynamic Bayesian network (DBN) that would capture the dependencies between the object’s appearance, position, heading and the observations. The DBN’s are generalization of the well-known probabilistic models that describe time-series processes, like the discrete data hidden Markov models or Kalman filters. The common feature is the assumption that what we observe depends probabilistically only on a limited number of past hidden variables and observations. In the 1st order Markov processes and Kalman filters only the last hidden state influences a new observation. The DBN’s can generalize it to any number of the past hidden states and observations. For tracking applications the most useful part of DBN’s is the prediction of a new observation from the observations up-to-date. In a high-noise environment or when there are similar objects the probabilistic models are not able to point uniquely the right object that might have been the source of an observation. In our method, the resulting ambiguity is resolved by the Multiple Hypothesis Tracking (MHT) technique. MHT works by trying different associations of a new observation to the existing tracks. It has been shown successful for the Motion Correspondence problem, where points (corners or other features) are tracked frame-to-frame at a single camera ([7, 6]). In our application, when a new observation is reported we try to assign it to the existing trajectories in every possible way, also creating a new trajectory (thereby introducing a new object). The different associations make different hypotheses that explain the data. For each hypothesis we compute its probability, and to limit their number we keep only a few with the highest likelihoods.

2.1 Modeling an object





    



The camera reports observation in the form , where is the vector describing the appearance, is the location and is the (real-world) time of observation. The last element informs about the side of the camera frame through which the object left the camera location. The key part of tracking is associating a new observation to a sequence of observations already believed to come from a single physical object. The decision can be based on the probability that the new observation came

    

     



f o1

t1

...

oi-1

oi

t i-1

ti

pi-1

pi

s i-1

si

...

p1

...

...

s1

Figure 1: Object as a hidden process. The gray node (variable ) describes an object’s hidden, time-invariant appearance. The arcs show dependencies between the variables. When a new observation is associated new nodes are added. from the given sequence. As the sequence gets larger this probability model becomes more difficult to evaluate. We can keep it simple by introducing a latent variable (it can be called a true object descriptor), and assume that together (for convenience) with the last observation generated . The conditioning term is limited to and . Since is hidden we need to estimate it from the observations up to date and integrate out. The model is split into two terms:

 

   







 

 

 

  





 

       where the first term is responsible for generating observation  . Writing           

(1)

(2)

is similar to the Markovian 1st order assumption in HMM, but we assume that to generate a new observation the hidden state and the last observation are needed. The second term still needs conditioning on the full in (1) collects our knowledge about the object.  sequence , but using (2) this can be factored into a product, and evaluated in an iterative way after each observation is assigned to the object:

  



     





          

  

 

               

(3)

Realizing that is a set of variables the two terms in (1) still may be difficult to write explicitly. Bayesian networks allow to factor complex, joint distributions of many variables into a product of simpler conditional models by exploiting independences among the variables. Figure 1 shows a factorization that is very convenient for tracking purposes. The latent term now describes an object’s hidden, time-invariant

appearance. The arcs connecting nodes in the graph represent probabilistic dependence between the nodes. To complete the model of figure 1 we need the following conditional distributions:

     : Observation model. It tells how the camera noise at position (location)   affects the true appearance to yield observation  at that location.           : Distribution of the possible next position   given that the object was last at position   and left the frame through side  .              . Distribution of the arrival time  at location   when the object departed from   at time  .  

For the nodes that do not have any arcs coming to (i.e., not depending on any other variables) there are prior distributions provided:  (prior distribution on hidden appearance); (the distribution on the first location that we can see an object); (distribution on the time, when on object can first appear at any camera); (distribution on the side of the frame that object typically disappear from the camera view). The conditional models of BN are much easier to find than the terms in (1). The first term of (1) now becomes:

 

   

 

    

  

  

                       

 

(4)

When there are no previous observations, we use:

  



         

 

(5)

From the graphical model we also find the aggregating component (3):

  





  

        

(6)

In the above equation the last expression is our previous posterior distribution of and  is a normalizing constant. The recursive updating starts when the is only one observation, and prior  replaces  .

 

  

2.2 Building a hypothesis To describe each object by a trajectory, i.e. a sequence of observations, one needs to deal with the uncertainty regarding the number of objects and the association of observations to the objects. Evaluating every possible arrangement is not feasible when the number of observations gets large. 2.2.1 Hypothesis probability



 

A hypothesis is a proposed partitioning of all observations up-to-date into trajectories (tracks). Let the set of all observations be . A particular defines tracks: , , so that and for . Assuming track independence, we find the probability of given our observations :



  

 



  



  

 

   



   



"  ! !     $# 



(7)

  , the track of         we need to



where the is prior probability of creating a trajectory. Recalling that the th object, is a sequence of observations of that object find the joint pdf of the sequence:



  

        

(8)

The Bayesian graph of figure 1 allows to condition the observations on the hidden variable, use independences to factor the joint distribution and integrate out the unknown term:

"                 #    is given by (4), and   by (5).    

where

 



2.2.2 Extending a hypothesis



(9)



Assume that at some point there are hypotheses  explaining current observations.  Each defines  objects. The new observation can be attached to any of the objects under any of the hypotheses, and but also create a new track in any  . The hypotheses are extended and maintained in the following fashion. For every  :



 





  

Find at max vation



tracks that have the highest probability of generating the new obser(and also try adding to an empty track).



Extend the current hypothesis to at max every of the tracks found previously.

  

From the resulting   .

 





  





new hypotheses   by adding

hypotheses keep only





to

with the highest likelihood:

3 Demonstration The performance of the proposed method is demonstrated on a computer-simulated set of observations. The simulation involves first defining the number of objects and their true appearance. In this work the object is defined by its color and size. Next for every object we set its path in terms of a sequence of camera locations to visit. The observation process at each camera is simulated with the linear Gaussian scheme that affects the true feature to generate an observation. Every camera has different parameters of the added noise. Our simulated objects move in a network of corridors with five cameras, as in figure 2. The locations of camera 1 and 2 are the entry points, what is reflected by higher prior probability of the first observation in a track to occur at these cameras.

cam 1

cam 2

cam 3

cam 4

cam 5

possible path Figure 2: Simulated environment. All trajectories were set between the five cameras. Locations of camera 1 and 2 were the starting points. After passing camera 5 or 4 an object may move back toward the entry points. To evaluate the performance of matching it is important how unique the objects look and how this look is blurred by a camera. For very similar objects, even without noise it is difficult to distinguish between them. Still, large noise can make  dissimilar objects appear the same. We describe the matching difficulty by the ratio  . The nominator  is the distance between the true features describing objects averaged over every pair of our objects. The more the objects differ the higher this term becomes. The denominator  is the average variance in the Gaussian noise over the cameras. For the evaluation we took only the hypothesis with the highest probability. The effectiveness of the method is measured by the percentage of correctly identified matches to the total number of matches. If the two observations belong to the same track in the original setting and they appear in the same track in the recovered assignment then the pair is considered correctly matched. The pairs of observations that originally belonged to different objects but occur in the same track in the recovered assignment are named false alarms. The ground-truth assignments are taken from the setup of the experiment. Figure 3 shows the performance of our matching method against the difficulty of the probMatching performance for P = 0.15

Matching performance for P = 0.07 θ

80

60

60

40

20

0

%

%

θ

80

corr. match false alarm 2 2.5 3 3.5 4 4.5 5 ratio of average object distance δ to average noise variance σ

40

20

0

corr. match false alarm 2 2.5 3 3.5 4 4.5 5 ratio of average objects distance δ to average noise variance σ

Figure 3: The results of simulated matching for two different settings of the algorithm.

 lem, (expressed by ratio  ). The left plot was obtained for setting the prior probability of creating new track and the right for . The method was run without using the knowledge about the parameters of Gaussian noise added while generating the observations. For matching the camera noise models were set randomly. The plots of figure 3 confirm the expected trend that results are better when the objects differ more and observations are less noisy. The results are comparable with the experiments from [1]. When it is easier to create more new tracks (higher ) the number of false alarms (superfluous matches) decreases. When it ranges from 40% to 5%, and for stays between 45% and 10% (depending on noise). A similar effect is observed in the Pattern Recognition literature, usually showed as ROC (Receiver-Operator Curve) plot.





  

  





  

  



4 Discussion The proposed method realizes the general scheme of identifying algorithms. It defines a probabilistic framework to find about an object’s properties given the observations associated to that object. Second, it assigns the observations to an object in a way to keep the most likely associations. In the most related approach by Pasula & Russell these two steps are realized by the HMM structure to describe an object and the random building of possible assignments. The practical advantages over their method are that we can dispatch the observations iteratively as they arrive(without resampling); allow an object to move in any direction and visit a camera location unlimited number of times; and do not limit the trajectories to only those of equal length. The current subject of research is how the multiple hypothesis framework extends to learning the probabilistic models from the assigned observations. The camera noise models are of special interest as they should follow the changing environmental conditions affecting the observation process. Acknowledgment This research is supported by the Technology Foundation STW (project no ANN5312) applied science division of NWO and the technology programme of the Dutch Ministry of Economic Affairs.

References [1] T. Huang and S. Russell. Object identification: A Bayesian analysis with application to traffic surveillance. Artificial Intelligence, 103(103):1–17, 1998. [2] H. Pasula, S. Russell, M. Ostland, and Y. Ritov. Tracking many objects with many sensors. In Proceedings of Int. Joint Conference on Artificial Intelligence, 1999. Stockholm. [3] A. E. Nicholson and J. M. Brady. The data association problem when monitoring robot vehicles using dynamic belief networks. In Proc. of the 10th European Conf. on Artificial Intelligence (ECAI-92), 689-693, 1992

[4] L. R. Rabinier. A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition. In Proceedings IEEE, 77(2):257–285, 1989. [5] Z. Ghahramani. Learning Dynamic Bayesian Networks. In C. L. Giles and M. Gori (eds), Adaptive Processing of Sequences and Data Structures. Berlin:SpringerVerlag. [6] I. J. Cox. A review of statistical data association techniques for motion correspondence. In International Journal on Computer Vision, 10(1):53–66, 1993. [7] I. J. Cox and S. L. Hingorani An Efficient Implementation of Reid’s Multiple Hypothesis Tracking Algorithm and Its Evaluation for the Purpose of Visual Tracking. In IEEE Trans. on Pattern Analysis and Machine Intelligence, 18(2):138–150, 1996. [8] K. Bar-Shalom and T.E. Fortmann. Tracking and Data Association. Academic Press, 1988.