Tracking with Random Finite Sets

Many tracking algorithms use the Bayes recursion to propagate uncertain estimates based on uncertain observation over time. This is a well established probabilistic framework for single-target tracking. A problem arises if we want to extend the Bayes framework to multiple objects and we are required to estimate the number of targets in the scene. This estimate, as the position of the target,  is also uncertain. However the standard form of the Bayes recursion does not provide the mathematical machinery to model this uncertainty.To cope with this problem many methods model target arrival and departure events separately from single-target state estimation.

Random Finite Sets (RFSs) provide a solution to this problem. A random finite set is a set of random variables (or vectors) whose cardinality is also a random variable. In a tracking algorithm a RFS can be used to model the multi-target state where each element of the set model the state estimate for one targets, while the random variable associated to the set cardinality allows us to model target presence uncertainty. The mathematical framework of Finite Set Statistics (FISS) then provides the tool for RFS manipulation, like computation of moments, marginals etc.. Using similar operations to the single target case FISS allow one to derive a multi-target Bayes recursion that is equivalent to the single-target one. Unfortunately the multi-target Bayes recursion is computationally intractable, nevertheless the propagation of the first order moment of the posterior can be achieved very efficiently, the algorithm is known as the PHD filter.

The video above shows the result of the PHD filter used for face tracking. On the right side the video shows the evolution over time fo the Probability Hypotheses Density (PHD). The PHD includes information on the number of targets in the scene. In fact the integral of the function over an area of the state returns the estimated number of targets in that area. As new data becomes available the estimate is filtered over space and over time.

The computational complexity of the filter is O(1) (yes you read well O(1)) with the number of targets in the scene. In practice a Monte Carlo approximation (like the one in the video) requires a number of particles proportional to the number of targets (i.e., linear complexity). The video below show some face tracking results where the PHD filter is used to pre-process the detections before the data association task.

Related publications

 Maggio, Emilio; Cavallaro, Andrea (2011): Video tracking: theory and practice. Wiley, 2011. (Type: Book | BibTeX) Maggio, Emilio; Cavallaro, Andrea (2009): Learning scene context for multiple object tracking. In: IEEE Transactions on Image Processing, 18 (8), pp. 1873–1884, 2009. (Type: Journal Article | BibTeX) Maggio, Emilio; Taj, Murtaza; Cavallaro, Andrea (2008): Efficient multi-target visual tracking using Random Finite Sets. In: IEEE Transactions on Circuits and Systems for Video Technology, 18 (8), pp. 1016–1027, 2008. (Type: Journal Article | BibTeX) E. Maggio E. Piccardo, Regazzoni; Cavallaro, (2007): Particle PHD filter for multi-target visual tracking. In: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Honolulu, USA, 2007. (Type: Inproceeding | BibTeX)