Image Segmentation in Video Sequences: A Probabilistic Approach (1302.1539v1)

Published 6 Feb 2013 in cs.CV and cs.AI

Abstract: "Background subtraction" is an old technique for finding moving objects in a video sequence for example, cars driving on a freeway. The idea is that subtracting the current image from a timeaveraged background image will leave only nonstationary objects. It is, however, a crude approximation to the task of classifying each pixel of the current image; it fails with slow-moving objects and does not distinguish shadows from moving objects. The basic idea of this paper is that we can classify each pixel using a model of how that pixel looks when it is part of different classes. We learn a mixture-of-Gaussians classification model for each pixel using an unsupervised technique- an efficient, incremental version of EM. Unlike the standard image-averaging approach, this automatically updates the mixture component for each class according to likelihood of membership; hence slow-moving objects are handled perfectly. Our approach also identifies and eliminates shadows much more effectively than other techniques such as thresholding. Application of this method as part of the Roadwatch traffic surveillance project is expected to result in significant improvements in vehicle identification and tracking.

Authors (2)

Nir Friedman (29 papers)
Stuart Russell (98 papers)

Citations (1,092)

View on Semantic Scholar

Summary

The paper introduces a pixel-wise probabilistic model using a mixture-of-Gaussians to enhance the detection of moving objects in video sequences.
It employs an incremental Expectation-Maximization algorithm that enables real-time updates while efficiently reducing memory usage.
The approach distinctly separates shadows from vehicles, significantly outperforming traditional background subtraction methods in traffic surveillance.

Image Segmentation in Video Sequences: A Probabilistic Approach

Nir Friedman and Stuart Russell in their paper "Image Segmentation in Video Sequences: A Probabilistic Approach," propose an advanced method for identifying moving objects in video sequences through pixel classification using a probabilistic model. The primary motivation stems from the limitations observed in traditional background subtraction methods, particularly their inability to effectively handle slow-moving objects and distinguish shadows from moving objects.

The authors employ a mixture-of-Gaussians (MoG) approach, where a probabilistic model is learned for each pixel. This model is dynamically updated using an efficient, incremental version of the Expectation-Maximization (EM) algorithm. This method allows for robust real-time performance, making it well-suited for applications such as traffic surveillance, which is a part of the Roadwatch project.

Key Methodological Details

Pixel Modeling:
- Each pixel in the image is modeled as a mixture of three Gaussians corresponding to the road, shadow, and vehicle.
- Parameters for these models are learned using an unsupervised, incremental version of the EM algorithm.
Incremental EM Algorithm:
- This variant of EM allows for real-time updates without the need for storing historical pixel values, thus reducing memory requirements.
- The algorithm incrementally updates the sufficient statistics for the model parameters, ensuring that the system adapts to new data efficiently.
Heuristic Component Labeling:
- Heuristics are applied to label the Gaussian components correctly. Generally, the darkest component is labeled as shadow, and among the remaining components, the one with the larger variance is labeled as vehicle, while the other is labeled as road.

Empirical Evaluation

Friedman and Russell demonstrate the efficacy of their method using traffic surveillance images. They compare their approach to traditional background subtraction techniques, highlighting substantial improvements in identifying slow-moving vehicles and effectively eliminating shadows.

The described method encompasses the following steps:

Initialize mixture models for each pixel with a weak prior.
Update models for each new frame using incremental EM.
Heuristically label mixture components and classify pixels accordingly.

Results and Practical Implications

The empirical results demonstrate that the MoG-based approach significantly outperforms standard methods in terms of clarity and accuracy of vehicle detection. For instance, even in the presence of shadows and slow-moving vehicles, which typically corrupt the background model in traditional methods, the proposed approach maintains robustness and precision.

Theoretical Implications and Future Directions

The utilization of pixel-wise MoG models represents a natural probabilistic generalization of classical deterministic methods. This innovation potentially opens avenues for more sophisticated models integrating higher-level spatial and temporal context, possibly through Markov networks or dynamic belief networks.

Future research could explore:

Enhanced initialization strategies for model parameters to improve performance under extreme lighting conditions.
Expansion of the approach to incorporate RGB values rather than intensity, reducing the chances of vehicles being undetected due to similar intensity to the background.
Development of spatial and temporal contiguity models to ensure smoother and more consistent classification over sequences of frames.

In conclusion, the paper by Friedman and Russell offers a significant contribution to the field of computer vision by introducing a probabilistic pixel classification model that significantly improves moving object detection in video sequences. The adoption of this approach in practical applications, such as traffic surveillance, promises to enhance real-time tracking and analysis capabilities.

PDF Markdown