Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Mask-Guided Attention Network for Occluded Pedestrian Detection (1910.06160v2)

Published 14 Oct 2019 in cs.CV

Abstract: Pedestrian detection relying on deep convolution neural networks has made significant progress. Though promising results have been achieved on standard pedestrians, the performance on heavily occluded pedestrians remains far from satisfactory. The main culprits are intra-class occlusions involving other pedestrians and inter-class occlusions caused by other objects, such as cars and bicycles. These result in a multitude of occlusion patterns. We propose an approach for occluded pedestrian detection with the following contributions. First, we introduce a novel mask-guided attention network that fits naturally into popular pedestrian detection pipelines. Our attention network emphasizes on visible pedestrian regions while suppressing the occluded ones by modulating full body features. Second, we empirically demonstrate that coarse-level segmentation annotations provide reasonable approximation to their dense pixel-wise counterparts. Experiments are performed on CityPersons and Caltech datasets. Our approach sets a new state-of-the-art on both datasets. Our approach obtains an absolute gain of 9.5% in log-average miss rate, compared to the best reported results on the heavily occluded (HO) pedestrian set of CityPersons test set. Further, on the HO pedestrian set of Caltech dataset, our method achieves an absolute gain of 5.0% in log-average miss rate, compared to the best reported results. Code and models are available at: https://github.com/Leotju/MGAN.

Citations (190)

Summary

  • The paper introduces a Mask-Guided Attention module that highlights visible pedestrian regions and suppresses occluded areas.
  • It integrates with Faster R-CNN and achieves up to a 9.5% reduction in log-average miss rate on CityPersons under heavy occlusion.
  • The model’s efficient design enables seamless deployment in applications such as autonomous driving, video surveillance, and smart urban planning.

Detailed Analysis of Mask-Guided Attention Network for Occluded Pedestrian Detection

The paper presents a novel approach to occluded pedestrian detection, a significant challenge in computer vision, exacerbated by both intra-class and inter-class occlusions. These occlusions often result in partial visibility of pedestrians in crowded scenes or due to overlaps with objects. The authors introduce the Mask-Guided Attention Network (MGAN), which integrates with popular pedestrian detection pipelines to improve performance under these challenging conditions.

Contributions and Methodology

The primary contribution of this research is the development of a Mask-Guided Attention (MGA) module that enhances pedestrian detection by focusing on visible regions of the pedestrian while suppressing occluded areas. This is achieved by modulating features extracted from full-body detections. The MGA module generates spatial attention maps using visible-region information (obtained from coarse-level segmentation annotations) to refine the full-body feature representation.

The MGA branch operates by applying a convolutional network to process Region of Interest (RoI) features, outputting a pixel-wise probability map indicative of visible pedestrian regions. These maps are used to modulate the RoI features, essentially re-weighting them to emphasize visible regions and suppress occlusion, thus enhancing discriminative feature learning for occluded pedestrians.

The architecture comprises a standard pedestrian detection branch (SPD), employing the Faster R-CNN framework. The integration of MGA offers a lightweight yet powerful enhancement that is easily deployable in existing detection architectures.

Experimental Evaluation

The authors conducted extensive experiments on two prominent datasets, CityPersons and Caltech, demonstrating that MGAN consistently outperforms existing models, particularly under heavy occlusion scenarios. On the heavily occluded subset of the CityPersons dataset, MGAN achieved a 9.5% and 5.0% absolute reduction in log-average miss rate compared to previous state-of-the-art results. These findings underscore the model's robustness in real-world scenarios marked by significant occlusions.

The ablation studies further illustrate the contribution of each component of the MGAN, showing the effectiveness of the mask-guided attention mechanism and the occlusion-sensitive loss. The results indicate the MGA branch's ability to generalize across varying degrees of pedestrian visibility, offering a compelling improvement over baseline models that focus solely on full-body annotations.

Theoretical and Practical Implications

The paper offers notable theoretical advancements in pedestrian detection by devising an attention mechanism closely tied to the varying degrees of occlusion, a largely underexplored area. The integration of coarse-level segmentation annotations is an intelligent simplification that provides practical feasibility without sacrificing detection performance, aligning with the increasing demand for rapid annotation strategies in expansive datasets.

Practically, the proposed MGAN model is particularly advantageous for applications requiring precise pedestrian detection in environments such as autonomous driving, video surveillance, and smart urban planning, where pedestrian safety is paramount. The developed network addresses a critical gap in current pedestrian detection systems by delivering superior performance in scenarios with clutter and occlusions.

Future Perspectives

The advancement initiated by this work opens several avenues for further exploration. Future research can explore the adaptation of the MGAN framework to other object detection tasks involving occlusions, as well as integration with more advanced backbone networks for enhanced feature extraction. Additionally, extending the model's utility in real-time systems and edge computing scenarios could be of significant interest, making pedestrian detection more ubiquitous in smart systems.

In summary, the Mask-Guided Attention Network represents a meaningful contribution to the field of pedestrian detection, delivering improved detection accuracy in challenging occlusion-dominated environments. The simplicity of integration with existing frameworks and the resultant performance gains position MGAN as an essential consideration for future developments in pedestrian detection technologies.

Github Logo Streamline Icon: https://streamlinehq.com