BMN: Boundary-Matching Network for Temporal Action Proposal Generation (1907.09702v1)

Published 23 Jul 2019 in cs.CV

Abstract: Temporal action proposal generation is an challenging and promising task which aims to locate temporal regions in real-world videos where action or event may occur. Current bottom-up proposal generation methods can generate proposals with precise boundary, but cannot efficiently generate adequately reliable confidence scores for retrieving proposals. To address these difficulties, we introduce the Boundary-Matching (BM) mechanism to evaluate confidence scores of densely distributed proposals, which denote a proposal as a matching pair of starting and ending boundaries and combine all densely distributed BM pairs into the BM confidence map. Based on BM mechanism, we propose an effective, efficient and end-to-end proposal generation method, named Boundary-Matching Network (BMN), which generates proposals with precise temporal boundaries as well as reliable confidence scores simultaneously. The two-branches of BMN are jointly trained in an unified framework. We conduct experiments on two challenging datasets: THUMOS-14 and ActivityNet-1.3, where BMN shows significant performance improvement with remarkable efficiency and generalizability. Further, combining with existing action classifier, BMN can achieve state-of-the-art temporal action detection performance.

Authors (5)

Tianwei Lin (42 papers)
Xiao Liu (402 papers)
Xin Li (980 papers)
Errui Ding (156 papers)
Shilei Wen (42 papers)

Citations (570)

View on Semantic Scholar

Summary

The paper introduces a novel Boundary-Matching mechanism that delivers dense evaluation for precise temporal proposal boundaries.
It presents a unified framework by jointly training boundary prediction and proposal scoring to enhance detection performance.
Empirical results on THUMOS-14 and ActivityNet-1.3 demonstrate BMN's superior quality and efficiency in temporal action proposal generation.

Boundary-Matching Network for Temporal Action Proposal Generation

The paper "BMN: Boundary-Matching Network for Temporal Action Proposal Generation" introduces an advanced method for generating temporal action proposals in videos using the Boundary-Matching Network (BMN). Unlike existing methods, BMN simultaneously delivers precise boundary detections and reliable confidence scores, optimizing both efficacy and efficiency.

Key Contributions

Boundary-Matching Mechanism: The authors propose the Boundary-Matching (BM) mechanism which addresses the inefficiencies of previous bottom-up methods by simultaneously evaluating proposals using dense boundary matches. This method constructs a two-dimensional BM confidence map to assess confidence scores efficiently.
Unified Framework: BMN operates within a fully integrated framework where two branches responsible for boundary prediction and proposal evaluation are trained jointly. This contrasts with previous multi-stage approaches, ensuring that temporal boundaries and confidence scores are generated in parallel.
Empirical Validation: The paper demonstrates BMN's capabilities through experiments on the THUMOS-14 and ActivityNet-1.3 datasets, showing substantial improvements in proposal quality and temporal action detection performance while maintaining computational efficiency.

Methodology

BMN uses a unique approach to encode video features and evaluate proposals:

Feature Encoding: Visual features are extracted using a two-stream network, which processes both spatial and temporal information. The extracted features inform the generation of proposal candidates.
BM Layer and Confidence Map: The BM layer efficiently generates proposal features via dot products with pre-defined sampling masks, producing a BM feature map. This map is processed through convolutional layers to yield a comprehensive BM confidence map.
Scoring: Proposals are assigned scores based on both boundary probabilities and confidence values from the BM confidence map, obtained by leveraging local and global temporal contexts.

Experimental Results

ActivityNet-1.3: The BMN achieves an AR@100 of 75.01% and an AUC of 67.10%, outperforming previous methods by noticeable margins.
THUMOS-14: BMN outmatches existing models in AR across different numbers of proposals, highlighting its robustness and generalization capability.
Temporal Action Detection: When integrated into action detection pipelines, BMN contributes to higher mAP scores on both datasets, confirming the practical utility of the generated proposals.

Implications and Future Directions

BMN's architecture demonstrates significant advances in temporal action proposal generation by efficiently leveraging deep learning constructs such as convolutional operations applied to temporal features. This research broadens the potential for applications in video analysis tasks such as smart surveillance and content recommendation systems.

For future research, the BMN opens avenues in enhancing context-based evaluations further and exploring adaptive boundary mechanisms to encompass diverse video scenarios. Integrating BMN with emerging neural architectures may lead to continued improvements in both speed and precision.

In conclusion, the BMN presents a promising path forward in the ongoing challenge of temporal action localization, setting a benchmark for future studies in the domain.