A Proposal-Based Solution to Spatio-Temporal Action Detection in Untrimmed Videos (1811.08496v2)
Abstract: Existing approaches for spatio-temporal action detection in videos are limited by the spatial extent and temporal duration of the actions. In this paper, we present a modular system for spatio-temporal action detection in untrimmed security videos. We propose a two stage approach. The first stage generates dense spatio-temporal proposals using hierarchical clustering and temporal jittering techniques on frame-wise object detections. The second stage is a Temporal Refinement I3D (TRI-3D) network that performs action classification and temporal refinement on the generated proposals. The object detection-based proposal generation step helps in detecting actions occurring in a small spatial region of a video frame, while temporal jittering and refinement helps in detecting actions of variable lengths. Experimental results on the spatio-temporal action detection dataset - DIVA - show the effectiveness of our system. For comparison, the performance of our system is also evaluated on the THUMOS14 temporal action detection dataset.
- Joshua Gleason (10 papers)
- Rajeev Ranjan (43 papers)
- Steven Schwarcz (6 papers)
- Carlos D. Castillo (29 papers)
- Jun-Chen Cheng (1 paper)
- Rama Chellappa (190 papers)