Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Boundary-Aware Proposal Generation Method for Temporal Action Localization (2309.13810v1)

Published 25 Sep 2023 in cs.CV

Abstract: The goal of Temporal Action Localization (TAL) is to find the categories and temporal boundaries of actions in an untrimmed video. Most TAL methods rely heavily on action recognition models that are sensitive to action labels rather than temporal boundaries. More importantly, few works consider the background frames that are similar to action frames in pixels but dissimilar in semantics, which also leads to inaccurate temporal boundaries. To address the challenge above, we propose a Boundary-Aware Proposal Generation (BAPG) method with contrastive learning. Specifically, we define the above background frames as hard negative samples. Contrastive learning with hard negative mining is introduced to improve the discrimination of BAPG. BAPG is independent of the existing TAL network architecture, so it can be applied plug-and-play to mainstream TAL models. Extensive experimental results on THUMOS14 and ActivityNet-1.3 demonstrate that BAPG can significantly improve the performance of TAL.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (29)
  1. Ms-tct: Multi-scale temporal convtransformer for action detection. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 20009–20019, 2022.
  2. Tridet: Temporal action detection with relative boundary modeling. In 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 18857–18866, 2023.
  3. Temporal action detection with structured segment networks. In 2017 IEEE International Conference on Computer Vision (ICCV), pages 2933–2942, 2017.
  4. Actionformer: Localizing moments of actions with transformers. In 2022 European Conference on Computer Vision (ECCV), pages 492–510, 2022.
  5. Bsn: Boundary sensitive network for temporal action proposal generation. In 2018 European Conference on Computer Vision (ECCV), pages 3–19, 2018.
  6. Bmn: Boundary-matching network for temporal action proposal generation. In 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pages 3888–3897, 2019.
  7. Gaussian temporal awareness networks for action localization. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 344–353, 2019.
  8. Learning salient boundary feature for anchor-free temporal action localization. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 3320–3329, 2021.
  9. End-to-end temporal action detection with transformer. IEEE Transactions on Image Processing, 31:5427–5441, 2022.
  10. Multi-granularity generator for temporal action proposal. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
  11. Tallformer: Temporal action localization with a long-memory transformer. In 2022 European Conference on Computer Vision (ECCV), pages 503–521, 2022.
  12. Action-aware masking network with group-based attention for temporal action localization. In 2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pages 6047–6056, 2023.
  13. SSD: Single shot multibox detector. In 2016 European Conference on Computer Vision (ECCV), pages 21–37, 2016.
  14. You only look once: Unified, real-time object detection. In 2016 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 779–788, 2016.
  15. Yolo9000: better, faster, stronger. In 2017 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 7263–7271, 2017.
  16. Single shot temporal action detection. In Proceedings of the 25th ACM international conference on Multimedia, pages 988–996, 2017.
  17. Learning spatiotemporal features with 3d convolutional networks. In 2015 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 4489–4497, 2015.
  18. Yoon Kim. Convolutional neural networks for sentence classification. In arXiv preprint arXiv:1408.5882, 2014.
  19. Quo vadis, action recognition? a new model and the kinetics dataset. In 2017 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 6299–6308, 2017.
  20. Category-specific video summarization. In 2014 European Conference on Computer Vision (ECCV), pages 540–555, 2014.
  21. Kernel change-point analysis. In 2008 Advances in Neural Information Processing Systems (NeurIPS), volume 21, 2008.
  22. Retrospective mutiple change-point estimation with kernels. In 2007 IEEE/SP 14th Workshop on Statistical Signal Processing (SSPW), pages 768–772, 2007.
  23. Franklin C Crow. Summed-area tables for texture mapping. In Proceedings of the 11th annual conference on Computer graphics and interactive techniques, pages 207–212, 1984.
  24. Thumos challenge: Action recognition with a large number of classes, 2014.
  25. Graph convolutional module for temporal action localization in videos. In IEEE Transactions on Pattern Analysis and Machine Intelligence, volume 44, pages 6209–6223, 2022.
  26. Activitynet: A large-scale video benchmark for human activity understanding. In 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 961–970, 2015.
  27. Multi-shot temporal event localization: a benchmark. In 2021 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 12596–12606, 2021.
  28. Revisiting training strategies and generalization performance in deep metric learning. In International Conference on Machine Learning, pages 8242–8252. PMLR, 2020.
  29. Revisiting anchor mechanisms for temporal action localization. IEEE Transactions on Image Processing, 29:8535–8548, 2020.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Hao Zhang (948 papers)
  2. Chunyan Feng (12 papers)
  3. Jiahui Yang (10 papers)
  4. Zheng Li (326 papers)
  5. Caili Guo (41 papers)

Summary

We haven't generated a summary for this paper yet.