Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
129 tokens/sec
GPT-4o
28 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

ADM-Loc: Actionness Distribution Modeling for Point-supervised Temporal Action Localization (2311.15916v1)

Published 27 Nov 2023 in cs.CV

Abstract: This paper addresses the challenge of point-supervised temporal action detection, in which only one frame per action instance is annotated in the training set. Self-training aims to provide supplementary supervision for the training process by generating pseudo-labels (action proposals) from a base model. However, most current methods generate action proposals by applying manually designed thresholds to action classification probabilities and treating adjacent snippets as independent entities. As a result, these methods struggle to generate complete action proposals, exhibit sensitivity to fluctuations in action classification scores, and generate redundant and overlapping action proposals. This paper proposes a novel framework termed ADM-Loc, which stands for Actionness Distribution Modeling for point-supervised action Localization. ADM-Loc generates action proposals by fitting a composite distribution, comprising both Gaussian and uniform distributions, to the action classification signals. This fitting process is tailored to each action class present in the video and is applied separately for each action instance, ensuring the distinctiveness of their distributions. ADM-Loc significantly enhances the alignment between the generated action proposals and ground-truth action instances and offers high-quality pseudo-labels for self-training. Moreover, to model action boundary snippets, it enforces consistency in action classification scores during training by employing Gaussian kernels, supervised with the proposed loss functions. ADM-Loc outperforms the state-of-the-art point-supervised methods on THUMOS14 and ActivityNet-v1.2 datasets.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (72)
  1. Diagnosing error in temporal action detectors. In Proceedings of the European Conference on Computer Vision (ECCV), pages 256–272, 2018.
  2. Boundary content graph neural network for temporal action proposal generation. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXVIII 16, pages 121–137. Springer, 2020.
  3. Richard P Brent. Algorithms for minimization without derivatives. Courier Corporation, 2013.
  4. Sst: Single-stream temporal action proposals. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pages 2911–2920, 2017.
  5. End-to-end, single-stream temporal action detection in untrimmed videos. 2019.
  6. Activitynet: A large-scale video benchmark for human activity understanding. In Proceedings of the ieee conference on computer vision and pattern recognition, pages 961–970, 2015.
  7. Quo vadis, action recognition? a new model and the kinetics dataset. In proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 6299–6308, 2017.
  8. Augmented transformer with adaptive graph for temporal action proposal generation. In Proceedings of the 3rd International Workshop on Human-Centric Multimedia Analysis, pages 41–50, 2022.
  9. Rethinking the faster r-cnn architecture for temporal action localization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1130–1139, 2018.
  10. Dual-evidential learning for weakly-supervised temporal action localization. In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part IV, pages 192–208. Springer, 2022.
  11. A context-aware loss function for action spotting in soccer videos. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13126–13136, 2020.
  12. Solving the multiple instance problem with axis-parallel rectangles. Artificial intelligence, 89(1-2):31–71, 1997.
  13. Turn tap: Temporal unit regression network for temporal action proposals. In Proceedings of the IEEE International Conference on Computer Vision, pages 3628–3636, 2017a.
  14. Cascaded boundary regression for temporal action detection. arXiv preprint arXiv:1705.01180, 2017b.
  15. Fine-grained temporal contrastive learning for weakly-supervised temporal action localization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 19999–20009, 2022.
  16. Soccernet: A scalable dataset for action spotting in soccer videos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pages 1711–1721, 2018.
  17. Asm-loc: Action-aware segment modeling for weakly-supervised temporal action localization. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 13925–13935, 2022.
  18. An anomaly-introduced learning method for abnormal event detection. Multimedia Tools and Applications, 77(22):29573–29588, 2018.
  19. Cross-modal consensus network for weakly supervised temporal action localization. In Proceedings of the 29th ACM International Conference on Multimedia, pages 1591–1599, 2021.
  20. Snapshot ensembles: Train 1, get m for free. arXiv preprint arXiv:1704.00109, 2017.
  21. Weakly supervised temporal action localization via representative snippet knowledge propagation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3272–3281, 2022.
  22. Thumos challenge: Action recognition with a large number of classes, 2014.
  23. Point-level temporal action localization: Bridging fully-supervised proposals to weakly-supervised losses. arXiv preprint arXiv:2012.08236, 2020.
  24. Divide and conquer for single-frame temporal action localization. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 13455–13464, 2021.
  25. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
  26. Learning action completeness from points for weakly-supervised temporal action localization. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 13648–13657, 2021.
  27. Background suppression network for weakly-supervised temporal action localization. In AAAI, pages 11320–11327, 2020.
  28. Graph attention based proposal 3d convnets for action detection. In AAAI, pages 4626–4633, 2020.
  29. Sub-action prototype learning for point-level weakly-supervised temporal action localization. arXiv preprint arXiv:2309.09060, 2023.
  30. Fast learning of temporal action proposal via dense boundary generator. In AAAI, pages 11499–11506, 2020.
  31. Learning salient boundary feature for anchor-free temporal action localization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3320–3329, 2021.
  32. Single shot temporal action detection. In Proceedings of the 25th ACM international conference on Multimedia, pages 988–996, 2017.
  33. Bsn: Boundary sensitive network for temporal action proposal generation. In Proceedings of the European Conference on Computer Vision (ECCV), pages 3–19, 2018.
  34. Bmn: Boundary-matching network for temporal action proposal generation. In Proceedings of the IEEE International Conference on Computer Vision, pages 3889–3898, 2019.
  35. Completeness modeling and context separation for weakly supervised temporal action localization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1298–1307, 2019a.
  36. Understanding the difficulty of training transformers. arXiv preprint arXiv:2004.08249, 2020.
  37. Progressive boundary refinement network for temporal action detection. In Proceedings of the AAAI conference on artificial intelligence, pages 11612–11619, 2020.
  38. Multi-granularity generator for temporal action proposal. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3604–3613, 2019b.
  39. Action unit memory network for weakly supervised temporal action localization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9969–9979, 2021.
  40. Weakly-supervised action localization with expectation-maximization multi-instance learning. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXIX 16, pages 729–745. Springer, 2020.
  41. Sf-net: Single-frame supervision for temporal action localization. In European conference on computer vision, pages 420–437. Springer, 2020.
  42. Weakly supervised action selection learning in video. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7587–7596, 2021.
  43. Av-pedestrian interaction design using a pedestrian mixed traffic simulator. In Proceedings of the 2019 on Designing Interactive Systems Conference, pages 475–486, 2019.
  44. Adversarial background-aware loss for weakly-supervised temporal activity localization. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XIV 16, pages 283–299. Springer, 2020.
  45. Action recognition from single timestamp supervision in untrimmed videos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 9915–9924, 2019.
  46. 3c-net: Category count and center loss for weakly-supervised action localization. In Proceedings of the IEEE International Conference on Computer Vision, pages 8679–8687, 2019.
  47. D2-net: Weakly-supervised action localization via discriminative embeddings and denoised activations. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 13608–13617, 2021.
  48. Activity graph transformer for temporal action localization. arXiv preprint arXiv:2101.08540, 2021.
  49. Refineloc: Iterative refinement for weakly-supervised action localization. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 3319–3328, 2021.
  50. Acm-net: Action context modeling network for weakly-supervised temporal action localization. arXiv preprint arXiv:2104.02967, 2021.
  51. Autonomous vehicles that interact with pedestrians: A survey of theory and practice. IEEE Transactions on Intelligent Transportation Systems, 21(3):900–918, 2019.
  52. Proposal-based multiple instance learning for weakly-supervised temporal action localization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2394–2404, 2023.
  53. Pivotal: Prior-driven supervision for weakly-supervised temporal action localization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 22992–23002, 2023.
  54. Tridet: Temporal action detection with relative boundary modeling. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 18857–18866, 2023.
  55. Hide-and-seek: Forcing a network to be meticulous for weakly-supervised object and action localization. In 2017 IEEE international conference on computer vision (ICCV), pages 3544–3553. IEEE, 2017.
  56. Deep learning-based action detection in untrimmed videos: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(4):4302–4320, 2023.
  57. Temporal action localization in the deep learning era: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023a.
  58. Videomae v2: Scaling video masked autoencoders with dual masking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14549–14560, 2023b.
  59. Two-stream networks for weakly-supervised temporal action localization with semantic-aware mechanisms. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 18878–18887, 2023c.
  60. G-tad: Sub-graph localization for temporal action detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10156–10165, 2020.
  61. Background-click supervision for temporal action localization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021a.
  62. Uncertainty guided collaborative training for weakly supervised temporal action detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 53–63, 2021b.
  63. When, where, and what? a new dataset for anomaly detection in driving videos. arXiv preprint arXiv:2004.03044, 2020.
  64. Breaking winner-takes-all: Iterative-winners-out networks for weakly supervised temporal action localization. IEEE Transactions on Image Processing, 28(12):5797–5808, 2019a.
  65. Graph convolutional networks for temporal action localization. In Proceedings of the IEEE International Conference on Computer Vision, pages 7094–7103, 2019b.
  66. Two-stream consensus network for weakly-supervised temporal action localization. In European conference on computer vision, pages 37–54. Springer, 2020.
  67. Cola: Weakly-supervised temporal action localization with snippet contrastive learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16010–16019, 2021.
  68. Actionformer: Localizing moments of actions with transformers. In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part IV, pages 492–510. Springer, 2022.
  69. S3d: single shot multi-span detector via fully 3d convolutional networks. arXiv preprint arXiv:1807.08069, 2018.
  70. Video self-stitching graph network for temporal action localization. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 13658–13667, 2021.
  71. Step-by-step erasion, one-by-one collection: a weakly supervised temporal action detector. In Proceedings of the 26th ACM international conference on Multimedia, pages 35–44, 2018.
  72. Improving weakly supervised temporal action localization by bridging train-test gap in pseudo labels. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 23003–23012, 2023.

Summary

We haven't generated a summary for this paper yet.