Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
153 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Two-stream joint matching method based on contrastive learning for few-shot action recognition (2401.04150v1)

Published 8 Jan 2024 in cs.CV

Abstract: Although few-shot action recognition based on metric learning paradigm has achieved significant success, it fails to address the following issues: (1) inadequate action relation modeling and underutilization of multi-modal information; (2) challenges in handling video matching problems with different lengths and speeds, and video matching problems with misalignment of video sub-actions. To address these issues, we propose a Two-Stream Joint Matching method based on contrastive learning (TSJM), which consists of two modules: Multi-modal Contrastive Learning Module (MCL) and Joint Matching Module (JMM). The objective of the MCL is to extensively investigate the inter-modal mutual information relationships, thereby thoroughly extracting modal information to enhance the modeling of action relationships. The JMM aims to simultaneously address the aforementioned video matching problems. The effectiveness of the proposed method is evaluated on two widely used few shot action recognition datasets, namely, SSv2 and Kinetics. Comprehensive ablation experiments are also conducted to substantiate the efficacy of our proposed approach.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (27)
  1. “Slowfast networks for video recognition,” in ICCV, 2019, pp. 6202–6211.
  2. “Tsm: Temporal shift module for efficient video understanding,” in ICCV, 2019, pp. 7083–7093.
  3. “Vivit: A video vision transformer,” in ICCV, 2021, pp. 6836–6846.
  4. “The” something something” video database for learning and evaluating visual common sense,” in ICCV, 2017, pp. 5842–5850.
  5. “Depth guided adaptive meta-fusion network for few-shot video recognition,” in ACMMM, 2020, pp. 1142–1151.
  6. “Long-short term cross-transformer in compressed domain for few-shot video classification.,” in IJCAI, 2022, pp. 1247–1253.
  7. “Learning from temporal gradient for semi-supervised action recognition,” in CVPR, 2022, pp. 3252–3262.
  8. “Molo: Motion-augmented long-short contrastive learning for few-shot action recognition,” in CVPR, 2023, pp. 18011–18021.
  9. “Action unit memory network for weakly supervised temporal action localization,” in CVPR, 2021, pp. 9969–9979.
  10. “Untrimmednets for weakly supervised action recognition and detection,” in CVPR, 2017, pp. 4325–4334.
  11. “Active exploration of multimodal complementarity for few-shot action recognition,” in CVPR, 2023, pp. 6492–6502.
  12. “Few-shot video classification via temporal alignment,” in CVPR, 2020, pp. 10618–10627.
  13. “Few-shot action recognition with compromised metric via optimal transport,” arXiv preprint arXiv:2104.03737, 2021.
  14. “Hybrid relation guided set matching for few-shot action recognition,” in CVPR, 2022, pp. 19948–19957.
  15. Meinard Müller, “Dynamic time warping,” Information retrieval for music and motion, pp. 69–84, 2007.
  16. Harold W Kuhn, “The hungarian method for the assignment problem,” Naval research logistics quarterly, vol. 2, no. 1-2, pp. 83–97, 1955.
  17. “Temporal-relational crosstransformers for few-shot action recognition,” in CVPR, 2021, pp. 475–484.
  18. “Temporal segment networks: Towards good practices for deep action recognition,” in ECCV. Springer, 2016, pp. 20–36.
  19. “Representation learning with contrastive predictive coding,” arXiv preprint arXiv:1807.03748, 2018.
  20. Linchao Zhu and Yi Yang, “Compound memory networks for few-shot video classification,” in ECCV, 2018, pp. 751–766.
  21. “Spatio-temporal relation modeling for few-shot action recognition,” in CVPR, 2022, pp. 19958–19967.
  22. “Revisiting the spatial and temporal modeling for few-shot action recognition,” in AAAI, 2023, vol. 37, pp. 3001–3009.
  23. “Motion-modulated temporal fragment alignment network for few-shot action recognition,” in CVPR, 2022, pp. 9151–9160.
  24. “Quo vadis, action recognition? a new model and the kinetics dataset,” in CVPR, 2017, pp. 6299–6308.
  25. “An iterative image registration technique with an application to stereo vision,” in IJCAI, 1981, vol. 2, pp. 674–679.
  26. “Deep residual learning for image recognition,” in CVPR, 2016, pp. 770–778.
  27. “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
Citations (2)

Summary

We haven't generated a summary for this paper yet.