Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Cross-view Action Recognition Understanding From Exocentric to Egocentric Perspective (2305.15699v3)

Published 25 May 2023 in cs.CV

Abstract: Understanding action recognition in egocentric videos has emerged as a vital research topic with numerous practical applications. With the limitation in the scale of egocentric data collection, learning robust deep learning-based action recognition models remains difficult. Transferring knowledge learned from the large-scale exocentric data to the egocentric data is challenging due to the difference in videos across views. Our work introduces a novel cross-view learning approach to action recognition (CVAR) that effectively transfers knowledge from the exocentric to the selfish view. First, we present a novel geometric-based constraint into the self-attention mechanism in Transformer based on analyzing the camera positions between two views. Then, we propose a new cross-view self-attention loss learned on unpaired cross-view data to enforce the self-attention mechanism learning to transfer knowledge across views. Finally, to further improve the performance of our cross-view learning approach, we present the metrics to measure the correlations in videos and attention maps effectively. Experimental results on standard egocentric action recognition benchmarks, i.e., Charades-Ego, EPIC-Kitchens-55, and EPIC-Kitchens-100, have shown our approach's effectiveness and state-of-the-art performance.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (30)
  1. arXiv:2106.13230.
  2. doi:10.1109/TPAMI.2022.3148790.
  3. doi:10.1109/CVPR.2017.678.
  4. doi:10.48550/ARXIV.2301.01380. URL https://arxiv.org/abs/2301.01380
  5. doi:10.1109/ICCV.2019.00630.
  6. arXiv:1706.04261.
  7. doi:10.1109/CVPR.2014.223.
  8. doi:10.1109/CVPR.2018.00633.
  9. doi:10.1109/CVPR.2017.502.
  10. doi:10.1007/978-3-319-46484-8\_2.
  11. arXiv:2103.15691.
  12. arXiv:2104.11227.
  13. doi:10.1109/CVPR.2015.7298878.
  14. doi:10.1109/CVPR.2016.90.
  15. doi:10.1109/CVPR.2016.308.
  16. doi:10.1162/neco.1997.9.8.1735.
  17. doi:10.1109/CVPR.2016.213.
  18. doi:10.1109/CVPR.2017.787.
  19. doi:10.1109/ICCV.2015.510.
  20. doi:10.1007/978-3-030-01246-5\_49.
  21. doi:10.1109/CVPR.2018.00675.
  22. C. Feichtenhofer, X3d: Expanding architectures for efficient video recognition (2020). arXiv:2004.04730.
  23. doi:10.1007/978-3-030-01267-0\_19.
  24. doi:10.1109/ICCV.2017.590.
  25. doi:10.1109/CVPR.2018.00813.
  26. arXiv:2010.11929. URL https://openreview.net/forum?id=YicbFdNTTy
  27. doi:10.1109/3DV.2019.00022.
  28. doi:10.1109/CVPR52688.2022.00791.
  29. doi:https://doi.org/10.1016/j.patrec.2020.06.002. URL https://www.sciencedirect.com/science/article/pii/S0167865520302208
  30. doi:10.48550/ARXIV.1803.00567. URL https://arxiv.org/abs/1803.00567
Citations (6)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com