Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

GTAutoAct: An Automatic Datasets Generation Framework Based on Game Engine Redevelopment for Action Recognition (2401.13414v1)

Published 24 Jan 2024 in cs.CV

Abstract: Current datasets for action recognition tasks face limitations stemming from traditional collection and generation methods, including the constrained range of action classes, absence of multi-viewpoint recordings, limited diversity, poor video quality, and labor-intensive manually collection. To address these challenges, we introduce GTAutoAct, a innovative dataset generation framework leveraging game engine technology to facilitate advancements in action recognition. GTAutoAct excels in automatically creating large-scale, well-annotated datasets with extensive action classes and superior video quality. Our framework's distinctive contributions encompass: (1) it innovatively transforms readily available coordinate-based 3D human motion into rotation-orientated representation with enhanced suitability in multiple viewpoints; (2) it employs dynamic segmentation and interpolation of rotation sequences to create smooth and realistic animations of action; (3) it offers extensively customizable animation scenes; (4) it implements an autonomous video capture and processing pipeline, featuring a randomly navigating camera, with auto-trimming and labeling functionalities. Experimental results underscore the framework's robustness and highlights its potential to significantly improve action recognition model training.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (37)
  1. 2008-2022 OpenIV. Openiv. https://openiv.com. Accessed March 8, 2023.
  2. Nctu-gtav360: A 360° action recognition video dataset. In 2019 IEEE 21st International Workshop on Multimedia Signal Processing (MMSP), pages 1–5, 2019.
  3. Autodesk. 3ds max 2023. https://www.autodesk.co.jp/products/3ds-max/overview?term=1-YEAR&tab=subscription. Accessed March 8, 2023.
  4. G3d: A gaming action dataset and real time action recognition evaluation framework. In 2012 IEEE Computer society conference on computer vision and pattern recognition workshops, pages 7–12. IEEE, 2012.
  5. Long-term human motion prediction with scene context. CoRR, abs/2007.03672, 2020.
  6. Quo vadis, action recognition? A new model and the kinetics dataset. CoRR, abs/1705.07750, 2017.
  7. A short note on the kinetics-700 human action dataset. CoRR, abs/1907.06987, 2019.
  8. dexyfex. Codeworker. https://de.gta5-mods.com/tools/codewalker-gtav-interactive-3d-map. Accessed March 8, 2023.
  9. Holistic large scale video understanding. CoRR, abs/1904.11451, 2019.
  10. Learning to detect and track visible and occluded body joints in a virtual world. In European Conference on Computer Vision (ECCV), 2018.
  11. Christoph Feichtenhofer. X3d: Expanding architectures for efficient video recognition, 2020.
  12. Fivem. Fivem. https://fivem.net. Accessed March 8, 2023.
  13. Jerome H Friedman. A variable span smoother. Laboratory for Computational Statistics, Department of Statistics, Stanford …, 1984.
  14. Activitynet: A large-scale video benchmark for human activity understanding. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 961–970, 2015.
  15. Eldersim: A synthetic data generation platform for human action recognition in eldercare applications. IEEE Access, 11:9279–9294, 2023.
  16. Human3.6m: Large scale datasets and predictive methods for 3d human sensing in natural environments. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2014.
  17. Hmdb: A large video database for human motion recognition. In 2011 International Conference on Computer Vision, pages 2556–2563, 2011.
  18. Uniformer: Unified transformer for efficient spatial-temporal representation learning. In International Conference on Learning Representations, 2022.
  19. Tsm: Temporal shift module for efficient video understanding. In Proceedings of the IEEE International Conference on Computer Vision, 2019.
  20. Tam: Temporal adaptive module for video recognition. arXiv preprint arXiv:2005.06803, 2020.
  21. Video swin transformer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3202–3211, 2022.
  22. Learning to localize actions from moments. CoRR, abs/2008.13705, 2020.
  23. Jointformer: Single-frame lifting transformer with error prediction and refinement for 3d human pose estimation. 26TH International Conference on Pattern Recognition, ICPR 2022, 2022.
  24. Moments in time dataset: one million videos for event understanding. IEEE Transactions on Pattern Analysis and Machine Intelligence, pages 1–8, 2019.
  25. Multi-moments in time: Learning and interpreting models for multi-action video understanding. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(12):9434–9445, 2021.
  26. Let’s play for action: Recognizing activities of daily living by learning from life simulation video games. CoRR, abs/2107.05617, 2021.
  27. Ntu rgb+d: A large scale dataset for 3d human activity analysis. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1010–1019, 2016.
  28. Finegym: A hierarchical video dataset for fine-grained action understanding. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2616–2625, 2020a.
  29. Temporal interlacing network. AAAI, 2020b.
  30. Ucf101: A dataset of 101 human actions classes from videos in the wild. ArXiv, abs/1212.0402, 2012.
  31. Deep high-resolution representation learning for human pose estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
  32. VideoMAE: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In Advances in Neural Information Processing Systems, 2022.
  33. Synthetic humans for action recognition from unseen viewpoints. In IJCV, 2021.
  34. Temporal segment networks: Towards good practices for deep action recognition. In European conference on computer vision, pages 20–36. Springer, 2016.
  35. Non-local neural networks. CVPR, 2018.
  36. Temporal pyramid network for action recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
  37. H3wb: Human3.6m 3d wholebody dataset and benchmark. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 20166–20177, 2023.

Summary

We haven't generated a summary for this paper yet.