Aria Everyday Activities Dataset (2402.13349v2)
Abstract: We present Aria Everyday Activities (AEA) Dataset, an egocentric multimodal open dataset recorded using Project Aria glasses. AEA contains 143 daily activity sequences recorded by multiple wearers in five geographically diverse indoor locations. Each of the recording contains multimodal sensor data recorded through the Project Aria glasses. In addition, AEA provides machine perception data including high frequency globally aligned 3D trajectories, scene point cloud, per-frame 3D eye gaze vector and time aligned speech transcription. In this paper, we demonstrate a few exemplar research applications enabled by this dataset, including neural scene reconstruction and prompted segmentation. AEA is an open source dataset that can be downloaded from https://www.projectaria.com/datasets/aea/. We are also providing open-source implementations and examples of how to use the dataset in Project Aria Tools https://github.com/facebookresearch/projectaria_tools.
- My view is the best view: Procedure learning from egocentric videos. In European Conference on Computer Vision (ECCV), 2022.
- Emerging properties in self-supervised vision transformers. In Proceedings of the International Conference on Computer Vision (ICCV), 2021.
- Argoverse: 3d tracking and forecasting with rich maps, 2019.
- Seamless: Multilingual expressive and streaming speech translation, 2023.
- Scaling egocentric vision: The epic-kitchens dataset. In European Conference on Computer Vision (ECCV), 2018.
- Rescaling egocentric vision: Collection, pipeline and challenges for epic-kitchens-100. International Journal of Computer Vision (IJCV), 130:33–55, 2022.
- Project aria: A new tool for egocentric multi-modal ai research, 2023.
- Vision meets robotics: The kitti dataset. International Journal of Robotics Research (IJRR), 2013.
- Ego4d: Around the world in 3,000 hours of egocentric video, 2022.
- Ego-exo4d: Understanding skilled human activity from first- and third-person perspectives, 2023.
- 3d gaussian splatting for real-time radiance field rendering. ACM Transactions on Graphics, 42(4), July 2023.
- Segment anything. arXiv:2304.02643, 2023.
- In the eye of beholder: Joint learning of gaze and actions in first person video. In Proceedings of the European Conference on Computer Vision (ECCV), September 2018.
- Delving into egocentric actions. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 287–295, 2015.
- Visual instruction tuning, 2023.
- Grounding dino: Marrying dino with grounded pre-training for open-set object detection. arXiv preprint arXiv:2303.05499, 2023.
- Aria pilot dataset. https://about.facebook.com/realitylabs/projectaria/datasets, 2022.
- Gpt-4 technical report, 2023.
- Aria digital twin: A new benchmark dataset for egocentric 3d machine perception, 2023.
- Learning transferable visual models from natural language supervision, 2021.
- Robust speech recognition via large-scale weak supervision. In International Conference on Machine Learning, pages 28492–28518. PMLR, 2023.
- Structure-from-motion revisited. In Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
- Assembly101: A large-scale multi-view video dataset for understanding procedural activities. CVPR 2022, 2022.
- Scalability in perception for autonomous driving: Waymo open dataset. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2020.
- Nerfstudio: A modular framework for neural radiance field development. In ACM SIGGRAPH 2023 Conference Proceedings, SIGGRAPH ’23, 2023.
- Gemini Team. Gemini: A family of highly capable multimodal models, 2023.
- EPIC Fields: Marrying 3D Geometry and Video Understanding. In Proceedings of the Neural Information Processing Systems (NeurIPS), 2023.
- Holoassist: an egocentric human interaction dataset for interactive ai assistants in the real world. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 20270–20281, October 2023.
- Efficientsam: Leveraged masked image pretraining for efficient segment anything. 2023.