Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
126 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Aria Everyday Activities Dataset (2402.13349v2)

Published 20 Feb 2024 in cs.CV, cs.AI, and cs.HC

Abstract: We present Aria Everyday Activities (AEA) Dataset, an egocentric multimodal open dataset recorded using Project Aria glasses. AEA contains 143 daily activity sequences recorded by multiple wearers in five geographically diverse indoor locations. Each of the recording contains multimodal sensor data recorded through the Project Aria glasses. In addition, AEA provides machine perception data including high frequency globally aligned 3D trajectories, scene point cloud, per-frame 3D eye gaze vector and time aligned speech transcription. In this paper, we demonstrate a few exemplar research applications enabled by this dataset, including neural scene reconstruction and prompted segmentation. AEA is an open source dataset that can be downloaded from https://www.projectaria.com/datasets/aea/. We are also providing open-source implementations and examples of how to use the dataset in Project Aria Tools https://github.com/facebookresearch/projectaria_tools.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (29)
  1. My view is the best view: Procedure learning from egocentric videos. In European Conference on Computer Vision (ECCV), 2022.
  2. Emerging properties in self-supervised vision transformers. In Proceedings of the International Conference on Computer Vision (ICCV), 2021.
  3. Argoverse: 3d tracking and forecasting with rich maps, 2019.
  4. Seamless: Multilingual expressive and streaming speech translation, 2023.
  5. Scaling egocentric vision: The epic-kitchens dataset. In European Conference on Computer Vision (ECCV), 2018.
  6. Rescaling egocentric vision: Collection, pipeline and challenges for epic-kitchens-100. International Journal of Computer Vision (IJCV), 130:33–55, 2022.
  7. Project aria: A new tool for egocentric multi-modal ai research, 2023.
  8. Vision meets robotics: The kitti dataset. International Journal of Robotics Research (IJRR), 2013.
  9. Ego4d: Around the world in 3,000 hours of egocentric video, 2022.
  10. Ego-exo4d: Understanding skilled human activity from first- and third-person perspectives, 2023.
  11. 3d gaussian splatting for real-time radiance field rendering. ACM Transactions on Graphics, 42(4), July 2023.
  12. Segment anything. arXiv:2304.02643, 2023.
  13. In the eye of beholder: Joint learning of gaze and actions in first person video. In Proceedings of the European Conference on Computer Vision (ECCV), September 2018.
  14. Delving into egocentric actions. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 287–295, 2015.
  15. Visual instruction tuning, 2023.
  16. Grounding dino: Marrying dino with grounded pre-training for open-set object detection. arXiv preprint arXiv:2303.05499, 2023.
  17. Aria pilot dataset. https://about.facebook.com/realitylabs/projectaria/datasets, 2022.
  18. Gpt-4 technical report, 2023.
  19. Aria digital twin: A new benchmark dataset for egocentric 3d machine perception, 2023.
  20. Learning transferable visual models from natural language supervision, 2021.
  21. Robust speech recognition via large-scale weak supervision. In International Conference on Machine Learning, pages 28492–28518. PMLR, 2023.
  22. Structure-from-motion revisited. In Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
  23. Assembly101: A large-scale multi-view video dataset for understanding procedural activities. CVPR 2022, 2022.
  24. Scalability in perception for autonomous driving: Waymo open dataset. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2020.
  25. Nerfstudio: A modular framework for neural radiance field development. In ACM SIGGRAPH 2023 Conference Proceedings, SIGGRAPH ’23, 2023.
  26. Gemini Team. Gemini: A family of highly capable multimodal models, 2023.
  27. EPIC Fields: Marrying 3D Geometry and Video Understanding. In Proceedings of the Neural Information Processing Systems (NeurIPS), 2023.
  28. Holoassist: an egocentric human interaction dataset for interactive ai assistants in the real world. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 20270–20281, October 2023.
  29. Efficientsam: Leveraged masked image pretraining for efficient segment anything. 2023.
Citations (3)

Summary

  • The paper presents a novel multimodal dataset that advances egocentric AI research by capturing 143 sequences of daily activities across diverse indoor settings.
  • It leverages advanced machine perception techniques to provide high-resolution RGB video, calibrated eye gaze, semi-dense point clouds, and aligned 3D trajectories.
  • The dataset enables groundbreaking research in 3D scene reconstruction and prompted segmentation, while ensuring ethical data handling through robust anonymization practices.

Comprehensive Overview of the Aria Everyday Activities (AEA) Dataset

Introduction to AEA Dataset

The rapid advancements in augmented reality (AR) and AI promise to integrate AR devices and personal wearable AI into our daily lives. The Aria Everyday Activities (AEA) Dataset emerges as a pivotal resource for researchers aiming to exploit the unique multimodal data streams these wearable devices can offer. This dataset provides egocentric data from Project Aria glasses, encapsulating a wide variety of daily activities captured in five distinct indoor locations. Unlike previous datasets, AEA extends the variety of sensory data to include high-resolution RGB and monochrome videos, eye-tracking, spatial audio, and more, creating a robust foundation for egocentric AI research.

Dataset Overview

AEA comprises 143 sequences of daily activities recorded across different environments. This significant volume of multimodal sensor data includes aligned 3D trajectories, point clouds, eye gaze vectors, and time-aligned speech transcriptions, addressing the gap left by traditional datasets in egocentric 3D and multimodal learning. The uniqueness of AEA also lies in its 4D longitudinal data, offering researchers a deep dive into the temporal dynamics and spatial nuances of everyday activities from a first-person perspective.

Technological and Methodological Contributions

Key innovations brought forward by AEA include:

  • Updated Machine Perception Data: Leveraging the latest Machine Perception Services, the dataset offers enhanced pose information, semi-dense point clouds, and calibrated eye gaze data, enriching the depth of contextual understanding for AI models.
  • Data Collection and Anonymization: Following Meta's Responsible Innovation Principles, the dataset ensures privacy by anonymizing all personal identifiable information within the recordings, setting a precedent for ethical data handling in research.
  • Dataset Tools: Accompanying the AEA dataset, open-source tools have been updated to facilitate data manipulation, offering capabilities to handle multimodal data and machine perception outputs effectively. This package includes tools for visualizing synchronized activities, further simplifying the exploration and utilization of the dataset.

Exemplar Research Applications Enabled by AEA

The AEA dataset's richness allows for groundbreaking applications in AI research. Two highlighted research directions include:

  • 3D Neural Scene Reconstruction: Demonstrating the potential of the dataset in reconstructing high-quality 3D scenes from egocentric data, the paper showcases applications leveraging the dataset's closed-loop trajectories and point clouds for immersive AR/VR experiences.
  • Prompted Segmentation: Exploring the integration of eye gaze and speech prompts with foundational models for object segmentation, AEA enables novel approaches to contextual object recognition, opening avenues for research in interactive AI systems.

Future Directions and Impact

The AEA dataset not only pushes the boundaries of what's possible with egocentric AI research but also sets a new standard in dataset utility and ethical considerations. By providing a comprehensive suite of tools alongside the dataset, the authors ensure accessibility and encourage widespread adoption within the research community.

Looking forward, the AEA dataset has the potential to foster innovation in personalized assistive AI technologies, augmented reality applications, and beyond. It emphasizes the need for rich, contextual data in developing AI systems that understand and predict user intent and interaction with their surroundings, marking a significant step forward in creating more intuitive and immersive AI experiences.

Conclusion

The Aria Everyday Activities Dataset provides an unprecedented resource for the exploration of personalized and contextualized AI research. With its diverse range of sensor modalities, spatial-temporal alignment, and ethical data practices, AEA is poised to drive forward the development of intelligent systems that enhance human-computer interaction in daily life. As we move towards a future where AR and AI are seamlessly integrated into our everyday experiences, datasets like AEA will play a crucial role in realizing this vision.

Youtube Logo Streamline Icon: https://streamlinehq.com