BEHAVIOR Vision Suite: Customizable Dataset Generation via Simulation
Introduction
Creating and evaluating computer vision models often requires large datasets that cater to specific research needs. However, real-world datasets usually fall short due to limitations like acquisition costs, inaccuracies, and fixed configurations. In response, synthetic data generation offers an alternative, but existing tools often lack quality and diversity. The BEHAVIOR Vision Suite (BVS) aims to tackle these challenges by offering a toolkit for generating fully customized synthetic datasets.
What is BEHAVIOR Vision Suite (BVS)?
BVS is built on the BEHAVIOR-1K benchmark and consists of two main components:
- Extended BEHAVIOR-1K Assets: A diverse collection of over 8,000 object models and 1,000 scene instances. These assets cover a wide range of categories and include features like articulated joints and fluid dynamics for realistic simulations.
- Customizable Dataset Generator: A robust software tool that uses these assets to create tailored datasets. The generator supports a wide variety of parameters at the scene, object, and camera levels, ensuring physical plausibility through a physics engine.
Key Features
Here's what makes BVS special:
- Comprehensive Labels: Generates labels at image, object, and pixel levels (e.g., scene graphs, point clouds, segmentation masks).
- Diverse and Photorealistic: Covers a wide array of indoor scenes and objects with high visual and physical fidelity.
- Customizability: Users can adjust parameters like object poses, semantic states, lighting conditions, and camera settings.
- User-friendly Tooling: Includes utilities for generating data tailored to specific research needs.
Applications and Experiments
BVS's utility is demonstrated through three primary applications, showcasing its robustness and versatility:
1. Parametric Model Evaluation
In this application, BVS is used to test model robustness against various parameters like lighting, occlusion, and object articulation. The dataset includes up to 500 video clips for each parameter, revealing significant performance differences among current state-of-the-art (SOTA) models. For instance, models generally struggled with detecting objects under low-light conditions or when objects were partially occluded. This kind of systematic evaluation is difficult to achieve with real-world datasets but is easily manageable with BVS.
2. Holistic Scene Understanding
BVS generated a large-scale dataset containing over 266,000 frames, each annotated with various labels like segmentation masks and depth maps. This comprehensive dataset was used to benchmark SOTA models on several tasks, including object detection, segmentation, depth estimation, and point cloud reconstruction. Interestingly, the relative performance of these models on the synthetic dataset closely matched their performance on real-world datasets, validating the photorealism and utility of BVS-generated data.
3. Object States and Relations Prediction
This application focuses on a novel vision task: predicting object states and their relationships. BVS generated 12,500 images with labels like "open," "closed," "on top of," and "inside." When tested on real-world images, a model trained solely on this synthetic dataset achieved impressive accuracy, highlighting BVS's potential for sim2real transfer. Training with this synthetic data outperformed zero-shot models like CLIP, proving the effectiveness of task-specific training.
Implications and Future Directions
The capabilities of BVS offer practical and theoretical benefits:
- Practical: Researchers can create large-scale, customized datasets for specific tasks, reducing the reliance on costly and inflexible real-world data.
- Theoretical: The ability to systematically vary parameters and observe model performance can help identify weaknesses and guide improvements in computer vision models.
Future developments could include expanding the range of customizable parameters and enhancing the photorealism of generated datasets. This would make BVS even more valuable for diverse applications in computer vision research and beyond.
This overview of the BEHAVIOR Vision Suite highlights its potential to revolutionize how computer vision datasets are created and utilized. With its extensive customization options and high-quality outputs, BVS stands as a powerful tool for advancing computer vision research.