Papers

Topics

Authors

Recent

View all

Gemini 2.5 Flash

97 tokens/sec

GPT-4o

53 tokens/sec

Gemini 2.5 Pro Pro

43 tokens/sec

o3 Pro

4 tokens/sec

GPT-4.1 Pro

47 tokens/sec

DeepSeek R1 via Azure Pro

28 tokens/sec

2000 character limit reached

108

BEHAVIOR Vision Suite: Customizable Dataset Generation via Simulation (2405.09546v1)

Published 15 May 2024 in cs.CV

Abstract: The systematic evaluation and understanding of computer vision models under varying conditions require large amounts of data with comprehensive and customized labels, which real-world vision datasets rarely satisfy. While current synthetic data generators offer a promising alternative, particularly for embodied AI tasks, they often fall short for computer vision tasks due to low asset and rendering quality, limited diversity, and unrealistic physical properties. We introduce the BEHAVIOR Vision Suite (BVS), a set of tools and assets to generate fully customized synthetic data for systematic evaluation of computer vision models, based on the newly developed embodied AI benchmark, BEHAVIOR-1K. BVS supports a large number of adjustable parameters at the scene level (e.g., lighting, object placement), the object level (e.g., joint configuration, attributes such as "filled" and "folded"), and the camera level (e.g., field of view, focal length). Researchers can arbitrarily vary these parameters during data generation to perform controlled experiments. We showcase three example application scenarios: systematically evaluating the robustness of models across different continuous axes of domain shift, evaluating scene understanding models on the same set of images, and training and evaluating simulation-to-real transfer for a novel vision task: unary and binary state prediction. Project website: https://behavior-vision-suite.github.io/

References (75)

Authors (23)

Yunhao Ge (29 papers)
Yihe Tang (5 papers)
Jiashu Xu (21 papers)
Cem Gokmen (9 papers)
Chengshu Li (32 papers)
Wensi Ai (9 papers)
Benjamin Jose Martinez (1 paper)
Arman Aydin (2 papers)
Mona Anvari (2 papers)
Ayush K Chakravarthy (1 paper)
Hong-Xing Yu (37 papers)
Josiah Wong (10 papers)
Sanjana Srivastava (12 papers)
Sharon Lee (6 papers)
Shengxin Zha (6 papers)
Laurent Itti (58 papers)
Yunzhu Li (56 papers)
Roberto Martín-Martín (79 papers)
Miao Liu (98 papers)
Pengchuan Zhang (58 papers)

Citations (5)

View on Semantic Scholar

Summary

The paper demonstrates the creation of fully customizable synthetic datasets using simulation and diverse assets from BEHAVIOR-1K.
It outlines a robust dataset generator that supports multi-level labels and adjustable parameters for comprehensive vision model evaluation.
Experimental results highlight improved sim2real transfer and model performance across tasks like detection, segmentation, and object state prediction.

BEHAVIOR Vision Suite: Customizable Dataset Generation via Simulation

Introduction

Creating and evaluating computer vision models often requires large datasets that cater to specific research needs. However, real-world datasets usually fall short due to limitations like acquisition costs, inaccuracies, and fixed configurations. In response, synthetic data generation offers an alternative, but existing tools often lack quality and diversity. The BEHAVIOR Vision Suite (BVS) aims to tackle these challenges by offering a toolkit for generating fully customized synthetic datasets.

What is BEHAVIOR Vision Suite (BVS)?

BVS is built on the BEHAVIOR-1K benchmark and consists of two main components:

Extended BEHAVIOR-1K Assets: A diverse collection of over 8,000 object models and 1,000 scene instances. These assets cover a wide range of categories and include features like articulated joints and fluid dynamics for realistic simulations.
Customizable Dataset Generator: A robust software tool that uses these assets to create tailored datasets. The generator supports a wide variety of parameters at the scene, object, and camera levels, ensuring physical plausibility through a physics engine.

Key Features

Here's what makes BVS special:

Comprehensive Labels: Generates labels at image, object, and pixel levels (e.g., scene graphs, point clouds, segmentation masks).
Diverse and Photorealistic: Covers a wide array of indoor scenes and objects with high visual and physical fidelity.
Customizability: Users can adjust parameters like object poses, semantic states, lighting conditions, and camera settings.
User-friendly Tooling: Includes utilities for generating data tailored to specific research needs.

Applications and Experiments

BVS's utility is demonstrated through three primary applications, showcasing its robustness and versatility:

1. Parametric Model Evaluation

In this application, BVS is used to test model robustness against various parameters like lighting, occlusion, and object articulation. The dataset includes up to 500 video clips for each parameter, revealing significant performance differences among current state-of-the-art (SOTA) models. For instance, models generally struggled with detecting objects under low-light conditions or when objects were partially occluded. This kind of systematic evaluation is difficult to achieve with real-world datasets but is easily manageable with BVS.

2. Holistic Scene Understanding

BVS generated a large-scale dataset containing over 266,000 frames, each annotated with various labels like segmentation masks and depth maps. This comprehensive dataset was used to benchmark SOTA models on several tasks, including object detection, segmentation, depth estimation, and point cloud reconstruction. Interestingly, the relative performance of these models on the synthetic dataset closely matched their performance on real-world datasets, validating the photorealism and utility of BVS-generated data.

3. Object States and Relations Prediction

This application focuses on a novel vision task: predicting object states and their relationships. BVS generated 12,500 images with labels like "open," "closed," "on top of," and "inside." When tested on real-world images, a model trained solely on this synthetic dataset achieved impressive accuracy, highlighting BVS's potential for sim2real transfer. Training with this synthetic data outperformed zero-shot models like CLIP, proving the effectiveness of task-specific training.

Implications and Future Directions

The capabilities of BVS offer practical and theoretical benefits:

Practical: Researchers can create large-scale, customized datasets for specific tasks, reducing the reliance on costly and inflexible real-world data.
Theoretical: The ability to systematically vary parameters and observe model performance can help identify weaknesses and guide improvements in computer vision models.

Future developments could include expanding the range of customizable parameters and enhancing the photorealism of generated datasets. This would make BVS even more valuable for diverse applications in computer vision research and beyond.

This overview of the BEHAVIOR Vision Suite highlights its potential to revolutionize how computer vision datasets are created and utilized. With its extensive customization options and high-quality outputs, BVS stands as a powerful tool for advancing computer vision research.

PDF Markdown

Tweets

https://twitter.com/_akhaliq/status/1790929028611821752

https://twitter.com/ducha_aiki/status/1791013004688978216

https://twitter.com/taziku_co/status/1791066500507590863

https://twitter.com/GeYunhao/status/1790967751412265168

https://twitter.com/javaeeeee1/status/1791051528511320246

https://twitter.com/gm8xx8/status/1790938095203889239