Iterative Model Improvement Through Open-Vocabulary Data Selection in the Mcity Data Engine
The paper "Mcity Data Engine: Iterative Model Improvement Through Open-Vocabulary Data Selection" presents a framework for addressing the challenges inherent in dataset selection and labeling within machine learning applications, particularly in Intelligent Transportation Systems (ITS). The ever-growing volume of raw data from vehicle fleets and roadside perception systems presents researchers with significant difficulty in detecting and selecting rare and novel classes—a situation exacerbated by long-tail data distribution.
The authors introduce the Mcity Data Engine (MDE) as a comprehensive, open-source solution designed to support the entire data-driven model development cycle, from data acquisition to model deployment, focusing on maximal utilization of datasets that include rare and novel instances. The MDE's functionality spans data acquisition, storage, selection, labeling, training, validation, and deployment, intending to provide robust support for rare class detection through an open-vocabulary data engine approach.
Methodological Highlights
The paper describes the MDE, which supports various data sources, prioritizing vision datasets and data from the University of Michigan's Smart Intersections Project in Ann Arbor, Michigan. The data engine converts datasets into a unified Voxel51 format, facilitating efficient storage and processing through integrated access schemes, including AWS S3.
A key methodological innovation lies in the MDE's ensemble of open-vocabulary object detection models for data selection, allowing a natural language query of classes of interest. The ensemble consists of models such as OWL-ViT, Grounding-DINO, and OmDet-Turbo, recommended for task optimization via a seed dataset evaluation process. The selection process employs a consensus-based method to mitigate false positives and negatives, thus enhancing the noise-filtering capabilities in the detection outputs.
For data labeling, the MDE integrates with CVAT for human-assisted labeling, along with supporting automated labeling through pre-trained models for segmentation and depth estimation. It further facilitates multi-dataset alignment for different class label schemes using zero-shot classification models.
Evaluation and Results
The paper evaluates the MDE using a real-world application of Vulnerable Road User (VRU) detection in fisheye camera data, emphasizing the model's ability to identify and label VRUs effectively. An ensemble of models optimized through systematic evaluation demonstrates superior recall, validating the engine's capability to leverage open-vocabulary models for detecting rare instances. The evaluation, conducted using a seed dataset with VRU instances, concludes with an ensemble consensus process achieving high true-positive detection rates, indicating substantial improvements in data selection efficacy.
Implications and Future Directions
The availability of the MDE as an open-source solution has significant implications for both research and practical applications. For academia, the engine's comprehensive support for data-driven model development presents potential improvement avenues for existing ITS models. Practically, the MDE's deployment capabilities align well with real-world systems, aiding applications such as near-miss detection and roadside assistance systems.
The paper suggests future developments focusing on refining model prediction robustness and integration with additional operational design domains and real-world roadside perception systems. The presented work furthers the potential of leveraging cutting-edge object detection models and data-centric development processes for ITS improvements.
In conclusion, this research contributes a valuable toolset for iterative model improvement in ITS and related fields by presenting a novel approach to open-vocabulary data selection, addressing the challenges of long-tail data distribution, and enhancing the accessibility and utility of large datasets for machine learning model refinement.