Unveiling Objects with SOLA: An Annotation-Free Image Search on the Object Level for Automotive Data Sets (2312.01860v2)

Published 4 Dec 2023 in cs.RO and cs.CV

Abstract: Huge image data sets are the fundament for the development of the perception of automated driving systems. A large number of images is necessary to train robust neural networks that can cope with diverse situations. A sufficiently large data set contains challenging situations and objects. For testing the resulting functions, it is necessary that these situations and objects can be found and extracted from the data set. While it is relatively easy to record a large amount of unlabeled data, it is far more difficult to find demanding situations and objects. However, during the development of perception systems, it must be possible to access challenging data without having to perform lengthy and time-consuming annotations. A developer must therefore be able to search dynamically for specific situations and objects in a data set. Thus, we designed a method which is based on state-of-the-art neural networks to search for objects with certain properties within an image. For the ease of use, the query of this search is described using natural language. To determine the time savings and performance gains, we evaluated our method qualitatively and quantitatively on automotive data sets.

Citations (2)

View on Semantic Scholar

Summary

The paper presents SOLA, a novel annotation-free system using panoptic segmentation and CLIP embeddings for object-level search in automotive datasets.
It demonstrates superior performance over traditional methods by efficiently retrieving challenging objects such as stop signs and stretch limousines.
The method enhances automated driving system development by improving data diversity and precision in large-scale image repositories.

Introduction

The development of automated driving systems (ADS) hinges on the ability to train neural networks with diverse image datasets capturing various driving conditions and scenarios. However, while capturing a large trove of unlabeled images is relatively straightforward, identifying and utilizing images that contain specific challenging situations or objects—the type of data critical for honing the sensitivity of ADS—is not as simple. To address this challenge, researchers have designed a method leveraging state-of-the-art neural networks to enable dynamic search for specific objects according to their properties within a dataset, using natural language as input.

The ability to manage and effectively search huge datasets is a vital component for Automotive Systems Engineering. Traditional image retrieval strategies for specific objects have relied on manual tagging or used context metadata, like recording locations or timestamps. While technologies have emerged to facilitate object retrieval using sketches and natural language, they often require reprocessing every image for each search, a ponderous task. Notably, a neural network model named CLIP has been employed for its feature embedding capabilities, allowing the search across datasets at the object level, minimizing processing time.

Methodology

The newly proposed system, SOLA, operates without annotations and is based on a two-fold strategy: preprocessing and image retrieval. The preprocessing involves a panoptic segmentation to detect objects within an image. Each detected object is then mapped into a vector representation using the image encoder from the CLIP model. For image retrieval, text descriptions are translated into vector space using a text encoder, which are then matched with vector representations of objects to score their relevance to the user's natural language query. This approach anticipates the anticipated scenarios during the development of ADS.

Evaluation and Conclusion

To assess the efficiency of the SOLA method, qualitative and quantitative experiments were conducted on two automotive data sets—one with a broad diversity and another featuring specific car types. SOLA's performance was benchmarked against traditional image retrieval approaches and showed superior results, particularly where specific, less common objects like stop signs and stretch limousines were involved. While acknowledging the limit to searching only for object classes recognized by the segmentation model and the potential of overlooking objects (false negatives), the research highlighted SOLA's capacity for speed and efficiency in scanning through massive image datasets. Future research is directed towards enhancing prompt optimization to support users in crafting more precise searches and further reducing false positives.

In summary, SOLA emerges as a powerful and intuitive tool for parsing extensive automotive image repositories, enabling developers to fine-tune ADS through focused image retrieval. Despite its limitations, SOLA demonstrates a marked improvement in performance and practicality, aiding the more rapid development of sophisticated and nuanced automated driving technologies.