Papers
Topics
Authors
Recent
2000 character limit reached

TARO: Toward Semantically Rich Open-World Object Detection (2510.09173v1)

Published 10 Oct 2025 in cs.CV

Abstract: Modern object detectors are largely confined to a "closed-world" assumption, limiting them to a predefined set of classes and posing risks when encountering novel objects in real-world scenarios. While open-set detection methods aim to address this by identifying such instances as 'Unknown', this is often insufficient. Rather than treating all unknowns as a single class, assigning them more descriptive subcategories can enhance decision-making in safety-critical contexts. For example, identifying an object as an 'Unknown Animal' (requiring an urgent stop) versus 'Unknown Debris' (requiring a safe lane change) is far more useful than just 'Unknown' in autonomous driving. To bridge this gap, we introduce TARO, a novel detection framework that not only identifies unknown objects but also classifies them into coarse parent categories within a semantic hierarchy. TARO employs a unique architecture with a sparsemax-based head for modeling objectness, a hierarchy-guided relabeling component that provides auxiliary supervision, and a classification module that learns hierarchical relationships. Experiments show TARO can categorize up to 29.9% of unknowns into meaningful coarse classes, significantly reduce confusion between unknown and known classes, and achieve competitive performance in both unknown recall and known mAP. Code will be made available.

Summary

  • The paper introduces TARO, a novel framework that enriches open-world object detection by categorizing unknown objects using a hierarchical taxonomy.
  • It employs a sparsemax-based objectness head and hierarchy-aware activation to enhance detection precision and increase unknown recall, achieving up to 29.9% hierarchy accuracy.
  • The study demonstrates that integrating semantic hierarchies with detection techniques yields robust results, paving the way for future multi-modal enhancements in open-world scenarios.

An Expert Review of "TARO: Toward Semantically Rich Open-World Object Detection"

Introduction

The paper "TARO: Toward Semantically Rich Open-World Object Detection" introduces an innovative approach to open-world object detection (OWOD). Existing object detectors predominantly operate under a "closed-world" assumption, restricting the model to a predefined class set and posing significant challenges when encountering novel objects. While previous open-set detection methods have concentrated on identifying unknown objects as a single "Unknown" category, this paper presents TARO, a framework that aims to enrich semantic understanding by categorizing unknown objects into coarse parent categories within a hierarchical taxonomy.

Methodology

TARO extends the capabilities of Deformable-DETR (D-DETR) by incorporating a taxonomy-based structure for classifying unknown objects into broader categories. The framework consists of three key components:

  1. Sparsemax-based Objectness Head: The objectness score is modeled using sparsemax, providing a sparse probability distribution over queries. This encourages competition among queries, allowing the model to focus on queries likely to be objects, without forcefully suppressing plausible object queries.
  2. Hierarchical-Aware Activation: This component reinforces the hierarchical relationship between classes. Each child class probability is multiplied by its parent's probability, governed by a learnable parameter. This enforces hierarchical consistency and discourages inconsistent object classification at different hierarchical levels.
  3. Hierarchy-Guided Relabeling: High-confidence signals from the classification head guide the objectness head, providing auxiliary supervision. Unmatched queries with significant non-leaf activations are relabeled as potential objects, refining the learning of object-like characteristics in queries. Figure 1

    Figure 1: Overall pipeline of the proposed method. Key aspects include the sparsemax-based objectness head, hierarchical-aware activation for classification, and hierarchy-guided relabeling strategy.

Experimental Evaluation

The paper evaluated TARO using the OWOD and OW-DETR benchmarks. Key metrics include U-R (unknown recall), mAP (mean Average Precision), AOSE (Absolute Open-Set Error), and HAcc (Hierarchy Accuracy). TARO demonstrated superior performance in localizing unknown objects and maintaining robust detection of known categories, achieving up to 29.9% HAcc, indicating its ability to provide meaningful hierarchical classifications.

Ablation Studies

Ablation studies validated the impact of each design choice in TARO. Notably, the sparsemax-based objectness head significantly improved U-R and mAP over using softmax, by focusing on positive queries and reducing background suppression. The hierarchy-aware activation contributed to the hierarchical consistency, while the relabeling strategy provided additional refinement of the objectness evaluation.

Implications and Future Work

The introduction of TARO suggests that integrating hierarchies into object detection frameworks can enhance the handling of unknown objects by providing structured and interpretable categorizations. While the framework advanced unknown recall and reduced confusion between known and unknown objects, challenges remain in consistently localizing and categorizing novel instances. Future research could explore the integration of Vision-LLMs (VLMs) and multi-modal data to further improve detection and classification capabilities in open-world scenarios.

Conclusion

This work extends the conventional boundaries of OWOD by introducing a semantically rich framework capable of categorizing unknowns into coarse categories. By implementing sparsemax-based objectness, hierarchy-aware activation, and hierarchy-guided relabeling, TARO presents a significant step forward in achieving a nuanced understanding of the open world, balancing between known and unknown object detection, and laying the groundwork for further exploration of semantic hierarchies in object detection tasks.

Whiteboard

Paper to Video (Beta)

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.