Open-Set & Object-Agnostic Mapping

Updated 19 January 2026

Open-set and object-agnostic mapping is a framework that generalizes environmental recognition by allowing models to detect both known and unseen objects.
Techniques leverage embedding-based detection, prototype methods, and unsupervised clustering to classify objects without a fixed semantic taxonomy.
Evaluations show enhancements in real-time semantic rendering and robustness, while challenges remain in dynamic streaming and efficient deployment.

Open-set and object-agnostic mapping refers to a family of recognition, detection, and mapping frameworks designed to operate without prior constraints on the types, numbers, or semantic classes of objects present in the environment. Unlike conventional closed-set approaches, which presuppose a fixed, known taxonomy of object classes, open-set and object-agnostic methods are architected to generalize to novel, unseen categories and to support flexible semantic querying and mapping. This area has become central to robotics, computer vision, and machine perception, especially with the increasing prevalence of vision-language foundation models, self-supervised embedding spaces, and multi-modal data sources.

1. Formal Definitions and Problem Setups

Open-set recognition formalizes the challenge of classifying known classes while simultaneously identifying inputs that belong to previously unseen, or "unknown," classes. In the most general instantiations, the set of possible labels may be unbounded, and models should assign an "unknown" label or open-set similarity score to out-of-distribution samples. Object-agnostic mapping—sometimes termed class-agnostic or category-agnostic mapping—eschews pre-defined semantic categories altogether, instead representing objects and regions in a generic feature or similarity space.

For example, in Few-Shot Open-Set Recognition (FSOSR), the label space is partitioned into a set of known (closed-set) classes $\mathcal{C}$ and an unknown (open-set) space $\bar{\mathcal{O}}$ , with $\mathcal{C} \cap \bar{\mathcal{O}} = \emptyset$ . The task involves, for a query input $x$ , returning both a closed-set class score $p(y=k|x)$ for $k \in \{1, \dots, K\}$ and an "outlierness" score $p(y\notin \mathcal{C}|x)$ , using only a small support set and possibly a frozen feature extractor (Boudiaf et al., 2022).

In open-set 3D scene understanding, the requirement is to construct explicit, queryable maps annotated with objects and relationships that are not limited by a fixed label ontology, typically by leveraging embeddings from large-scale vision-LLMs (Koch et al., 2024). In SLAM and mapping, object-agnostic approaches may represent each object as a latent feature centroid plus geometric coordinates, avoiding reliance on semantic classifiers (Singh et al., 2024).

2. Core Methodologies and Algorithms

The methodologies for open-set/object-agnostic mapping fall into several architectural paradigms:

a. Embedding-Based Detection and Assignment

Many approaches employ pre-trained, frozen feature extractors ( $\phi: X \to \mathcal{Z}$ ) and perform downstream detection and recognition in the embedding space. For example, model-agnostic FSOSR uses a combination of cosine-similarity prototypical classifiers for inlier classification and $k$ -nearest neighbor detectors for open-set detection, operating exclusively in $\mathcal{Z}$ without retraining or architecture-specific tuning (Boudiaf et al., 2022).

b. Prototypical and Outlier Prototype Methods

Mappings often utilize class prototypes for closed-set classes and devise a mechanism for handling unknowns, such as hallucinated or implicit outlier prototypes. In OSTIM, the outlier prototype is defined as the negative mean of the inlier prototypes, and classification is executed as $(K+1)$ -way softmax over class plus outlier, with optimization objectives based on mutual information maximization (Boudiaf et al., 2022).

c. Category-Agnostic Clustering

Several frameworks augment standard recognition by incorporating unsupervised or self-supervised clustering over the target data, producing category-agnostic clusters which serve as latent structure indicators. In SE-CC for domain adaptation, k-means-derived centroids provide a "soft" cluster structure preserved by a clustering KL loss, guiding the network to discover and separate target-domain unknowns (Pan et al., 2020).

d. Vocabulary- or Prototype-Informed Embedding

Unification across supervised, zero-shot, and open-set recognition can be achieved by embedding samples into a semantic space indexed by a large open vocabulary of prototypes (e.g., word2vec/GloVe). The weighted maximum-margin framework (WMM-Voc) imposes constraints to project samples closer to their true prototype than to any of the near-neighbor vocabulary items, enabling reasoning with up to 310 K classes and handling open-set queries (Fu et al., 2023).

e. Instance- and Object-Centric Representations

Object-agnostic detectors (e.g., SSCOD) learn to produce class-agnostic "objectness" maps and group object-like regions via contrastive and metric learning in embedding space. This schema extends to mapping: FindAnything maintains per-segment CLIP embeddings for volumes, while OpenGS-SLAM uses explicit label assignments derived from foundation model segmenters, supporting fast per-object relabeling (Nguyen et al., 2021, Laina et al., 11 Apr 2025, Yang et al., 3 Mar 2025).

Methods such as ConceptFusion and Open3DSG perform fusion of pixel- or point-level open-set features into 3D geometric representations (surfels, Gaussians, TSDFs) and associate each region with a feature embedding. These embeddings support semantic queries (text, image, audio, click) through similarity search, and some systems generalize to open-set relationship prediction using LLMs (Jatavallabhula et al., 2023, Koch et al., 2024).

3. Notable Architectures and Systems

Several key systems and frameworks exemplify contemporary approaches:

Model/System	Main Objective	Key Innovations
OSTIM (Boudiaf et al., 2022)	FSOSR (Few-Shot Open-Set)	Implicit outlier prototype; transductive InfoMax; model-agnostic inference
SSCOD (Nguyen et al., 2021)	Class-agnostic common object detection	Single-stage detection; embedding head; cosine-based matching
SE-CC (Pan et al., 2020)	Domain adaptation with open set	Student–teacher self-ensembling; category-agnostic clustering; MI maximization
ConceptFusion (Jatavallabhula et al., 2023)	Open-set multimodal 3D mapping	Pixel-aligned open-set features; SLAM fusion; multi-modal querying
OpenGS-SLAM (Yang et al., 3 Mar 2025)	Open-set dense semantic SLAM	3D Gaussian splatting; explicit 2D label voting; dynamic consensus
FindAnything (Laina et al., 11 Apr 2025)	Open-vocabulary, object-centric mapping for robots	Volumetric submaps; CLIP aggregation at segment-level; open-vocabulary index/query
Open3DSG (Koch et al., 2024)	Open-vocabulary 3D scene graphs	2D→3D embedding distillation; LLM-driven open-set relationship prediction
LOSS-SLAM (Singh et al., 2024)	Lightweight open-set semantic mapping for SLAM	Foundation model patch embeddings; clustering; factor graph association on features
WMM-Voc (Fu et al., 2023)	Unified open-set/zero-shot/supervised recognition	Maximum-margin semantic embedding against large open vocabularies

These systems typically combine object-centric or region-centric representations, explicit (or implicit) semantic embedding, and open-set detection capability.

4. Performance, Evaluation, and Practical Impact

Open-set/object-agnostic mapping methods are evaluated with a range of metrics reflecting both generalization to novel classes and retention of accuracy for known classes. Common metrics include closed-set accuracy (Acc), open-set AUROC, AUPR, mean Intersection-over-Union (mIoU) in 2D/3D, R@k for retrieval, and localization or drift errors in SLAM.

Notable performance highlights include:

OSTIM exhibiting $\sim$ 70--85% closed-set accuracy and 75--88% AUROC across vision benchmarks, with $\sim$ 10% AUROC improvements versus previous transductive methods; improvement observed across 10 architectures with no hyperparameter tuning (Boudiaf et al., 2022).
OpenGS-SLAM achieves %%%%13 $(K+1)$ 14%%%% faster semantic rendering and halves storage versus prior 3D methods, with open-set mIoU 61.9% on Replica (Yang et al., 3 Mar 2025).
ConceptFusion demonstrates $>$ 40% mIoU gain on long-tailed objects and supports multi-modal semantic querying without any re-training (Jatavallabhula et al., 2023).
FindAnything sets new state-of-the-art on Replica closed-set f-mIoU ($62.9$%) and supports real-time, on-drone, language-driven exploration (Laina et al., 11 Apr 2025).
LOSS-SLAM achieves trajectory errors competitive with closed-set or geometric-only baselines, but with open-set, object-level semantic mapping and linear memory scaling with number of objects (Singh et al., 2024).

Practical impact includes the capacity for real-time semantic understanding on resource-constrained robots (Laina et al., 11 Apr 2025), robust scene graph extraction from single point clouds without curated labels (Koch et al., 2024), and scalability to hundreds of thousands of unseen object categories (Fu et al., 2023). These capabilities are fundamental for autonomous exploration, manipulation, surveillance, discovery in web-scale collections, and cross-modal reasoning.

5. Theoretical Insights and Open Challenges

The effectiveness of open-set/object-agnostic approaches is attributed to several theoretical and empirical observations:

Frozen foundation model embeddings cluster seen-class instances, but open-set data induces blurred boundaries. Transductive mutual information objectives that explicitly allocate a prototype or feature mass for unknowns—such as in OSTIM (implicit outlier prototype) or SE-CC (category-agnostic clusters)—preserve discrimination among inliers while pushing outliers apart (Boudiaf et al., 2022, Pan et al., 2020).
Embedding-based clustering (category-agnostic or semantic space) enables "unknown" regions to naturally emerge as separate clusters or low-similarity targets, bypassing the need for dedicated outlier detection modules (Pan et al., 2020, Fu et al., 2023).
Open-vocabulary prototypes or segment-level embeddings amortize the capabilities of vision-LLMs, supporting free-form querying and language-conditionable mapping, without ever collapsing to a small fixed taxonomy (Laina et al., 11 Apr 2025, Jatavallabhula et al., 2023, Koch et al., 2024).
Consistency mechanisms such as multi-view fusion, label-voting, and post-hoc consensus steps are key for robust open-set mapping across varying viewpoints and acquisition frames (Yang et al., 3 Mar 2025, Laina et al., 11 Apr 2025).

Limitations remain: open-set mapping approaches often assume a static scene, require the presence of object-like regions in all target domains, and can be sensitive to clustering hyperparameters, object granularity, or mutual information objectives. Transductive approaches can be less flexible in streaming or highly imbalanced support/query regimes (Boudiaf et al., 2022). Extension to additional modalities (e.g., audio, text, continuous video streams) is feasible provided a suitable embedding space, but requires further validation (Fu et al., 2023, Jatavallabhula et al., 2023).

6. Directions for Future Research

Dynamic and streaming open-set mapping: Existing approaches largely require batch or transductive access to the query/sample set, raising challenges for online, continuous mapping and lifelong learning scenarios (Boudiaf et al., 2022).
Hierarchical and generative unknown modeling: Beyond geometric prototypes, integrating generative schemes or hierarchical latent spaces may improve fine-grained discrimination and unknown detection (Boudiaf et al., 2022).
Cross-domain and cross-modal adaptation: Robustness to severe domain shifts, and leveraging unsupervised or self-supervised methods to extract category-agnostic structure, will be critical for general-purpose deployment (Pan et al., 2020, Jatavallabhula et al., 2023).
Resource efficiency and deployment: Further compression, model pruning, and system optimization will enable open-set mapping on edge devices and intra-robot communication settings (Laina et al., 11 Apr 2025, Singh et al., 2024).
Interaction and affordance querying: Language-conditioned or action-conditioned open-set mapping (e.g., affording queries about relationships, affordances, scene context) is beginning to be explored through integration with LLMs and scene graph methodologies (Koch et al., 2024).

Open-set and object-agnostic mapping will continue to grow in importance as intelligent systems operate in ever more semantically complex, unpredictable, and dynamic environments; the maturation of foundation models and scalable embedding-based reasoning provides increasingly powerful mechanisms for tackling the unboundedness of the real world.

Markdown Upgrade to Chat

References (9)

Model-Agnostic Few-Shot Open-Set Recognition (2022)

Open3DSG: Open-Vocabulary 3D Scene Graphs from Point Clouds with Queryable Objects and Open-Set Relationships (2024)

LOSS-SLAM: Lightweight Open-Set Semantic Simultaneous Localization and Mapping (2024)

Exploring Category-Agnostic Clusters for Open-Set Domain Adaptation (2020)

Vocabulary-informed Zero-shot and Open-set Learning (2023)

Single Stage Class Agnostic Common Object Detection: A Simple Baseline (2021)

FindAnything: Open-Vocabulary and Object-Centric Mapping for Robot Exploration in Any Environment (2025)

OpenGS-SLAM: Open-Set Dense Semantic SLAM with 3D Gaussian Splatting for Object-Level Scene Understanding (2025)

ConceptFusion: Open-set Multimodal 3D Mapping (2023)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Open-Set/Object-Agnostic Mapping.