Open Domain Generalization (ODG)

Updated 26 November 2025

Open Domain Generalization (ODG) is a machine learning paradigm that addresses both domain shift and category shift by unifying domain generalization and open-set recognition.
Recent methods employ semantic-aware prompt engineering, contrastive losses, and diffusion-based generative augmentation to enhance robust feature alignment and unknown class rejection.
Empirical results demonstrate significant gains in accuracy and open-set metrics, while challenges remain in handling ambiguous cases and noisy labels through adaptive meta-learning strategies.

Open Domain Generalization (ODG) is a machine learning paradigm addressing the simultaneous challenge of domain shift and category shift when deploying models in real-world, open-world environments. In ODG, a model is trained exclusively on labeled data from multiple source domains, each with its own data distribution and often a common, but not necessarily identical, set of class labels. At deployment, the model is tasked with generalizing to an entirely unseen target domain that may contain different data distributions (domain shift) and new class labels not present during training (open-set recognition). ODG thus unifies the objectives of domain generalization (DG) and open-set recognition (OSR)—requiring both robust in-domain classification and the principled rejection of unknown-category instances (Wang et al., 21 Nov 2025).

1. Formal Problem Definition and Generalization Bounds

Let $\mathcal{D}_1, \dots, \mathcal{D}_M$ be $M$ labeled source domains, each $\mathcal{D}_k = \{ (x_i^s, y_i^s) \}_{i=1}^{n_k}$ over a label set $\mathcal{Y}^s = \{1,\dots,C\}$ . The model is deployed on an unlabeled target domain $\mathcal{D}^t = \{ x_j^t \}_{j=1}^{n_t}$ , drawn from a shifted distribution and possibly containing both known ( $\mathcal{Y}^s$ ) and unknown ( $\mathcal{Y}^u$ ) classes. The ODG learner $h$ seeks to minimize:

Structural risk on known classes: $R^s(h) = \mathbb{E}_{(x,y)\sim\mathcal{D}^t \cap \mathcal{Y}^s}[\ell(h(x),y)]$
Open-space risk: $R^{OS}(h) = \mathbb{E}_{x\sim\mathcal{D}^t \cap \mathcal{Y}^u}[I(h(x)\in\mathcal{Y}^s)]$

A generalization bound in OSDG (Open-Set Domain Generalization), as established in SeeCLIP, is: $R^t(h) \leq \sum_{i=1}^M \pi_i^* R^i(h) + \frac{\gamma+\rho}{2} + \lambda + \pi^{unk} R^{OS}(h)$ where $\pi^*$ minimizes the $\mathcal{H}$ -divergence between the convex hull of source distributions and the target, $\gamma$ and $\rho$ represent domain shifts, $\lambda$ is the joint ideal risk, and $\pi^{unk} = \Pr(\mathcal{Y}^t \in \mathcal{Y}^u)$ (Wang et al., 21 Nov 2025). The core challenge is thus to balance structural risk (for known classes/domains) and open-space risk (for unknown classes under domain shift).

2. Core Methodological Advances

Semantic-Aware Prompt Engineering and Fine-Grained Alignment

Modern ODG methods often leverage pretrained vision-LLMs (e.g., CLIP) and prompt learning with explicit integration of semantic features. For instance, SeeCLIP introduces a semantic-aware prompt enhancement module: from image patch embeddings $f_i$ , a set of $K$ learnable query vectors $q^{(k)}$ extracts discriminative semantic tokens via attention pooling: $\omega_i^{(k)} = \frac{\exp(q^{(k)} \cdot f_i)}{\sum_{j=1}^N \exp(q^{(k)} \cdot f_j)}, \qquad v_{sem}^{(k)} = \sum_{i=1}^N \omega_i^{(k)} f_i$ These tokens are concatenated with domain tokens in the construction of class and unknown prompts, supporting granular, context-rich vision–language alignment (Wang et al., 21 Nov 2025).

Contrastive Losses and Open-Space Calibration

Explicit contrastive alignment losses are used to align image and prompt features, enforced both for known and unknown class prompts. Two contrastive mechanisms are prominent:

Repulsion: Pushing the unknown prompt embedding away from all known-class embeddings (margin-based).
Cohesion: Ensuring the unknown prompt is not arbitrarily far from the centroid of known prompt embeddings.

This duplex loss design directly tackles the trade-off between misclassifying unknowns as known (open-space risk) and over-rejecting ambiguous knowns (structural risk), an aspect supported by substantial ablation gains in accuracy and H-score (Wang et al., 21 Nov 2025).

Diffusion-Driven and Generative Augmentation for Hard Negatives

Instead of random or ad-hoc mixup, several leading ODG approaches synthesize hard pseudo-unknowns near class boundaries using generative models:

SeeCLIP perturbs semantic tokens and samples from a diffusion network conditioned on dual prompts (“a [domain] image of an unknown class” and negative lists of known classes).
ODG-CLIP and OSLoPrompt generate proxy images for the unknown class using Stable Diffusion, with text prompts designed to exclude known-class semantics and include visually similar “fine-grained” unknowns (Wang et al., 21 Nov 2025, Singha et al., 31 Mar 2024, C et al., 20 Mar 2025).

These synthetics are essential for learning tight and operationally meaningful known/unknown boundaries.

3. Meta-Learning, Scheduling, and Robustness

ODG studies have established the value of meta-learning frameworks that simulate domain shift and category shift at training time:

Bi-level meta-learning objectives partition source domains and classes into meta-train/test splits, updating parameters to maximize generalization across domains and classes.
The Evidential Bi-Level Hardest Domain Scheduler (EBiL-HaDS) adaptively prioritizes the hardest domains (by lowest reliability as measured by evidential confidence), focusing meta-updates where domain and open-set risk are most severe (Peng et al., 26 Sep 2024).
Dualistic gradient-matching, as in MEDIC, aligns both domain and class gradients to find parameter regions that yield balanced boundaries—critical for effective open-set rejection (Wang et al., 2023).
Reliability-aware, evidential approaches quantify uncertainty and improve rejection of unknowns by leveraging Dirichlet-based evidence modeling (Peng et al., 26 Sep 2024, Khoshbakht et al., 11 Jun 2025).

Ablation and visualization studies demonstrate that hardest-domain scheduling and evidential regularization yield more compact known-class clusters, clearer separation of unknowns, and superior open-set metrics (e.g., OSCR, H-score).

4. Extensions: Label Noise, Multimodality, and Specialized Settings

Noisy Labels

OSDG under noisy labels (OSDG-NL) requires frameworks that detect and correct mislabeled samples while maintaining open-set generalization:

HyProMeta introduces hyperbolic prototype learning for robust label-noise detection/correction and prompt-based agnostic augmentation to construct open/out-of-distribution examples (Peng et al., 24 Dec 2024).
EReLiFM further adds two-stage evidential loss clustering for obtaining clean/noisy splits, and domain-category-conditioned residual flow matching to synthesize diverse, reliability-aware augmentations (Peng et al., 14 Oct 2025).

Empirically, these frameworks maintain competitive or superior open-set metrics at high label noise rates compared to previous methods.

Single-Source and Multimodal ODG

Domain expansion plus boundary growth (DEBUG, SODG-Net) addresses single-source ODG using style and background augmentation combined with margin-based and edge-based classifier expansions (Jiao et al., 5 Nov 2024, Bele et al., 2023).
Multimodal extensions (MOOSA) learn joint representations via cross-modal self-supervised pretext tasks (masked cross-modal translation, jigsaw puzzles), combined with entropy-based modality weighting for improved open-set and cross-domain transfer (Dong et al., 1 Jul 2024).

Simpler Baselines

Recent studies confirm that even simple DG methods (CORAL, MMD) extended with Dirichlet mixup and basic ensemble strategies are strong ODG baselines, achieving performance close to meta-learning methods like DAML but at lower computational cost (Noguchi et al., 2023).

5. Empirical Evaluation: Protocols and Benchmarks

ODG performance is typically measured using:

Closed-set (known-class) accuracy (ACC)
Open-set accuracy (ACC for unknown samples, or open recall)
H-score: the harmonic mean $H = 2 \cdot \text{ACC}_{\text{known}} \cdot \text{ACC}_{\text{unknown}} / (\text{ACC}_{\text{known}} + \text{ACC}_{\text{unknown}})$
OSCR: open-set classification rate (ROC-style, threshold independent)

Standard protocols are leave-one-domain-out or leave-one-task-out, using datasets such as Office-Home (65 classes, 4 domains), PACS (7 classes, 4 domains), VLCS (5 classes), Mini-DomainNet (126 classes), Multi-Dataset, and variants for label noise and open/closed splits (Wang et al., 21 Nov 2025, Singha et al., 31 Mar 2024, Noguchi et al., 2023).

SeeCLIP achieves state-of-the-art ODG: on five benchmarks, ACC ≈97.05%, H-score ≈95.66%, with +3% ACC and +5% H-score over ODG-CLIP, and up to +28% over earlier CNN/DG baselines. On fine-grained benchmarks, improvements in subtle unknown discrimination are particularly marked (Wang et al., 21 Nov 2025).

6. Key Challenges and Open Directions

Key limitations and challenges identified in the contemporary ODG literature include:

Trade-off between rejecting unknowns and properly classifying ambiguous knowns (“hard unknowns”)—fine-grained semantic modeling and synthetic negative construction are vital but computationally intensive.
The need for adaptive, data-driven scheduler designs over rigid or random meta-learning splits.
Handling severe label noise and category imbalance, especially when few clean samples exist—necessitating robust prototype and meta-learning strategies (Peng et al., 24 Dec 2024, Peng et al., 14 Oct 2025).
Extension to other modalities (video, audio), more difficult open-world tasks (segmentation, detection), and single-source regimes (Dong et al., 1 Jul 2024, Jiao et al., 5 Nov 2024).
Addressing the limitations of pre-trained encoders and prompt designs; research continues into more sophisticated and adaptive semantic tokenizations and generative augmentations (Wang et al., 21 Nov 2025, Singha et al., 31 Mar 2024).

7. Theoretical Guarantees and Historical Context

PAC-style theory establishes that domain generalization (with potential support and label-shifts) is feasible under mild structural assumptions, given polynomially many source domains and per-domain samples. These include domain-wise Massart noise reduction, efficient decision-tree assembly from per-domain supports, and robust feature selection using domain stability (Garg et al., 2020). Although these results justify the plausibility of ODG under certain conditions, bridging from such formalism to high-dimensional, open-world neural architectures remains an ongoing challenge.

In summary, Open Domain Generalization constitutes a rapidly evolving intersection of domain generalization, open-set recognition, and meta-learning. Recent advances—semantic-enhanced prompt architectures, adaptive meta-training schedules, evidential risk quantification, and strong generative augmentation—define the state of the art, yet significant theoretical and practical frontiers remain open for robust, scalable ODG in ever more realistic settings (Wang et al., 21 Nov 2025, Peng et al., 26 Sep 2024, Peng et al., 24 Dec 2024, C et al., 20 Mar 2025, Singha et al., 31 Mar 2024, Bele et al., 2023).