Info Theoretic Open Set Recognition

Updated 20 October 2025

Information Theoretic Open Set Recognition is a field that harnesses entropy, mutual information, and KL divergence to reliably distinguish known from unknown samples.
It formulates open-space risk by regulating classifier certainty through the Information Bottleneck, promoting robust rejection of unsupported data.
Practical implementations, including entropy-regularized deep models and contrastive methods, support adaptive learning and continual system improvements.

Information Theoretic Open Set Recognition (OSR) is the paper and design of machine learning systems that explicitly quantify, manage, and minimize the risk of misclassifying unknown samples—those outside the closed set of training classes—by leveraging the mathematical principles of information theory. Core to this research is the use of entropy, mutual information, and information-theoretic divergences (notably Kullback–Leibler (KL) divergence) to formalize and regulate uncertainty, risk, and knowledge transfer in open-world recognition and adaptive learning systems (Wang, 17 Oct 2025).

1. Core Information Theoretic Quantities in OSR

Information Theoretic OSR is fundamentally grounded in three mathematical constructs:

Entropy ( $H$ ): Captures the uncertainty in predictions for a given sample. In OSR, entropy is used to measure the classifier’s confidence. Low entropy in output distributions signals certainty (typical for known classes), while high entropy is expected for unfamiliar or unknown samples:

$H(Y|X) = -\sum_y P(Y=y|X) \log P(Y=y|X)$

Mutual Information ( $I$ ): Quantifies the shared information between input representations and target labels. The Information Bottleneck (IB) principle, $I(Z; X) - \beta I(Z; Y)$ , is adapted for OSR by maximizing $I(Z; Y_{\text{known}})$ for discrimination, while minimizing information leakage to $I(Z; Y_{\text{unknown}})$ to suppress overconfident responses to unknowns.
Kullback–Leibler (KL) Divergence: Measures how the model’s beliefs deviate from priors or from distributions supported by known data:

$D_{\mathrm{KL}}(P \parallel Q) = \sum_i P(i) \log \frac{P(i)}{Q(i)}$

In OSR, KL terms arise in latent representation regularization (as in variational methods), in novelty/outlier detection, and in bounding distributional drift during continual learning.

When combined in the learning objective, these constructs yield a controllable trade-off:

$\min I(Z; X) - \beta I(Z; Y_{\text{known}}) + \gamma I(Z; Y_{\text{unknown}})$

where the last term penalizes unwanted information about unknowns leaking into the latent space (Wang, 17 Oct 2025).

2. Information Theoretic Formulations of Open Set Risk

Information-theoretic OSR extends classical risk decompositions by integrating explicit penalization of “open space” errors via entropy and mutual information. The risk of making confident predictions in unsupported regions (open-space risk) is closely linked to the model’s uncertainty profile—effectively, its entropy in these regions.

Key frameworks described in the literature include:

OSR via Information Bottleneck: Retain high mutual information for known labels, and force low mutual information or high entropy for unknowns. This simultaneous alignment and suppression creates margin in the representation space (Sun et al., 2021).
Open-space Risk Theoretic Formulation:

$R_{open}(f) \leq \hat{R}_{known}(f) + \sqrt{\frac{2 I(Z; X_{known})}{n}} + \gamma D_{\mathrm{KL}}(P_{unknown} \Vert P_{known})$

This structure bounds expected risk on unknown data by penalizing mutual information in regions not supported by the training data and using distributional divergence as measured by KL to capture deviation caused by unknowns (Wang, 17 Oct 2025).

Information-driven Outlier Detection: Models maximize entropy or minimize confidence for inputs with low $I(Z; Y_{\text{known}})$ , i.e., those that cannot be well explained by the support of the known-class representations.
Information Bottleneck in Continual Learning: Ensuring robust retention of prior knowledge via mutual information constraints:

$\max_t I(Z_t; Z_{t-1}) \quad \text{and} \quad \max I(Z_t; Y_t) - \lambda D_{\mathrm{KL}}(P(Z_t) \Vert P(Z_{t-1}))$

This guards against catastrophic forgetting and anchors adaptation in the information-theoretic regime.

3. Algorithmic Instantiations and Practical Implementations

A variety of OSR algorithmic frameworks explicitly or implicitly encode information-theoretic principles:

Maximum Mutual Information Feature Extractors: Methods such as M2IOSR maximize $I(X; Z)$ at both global and multi-scale local levels while imposing class-conditionality in latent encodings, forcing low entropy outside known clusters and enhancing class separation via mutual information maximization and KL regularization to Gaussian priors (Sun et al., 2021).
Entropy-regularized Deep Models: Approaches like OpenMax and contrastive loss variants explicitly adjust output entropy such that known samples yield confident (low entropy) outputs, and samples far from known clusters are assigned high-entropy (uncertain) predictions, promoting robust rejection (Wang, 17 Oct 2025).
Contrastive and Generative Models with Information Constraints: Dual contrastive learning and supervised contrastive formulations (e.g., as in DCTAU) shape representations to maximize inter-class KL divergence and mutual information, increasing entropy for outliers and enhancing margin-based rejection boundaries (Li et al., 31 Jan 2024, Xu, 16 Apr 2024).
Distance-based and Nearest Centroid Classifiers: Several frameworks draw on information-theoretic arguments for minimum “open space risk,” using density estimation in latent/projection spaces to tie entropy and KL divergence directly to rejection thresholds (Cao et al., 2020).

These methods either directly optimize information-theoretic quantities or are supported by analytical/theoretical results linking their loss structures and thresholds to entropy, mutual information, or distributional divergence.

4. Theoretical Connections to Generalization and Risk Bounds

Recent work explores the alignment of information-theoretic OSR objectives with provable bounds:

PAC-Bayes Generalization: Mutual information between learned parameters and training data bounds generalization error, and entropy/minimum mutual information is linked directly to the expected excess risk in open world scenarios.

$E[ R(f_W) - \hat{R}(f_W) ] \leq \sqrt{ \frac{2 I(W; D) }{n} }$

By minimizing mutual information through information bottleneck/entropy maximization for unknowns, the model reduces uncertainty and can achieve provable safety criteria under open-world conditions (Wang, 17 Oct 2025).

Open-space Information Risk: The notion of “information risk” is being developed as a theoretical criterion, measuring the expected information leakage or error from open classes, via mutual information and KL divergence. It is conjectured that this form of risk could be regulated to guarantee safe rejection and adaptive learning across nonstationary environments (Wang, 17 Oct 2025).
Boundaries and Causal Information Flow: Recent theory integrates causal information flow and energy-based models, with information-theoretic concepts governing both the formation of stable boundaries for known classes and adaptive updating when novel classes or causal factors are detected (Wang, 17 Oct 2025).

5. Interplay with Novelty Detection and Continual Learning

Information-theoretic OSR naturally extends into tasks of novelty detection and continual learning:

Novelty Discovery: Novelty detection is formulated as maximizing the information gain (reduction in entropy) when incorporating previously unseen data:

$\max_\theta \mathbb{E}_{x \in \mathcal{D}_{unknown}} \left[ H(P_{prior}(Y)) - H(P_\theta(Y|X)) \right]$

This explicitly encodes the process of knowledge acquisition as information-theoretic learning.

Continual/Incremental Learning: By maximizing mutual information between successive latent representations, models can simultaneously acquire new knowledge and retain prior information, avoiding “catastrophic forgetting.” KL penalties constrain deviation from prior representations, further ensuring stable adaptation.

The convergence of these principles creates a pathway toward information retentive, risk-aware, and self-adaptive open-world learning systems.

6. Current Challenges and Future Research Directions

Several open problems mark the frontier of information-theoretic OSR:

Quantifiable Information Risk: There is an emerging need to define and measure “open information risk,” formalizing the effect of unknown data distributions on open set classification error and providing guidance for robust model rejection thresholds.
Dynamic Mutual Information Bounds: Current information-theoretic bounds are largely static. Advanced research is investigating temporal mutual information and dynamic entropy regulation to govern learning in nonstationary, evolving environments, especially crucial for real-world continual and open-world learning (Wang, 17 Oct 2025).
Multimodal Fusion and Causal Integration: The integration of information-theoretic objectives across multimodal (image, text, sensor) streams, as well as their fusion with causal inference frameworks, is considered essential for provably interpretable and safe open-world learning, particularly when encountering complex or adversarial novelty.
Self-Adaptive, Information-Aware Agents: A future avenue is the design of learning agents that actively monitor and regulate their own internal “information state,” dynamically adjusting exploration and retention in response to changing information boundaries and environmental uncertainty.

Advances in these directions are expected to unify the information-theoretic analysis and regulation of open set recognition, novelty detection, and continual learning, leading toward theoretically robust, self-adaptive machine intelligence.

This overview details the foundational role of information theory in the design, analysis, and future advancement of open set recognition in open-world learning systems, as synthesized from (Wang, 17 Oct 2025).