SAMOSA: Sharpness-Aware Open Set Active Learning

Updated 26 October 2025

The paper introduces a dual-model approach using SGD and SAM to highlight atypical samples by quantifying divergent losses near decision boundaries.
SAMOSA computes an L1 discrepancy (SAMIS-P) to effectively identify decision-boundary-adjacent points, outperforming traditional uncertainty sampling methods.
Empirical results demonstrate up to a 3% accuracy boost on CIFAR datasets, validating its robustness and enhanced generalization in open set conditions.

Sharpness Aware Minimization for Open Set Active Learning (SAMOSA) is a querying strategy designed to address the challenges of sample selection in open set scenarios, where the large pool of unlabeled data contains both relevant (target) and irrelevant (unknown) classes. By leveraging the theoretical properties of Sharpness-Aware Minimization (SAM), SAMOSA actively identifies atypical, decision-boundary-adjacent samples that maximize the informativeness of the annotation process and improve generalization to both in-distribution and out-of-distribution data.

1. Theoretical Foundations

SAMOSA is built upon recent theoretical findings concerning the differential generalization effects of Stochastic Gradient Descent (SGD) and SAM with respect to sample typicality. In this context:

Typical samples (high signal, common class instances) produce similar generalization error for both SGD- and SAM-trained models.
Atypical samples (low signal, rare or confusing class instances) see diverging generalization errors, with SAM substantially outperforming SGD.

The paper presents an informal theorem:

$|L_{01}(W_\text{SAM} \text{ on typical}) - L_{01}(W_\text{SGD} \text{ on typical})| \leq \epsilon$

$|L_{01}(W_\text{SAM} \text{ on atypical}) - L_{01}(W_\text{SGD} \text{ on atypical})| \geq 0.1 - \epsilon$

This quantifies the advantage of sharpness-aware updates for boundary and rare examples. SAMOSA operationalizes this by using the discrepancy in model outputs as a proxy for atypicality.

2. Algorithmic Structure

The SAMOSA algorithm proceeds in iterative query rounds:

Dual Model Training: On the labeled sample pool, two models are maintained:
- $f_\text{SGD}$ : trained with vanilla stochastic gradient descent.
- $f_\text{SAM}$ : trained with sharpness-aware minimization.
Atypicality Scoring: For each candidate sample $x$ in the unlabeled pool, SAMOSA computes the SAMIS-P score:

$S(x) = \| f_\text{SAM}(x) - f_\text{SGD}(x) \|_1$

This measures the L1 discrepancy between predicted probabilities, with higher scores indicating samples near the model's ambiguity regions.

Open Set Filtering: Before ranking, a $|K| + 1$ -class classifier ("distinguisher") is deployed to identify and exclude samples predicted to belong to unknown or irrelevant classes (the $K+1$ -th output).
Sample Selection: Samples are sorted in descending order of $S(x)$ ; the top $q$ samples are selected for annotation, directly prioritizing atypical, boundary-adjacent points. This process is repeated for a fixed number of query rounds.

Optional variants include SAMOSA-L, which samples low-scoring (typical) examples for the purpose of valid known-class expansion, maximizing recall.

3. Informativeness and Decision Boundary Localization

Empirical results confirm that high-SAMIS-P score samples identified by SAMOSA populate regions near the model's decision boundaries within the embedding manifold. These samples are inherently more informative for improving the discrimination power of the learned classifier, especially under open set conditions where ambiguity is compounded by unknown class interference.

This approach contrasts strongly with standard uncertainty-based active learning, which may overselect redundant samples from well-understood regions, and open set strategies that rely solely on unknown-class filtering.

4. Performance Analysis

SAMOSA demonstrates clear empirical benefits:

Classification Accuracy: Up to 3% improvement over contemporary open set active learning baselines (e.g., EOAL, MQNet) on CIFAR10, CIFAR100, and TinyImageNet with mismatch ratios up to 40%.
Sample Effectiveness: By targeting atypical samples, SAMOSA improves generalization and robustness versus baselines that optimize only precision (fraction of valid samples among queried).
Computational Overhead: SAMOSA's resource requirements are comparable to leading open set active learning methods; running times on CIFAR100 match those of EOAL and outperform MQNet, despite the dual-model paradigm.
Decision Boundary Coverage: The sampled data points are well-distributed along critical regions for classifier refinement—particularly areas likely to impact out-of-distribution recognition.

SAMOSA distinguishes itself from prior strategies by explicitly exploiting the differential generalization performance between SGD and SAM, rather than relying on entropy or margin sampling. Notably:

Other methods may increase precision by focusing on easy (typical) samples, but SAMOSA demonstrates superior end-to-end effectiveness by querying harder, more ambiguous regions.
The SAMIS-P metric is theoretically justified by the loss gap observed in learning atypical samples.
Dual-model infrastructure allows SAMOSA to adaptively balance between precision and informativeness, as further evidenced by the introduction of the SAMOSA-L variant for specialized situations.

6. Significance for Open Set Active Learning

SAMOSA operates under the realistic regime where annotation budgets are limited and class coverage is incomplete:

By filtering out unknown-class data, annotation resources are not wasted.
By targeting atypical, decision-boundary-adjacent samples, the annotated set achieves higher information density—improving classifier robustness against both in-distribution and open set (out-of-distribution) challenge samples.
The theoretical basis ensures that the strategy targets samples where SAM generalizes much better than SGD, optimizing the impact of each labeling round.

A plausible implication is that SAMOSA may be particularly suitable when downstream deployments face frequent class shifts, label noise, or distributional drift, given its capacity to proactively bolster boundary understanding and generalize reliably.

7. Practical Considerations and Future Directions

While SAMOSA has minimal computational overhead relative to leading baselines, maintaining two models may introduce system complexity. However, this design is essential for extracting atypicality signals via the SAMIS-P metric.

Extensions may include:

Integration of SAMOSA with adaptive or hybrid scoring mechanisms, leveraging calibration measures (Tan et al., 29 May 2025) or variance suppression (Li et al., 2023).
Applying switching schemes to prevent convergence to hallucinated minimizers, as discussed in recent analysis (Park et al., 26 Sep 2025).
Non-gradient-based variants (e.g., ZEST (Gong et al., 17 Oct 2025)) for resource-constrained or privacy-sensitive scenarios.

In summary, SAMOSA is an open set active learning strategy exploiting sharpness-aware generalization theory, with demonstrated accuracy gains, principled informativity measures, and practical scalability for large, heterogeneous unlabeled sets.

PDF Markdown Chat (Pro)

References (4)

Towards Understanding The Calibration Benefits of Sharpness-Aware Minimization (2025)

Enhancing Sharpness-Aware Optimization Through Variance Suppression (2023)

Sharpness-Aware Minimization Can Hallucinate Minimizers (2025)

Zeroth-Order Sharpness-Aware Learning with Exponential Tilting (2025)

Whiteboard

Generate a whiteboard explanation of this topic.

Topic to Video (Beta)

Generate a video overview of this topic.

Follow Topic

Get notified by email when new papers are published related to Sharpness Aware Minimization for Open Set Active Learning (SAMOSA).

SAMOSA: Sharpness-Aware Open Set Active Learning

1. Theoretical Foundations

2. Algorithmic Structure

3. Informativeness and Decision Boundary Localization

4. Performance Analysis

6. Significance for Open Set Active Learning

7. Practical Considerations and Future Directions

Whiteboard

Topic to Video (Beta)

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research

SAMOSA: Sharpness-Aware Open Set Active Learning

1. Theoretical Foundations

2. Algorithmic Structure

3. Informativeness and Decision Boundary Localization

4. Performance Analysis

5. Comparison to Related Approaches

6. Significance for Open Set Active Learning

7. Practical Considerations and Future Directions

Sponsor

Whiteboard

Topic to Video (Beta)

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research