Papers
Topics
Authors
Recent
Search
2000 character limit reached

Sharpness-Aware & Reliable Entropy Minimization (SAR)

Updated 5 May 2026
  • SAR is a training and adaptation paradigm that integrates sharpness-aware minimization with entropy-based corrections to enhance generalization and confidence calibration.
  • The method employs a min-max optimization approach to encourage convergence to flat minima while explicitly penalizing overconfident predictions through calibrated entropy regularization.
  • Test-time adaptation with SAR² and additional feature regularizers mitigates representation collapse and stabilizes performance under dynamic, shifted data distributions.

Sharpness-Aware and Reliable Entropy Minimization (SAR) refers to a set of training and adaptation paradigms that jointly leverage the generalization properties of sharpness-aware minimization (SAM) with explicit mechanisms to control confidence calibration and representation collapse, primarily through entropy-based corrections. SAR arises in both standard supervised training and online test-time adaptation. Its core principle is to produce neural networks whose predictive probabilities are robust under distribution shifts and whose confidence estimates reliably track the true accuracy.

1. Theoretical Foundations and Formulation

Sharpness-Aware Minimization (SAM) is formulated as a min-max optimization problem that seeks parameter vectors θ\theta which are robust to worst-case weight perturbations of magnitude ρ\rho. In contrast to standard stochastic gradient descent (SGD), which minimizes the empirical risk,

min⁔θLS(Īø)=1nāˆ‘i=1nā„“Īø(zi),\min_{\theta} L_S(\theta) = \frac{1}{n} \sum_{i=1}^n \ell_\theta(z_i),

SAM instead solves

min⁔θmax⁔∄ϵ∄2≤ρLS(Īø+ϵ).\min_{\theta} \max_{\|\epsilon\|_2 \leq \rho} L_S(\theta+\epsilon).

At each step, the inner maximization is approximated by a first-order Taylor expansion, yielding

Ļµāˆ—=Ļāˆ‡LS(Īø)āˆ„āˆ‡LS(Īø)∄2,\epsilon^* = \rho \frac{\nabla L_S(\theta)}{ \|\nabla L_S(\theta)\|_2 },

and the update for step kk is

Īø~k=Īøk+Ļāˆ‡LĪ©k(Īøk)āˆ„āˆ‡LĪ©k(Īøk)∄2,Īøk+1=Īøkāˆ’Ī·āˆ‡LĪ©k(Īø~k).\tilde\theta_k = \theta_k + \rho \frac{\nabla L_{\Omega_k}(\theta_k)}{ \|\nabla L_{\Omega_k}(\theta_k)\|_2 }, \qquad \theta_{k+1} = \theta_k - \eta \nabla L_{\Omega_k}(\tilde\theta_k).

This mechanism encourages convergence to flat minima, improving generalization and robustness.

A key theoretical finding is that, for the standard cross-entropy loss, SAM not only discourages sharp minima but also implicitly regularizes the negative entropy of predictive distributions. Specifically, after the SAM ascent step, the model's confidence in the correct class is strictly reduced, which has the effect of "softening" predictions and thereby combats overconfidence—a prevalent defect in neural classification models (Tan et al., 29 May 2025).

2. Calibration, Implicit Entropy Regularization, and CSAM

Overconfident predictions and miscalibration are addressed in SAR by exploiting SAM's entropy-regularizing effect. For the cross-entropy loss ā„“Īø(x,y)=āˆ’log⁔py(Īø)\ell_\theta(x, y) = -\log p_y(\theta), under mild conditions,

p~y=[fĪø~(x)]y≤eāˆ’Ļ/2py,\tilde p_y = [f_{\tilde\theta}(x)]_y \leq e^{-\rho/2} p_y,

which forces lower-confidence on the correct class after the weight perturbation.

The entropy bound formalizes that minimizing the worst-case perturbed loss is equivalent to minimizing the standard loss plus an explicit negative entropy penalty:

ā„“Īø~(x,y)ā‰„āˆ’log⁔pyāˆ’Ī»H(py)+H(p~y),\ell_{\tilde\theta}(x, y) \geq -\log p_y - \lambda H(p_y) + H(\tilde p_y),

where ρ\rho0 and ρ\rho1.

The Calibrated SAM (CSAM) variant refines this by adapting the loss to further disincentivize overconfidence, especially for high-confidence predictions, via

ρ\rho2

which yields an amplified entropy regularization for overconfident predictions (Tan et al., 29 May 2025).

3. Test-Time Adaptation: SAR and SAR² Algorithms

Sharpness-Aware and Reliable Entropy Minimization (SAR) as a Test-Time Adaptation (TTA) method applies these principles to online domain shift. At test time, SAR combines entropy minimization, sharpness-aware robustification, and explicit sample filtering:

  • The per-sample entropy loss,

ρ\rho3

is minimized on samples excluded by a mask ρ\rho4, which filters out examples whose entropy is above a critical threshold (indicating unreliability or noisiness).

  • A sharpness-aware penalty,

ρ\rho5

biases the parameter update toward flatter minima by evaluating gradients at a locally perturbed point.

  • The total batch loss is

ρ\rho6

where ρ\rho7 is a balancing hyperparameter.

The SAR² algorithm further prevents representational collapse in wild test streams with two additional centroid-based feature-space regularizers:

  • Redundancy regularizer ρ\rho8 minimizes correlations among class prototypes (computed via feature centroids),
  • Inequity regularizer ρ\rho9 encourages balanced class assignments by maximizing the entropy of the global centroid prediction.

The sharpness-aware analogs of these regularizers are constructed as maxima over local perturbations, analogous to entropy sharpness.

4. Calibration and Feature Regularization Metrics

SAR's effectiveness hinges upon calibration metrics that quantify the match between predicted confidences and empirical correctness. The principal evaluation is:

  • Expected Calibration Error (ECE): Test examples are binned by confidence; for each bin min⁔θLS(Īø)=1nāˆ‘i=1nā„“Īø(zi),\min_{\theta} L_S(\theta) = \frac{1}{n} \sum_{i=1}^n \ell_\theta(z_i),0, the difference between prediction accuracy and average maximum softmax is

min⁔θLS(Īø)=1nāˆ‘i=1nā„“Īø(zi),\min_{\theta} L_S(\theta) = \frac{1}{n} \sum_{i=1}^n \ell_\theta(z_i),1

  • Adaptive ECE (AdaECE): Adjusts binning per class for finer evaluation.
  • Temperature-Scaled Calibration Error (TCE): ECE measured after temperature scaling.

Feature space regularization is evaluated by:

  • Redundancy (mean off-diagonal squared correlation in feature covariance) and
  • Inequity (entropy deviation from uniform for pooled centroids).

5. Empirical Evaluation

Empirical results on canonical benchmarks demonstrate the effect and stability of SAR:

Approach Dataset ECE (%) Additional Notes
SGD CIFAR-100 ā‰ˆ3.95 Baseline
SAM CIFAR-100 ā‰ˆ2.11 ~50–70% reduction over SGD
SAM ImageNet-1K (ViT) 9.72 → 1.76 ECE reduction, Top-1 acc gain 0.3–4%
CSAM CIFAR-100 ā‰ˆ1.93 Outperforms SAM, maintains accuracy
SAR² (TTA) ImageNet-C, CIFAR-C variance ↓, acc ↑ Stable under wild distribution shifts

On CIFAR-10 and WideResNet-28-10, CSAM achieves ECE ā‰ˆ 0.50% versus SAM's 0.86% and SGD+T scaling's ā‰ˆ1.71%. Under test-time adaptation on heavily shifted or imbalanced batches, SAR² yields orders-of-magnitude lower variance and mitigates catastrophic representation collapse compared to prior TTA schemes (Tan et al., 29 May 2025, Niu et al., 5 Sep 2025).

6. Implementation and Deployment Considerations

Several best practices ensure the robust deployment of SAR and SAR²:

  • Batch-agnostic normalization (GroupNorm/LayerNorm) is essential for stability under small, mixed, or imbalanced batches; BatchNorm often introduces instability in TTA (Niu et al., 5 Sep 2025).
  • Stable defaults: entropy filter threshold min⁔θLS(Īø)=1nāˆ‘i=1nā„“Īø(zi),\min_{\theta} L_S(\theta) = \frac{1}{n} \sum_{i=1}^n \ell_\theta(z_i),2, sharpness radius min⁔θLS(Īø)=1nāˆ‘i=1nā„“Īø(zi),\min_{\theta} L_S(\theta) = \frac{1}{n} \sum_{i=1}^n \ell_\theta(z_i),3, redundancy and inequity weights scaled to feature dimension and class count, respectively.
  • Online monitoring of the moving-average entropy enables model recovery by resetting parameters if collapse is detected.
  • Hyperparameter choices generalize across benchmarks; tuning can be performed coarsely.

7. Open Questions and Directions

Research directions include extending SAR analysis beyond cross-entropy objectives, clarifying the temporal emergence of calibration under SAM (early versus late training), and developing single-step or computationally efficient approximations preserving both generalization and calibration. Another unresolved topic is the scalable and robust deployment of SAR-based adaptation in safety-critical or real-time settings (Tan et al., 29 May 2025, Niu et al., 5 Sep 2025).

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Sharpness-Aware and Reliable Entropy Minimization (SAR).