Papers
Topics
Authors
Recent
Search
2000 character limit reached

Semantic Orthogonal Calibration (SoC)

Updated 20 January 2026
  • Semantic Orthogonal Calibration (SoC) is a margin-aware regularizer for test-time prompt tuning that uses Huber loss to maintain semantic proximity while reducing overconfidence.
  • It leverages semantic margins derived from cosine similarities to cap prototype repulsion, ensuring smoother separation compared to traditional quadratic penalties.
  • Empirical results on multiple benchmarks show SoC reduces Expected Calibration Error by up to 2.3% and enhances accuracy, making it valuable for robust vision-language applications.

Semantic Orthogonal Calibration (SoC) is a Huber-based regularizer designed for test-time prompt tuning (TPT) in vision-LLMs (VLMs), with the principal aim of improving the calibration of uncertainty estimates while maintaining discriminative performance. SoC enforces smooth prototype separation that respects semantic proximity, addressing the shortcomings of previous orthogonality-based regularization techniques that induce overconfidence by artificially separating semantically related classes. The method has demonstrated empirical state-of-the-art calibration across diverse classification benchmarks (Fillioux et al., 13 Jan 2026).

1. Theoretical Motivation and Problem Statement

In the context of TPT, uncertainty calibration is critical for robust deployment in sensitive domains such as healthcare and autonomous driving. Standard prompt tuning minimizes the softmax entropy of class predictions, frequently causing the model to become overconfident. Recent approaches, notably O-TPT (Sharifdeen et al., CVPR'25), introduce a quadratic full-orthogonality regularizer:

Lortho=SIKF2=ijsij2\mathcal L_{\rm ortho} = \big\|S - I_K\big\|_F^2 = \sum_{i\neq j} s_{ij}^2

where Sij=titjS_{ij} = t_i^\top t_j denotes the cosine similarity between text prompt prototypes ti,tjt_i, t_j. This penalty enhances class separability but unduly pushes apart even those classes that are semantically similar (e.g., “dog” vs. “puppy”). For collinear prototypes (sij1s_{ij} \approx 1), the gradient step induced by this loss is proportional to 4ηsij-4\eta s_{ij}, amplifying the semantic drift and inflating prediction confidence, particularly for ambiguous samples.

2. SoC Regularizer: Mathematical Formulation

SoC introduces a Huber-based regularization that caps prototype repulsion based on semantic margins derived from class name similarities. The Huber function with threshold δ>0\delta > 0 is defined as:

Hδ(x)={12x2,xδ δ(xδ2),x>δH_\delta(x) = \begin{cases} \tfrac12\,x^2, & |x| \le \delta \ \delta \Big(|x| - \tfrac{\delta}{2} \Big), & |x| > \delta \end{cases}

Class margin mijm_{ij} is set via pre-computed semantic similarities sijs_{ij} (such as cosine similarity between frozen CLIP name embeddings):

mij=1sijm_{ij} = 1 - s_{ij}

For prompt prototypes Sij=titjS_{ij} = t_i^\top t_j0, SoC applies the pairwise regularizer:

Sij=titjS_{ij} = t_i^\top t_j1

If two classes are highly similar (Sij=titjS_{ij} = t_i^\top t_j2 is large), their semantic margin Sij=titjS_{ij} = t_i^\top t_j3 is small, so their embeddings are allowed to be close—minor deviations incur mild penalty, and only excessive collapse/trivial separation triggers linear repulsion.

3. Combined Prompt-Tuning Objective

At test time, the SoC regularizer is incorporated alongside the cross-entropy loss on pseudo-labels:

Sij=titjS_{ij} = t_i^\top t_j4

where Sij=titjS_{ij} = t_i^\top t_j5 governs the trade-off between discriminative adaptation and semantic calibration.

4. Implementation and Algorithmic Workflow

Test-time SoC tuning proceeds as follows (single AdamW step):

Sij=titjS_{ij} = t_i^\top t_j9

Notable implementation aspects include the use of ViT-L/14 or ViT-B/16 encoder backbones; learning rate 0.005; batch size 64 (with heavy augmentation); temperature Sij=titjS_{ij} = t_i^\top t_j6 fixed by CLIP; Sij=titjS_{ij} = t_i^\top t_j7 set to the 20th percentile of zero-shot cosine distances; and typical Sij=titjS_{ij} = t_i^\top t_j8 values around 30 (or 14 under distribution shift).

5. Empirical Evaluation

SoC was validated across eleven classification benchmarks—including ImageNet, Caltech101, OxfordPets, StanfordCars, Flowers102, Food101, SUN397, FGVC-Aircraft, DTD, UCF101, EuroSAT—and four distribution-shift variants (ImageNet-A, v2, R, Sketch). The following Table summarizes accuracy and Expected Calibration Error (ECE) for ViT-L/14:

Method Avg. Acc. (%) Avg. ECE (%)
Zero-Shot 71.1 5.1
TPT 72.0 14.9
C-TPT 72.1 10.0
O-TPT 71.4 7.7
SoC 72.3 (+0.9) 5.4 (–2.3)

Qualitative reliability diagrams indicate SoC undershoots the confidence–accuracy gap relative to O-TPT, particularly on high-similarity class pairs (e.g., EuroSAT). Under natural distribution shifts, SoC matches O-TPT’s accuracy but lowers ECE by approximately 1.5%, and is less sensitive to multi-step prompt updates. The approach is robust to CLIP prompt template variations and supervised CoOp prompt initializations. Post-hoc SaLS calibration further reduces ECE for all methods, with SoC maintaining the lowest error.

6. Core Insights and Practical Recommendations

SoC’s Huber-based, margin-aware repulsion preserves semantic alignment learned by the VLM backbone, mitigating the confidence inflation and semantic drift induced by quadratic orthogonality penalties. Theoretical results demonstrate that SoC’s margin-respecting separation yields smoother reduction in worst-case cosine similarity, which translates to improved calibration empirically.

Hyperparameter Guidance

  • δ (Huber threshold): Set to a low percentile (10–30%) of zero-shot cosine distances; decrease for fine-grained tasks.
  • λ (regularizer weight): Typically 20–50; increasing λ enhances calibration at the expense of minor accuracy drops. Cross-validate using held-out, unlabeled batches (monitoring ECE vs. accuracy).

Extensions

Potential extensions include learning adaptive semantic margins from external knowledge graphs (e.g., WordNet), leveraging a small labeled seed, multi-step TPT with dynamic λ or δ scheduling, and integration with complementary post-hoc calibration schemes.

7. Relation to Prior Work and Future Perspectives

SoC builds upon O-TPT and related orthogonality- and cross-modal prompt-tuning literature. It specifically challenges the assumption that full prototype separation always benefits calibration, offering a theoretically and empirically grounded alternative that respects semantic structure. Future work may explore adaptive margin learning, integration with external knowledge sources, or joint optimization with supervised prompt initializations.

In sum, Semantic Orthogonal Calibration (SoC) provides a margin-aware, semantically-conscious regularization strategy for test-time prompt tuning. It achieves state-of-the-art calibration—measured by reduced ECE and improved reliability—across standard VLM benchmarks, without compromising discriminative accuracy (Fillioux et al., 13 Jan 2026).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Semantic Orthogonal Calibration (SoC).