Confidence-Aware Asymmetric Learning (CAL)
- Confidence-Aware Asymmetric Learning (CAL) is a framework that adapts to class and modality imbalances by dynamically weighting sample confidences and prunning prototypes.
- It employs techniques like Confidence-Aware Consistency Regularization and Asymmetric Confidence Reinforcement to boost accuracy in open-world deepfake attribution.
- CAL extends to robust multimodal fusion by dynamically assessing modality contributions and applying asymmetric gradient scaling, effectively handling noise and imbalance.
Confidence-Aware Asymmetric Learning (CAL) is a family of machine learning strategies designed to address scenarios characterized by class or modality imbalance, low-confidence predictions, unknown or novel categories, and noisy or heterogeneous data. Two representative contexts where CAL is extensively developed are (1) open-world deepfake attribution, where the challenge is to attribute both known and unknown forged images, and (2) multimodal fusion under challenging modality imbalances and noise. The unifying theme is the adaptive and asymmetric treatment of confidence and contribution, both at the sample and modality level, enabling calibration across uncertainty and heterogeneity (Zheng et al., 14 Dec 2025, Xu et al., 30 Oct 2025).
1. Motivation and Problem Settings
CAL emerged to counteract fundamental limitations of conventional pseudo-label-based open-world learning and standard multimodal fusion approaches, manifesting notably in:
- Open-World Deepfake Attribution (OW-DFA): The objective is to attribute a facial forgery to either a known generation method (with labeled data for known classes) or one of possibly many unknown, unlabeled novel classes. Two pervasive challenges arise:
- Confidence skew—models are biased towards known classes, and pseudo-labels for novel samples are unreliable due to low softmax confidence.
- Unknown novel-class cardinality—practical deployments cannot assume the number of novel forgery types is known a priori (Zheng et al., 14 Dec 2025).
Robust Multimodal Fusion: Here, modalities differ in information value and are variably affected by noise and imbalance. Classical approaches suppress strong modalities for balance or treat all modalities symmetrically, but this neglects the reality that some modalities are inherently more informative or trustworthy than others (Xu et al., 30 Oct 2025).
CAL frameworks address these structural issues through confidence-aware, dynamically asymmetric mechanisms at the sample, class, and modality level.
2. Core Principles and Theoretical Formulation
CAL’s methodology is built upon confidence-aware weighting, asymmetric regularization, and adaptive pruning/compression mechanisms, which are realized differently in the OW-DFA and multimodal contexts.
2.1. Confidence-Aware Sample Weighting
In OW-DFA, CAL employs Confidence-Aware Consistency Regularization (CCR). For pseudo-label-based consistency loss (inspired by FixMatch), the sample-wise consistency loss is dynamically reweighted:
where is the normalized (softmax) confidence of sample , is the current epoch, and is the total epochs. Early epochs prioritize high-confidence (usually known) samples; later epochs shift focus to low-confidence (typically novel) samples.
2.2. Asymmetric Pseudo-label Selection
The Asymmetric Confidence Reinforcement (ACR) component sets distinct pseudo-label confidence thresholds for known and novel predictions, parameterized by dynamically evolving average confidences:
- For known classes: threshold (e.g., $0.9$).
- For novel classes: threshold , where (i.e., the ratio of average novel to known confidence).
This enforces stricter reliability for known class pseudo-labels and relaxes for the less confident, novel-class samples, with the thresholds tightening as model calibration improves (Zheng et al., 14 Dec 2025).
2.3. Adaptive Structure Discovery and Pruning
To handle unknown novel class cardinality, CAL introduces Dynamic Prototype Pruning (DPP):
- An over-complete set of class prototypes is maintained.
- Each epoch, prototypes are pruned by usage statistics and similarity: high-confidence prototypes are retained; low-confidence, rarely used prototypes are merged to their nearest high-confidence sibling.
- This process coarsely estimates and adapts to the true number of novel classes without grid search or prior knowledge.
2.4. Contribution-Guided Modality Adaptation (Multimodal Fusion)
In multimodal settings, Contribution-Guided Asymmetric Learning quantifies each modality’s value through:
where is mutual information between modality and the fused representation, and is a product of the Shapley-inspired marginal contribution and the recent relative accuracy improvement for modality .
Asymmetric gradient acceleration adjusts per-modality parameter update magnitudes according to via a softmax scaling, while a contribution-aware Asymmetric Information Bottleneck (AIB) regularizer applies stronger compression (via a modality-dependent ) on noise-prone, low-contribution modalities (Xu et al., 30 Oct 2025).
3. Algorithmic Workflow
3.1. OW-DFA: Pseudo-labeling and Pruning
The CAL workflow for OW-DFA is summarized as:
- Forward pass with both labeled and unlabeled data, obtaining per-sample confidence scores.
- Compute CCR and ACR losses: scale consistency loss by dynamic sample weight, select pseudo-labels using class-type-specific thresholds.
- Update model parameters with weighted gradients.
- If class count is unknown, apply DPP to prune/merge prototypes, updating the class set.
- Joint optimization combines supervised loss, uniformity regularizer, CCR, and ACR.
3.2. Multimodal Fusion: Modality Contribution and Compression
The CAL multimodal training routine proceeds as:
- Compute uni-modal and fusion predictions.
- Calculate for each modality (mutual information, confidence, and performance improvement).
- Use softmax over to scale each modality’s gradient.
- Apply modality-specific AIB compression, with compression weight inversely proportional to contribution.
- Optimize joint loss: fusion-CE, modality-CE, and AIB (Xu et al., 30 Oct 2025).
4. Empirical Performance and Benchmarks
Experimental results establish CAL as state-of-the-art in open-world deepfake attribution and robust multimodal fusion.
4.1. OW-DFA Results
- On the OW-DFA-40 benchmark:
- CAL (with known ): All ACC = 88.3% (+5.7% over CPL), Novel ACC = 76.5% (+12.2%)
- CAL (unknown ): All ACC = 87.5%, Novel ACC = 74.5%
- Ablation studies show that adding CCR raises Novel ACC by over 21 percentage points versus baseline; adding ACR and additional features provide incremental gains.
- DPP reduces novel-class count estimation error to less than 10% (e.g., estimating 39 vs. true 41), outperforming GCD-style grid search (error > 50%).
4.2. Multimodal Fusion Results
- On CREMA-D: CAL achieves 79.30% ACC (+2.69% vs ARL).
- On KS and AVE: CAL outperforms ARL by +0.54% and +1.32%, respectively.
- Under significant “test-only” and “train+test” noise (e.g., salt–pepper, Gaussian at ) on MVSA-Single and NYUD2, CAL yields up to +8.7% ACC improvement over prior methods.
- Ablations confirm that AIB with weighting and compound contributions are essential for best performance.
4.3. Efficiency
- CAL’s training and inference time is comparable to baseline methods. Enabling DPP in OW-DFA adds less than 2% computational overhead (Zheng et al., 14 Dec 2025).
5. Comparative Advantages and Analysis
CAL frameworks provide mutually reinforcing mechanisms that:
- Correct the persistent bias towards known classes or modalities by adaptively emphasizing low-confidence or low-contribution cases as training progresses.
- Offer principled, dynamic calibration—tightening or relaxing thresholds and compression as the model’s confidence evolves.
- Remove strong prior requirements (such as the number of novel classes), promoting scalability and applicability in real-world open-set scenarios.
- Are modular and easily portable to new datasets and tasks, requiring only per-modality encoders and contribution computation.
The frameworks admit a plausible implication that future research in open-world learning or robust fusion will integrate confidence- and contribution-aware asymmetry as a default, rather than assuming static, balanced architectures or uniform pseudo-labeling regimes.
6. Connections, Limitations, and Extensions
The CAL approach for OW-DFA contrasts with prior methods such as CPL, GCD, and FixMatch-inspired pseudo-labeling, mainly by targeting confidence bias and structural uncertainty directly. In the multimodal context, it improves over symmetric regularizers and unimodal suppression by explicitly quantifying and acting on information and trustworthiness.
CAL’s dependency on mutual information and Shapley-like metrics may introduce extra computational requirements for complex, high-dimensional modalities, though empirical results indicate these costs are modest. The framework is agnostic to encoder architectures, supporting vision, text, audio, and sensor data, with reported success in VQA and audio-visual speech separation transfer experiments.
Current limitations include the need for fine-tuning hyperparameters (e.g., confidence thresholds, temperature , and AIB weight ), and establishing robust mutual information estimation for highly nonlinear dependencies in feature spaces.
7. Summary Table: CAL Variants and Application Domains
| CAL Variant | Application Domain | Primary Mechanism |
|---|---|---|
| Confidence-Aware Asymmetric Learning (CCR+ACR+DPP) (Zheng et al., 14 Dec 2025) | Open-world Deepfake Attribution | Dynamic consistency, asymmetric pseudo-labels, prototype pruning |
| Contribution-Guided Asymmetric Learning (AIB+gradient accel.) (Xu et al., 30 Oct 2025) | Robust Multimodal Fusion | Modality contribution, gradient scaling, information bottleneck |
Both frameworks leverage dynamic, asymmetric weighting of confidence or contribution, and each integrates mechanisms to adapt to the realities of class or modality imbalance, label uncertainty, and noise, establishing CAL as a foundational technique for robust, scalable learning in open and heterogeneous environments.