Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
112 tokens/sec
GPT-4o
12 tokens/sec
Gemini 2.5 Pro Pro
39 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
5 tokens/sec
DeepSeek R1 via Azure Pro
33 tokens/sec
2000 character limit reached

Active Passive Loss (APL) in Noisy Learning

Updated 24 July 2025
  • Active Passive Loss (APL) is a robust optimization framework that combines complementary active and passive loss functions to tackle noisy labels in deep neural networks.
  • The active component focuses on maximizing true class probability while the passive component suppresses erroneous class predictions, ensuring noise tolerance and convergence.
  • APL has demonstrated practical success across domains like multiagent networks and material science, yielding improved performance under high-noise conditions.

Active Passive Loss (APL) is a framework for robust optimization, predominantly in the context of learning with noisy labels in deep neural networks. APL, in its modern usage, refers to a principled method that combines two complementary robust loss functions—one “active,” emphasizing correct class probability assignment, and one “passive,” suppressing incorrect class probabilities. This joint structure is motivated by both theoretical guarantees for noise tolerance and practical considerations regarding learning sufficiency and convergence in challenging, noisy regimes. The term also appears in multiagent networks and material science, where it denotes performance differentials induced by active and passive system components.

1. Conceptual Foundations of Active Passive Loss

Active Passive Loss arises from the need to create loss functions that are robust to label noise yet retain strong learning capacity. Traditional loss functions like Cross-Entropy (CE) can overfit noisy labels, while robust alternatives such as Mean Absolute Error (MAE) or Reverse Cross Entropy (RCE) may underfit due to symmetry constraints that weaken discriminative learning (Ma et al., 2020). APL addresses these challenges by explicitly combining loss components:

  • Active Loss: Drives up the probability of the annotated (possibly noisy) label, focusing all gradient on the true class.
  • Passive Loss: Actively pushes down probabilities of non-target classes, providing additional regularization.

Mathematically, the general form of APL is:

LAPL=αLActive+βLPassive,\mathcal{L}_{\mathrm{APL}} = \alpha \cdot \mathcal{L}_{\mathrm{Active}} + \beta \cdot \mathcal{L}_{\mathrm{Passive}},

with non-negative weights α\alpha and β\beta, and each component chosen for theoretical robustness, often via normalization.

2. Active and Passive Loss Functions: Definitions and Properties

The separation into “active” and “passive” components is a distinctive contribution of APL (Ma et al., 2020):

  • Active Loss Functions: These assign nonzero loss only to the sampled label yy, such that for all kyk \neq y, the per-class loss (f(x),k)=0\ell(f(x), k) = 0. Cross Entropy (CE) is archetypal.
  • Passive Loss Functions: These have nonzero components for at least one kyk \neq y, encouraging suppression of all non-annotated class confidences. MAE and normalized RCE are canonical passive losses.

This decomposition captures both direct supervision toward the label and global pressure against spurious class assignments—a duality leveraged by APL for both robustness and learning capacity.

3. Theoretical Guarantees: Robustness to Noisy Labels

The APL framework is built upon recent results in normalized loss functions (Ma et al., 2020). If the loss L\mathcal{L} satisfies the constant sum:

kL(f(x),k)=Cx,f,\sum_{k} \mathcal{L}(f(x), k) = C \quad \forall x,f,

then its normalized variant is robust to symmetric and class-conditional (asymmetric) noise:

Lnorm(f(x),y)=L(f(x),y)jL(f(x),j).\mathcal{L}_{\mathrm{norm}}(f(x), y) = \frac{\mathcal{L}(f(x), y)}{\sum_j \mathcal{L}(f(x), j)}.

This theoretical result ensures that, by choosing appropriately normalized active and passive components, LAPL\mathcal{L}_{\mathrm{APL}} inherits noise tolerance while its non-trivial passive term overcomes the convergence weaknesses observed in solely normalized losses (such as underfitting).

4. Practical Implementations and Empirical Results

To instantiate APL in deep learning, typical combinations include:

  • Normalized Cross Entropy (NCE, active) + MAE (passive)
  • Normalized Focal Loss (NFL, active) + RCE or MAE (passive)

Pseudocode for integrating APL in a deep learning pipeline is:

1
2
3
4
5
6
7
8
def apl_loss(logits, labels, alpha=1.0, beta=1.0):
    # logits: [batch_size, num_classes]
    # labels: [batch_size]
    # Compute normalized active loss (e.g., normalized cross-entropy)
    nce_loss = nce(logits, labels)
    # Compute passive loss (e.g., MAE)
    mae_loss = mae(logits, labels)
    return alpha * nce_loss + beta * mae_loss

Empirical evaluation on benchmarks (CIFAR-10/100, WebVision) under high noise (e.g., 60-80%) revealed APL to yield consistently higher test accuracy and better convergence than both vanilla and prior robust loss methods (Ma et al., 2020). Weighting selection (α\alpha, β\beta) can be tuned to fit dataset complexity; for example, heavier emphasis on the active term is beneficial on more challenging datasets.

5. Extensions: Beyond Symmetric Passive Losses

APL's original passive losses are “symmetric,” relying on a constant sum condition. Recent advances generalize the framework:

  • Active Negative Loss (ANL): Replaces MAE with Normalized Negative Loss Functions (NNLFs), constructed via label complementarity and normalization (Ye et al., 3 Dec 2024). NNLFs adjust the passive gradient adaptively, allowing the model to focus more on clean examples and boosting convergence on large or complex data.
  • Joint Asymmetric Loss (JAL): Introduces the Asymmetric Mean Square Error (AMSE) as a passive loss, breaking the symmetry constraint for greater expressivity and improved fitting (Wang et al., 23 Jul 2025). AMSE reweights the standard mean square error to preferentially penalize the ground-truth class and incorporates theoretical conditions to guarantee the “asymmetric” property, which is associated with improved performance in high-noise regimes.

The generalization to asymmetric passive losses allows more powerful and adaptive noise-robust learning, a property empirically validated on datasets with both symmetric and instance-dependent corruption.

6. Application Domains and Broader Contexts

While APL is predominantly referenced in noisy-label learning, similar ideas appear in other research areas:

  • Multiagent Networks: In control theory, “active” and “passive” agents describe networked systems with and without exogenous input, respectively (Yucelen, 2014). The analysis quantifies robustness and convergence dynamics in the presence of agent heterogeneity, and the distributed controller design ensures consensus regardless of the active/passive mix—effectively controlling the “active passive loss” in network performance.
  • Physics and Communications: In dielectric composite materials, “active/passive loss” denotes gain or loss enhancement in engineered media, relying on the constructive interplay of components with positive or negative imaginary permittivity (Mackay et al., 2015). Similarly, in smart grid communications, the monetary impact of passive (eavesdropping) and active (jamming) attacks is analyzed as a quantifiable “active passive loss” in system reliability and estimation cost (1711.02168).

7. Implications, Limitations, and Future Directions

APL and its extensions advance the state of robust learning by balancing theoretical guarantees of noise tolerance with practical considerations of capacity and convergence. Nevertheless, persistent challenges remain:

  • Symmetric passive terms can unduly constrain fitting, reminiscent of underfitting in difficult regimes.
  • Adaptive passive terms (NNLFs, AMSE) introduce new hyperparameters and require careful calibration; their performance in extremely high-noise or open-set regimes requires further paper.
  • The theoretical framework for joint active/asymmetric passive loss is evolving, with papers establishing necessary and sufficient conditions for the preservation of robust minimization properties.

Ongoing research is expanding the APL paradigm to domains such as image segmentation, large-scale weak supervision, and multimodal learning. The flexibility of the approach, in allowing any robust loss to be decomposed and recombined, indicates its potential as a general toolbox for learning under uncertainty and heterogeneity.


Overall, Active Passive Loss (APL) and its derivatives represent a foundational methodology for constructing robust losses by leveraging the complementary strengths of active and passive (symmetric or asymmetric) supervision, underpinned by theoretical analysis and validated in challenging real-world scenarios.