Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 69 tok/s
Gemini 2.5 Pro 52 tok/s Pro
GPT-5 Medium 37 tok/s Pro
GPT-5 High 28 tok/s Pro
GPT-4o 119 tok/s Pro
Kimi K2 218 tok/s Pro
GPT OSS 120B 456 tok/s Pro
Claude Sonnet 4.5 33 tok/s Pro
2000 character limit reached

Mass-Decorrelated Discriminant

Updated 24 August 2025
  • Mass-decorrelated discriminants are advanced machine learning tools that separate signal from background without correlating with invariant mass, preventing mass sculpting in resonance searches.
  • They leverage techniques such as adversarial neural networks, conditional normalizing flows, and transformer-based classifiers with specialized losses to maintain robust signal separation.
  • These methods minimize systematic uncertainties and enhance discovery significance by preserving the smooth background mass spectrum in high energy physics analyses.

A mass-decorrelated discriminant is a statistical or machine learning construct that distinguishes between signal and background events while remaining uncorrelated with the invariant mass of the event, or more generally, with a protected attribute. Techniques for mass decorrelation in discriminant construction are crucial in high energy physics searches for resonances, as they preserve the smooth background shape necessary for reliable data-driven background estimation and minimize systematic uncertainties. This article surveys methodologies, architectures, theoretical principles, empirical implications, and comparative studies related to mass-decorrelated discriminants, focusing on adversarial neural networks, conditional normalizing flows, and transformer-based classifiers.

1. Rationale for Mass Decorrelation

The need for mass-decorrelated discriminants arises in resonance searches, where machine learning classifiers are deployed to separate signal events from background based on event and substructure features. Typical discriminants may inadvertently learn correlations with the invariant mass, causing selection-induced "mass sculpting"—a distortion of the background mass spectrum in regions enriched with signal. This complicates the modeling of backgrounds using sideband data and introduces substantial systematic uncertainties in signal significance estimation, especially when background rates are uncertain.

Mass decorrelation ensures that the classifier output is statistically independent of mass, thereby maintaining the integrity of the background mass shape and enhancing discovery potential, particularly when background model uncertainties are high (Shimmin et al., 2017, Klein et al., 2022, Kim, 2023).

2. Adversarial Neural Network Architecture

A dominant strategy uses adversarial neural networks, consisting of a classifier and an adversary trained jointly via the loss function:

Ltagger=LclassificationλLadversaryL_{\text{tagger}} = L_{\text{classification}} - \lambda \cdot L_{\text{adversary}}

  • LclassificationL_{\text{classification}}: Standard binomial cross-entropy for signal/background separation.
  • LadversaryL_{\text{adversary}}: Multi-class cross-entropy measuring the adversary’s ability to infer binned jet mass from classifier output.
  • λ\lambda: Trade-off hyperparameter controlling the balance between discrimination and decorrelation (values such as λ=100\lambda=100 are effective).

The classifier is penalized when its output reveals mass information; the adversary trains to extract mass from the classifier score, which forces the classifier to encode information orthogonal to mass. Training employs stochastic gradient descent, with the adversary given a learning "head start" and increased learning rate for rapid convergence relative to the classifier. The adversary typically receives the classifier output as input, passing through hidden layers to a softmax output over mass bins. Invariants such as signal mass (mZm_{Z'}) may also be used as parametric inputs to enable interpolation across multiple signal hypotheses (Shimmin et al., 2017).

3. Conditional Normalizing Flow Approaches

An alternative technique employs conditional normalizing flows (CNFs) to transform discriminant outputs s(x)s(x) into mass-decorrelated representations. CNFs model the conditional probability pθ(s(x)m)p_\theta(s(x)\mid m) via an invertible neural network fθf_\theta:

logpθ(s(x)m)=logp(fθ(s(x),m))+logdetJfθ(s(x),m)\log p_\theta(s(x)\mid m) = \log p(f_\theta(s(x), m)) + \log |\det J_{f_\theta}(s(x), m)|

where JfθJ_{f_\theta} is the Jacobian of the transformation and the base density pp is chosen independent of mm. The mapping fθf_\theta is conditioned explicitly on mm, and post-training reflection techniques can enforce monotonicity, preserving the event ordering and thus the signal-background separation at fixed mass. This invertibility guarantees that separation power is unchanged at each mm value, as affirmed by constancy in AUC within mass bins (Klein et al., 2022).

4. Transformer-Based Classifiers with DisCo Regularization

Transformer architectures, prominent in natural language processing, have been adapted for event classification with mass decorrelation using specialized losses and regularization. Notably, the following losses and techniques are combined:

  • Extreme Loss (E(y^,y)E(\hat{y}, y)), penalizing certain misclassifications more aggressively than BCE, with the explicit formula:

E(y^,y)=y[1y^+ln(y^)ln(1y^)](1y)[11y^+ln(1y^)ln(y^)]E(\hat{y}, y) = -y \left[-\frac{1}{\hat{y}} + \ln(\hat{y}) - \ln(1 - \hat{y})\right] - (1 - y)\left[\frac{1}{1 - \hat{y}} + \ln(1 - \hat{y}) - \ln(\hat{y})\right]

with y^\hat{y} constrained to [0.001,0.999][0.001, 0.999] for stability.

  • Distance Correlation (DisCo) Regularization: A term added to the loss that is strictly zero if output and mass are independent, penalizing any detected correlation:

Loss=Lossclassifier(y^,y)+λDisCo(m,y^)\text{Loss} = \text{Loss}_{\text{classifier}}(\hat{y}, y) + \lambda \cdot \text{DisCo}(m, \hat{y})

  • Data Scope Training: Classifier loss is computed only in a narrow mass window around the signal peak; DisCo penalty is computed over a wider window encompassing much of the background. This dual-scope training forces background mass shape preservation while maximizing signal separation.
  • Significance-Oriented Model Selection: Rather than minimizing loss, model selection is based on maximizing physics significance:

Significance=2[(NS+NB)ln(1+NSNB)NS]\text{Significance} = \sqrt{2\left[(N_S + N_B) \ln \left(1 + \frac{N_S}{N_B}\right) - N_S\right]}

with bin-wise aggregation.

Transformer-based classifiers trained in this way have been shown to outperform conventional feed-forward neural networks and boosted decision trees, both in expected significance and background mass decorrelation (Kim, 2023).

5. Trade-offs and Comparative Performance

The introduction of mass decorrelation constraints typically yields a trade-off: pure signal-background separation (as measured by ROC AUC) can be slightly reduced, but background modeling uncertainties are greatly diminished due to preserved mass shapes. For example, in scenarios with 50%50\% background uncertainty, decorrelated discriminants maintain higher discovery significance compared to non-decorrelated counterparts, which suffer from unreliable sideband extrapolation (Shimmin et al., 2017, Klein et al., 2022, Kim, 2023).

Empirical studies demonstrate that methods such as adversarial networks and CNFs produce classifier outputs that are nearly flat with respect to mass for background events, directly addressing mass sculpting issues, while maintaining discriminative power within mass slices.

Method Output-Mass Correlation Separation Power Robustness to Background Uncertainty
Traditional NN/BDT High High Poor
Adversarial NN Low Slightly Reduced Superior
Conditional Normalizing Flow Very Low Mass-bin Preserved Superior
Transformer (DisCo, Extreme Loss) Low Comparable Superior

6. Generalization and Prospects

While mass decorrelation techniques are predominantly developed for jet tagging and resonance searches in high energy physics, the methodology is generalizable to other protected attributes and domains, such as fairness and data anonymity tasks. The use of invertible mappings and attribute-conditioned decorrelation is applicable wherever undesirable correlations compromise analysis integrity or induce systematic bias.

A plausible implication is that further development of conditional decorrelation frameworks may improve anomaly detection, clarify causal inference, and extend to unsupervised settings as needed. Extensions to decorrelate multiple attributes and to integrate model selection based on cross-domain significance metrics are suggested (Klein et al., 2022).

7. Practical Implementation and Benchmarking

Mass-decorrelated discriminants have been benchmarked in specific scenarios, such as tagging boosted ZqqˉZ′ \to q\bar{q} in association with photons, using large-radius jets (R=1.0R=1.0) and multivariate input features (jet kinematics, τ21τ_{21}, energy correlation functions, photon information). Discriminants constructed with adversarial, normalizing flow, or specialized transformer networks achieve flatter background mass spectra and improved discovery significance.

Implementation typically involves:

  • Simultaneous training of classifier and decorrelation components (adversary or CNF), using stochastic optimization and careful hyperparameter selection (λ\lambda regularization strength).
  • Event-based selection and loss computation in mass and classifier output windows matched to analysis requirements.
  • Post-training application in analysis regions defined by classifier score bins, using expected significance for model selection.

Continued comparative studies of decorrelated discriminant methods reinforce their impact on minimizing systematic uncertainties and enhancing resonance search sensitivity. The decorrelation paradigm now forms a foundational aspect of high energy physics machine learning analysis workflows.

Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to Mass-Decorrelated Discriminant.

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube