Papers

Topics

Authors

Recent

View all

Assistant

AI Research Assistant

Well-researched responses based on relevant abstracts and paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses.

Gemini 2.5 Flash

Gemini 2.5 Flash 69 tok/s

Gemini 2.5 Pro 52 tok/s Pro

GPT-5 Medium 37 tok/s Pro

GPT-5 High 28 tok/s Pro

GPT-4o 119 tok/s Pro

Kimi K2 218 tok/s Pro

GPT OSS 120B 456 tok/s Pro

Claude Sonnet 4.5 33 tok/s Pro

2000 character limit reached

Mass-Decorrelated Discriminant

Updated 24 August 2025

Mass-decorrelated discriminants are advanced machine learning tools that separate signal from background without correlating with invariant mass, preventing mass sculpting in resonance searches.
They leverage techniques such as adversarial neural networks, conditional normalizing flows, and transformer-based classifiers with specialized losses to maintain robust signal separation.
These methods minimize systematic uncertainties and enhance discovery significance by preserving the smooth background mass spectrum in high energy physics analyses.

A mass-decorrelated discriminant is a statistical or machine learning construct that distinguishes between signal and background events while remaining uncorrelated with the invariant mass of the event, or more generally, with a protected attribute. Techniques for mass decorrelation in discriminant construction are crucial in high energy physics searches for resonances, as they preserve the smooth background shape necessary for reliable data-driven background estimation and minimize systematic uncertainties. This article surveys methodologies, architectures, theoretical principles, empirical implications, and comparative studies related to mass-decorrelated discriminants, focusing on adversarial neural networks, conditional normalizing flows, and transformer-based classifiers.

1. Rationale for Mass Decorrelation

The need for mass-decorrelated discriminants arises in resonance searches, where machine learning classifiers are deployed to separate signal events from background based on event and substructure features. Typical discriminants may inadvertently learn correlations with the invariant mass, causing selection-induced "mass sculpting"—a distortion of the background mass spectrum in regions enriched with signal. This complicates the modeling of backgrounds using sideband data and introduces substantial systematic uncertainties in signal significance estimation, especially when background rates are uncertain.

Mass decorrelation ensures that the classifier output is statistically independent of mass, thereby maintaining the integrity of the background mass shape and enhancing discovery potential, particularly when background model uncertainties are high (Shimmin et al., 2017, Klein et al., 2022, Kim, 2023).

2. Adversarial Neural Network Architecture

A dominant strategy uses adversarial neural networks, consisting of a classifier and an adversary trained jointly via the loss function:

$L_{\text{tagger}} = L_{\text{classification}} - \lambda \cdot L_{\text{adversary}}$

$L_{\text{classification}}$ : Standard binomial cross-entropy for signal/background separation.
$L_{\text{adversary}}$ : Multi-class cross-entropy measuring the adversary’s ability to infer binned jet mass from classifier output.
$\lambda$ : Trade-off hyperparameter controlling the balance between discrimination and decorrelation (values such as $\lambda=100$ are effective).

The classifier is penalized when its output reveals mass information; the adversary trains to extract mass from the classifier score, which forces the classifier to encode information orthogonal to mass. Training employs stochastic gradient descent, with the adversary given a learning "head start" and increased learning rate for rapid convergence relative to the classifier. The adversary typically receives the classifier output as input, passing through hidden layers to a softmax output over mass bins. Invariants such as signal mass ( $m_{Z'}$ ) may also be used as parametric inputs to enable interpolation across multiple signal hypotheses (Shimmin et al., 2017).

3. Conditional Normalizing Flow Approaches

An alternative technique employs conditional normalizing flows (CNFs) to transform discriminant outputs $s(x)$ into mass-decorrelated representations. CNFs model the conditional probability $p_\theta(s(x)\mid m)$ via an invertible neural network $f_\theta$ :

$\log p_\theta(s(x)\mid m) = \log p(f_\theta(s(x), m)) + \log |\det J_{f_\theta}(s(x), m)|$

where $J_{f_\theta}$ is the Jacobian of the transformation and the base density $p$ is chosen independent of $m$ . The mapping $f_\theta$ is conditioned explicitly on $m$ , and post-training reflection techniques can enforce monotonicity, preserving the event ordering and thus the signal-background separation at fixed mass. This invertibility guarantees that separation power is unchanged at each $m$ value, as affirmed by constancy in AUC within mass bins (Klein et al., 2022).

4. Transformer-Based Classifiers with DisCo Regularization

Transformer architectures, prominent in natural language processing, have been adapted for event classification with mass decorrelation using specialized losses and regularization. Notably, the following losses and techniques are combined:

Extreme Loss ( $E(\hat{y}, y)$ ), penalizing certain misclassifications more aggressively than BCE, with the explicit formula:

$E(\hat{y}, y) = -y \left[-\frac{1}{\hat{y}} + \ln(\hat{y}) - \ln(1 - \hat{y})\right] - (1 - y)\left[\frac{1}{1 - \hat{y}} + \ln(1 - \hat{y}) - \ln(\hat{y})\right]$

with $\hat{y}$ constrained to $[0.001, 0.999]$ for stability.

Distance Correlation (DisCo) Regularization: A term added to the loss that is strictly zero if output and mass are independent, penalizing any detected correlation:

$\text{Loss} = \text{Loss}_{\text{classifier}}(\hat{y}, y) + \lambda \cdot \text{DisCo}(m, \hat{y})$

Data Scope Training: Classifier loss is computed only in a narrow mass window around the signal peak; DisCo penalty is computed over a wider window encompassing much of the background. This dual-scope training forces background mass shape preservation while maximizing signal separation.
Significance-Oriented Model Selection: Rather than minimizing loss, model selection is based on maximizing physics significance:

$\text{Significance} = \sqrt{2\left[(N_S + N_B) \ln \left(1 + \frac{N_S}{N_B}\right) - N_S\right]}$

with bin-wise aggregation.

Transformer-based classifiers trained in this way have been shown to outperform conventional feed-forward neural networks and boosted decision trees, both in expected significance and background mass decorrelation (Kim, 2023).

5. Trade-offs and Comparative Performance

The introduction of mass decorrelation constraints typically yields a trade-off: pure signal-background separation (as measured by ROC AUC) can be slightly reduced, but background modeling uncertainties are greatly diminished due to preserved mass shapes. For example, in scenarios with $50\%$ background uncertainty, decorrelated discriminants maintain higher discovery significance compared to non-decorrelated counterparts, which suffer from unreliable sideband extrapolation (Shimmin et al., 2017, Klein et al., 2022, Kim, 2023).

Empirical studies demonstrate that methods such as adversarial networks and CNFs produce classifier outputs that are nearly flat with respect to mass for background events, directly addressing mass sculpting issues, while maintaining discriminative power within mass slices.

Method	Output-Mass Correlation	Separation Power	Robustness to Background Uncertainty
Traditional NN/BDT	High	High	Poor
Adversarial NN	Low	Slightly Reduced	Superior
Conditional Normalizing Flow	Very Low	Mass-bin Preserved	Superior
Transformer (DisCo, Extreme Loss)	Low	Comparable	Superior

6. Generalization and Prospects

While mass decorrelation techniques are predominantly developed for jet tagging and resonance searches in high energy physics, the methodology is generalizable to other protected attributes and domains, such as fairness and data anonymity tasks. The use of invertible mappings and attribute-conditioned decorrelation is applicable wherever undesirable correlations compromise analysis integrity or induce systematic bias.

A plausible implication is that further development of conditional decorrelation frameworks may improve anomaly detection, clarify causal inference, and extend to unsupervised settings as needed. Extensions to decorrelate multiple attributes and to integrate model selection based on cross-domain significance metrics are suggested (Klein et al., 2022).

7. Practical Implementation and Benchmarking

Mass-decorrelated discriminants have been benchmarked in specific scenarios, such as tagging boosted $Z′ \to q\bar{q}$ in association with photons, using large-radius jets ( $R=1.0$ ) and multivariate input features (jet kinematics, $τ_{21}$ , energy correlation functions, photon information). Discriminants constructed with adversarial, normalizing flow, or specialized transformer networks achieve flatter background mass spectra and improved discovery significance.

Implementation typically involves:

Simultaneous training of classifier and decorrelation components (adversary or CNF), using stochastic optimization and careful hyperparameter selection ( $\lambda$ regularization strength).
Event-based selection and loss computation in mass and classifier output windows matched to analysis requirements.
Post-training application in analysis regions defined by classifier score bins, using expected significance for model selection.

Continued comparative studies of decorrelated discriminant methods reinforce their impact on minimizing systematic uncertainties and enhancing resonance search sensitivity. The decorrelation paradigm now forms a foundational aspect of high energy physics machine learning analysis workflows.

PDF Markdown Chat (Pro)

References (3)

Decorrelated Jet Substructure Tagging using Adversarial Neural Networks (2017)

Decorrelation with conditional normalizing flows (2022)

Training toward significance with the decorrelated event classifier transformer neural network (2023)

Follow Topic

Get notified by email when new papers are published related to Mass-Decorrelated Discriminant.

Mass-Decorrelated Discriminant

1. Rationale for Mass Decorrelation

2. Adversarial Neural Network Architecture

3. Conditional Normalizing Flow Approaches

4. Transformer-Based Classifiers with DisCo Regularization

5. Trade-offs and Comparative Performance

6. Generalization and Prospects

7. Practical Implementation and Benchmarking

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Mass-Decorrelated Discriminant

1. Rationale for Mass Decorrelation

2. Adversarial Neural Network Architecture

3. Conditional Normalizing Flow Approaches

4. Transformer-Based Classifiers with DisCo Regularization

5. Trade-offs and Comparative Performance

6. Generalization and Prospects

7. Practical Implementation and Benchmarking

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research