Multiplicity-Based Selection Methods

Updated 31 January 2026

Multiplicity-based selection is a framework of tools that quantifies and controls redundancy in high-volume, complex environments such as collider experiments and statistical model spaces.
It employs ranking algorithms, veto procedures, and sparsity-inducing Bayesian priors to enhance signal purity, reduce false positives, and improve overall inference quality.
This approach is applied in jet background correction, nuclear collision centrality assessment, and fairness in predictive modeling to mitigate biases from combinatorial complexity.

Multiplicity-based selection refers to a class of methodologies, criteria, and formal tools designed to navigate, quantify, and control the consequences of redundancy, combinatorial complexity, or model indeterminacy that arises whenever scientific or statistical selection is performed over highly repeated, redundant, or combinatorially large structures. It appears both in experimental physics—where multiplicities count numbers of final-state particles, jets, or event classes—and in statistical or machine learning settings—where multiplicity refers to the number of hypothesis/tests, model candidates, or near-optimal solutions. Multiplicity-based selection addresses the challenges posed by these high-multiplicity environments, ensuring greater purity, more robust inference, or controlled selection bias over naive or unadjusted approaches.

1. Multiplicity in High-Energy Physics: Combinatorial Selection

High-multiplicity environments are characteristic of high-energy collider experiments where each event produces dozens of final-state particles. Signal reconstruction (such as the identification of $\eta\to\gamma\gamma$ candidates) is hampered by a combinatorial background from random pairings of decay products, leading to low-purity, high-fake-rate selections under simple cut-based approaches.

Multiplicity-based selection tackles this via ranking-based algorithms, where candidate decays are prioritized according to a likelihood or classifier score (such as a probability density estimator (PDE) or a boosted decision tree (BDT)) built from kinematic observables: $R_i = e(\theta_{12}, M_{12}, E_{12}) = f_1(\theta_{12}) \times f_2(M_{12}) \times f_3(E_{12}),$ with functional forms for %%%%1%%%%, $f_2$ , $f_3$ empirically fit to maximize signal purity (Bingül et al., 2018).

After scoring, a veto procedure enforces exclusive assignment: candidates sharing a daughter are resolved in favor of the highest-ranked, ensuring each decay product appears in at most one candidate. This "multiplicity-based selection" significantly suppresses combinatorial background and overlap, systematically improving purity by a factor of ≈2 (from 8–10% up to 18–20%) and the figure-of-merit $E\times P$ by 16–27%, trading some efficiency for a much cleaner signal (Bingül et al., 2018). This approach generalizes to any two-body neutral final state in high-multiplicity environments.

2. Multiplicity-Based Centrality Selection and Subtraction in Nuclear Collisions

Event activity or centrality in heavy-ion and small-system collisions is typically assigned via charged-particle multiplicity in specified rapidity regions (e.g., "refmult-3" and "refmult-2" in Au+Au for midrapidity, or V0A/V0C in ALICE p+p). The selection of multiplicity estimators and the exclusion of specific species (e.g., protons, to avoid autocorrelation if protons are signal) critically impacts the bias, background, and resolution of fluctuation measurements (Chatterjee et al., 2019, Hushnud et al., 2023).

Multiplicity-based background subtraction in jet analyses leverages constituent counting, instead of geometric quantities alone, to estimate and subtract background on an event-by-event basis. For jets, the corrected $p_T$ using multiplicity reads

$p_{T,\rm corr}^N = p_{T,\rm raw} - \rho_{\rm Mult} (N_{\rm tot} - \langle N_{\rm signal}\rangle),$

with $\rho_{\rm Mult}$ estimated per-event and $\langle N_{\rm signal}\rangle$ from pp reference. This method removes both mean background and leading-order fluctuations (Poissonian and flow-induced), outperforming canonical area-based subtraction especially at high multiplicities and low jet $p_T$ . Minimal sufficient statistics—multiplicity and per-particle $p_T$ —enable simple, interpretable implementations that rival deep learning approaches for jet background correction (Mengel et al., 2024).

3. Multiplicity in Variable and Model Selection: Statistical and Bayesian Perspectives

Multiplicity arises in statistical variable/model selection due to exponentially large hypothesis/model spaces. Uncorrected search over $2^p$ models induces increased false positives ("multiplicity problem"). Bayesian variable selection solves this by imposing priors on model spaces that penalize large, complex, or non-sparse models as $p$ grows—automatically suppressing over-selection and controlling the family-wise error rate as dimensionality increases (Scott et al., 2010, Ghosh, 27 Dec 2025).

Standard model-space priors include Beta–Binomial and recently, the matryoshka-doll (MD) prior. For $k_\gamma$ nonzero regressors out of $p$ , these yield:

Beta–Binomial: $p(k) = {p \choose k} B(a+k, b+p-k)/B(a,b)$
MD prior (infinite $p$ ): $p(k) \to e^{-\theta} \theta^k / k!$ , with $\theta=\log(1+1/\xi)$

Independent-Bernoulli approximations allow efficient scalable inference without sacrificing limiting behavior. Simulation studies show that strong sparsity-inducing priors (Beta(1, $p^2$ ), MD) dramatically improve inclusion recovery in sparse regimes (low RMSE), while weak priors (uniform) fail to control false discoveries (Ghosh, 27 Dec 2025). Empirical Bayes strategies make the multiplicity penalty adaptive but can collapse to degenerate (all-in/all-out) solutions, whereas fully Bayesian hyperpriors ensure global multiplicity adjustment and coherent inference even as $p$ increases (Scott et al., 2010).

4. Multiplicity-Based Selection in Predictive Modeling and Machine Learning

The Rashomon effect in machine learning identifies the existence of large sets of statistically indistinguishable models (the Rashomon set), each making different decisions for substantial subsets of the data. Multiplicity-based selection in this context involves:

Quantifying the extent of predictive disagreement (e.g., via discrepancy $\delta$ and obscurity $\gamma$ rates—maximal and mean per-example disagreement, respectively);
Assessing how data-preprocessing methods such as class balancing and filtering (variable selection) modulate multiplicity.

Large, imbalanced, and high-complexity datasets exacerbate predictive multiplicity, with certain synthetic balancing strategies (e.g., ANSMOTE or Near Miss) further elevating the risk of arbitrary outcomes (median $\delta$ up to 0.75; $\gamma$ up to 0.22), while filtering reduces multiplicity and leads to more robust, trustworthy predictions. Multiplicity mitigation requires complexity-aware data handling, careful balance between accuracy and stability, and ensemble or Rashomon-aware validation strategies (Cavus et al., 2024).

5. Model Multiplicity, Arbitrariness, and Fairness

Multiplicity-based selection in the presence of many nearly-optimal models introduces arbitrariness: the final choice among them is rarely principled. This arbitrariness unevenly burdens individuals and protected groups; group-wise differences in self-consistency (the probability that different Rashomon models agree on a given outcome) reveal disparate vulnerability to arbitrary model choice (Ganesh et al., 2024).

Formally, suppose $b$ models, with $b_0(x)$ assigning outcome 0 to $x$ : $SC(x) = 1 - \frac{b_0(x) \cdot b_1(x)}{b(b-1)}$ Measures of self-consistency can be calculated for prediction, robustness, and privacy outcomes. Empirical studies show that enforcing aggregate fairness constraints often reduces ensemble size but increases individual-level arbitrariness, especially for minority groups. In legal contexts (e.g., under Canadian antidiscrimination law), higher multiplicity for protected groups is argued to constitute “adverse impact” and thus a prima facie case for discrimination unless justified through necessity (Ganesh et al., 2024).

Selection criteria that incorporate multiplicity (such as selecting models with both high accuracy and high self-consistency for the most vulnerable individuals) are critical for ensuring decision-level fairness and legal compliance.

6. Resource-Constrained and Multi-Target Multiplicity

In resource-constrained settings (e.g., allocation of scarce treatments), multiplicity-based selection quantifies the sensitivity of top- $k$ selected individuals to modeling choices, including the selection of the predictive target itself. With multiple plausible outcomes $T_1,\dots,T_m$ , reweighting or combining target indices yields even larger swings in allocation than varying the model alone: $M_{\rm multi}(k) = \max_{\alpha, \alpha' \in \Delta_{m-1}} |S^k_\alpha \triangle S^k_{\alpha'}|$ where $S^k_\alpha$ is the set of top- $k$ units under outcome-index $\alpha$ (Watson-Daniels et al., 2023). This multi-target multiplicity can be leveraged to reduce group disparities but first requires rigorous measurement and explicit audit of model, data, and target choices.

7. Multiplicity-Based Selection in Algebra: Integral Dependence

In commutative algebra, multiplicity-based selection appears as a tool for identifying when two ideals $I \subset J$ have the same integral closure. The equivalence holds if and only if their entire multiplicity sequence $(c_0(I),\dots,c_d(I))$ agrees; thus, in a family $\{I_t\}$ , the Zariski-open locus where all $c_i(I_t)$ are constant constitutes the correct selection subset with integrally closed fibers. This forms the algebraic analog of robust, multiplicity-vetted selection criteria in physical and statistical sciences (Polini et al., 2020).

References

"A Ranking Method For Selection Of $\Eta$ Mesons In High Multiplicity Events" (Bingül et al., 2018)
"Centrality selection effect on higher-order cumulants of net-proton multiplicity distributions in relativistic heavy-ion collisions" (Chatterjee et al., 2019)
"Multiplicity Based Background Subtraction for Jets in Heavy Ion Collisions" (Mengel et al., 2024)
"On the Choice of Model Space Priors and Multiplicity Control in Bayesian Variable Selection: An Application to Streaming Logistic Regression" (Ghosh, 27 Dec 2025)
"Bayes and empirical-Bayes multiplicity adjustment in the variable-selection problem" (Scott et al., 2010)
"Investigating the Impact of Balancing, Filtering, and Complexity on Predictive Multiplicity: A Data-Centric Perspective" (Cavus et al., 2024)
"The Cost of Arbitrariness for Individuals: Examining the Legal and Technical Challenges of Model Multiplicity" (Ganesh et al., 2024)
"Multi-Target Multiplicity: Flexibility and Fairness in Target Specification under Resource Constraints" (Watson-Daniels et al., 2023)
"Multiplicity sequence and integral dependence" (Polini et al., 2020)
"Effect of event classifiers on jet quenching-like signatures in high-multiplicity $p+p$ collisions at $\sqrt{s} = 13$ TeV" (Hushnud et al., 2023)
"Apparent strangeness enhancement from multiplicity selection in high energy proton-proton collisions" (Loizides et al., 2021)