Photon–Pion Discrimination in Colliders

Updated 15 November 2025

Photon–pion discrimination is the process of distinguishing single prompt photons from neutral pion decay photons by analyzing overlapping electromagnetic showers in high-energy collider environments.
Advanced ML methods such as BDT, DNN, and ResNet extract engineered shower-shape features and raw calorimeter cell energies, achieving AUC up to 0.965 and signal efficiencies of 84% at 10⁻³ FPR.
Techniques like soft scoring and auxiliary ΔR regression integrate physics insights into end-to-end CNN models, enhancing discrimination performance under high-pileup conditions.

Photon–pion ( $\gamma$ – $\pi^0$ ) discrimination refers to the classification task in high-energy physics of distinguishing single prompt photons from background photons originated from neutral pion ( $\pi^0$ ) decays. This challenge is acute in high-luminosity collider environments, such as the LHC, where the high rate of QCD jets ( $\pi^0\to\gamma\gamma$ ) leads to overlapping electromagnetic (EM) showers within the calorimeter granularities. The regime of collimated, merged showers (small $\Delta R_{\gamma\gamma}$ below calorimeter cell size) is particularly difficult for traditional, hand-engineered variables, motivating the deployment of advanced machine learning techniques with high-granularity detector data.

1. Detector Simulations and Calorimeter Segmentation

Photon–pion discrimination studies are grounded in full Geant4-based detector simulations with realistic geometry, as realized using the COCOA-HEP simulation framework to approximate the ATLAS EM calorimeter at $\sqrt{s}=14$ TeV. The EM calorimeter is structured into three longitudinal layers (EM1, EM2, EM3), each with distinct transverse segmentation:

EM1, EM2: $\Delta\eta\times\Delta\phi=0.025\times0.0245$
EM3: $\Delta\eta\times\Delta\phi=0.050\times0.0491$

The hadronic (HAD) calorimeter is more coarsely segmented, with three layers (HAD1–HAD3) ranging from $0.100\times0.0982$ up to $0.200\times0.1965$ in $\Delta\eta\times\Delta\phi$ . Each candidate shower is extracted as a region of interest (ROI) approximately $0.3\times0.295$ in $(\eta,\phi)$ and processed into tensor representations for the EM ( $3\times12\times12$ ) and HAD ( $3\times3\times3$ ) sections.

Signal events comprise prompt photons in the $p_T\in[10,100]$ GeV range ( $1.1\times10^6$ events), while the dominant background is QCD $\pi^0\to\gamma\gamma$ ( $4.2\times10^6$ events), with two-thirds of $\pi^0$ decays yielding $\Delta R_{\gamma\gamma}<0.025$ , smaller than the EM1/2 cell scale—thus generating overlapping showers indistinguishable at the cell granularity.

2. High-Level Shower-Shape Variables

Traditional discrimination algorithms utilize engineered "shower-shape" variables based on integrations over specific geometrical windows of the calorimeter. Twenty such features are defined, following ATLAS conventions. Key variables include:

Hadronic leakage: $R_{\mathrm{had}} = \dfrac{E_T(\text{HAD1}+\text{HAD2}+\text{HAD3})}{E_T(\text{EM1}+\text{EM2}+\text{EM3})}$
Isolation fraction: $\mathrm{Iso} = \dfrac{\sum_{i\in 12\times 12} E_T(i)}{\sum_{j\in 20\times 20} E_T(j)}$
Layer energy fractions: $F_{\mathrm{EM}X} = \dfrac{\sum_{i\in\mathrm{EM}X} E_T(i)}{\sum_{\rm all\,layers} E_T(i)}$
Lateral shower widths (in EM2):

$W_{\eta2} = \sqrt{\frac{\sum_i E_i\,\eta_i^2}{\sum_i E_i} - \left(\frac{\sum_i E_i\,\eta_i}{\sum_i E_i}\right)^2}$

with analogous $W_{\phi2}$

Energy ratio (EM2): $E_{\mathrm{ratio}} = \dfrac{E_1-E_2}{E_1+E_2}$ (leading two cells)
Leading cell separation: $E_{dR} = \Delta R((\eta, \phi)_1, (\eta, \phi)_2)$

Additional analogous moments and ratios are constructed in both EM1 and EM3, resulting in a 20-dimensional feature vector per candidate.

3. Machine Learning Approaches: BDT, DNN, and ResNet Architectures

Three primary ML methods are benchmarked for photon–pion discrimination:

(a) Boosted Decision Tree (BDT) on Shower-Shape Variables

The BDT employs XGBoost configured with 500 trees, a maximum depth of 6, learning rate 0.1, and subsampling (0.8). Training utilizes 70% of the data (700k background, 682k signal); remaining events are reserved for testing. Binary cross-entropy (logistic loss) is used.

Typical signal efficiency at a fixed false positive rate (FPR) of $10^{-3}$ is 60–65%. The area under the ROC curve (AUC) in the signal efficiency range above $10^{-3}$ FPR is approximately 0.90.

(b) Dense Neural Network (DNN) on Shower-Shape Variables

The DNN uses an input of 20 engineered variables and is structured as four fully connected layers with [64 → 128 → 64 → 32] nodes, each block consisting of Linear, BatchNorm, ReLU, and Dropout ( $p=0.2$ ). The output node is sigmoid-activated. Training uses Adam optimizer (lr= $10^{-3}$ , weight decay $10^{-5}$ ) and binary cross-entropy loss.

DNN performance moderately exceeds the BDT, yielding about 65–70% signal efficiency at $10^{-3}$ FPR; AUC ≈ 0.92.

(c) ResNet-Based Convolutional Neural Network on Raw Cell Energies

The ResNet is applied directly to the full granularity calorimeter cell energies. A dual-branch architecture processes EM ( $3\times12\times12$ ) and HAD ( $3\times3\times3$ ) tensors, each with three residual blocks comprised of:

Conv2D layers ( $3\times3$ kernel, stride 1, padding 1), filters = [32, 64, 128]
BatchNorm, ReLU activations, skip connections, and global average pooling

Feature vectors (length 128 each) from EM and HAD are concatenated, passed through several FC layers ([256→128→64], with ReLU and Dropout $p=0.3$ ), and a final sigmoid output. The network is trained with AdamW ( $5\times10^{-4}$ , weight decay $10^{-4}$ ) and binary cross-entropy loss.

ResNet achieves substantial gains: AUC ≈ 0.96 and 80% signal efficiency at $10^{-3}$ FPR.

Performance is further improved by physics-informed strategies:

Soft scoring (label smoothing): For "hard" background events where $\Delta R_{\gamma\gamma}<0.025$ , soft labels $y_s$ in $[0, 0.4]$ are assigned via a Fermi–Dirac shape:

$y_s = L / [1 + \exp((\Delta R - R_0)/T)]$

with $L=0.4$ , $R_0=0.025$ , $T=0.005$ , reducing the penalization for highly overlapping backgrounds that are fundamentally ambiguous.

Auxiliary $\Delta R$ regression head (multi-task): An additional regression branch predicts the opening angle $\widehat{\Delta R}$ after the main feature concatenation, using a regression loss

$L_{\mathrm{reg}} = \frac{1}{N}\sum_i (\widehat{\Delta R}_i - \Delta R_i)^2$

Combined loss: Total loss is $L_{\mathrm{total}} = L_{\mathrm{cls}} + \lambda L_{\mathrm{reg}}$ with $\lambda=0.5$ .

These refinements push the ResNet AUC to ≈0.965 and 84% signal efficiency at $10^{-3}$ FPR.

5. Quantitative Performance and Comparative Analysis

The following summarizes the discriminative power of the tested ML strategies for high-purity photon selection:

Method	AUC	Signal Efficiency @ $10^{-3}$ FPR
BDT on shower-shape	0.90	55–65%
DNN on shower-shape	0.92	65–70%
ResNet on raw energies	0.96	80%
ResNet + Soft scoring	0.963	82%
ResNet + Soft + $\Delta R$	0.965	84%

Additional performance characteristics:

Turn-on curve: ResNet variants show a rapid turn-on at low $p_T$ , reaching $>90\%$ plateau by 40 GeV, outperforming BDT/DNN which plateau at 60–70%.
$\Delta R$ dependence: For $\Delta R<0.025$ , ResNet+aux improves rejection by $\sim$ 20% over BDT/DNN. For $\Delta R\geq0.025$ , all methods perform better, but the ResNet maintains a 5–10% advantage in signal efficiency at fixed background mis-ID rate.

6. Mechanisms of ResNet Superiority

ResNet architectures achieve superior discrimination by exploiting the complete 2D $(\eta,\phi)$ shower topology across all longitudinal layers, learning subtle correlations and sub-cluster structure that evade capture by fixed, high-level moments. Residual connections enable deep feature extraction, mitigating vanishing gradients, and supporting both local and global feature learning over the calorimeter image. The augmentation via multi-task learning (auxiliary $\Delta R$ regression) and soft-labeling directly embed physics knowledge into training, sharpening the network's focus on the most challenging, ambiguous cases.

7. Recommendations for Detector and ML Architecture Design

Prospective improvements for photon–pion discrimination include:

Finer transverse segmentation below $0.02\times0.02$ in the first EM layer, targeting resolution of highly collimated $\gamma\gamma$ pairs.
Increased longitudinal segmentation (beyond three EM layers) to utilize depth profile differences for distinguishing overlaps.
Hybrid models integrating tracker-based information (vertex, conversions) with calorimeter images for enhanced $\pi^0$ rejection.
Physics-informed loss functions (mass constraints, opening angle regression) and systematic adoption of multi-task setups.
Prioritization of end-to-end learning approaches that minimize reliance on manual feature engineering, as required for the high-pileup, high-overlap environments of future colliders.

A plausible implication is that continued evolution toward architectures directly operating on raw, high-granularity calorimeter data—with explicit physics guidance—will be central to robust photon identification at the $10^{-3}$ – $10^{-4}$ background rate across wide $p_T$ regimes.

PDF Markdown Chat (Pro)

Follow Topic

Get notified by email when new papers are published related to Photon--Pion Discrimination.