Boosted-Object Tagging Algorithms

Updated 16 October 2025

Boosted-object tagging algorithms are advanced classification methods that detect high-momentum particles by analyzing jet decay patterns and substructure features.
They leverage observables like N‑subjettiness and energy correlation functions to distinguish signal jets from overwhelming QCD backgrounds.
Hybrid taggers combining traditional cut-based techniques with machine learning improve precision and enable real‑time FPGA implementations in collider experiments.

A boosted-object tagging algorithm is a classification method that seeks to identify highly Lorentz-boosted Standard Model particles such as electroweak bosons, top quarks, and Higgs bosons via their characteristic decay topologies within hadronic jets. When these particles are produced with transverse momentum $p_T \gg 2m$ , their hadronic decay products are collimated into a single, large-radius (“fat”) jet, whose internal structure can be exploited to distinguish signal from copious QCD backgrounds. Over the past decade, the field has evolved from cut-based taggers built on physically motivated jet substructure observables to sophisticated, interpretable hybrid frameworks incorporating machine learning and real-time hardware implementations. Advances in tagging algorithms are central to LHC new physics searches, precision Standard Model measurements, and detector and trigger design.

1. Core Principles of Boosted-Object Tagging

Boosted-object tagging leverages differences in jet substructure to separate hadronically decaying heavy particles from background QCD jets. The key principle exploits the "prongness" of the energy flow inside jets formed by the decay of a color singlet or triplet (e.g., $W$ , $Z$ , $H$ , $t$ ):

Subjet Multiplicity: Boosted $W$ , $Z$ , or $H\to b\bar{b}$ decays yield a two-prong structure; top quark decays ( $t \to bW \to bqq$ ) generate a three-prong pattern.
Jet Clustering and Grooming: Fat jets are reconstructed with clustering algorithms (commonly anti- $k_t$ , Cambridge–Aachen, or $k_t$ ) and then subject to jet grooming (trimming, pruning, filtering, SoftDrop, or dynamical grooming) to remove soft contamination from pileup and the underlying event.
Infrared/Collinear Safety: Observables and grooming procedures are constructed to be IR/collinear safe for calculability and stability under soft/collinear emissions.

An archetypal observable is $N$ -subjettiness (Thaler et al., 2010): $\tau_N = \frac{1}{d_0} \sum_k p_{\mathrm{T},k}\min\{\Delta R_{1,k},\ldots,\Delta R_{N,k}\}$ with $d_0 = (\sum_k p_{\mathrm{T},k})R_0$ . The discriminating power comes from ratios such as $\tau_{21} = \tau_2/\tau_1$ for two-prong decays or $\tau_{32} = \tau_3/\tau_2$ for top quarks.

2. Tagging Methodologies: Traditional Algorithms

Early and contemporary traditional taggers rely on physically motivated, high-level observables (Behr, 2014, Kasieczka, 2018, Rentala et al., 2014). The most common strategies include:

N-subjettiness taggers: Defined above, with cuts on $\tau_{21}$ (for $W,\,Z,\,H$ ) and $\tau_{32}$ (for $t$ ) after an invariant mass window. Typical working points yield~40% efficiency for $W$ jets at 1% mis-tag rate, and ~30% for top quarks at similar fake rates (Thaler et al., 2010).
Mass Drop and Symmetry Cuts: Based on the BDRS tagger (Butterworth et al.), requiring that the most massive subjet has $m_{j1} < \mu m_j$ and $(\min\{p_{T,j_1}^2,\,p_{T,j_2}^2\}/m_j^2)\Delta R_{j_1,j_2}^2 > y_{\rm cut}$ (Rentala et al., 2014, Bose et al., 2 Aug 2024).
Groomed Jet Mass: Requiring the mass of the groomed jet to lie in a window around the target resonance ( $m_W,\,m_Z,\,m_H,\,m_t$ ) reduces QCD backgrounds substantially (Mehtar-Tani et al., 2020, Thaler et al., 2010).
Energy Correlation Functions (ECF): Hierarchical correlators capturing angular and energy correlations, including $C_2$ , $D_2$ , $D_3$ (Bhattacherjee et al., 2022).

Table: Key Traditional Tagging Strategies and Discriminants

Tagger/Observable	Discriminant	Typical Signal / Fake Rate
$N$ -subjettiness	$\tau_{21}$ , $\tau_{32}$	$40\%/1\%$ ( $W$ ), $30\%/1\%$ (top)
BDRS Mass Drop	$\mu,\; y_{\rm cut}$	Used for $H\to b\bar{b}$ , see (Bose et al., 2 Aug 2024)
Groomed Mass	Jet mass window	Background reduction >10 $\times$
ECF / D-variables	$D_2$ , $D_3$ ratios	Robust to pileup, multi-prong sensitivity

Combined use of mass and substructure improves performance multiplicatively—e.g., S/B enhancement by a factor $\sim (\epsilon_{\text{signal}}/\epsilon_\text{background})^2$ when cuts are made on both leading jets in resonance searches (Thaler et al., 2010).

3. Sensitivity to QCD Color Flow and Event Structure

Tagger performance depends crucially on the underlying color structure of the event (Joshi et al., 2012, Salam et al., 2016). Color singlet resonances (e.g., $Z' \to t\bar{t}$ via KK photon) generate different jet radiation patterns than octet resonances (e.g., via KK gluon):

Extra Radiation: Color-octet decays have more internal and external QCD radiation, modifying subjet kinematics (mass, $p_T$ ) and increasing mistag rates.
Tagger Dependence: Efficiency differences of 15–75% between color singlet and octet signals under tight cuts, especially at low mis-tag rates relevant for discovery (Joshi et al., 2012).
Mitigation: Minimize jet radius $R$ to suppress soft radiation, tune mass windows or use taggers (e.g., HEPToptagger with built-in jet grooming) with reduced color sensitivity.

The use of dichroic subjettiness ratios, defined by measuring $\tau_2$ on the full jet and $\tau_1$ on the groomed/tagged jet,

$\tau_{21}^\text{dichroic} = \frac{\tau_2^\text{full}}{\tau_1^\text{tagged}},$

enhances background suppression by exploiting the difference in large-angle soft radiation between color-singlet signal jets and QCD backgrounds. Improvements of $\sim$ 25% in signal significance and a reduction in non-perturbative effects by factors 2–3 are observed relative to traditional ratios (Salam et al., 2016).

4. Machine Learning and Hybrid Taggers

Contemporary developments have integrated deep learning and hybrid models (Paganini, 2017, Kasieczka, 2018, Macaluso et al., 2018, Bose et al., 2 Aug 2024, Bhattacherjee et al., 2022). Key elements include:

Neural Networks (CNNs, GNNs, RNNs): CNNs process jet images (calorimetric or tracking $p_T$ , multiplicities), graph neural networks treat jet constituents as nodes with spatial and kinematic features (Macaluso et al., 2018, Sahu et al., 13 Jan 2025).
Feature Engineering vs. End-to-End: Some taggers ingest only low-level inputs (e.g., four-vectors, images); others combine high-level variables (e.g., $\tau_{32}$ , $D_2$ ) as features to enhance interpretability and performance (Bhattacherjee et al., 2022).
Hybrid Taggers: Combine traditional variables with ML outputs, e.g., GNN-derived class scores as inputs to a boosted decision tree (BDT) for event selection, yielding performance greater than either approach alone (Sahu et al., 13 Jan 2025, Bose et al., 2 Aug 2024).
Interpretability Tools: Shapley value analysis (SHAP) quantifies input variable importance in decision trees or ensemble models, elucidating which observables are most predictive (Bhattacherjee et al., 2022, Chowdhury et al., 2023).

Table: Machine Learning Approaches and Inputs

ML Tagger Type	Input Features	Role
CNN/DeepTop	Jet images ( $p_T$ maps etc.)	End-to-end jet class. ( $W$ , top, QCD…)
XGBoost BDT	High-level observables	Variable ranking, hybrid taggers
GNN/LorentzNet	Jet constituent 4-vectors, graphs	Fat jet multiclassification (top/H/QCD)

Hybrid methods can handle both SM and BSM signatures, including rare flavor-violating decays and explore unexplored boosted regimes (Chowdhury et al., 2023, Zhao et al., 28 Feb 2025).

5. Hardware, Real-Time Application, and Latency Constraints

With increasing trigger rates and HL-LHC luminosity, real-time, FPGA-based ML triggers are required for prompt event selection (Bileska, 8 May 2025):

FPGA Deployment: Models (e.g., WOMBAT) are distilled into quantized, resource-constrained versions running on FPGA hardware at L1 trigger, achieving $<22$ clock cycle latency for identification of $H\rightarrow b\bar{b}$ on calorimeter trigger primitives (Bileska, 8 May 2025).
Latency and Rate Control: WOMBAT achieves comparable signal efficiency at an offline $p_T$ threshold of 146.8 GeV (40.6 GeV lower than traditional single-jet triggers) at a 1 kHz output rate.
Knowledge Distillation: Separates high-fidelity master models (offline) from apprentice (quantized, real-time) models, enabling practical, efficient real-time inference with constrained resources.

FPGA-based ML triggers demonstrate that sophisticated jet substructure algorithms can operate within the tight timing and resource restrictions of LHC trigger systems, with significant implications for Phase-2 and beyond.

6. Practical Impact and Applications

Boosted-object tagging algorithms underpin a broad range of LHC applications:

Standard Model Measurements: Improved top tagging enhances $|V_{cb}|$ extractions using boosted $bc$ signatures with in-situ calibration, yielding $\sim$ 30% improved precision under HL-LHC expectations (Zhao et al., 28 Feb 2025).
Exotic and BSM Searches: Improved sensitivity to rare/topologies, e.g., $H^\pm\to bc$ searches may gain a factor of 2–5 in reach via AI-based taggers (Zhao et al., 28 Feb 2025), and dedicated taggers probe rare flavor-violating top decays ( $t\to cH$ ) in the boosted regime (Chowdhury et al., 2023).
Trigger and Data Acquisition: Real-time selection of boosted Higgs, top, and boson decays at low operation thresholds, crucial for Phase-2 trigger design (Bileska, 8 May 2025).
Interpretability: SHAP-assisted hybrid taggers rank variables by explanatory power; mass, $N$ -subjettiness, and ECFs are consistently dominant contributors (Bhattacherjee et al., 2022).

Supported by a suite of methodologies, boosted-object tagging algorithms enable efficient selection, precision study, and physical understanding of heavy objects under challenging experimental conditions. Their interpretability, adaptability to hardware, and robustness to QCD and detector effects will remain essential in the HL-LHC and in future collider environments.