DeepTau Algorithm Overview
- DeepTau is a CNN-based tau lepton identification method that uses image-based PF representations and domain adaptation to enhance discrimination of genuine taus from backgrounds.
- It achieves a 30–50% reduction in jet misidentification rates at fixed tau efficiency, thereby improving simulation fidelity to collision data.
- Its architecture integrates multiple input grids, high-level features, and an adversarial branch, ensuring robust calibration and effective performance in CMS analyses.
DeepTau is a tau lepton identification algorithm developed for the Compact Muon Solenoid (CMS) experiment at the Large Hadron Collider (LHC), designed to discriminate hadronic tau decays () from backgrounds such as quark and gluon jets, electrons, and muons. Employing convolutional neural network (CNN) techniques and, in its version 2.5, domain adaptation via backpropagation, DeepTau v2.5 substantially improves both the fidelity of simulation to collision data and the overall identification performance, achieving a 30–50% reduction in jet-to-tau misidentification at fixed efficiency. Its design leverages “image-based” representations of particle-flow (PF) objects around each candidate and incorporates robust calibration strategies for direct usage in physics analyses at and $13.6$ TeV.
1. Architecture and Input Representation
DeepTau v2.5 utilizes a multi-branch CNN architecture designed to exploit spatial and high-level feature representations of particles near candidate objects:
- Input Construction:
- Two overlapping grids in the – plane are centered on each HPS-reconstructed candidate:
- Inner grid: cells, each of size (corresponding to the signal cone with radius ).
- Outer grid: cells, each of size (corresponding to the isolation cone with radius ).
- Each cell encodes up to seven PF-reconstructed particle types (electron, muon, photon, charged hadron, neutral hadron, and standalone electron/muon) with associated kinematical and identification features (such as , , , electric charge, PUPPI weights, electromagnetic/hadronic calorimeter cluster properties, and track-to-vertex compatibility).
- Additional Features:
- 43 high-level variables summarize kinematics (e.g., , , charge, decay mode prongs), isolation balances, leading track and secondary vertex information, variables for discrimination between electromagnetic and hadronic showers, pileup characteristics, etc.
- Network Structure:
- The architecture consists of three distinct branches:
- 1. The 43 high-level features processed by fully connected (FC) layers.
- 2. The inner grid processed by convolutional and pooling layers, yielding a embedding.
- 3. The outer grid processed similarly to provide a representation.
- Outputs from all branches are concatenated and passed through a stack of FC layers and a final softmax layer producing per-class scores .
- Parametric ReLU (PReLU) activation is used:
- Batch normalization and dropout (-- per FC layer) are applied for regularization.
2. Domain Adaptation via Backpropagation
To reduce data–simulation discrepancies, DeepTau v2.5 employs a domain adaptation strategy using a gradient reversal layer (GRL):
Gradient Reversal Layer (GRL):
- Inserted between the shared feature-extracting layers and an adversarial branch tasked with classifying the source domain (simulation vs. real data).
- Forward pass: identity; backward pass: multiplies the incoming gradient by for domain loss, effectively reversing it.
- Loss Functions:
- Classification loss (): Combines categorical cross-entropy (for genuine ), focal loss components (for overall background discrimination), and targeted cross-entropy penalties for separating jets, electrons, and muons when is large.
- Adversarial (domain) loss:
where for data, $0$ for simulation. - Combined objective:
GRL ensures that the gradient from is negated in the feature-extraction trunk:
- This leads to domain-invariant feature learning, especially in regions with high purity of genuine candidates.
3. Training Datasets, Workflow, and Hyperparameters
Datasets:
- Simulation (2018 conditions): Mix of Z+jets, W+jets, , single-top, diboson, , QCD multijet, “-gun,” processes, ensuring uniform and distributions per class.
- Real data (2018, 13 TeV, 60 fb): Z (“ control sample”) where purity is approximately is used exclusively for domain adaptation.
- Workflow:
- Main optimizer (Adam/NAdam) for shared and classification layers, learning rate .
- Separate optimizer (Adam) for domain branch, learning rate .
- Domain loss weighting: .
- This staged training decouples classification performance from simulation–data mismodeling, reducing systematic uncertainties associated with modeling detector effects.
4. Performance Evaluation
- Metrics:
- identification efficiency: .
- Misidentification (fake) rate: .
- Results at 13 TeV (2018 sim):
- At fixed efficiency, DeepTau v2.5 achieves marked reductions in fake rates compared to v2.1:
- For : reduced from to .
- For : reduced from to .
- Electron misidentification reduced by up to at the tightest working points; muon misidentification remains .
- Robustness at 13.6 TeV (2022 data):
- Despite being trained on 2018 data, domain adaptation reduces data–simulation disagreement in high– regions to , compared to pre-adaptation.
5. Calibration Strategies and Application in Analyses
- Tag-and-probe Calibration:
- Tag-and-probe methods in and events are used to fit visible mass () distributions and extract:
- energy scale corrections (TES): within .
- identification scale factors (SF) within $0.9$–$1.1$.
- Both individual (fix TES, fit SF) and combined (profile likelihood in both TES and SF) fitting procedures are implemented.
- High- Calibration:
- In events with GeV, control regions and fits provide SF in bins: [100–200], 200 GeV. The resulting SFs are – at high .
- Lepton-to- Misidentification Calibration:
- (“ probe”) and (“ probe”) tag-and-probe methods are used to determine mis-ID rate scale factors (SFs) as functions of and decay mode:
- SF rises to for .
- SF typically within $5$– depending on decay mode and detector region.
- Systematic uncertainties (including luminosity, trigger/isolation, background shaping, mis-ID energy scale, and PDF/scale variations for ) are catalogued for each DeepTau v2.5 working point, and correction factors are propagated to CMS physics analyses for and $13.6$ TeV.
6. Context, Significance, and Outlook
DeepTau v2.5's use of image-based PF encoding, advanced convolutional architectures, adversarial domain adaptation, and extensive calibration achieves significant improvements in distinguishing genuine from jets and other fakes. The 30–50% reduction in jet misidentification for fixed signal efficiency, combined with the reduction of data–simulation discrepancies to a few percent across both LHC Run 2 and early Run 3 datasets, enhances the reliability of CMS analyses involving signatures.
A plausible implication is that further developments could continue to target robustness to changing detector conditions and evolving pileup profiles, leveraging similar domain adaptation frameworks. The algorithm's modular, image-based structure provides a foundation for ongoing improvement and adaptation to future LHC datasets.