DeepTau Algorithm Overview

Updated 11 November 2025

DeepTau is a CNN-based tau lepton identification method that uses image-based PF representations and domain adaptation to enhance discrimination of genuine taus from backgrounds.
It achieves a 30–50% reduction in jet misidentification rates at fixed tau efficiency, thereby improving simulation fidelity to collision data.
Its architecture integrates multiple input grids, high-level features, and an adversarial branch, ensuring robust calibration and effective performance in CMS analyses.

DeepTau is a tau lepton identification algorithm developed for the Compact Muon Solenoid (CMS) experiment at the Large Hadron Collider (LHC), designed to discriminate hadronic tau decays ( $\tau_\mathrm{h}$ ) from backgrounds such as quark and gluon jets, electrons, and muons. Employing convolutional neural network (CNN) techniques and, in its version 2.5, domain adaptation via backpropagation, DeepTau v2.5 substantially improves both the fidelity of simulation to collision data and the overall identification performance, achieving a 30–50% reduction in jet-to-tau misidentification at fixed efficiency. Its design leverages “image-based” representations of particle-flow (PF) objects around each candidate and incorporates robust calibration strategies for direct usage in physics analyses at $\sqrt{s}=13$ and $13.6$ TeV.

1. Architecture and Input Representation

DeepTau v2.5 utilizes a multi-branch CNN architecture designed to exploit spatial and high-level feature representations of particles near candidate $\tau_\mathrm{h}$ objects:

Input Construction:
- Two overlapping grids in the $\eta$ – $\phi$ plane are centered on each HPS-reconstructed $\tau_\mathrm{h}$ candidate:
- Inner grid: $11 \times 11$ cells, each of size $\Delta\eta \times \Delta\phi = 0.02 \times 0.02$ (corresponding to the signal cone with radius $R=0.1$ ).
- Outer grid: $21 \times 21$ cells, each of size $0.05 \times 0.05$ (corresponding to the isolation cone with radius $R=0.5$ ).
- Each cell encodes up to seven PF-reconstructed particle types (electron, muon, photon, charged hadron, neutral hadron, and standalone electron/muon) with associated kinematical and identification features (such as $p_T$ , $\Delta\eta$ , $\Delta\phi$ , electric charge, PUPPI weights, electromagnetic/hadronic calorimeter cluster properties, and track-to-vertex compatibility).
Additional Features:
- 43 high-level variables summarize $\tau_\mathrm{h}$ kinematics (e.g., $\eta$ , $p_T$ , charge, decay mode prongs), isolation balances, leading track and secondary vertex information, variables for discrimination between electromagnetic and hadronic showers, pileup characteristics, etc.
Network Structure:
- The architecture consists of three distinct branches:
- 1. The 43 high-level features processed by fully connected (FC) layers.
- 2. The inner $11 \times 11 \times N$ grid processed by convolutional and pooling layers, yielding a $1 \times 1 \times M_\text{inner}$ embedding.
- 3. The outer $21 \times 21 \times N$ grid processed similarly to provide a $1 \times 1 \times M_\text{outer}$ representation.
- Outputs from all branches are concatenated and passed through a stack of FC layers and a final softmax layer producing per-class scores $\hat{y} = [\hat{y}_e, \hat{y}_\mu, \hat{y}_\tau, \hat{y}_\text{jet}]$ .
- Parametric ReLU (PReLU) activation is used:
$f(x) = \max(0, x) + \alpha \min(0, x)$ - Batch normalization and dropout ( $\mathcal{O}(10$ -- $20\%)$ per FC layer) are applied for regularization.

2. Domain Adaptation via Backpropagation

To reduce data–simulation discrepancies, DeepTau v2.5 employs a domain adaptation strategy using a gradient reversal layer (GRL):

Gradient Reversal Layer (GRL):
- Inserted between the shared feature-extracting layers and an adversarial branch tasked with classifying the source domain (simulation vs. real data).
- Forward pass: identity; backward pass: multiplies the incoming gradient by $-\lambda$ for domain loss, effectively reversing it.
Loss Functions:
- Classification loss ( $\mathcal{L}_\text{class}$ ): Combines categorical cross-entropy (for genuine $\tau_\mathrm{h}$ ), focal loss components (for overall background discrimination), and targeted cross-entropy penalties for separating jets, electrons, and muons when $\hat{y}_\tau$ is large.
- Adversarial (domain) loss:
$\mathcal{L}_\text{domain} = -\bigl[d \ln \hat{y}_\text{adv} + (1-d) \ln (1-\hat{y}_\text{adv})\bigr]$

where $d=1$ for data, $0$ for simulation. - Combined objective:

$\mathcal{L}_\text{total} = \mathcal{L}_\text{class} + \lambda\,\mathcal{L}_\text{domain}$

GRL ensures that the gradient from $\mathcal{L}_\text{domain}$ is negated in the feature-extraction trunk:

$G = \nabla_w \mathcal{L}_\text{class} - \lambda \nabla_w \mathcal{L}_\text{domain}$ - This leads to domain-invariant feature learning, especially in regions with high purity of genuine $\tau_\mathrm{h}$ candidates.

3. Training Datasets, Workflow, and Hyperparameters

Datasets:
- Simulation (2018 conditions): Mix of Z+jets, W+jets, $t\bar{t}$ , single-top, diboson, $H\rightarrow\tau\tau$ , QCD multijet, “ $\tau$ -gun,” $Z'\rightarrow ee$ processes, ensuring uniform $p_T$ and $\eta$ distributions per class.
- Real data (2018, 13 TeV, 60 fb $^{-1}$ ): Z $\rightarrow\tau_\mu\tau_h$ (“ $\mu\tau$ control sample”) where $\tau_\mathrm{h}$ purity is approximately $76\%$ is used exclusively for domain adaptation.
Workflow:
- Main optimizer (Adam/NAdam) for shared and classification layers, learning rate $\approx 10^{-4}$ .
- Separate optimizer (Adam) for domain branch, learning rate $\approx 10^{-5}$ .
- Domain loss weighting: $\lambda \approx 10$ .
This staged training decouples classification performance from simulation–data mismodeling, reducing systematic uncertainties associated with modeling detector effects.

4. Performance Evaluation

Metrics:
- $\tau_\mathrm{h}$ identification efficiency: $\varepsilon_\tau = \frac{\#~\text{genuine}~\tau_\mathrm{h}\ \text{passing WP}}{\#~\text{genuine}~\tau_\mathrm{h}}$ .
- Misidentification (fake) rate: $f_\text{jet} = \frac{\#~\text{jets passing}~\tau_\mathrm{h}~\text{WP}}{\#~\text{jets}}$ .
Results at 13 TeV (2018 sim):
- At fixed $\tau_\mathrm{h}$ efficiency, DeepTau v2.5 achieves marked reductions in fake rates compared to v2.1:
- For $\varepsilon_\tau \simeq 60\%$ : $f_\text{jet}$ reduced from $\sim1.2\%$ to $0.6\%$ .
- For $\varepsilon_\tau \simeq 80\%$ : $f_\text{jet}$ reduced from $\sim3.0\%$ to $1.5\%$ .
- Electron misidentification reduced by up to $\sim50\%$ at the tightest working points; muon misidentification remains $\ll0.1\%$ .
Robustness at 13.6 TeV (2022 data):
- Despite being trained on 2018 data, domain adaptation reduces data–simulation disagreement in high– $D_\text{jet}$ regions to $\lesssim2\%$ , compared to $\sim17\%$ pre-adaptation.

5. Calibration Strategies and Application in Analyses

Tag-and-probe Calibration:
- Tag-and-probe methods in $Z\to\tau_\mu\tau_h$ and $Z\to\tau_e\tau_h$ events are used to fit visible mass ( $m_\text{vis}$ ) distributions and extract:
- $\tau_\mathrm{h}$ energy scale corrections (TES): $\Delta E/E$ within $\pm3\%$ .
- $\tau_\mathrm{h}$ identification scale factors (SF $_{\tau\text{ID}} = \varepsilon_\text{data}/\varepsilon_\text{sim}$ ) within $0.9$–$1.1$.
- Both individual (fix TES, fit SF) and combined (profile likelihood in both TES and SF) fitting procedures are implemented.
High- $p_T$ Calibration:
- In $W^*\to\tau\nu$ events with $m_{W^*}>200$ GeV, control regions and $m_T(\tau_h,\text{MET})$ fits provide SF $_{\tau\text{ID}}$ in $p_T$ bins: [100–200], $>$ 200 GeV. The resulting SFs are $1.0\pm(8$ – $16\%)$ at high $p_T$ .
Lepton-to- $\tau_\mathrm{h}$ Misidentification Calibration:
- $Z\to\mu\mu$ (“ $\mu\tau$ probe”) and $Z\to ee$ (“ $e\tau$ probe”) tag-and-probe methods are used to determine mis-ID rate scale factors (SFs) as functions of $|\eta|$ and $\tau_\mathrm{h}$ decay mode:
- SF $_{\mu\to\tau_h}(|\eta|)$ rises to $\sim1.2$ for $|\eta|>2.1$ .
- SF $_{e\to\tau_h}$ typically within $5$– $10\%$ depending on decay mode and detector region.
Systematic uncertainties (including luminosity, trigger/isolation, background shaping, mis-ID energy scale, and PDF/scale variations for $W^*$ ) are catalogued for each DeepTau v2.5 working point, and correction factors are propagated to CMS physics analyses for $\sqrt{s}=13$ and $13.6$ TeV.

6. Context, Significance, and Outlook

DeepTau v2.5's use of image-based PF encoding, advanced convolutional architectures, adversarial domain adaptation, and extensive calibration achieves significant improvements in distinguishing genuine $\tau_\mathrm{h}$ from jets and other fakes. The 30–50% reduction in jet misidentification for fixed signal efficiency, combined with the reduction of data–simulation discrepancies to a few percent across both LHC Run 2 and early Run 3 datasets, enhances the reliability of CMS analyses involving $\tau_\mathrm{h}$ signatures.

A plausible implication is that further developments could continue to target robustness to changing detector conditions and evolving pileup profiles, leveraging similar domain adaptation frameworks. The algorithm's modular, image-based structure provides a foundation for ongoing improvement and adaptation to future LHC datasets.

PDF Markdown Chat (Pro)

Follow Topic

Get notified by email when new papers are published related to DeepTau Algorithm.