Real-Time Analysis in High-Energy Physics

Updated 18 January 2026

Real-time analysis in high-energy physics is a system of specialized algorithms and computing architectures that process massive detector data streams in microsecond to millisecond timescales.
Multi-tiered trigger hierarchies combine custom hardware (FPGAs/ASICs) and heterogeneous software (CPUs, GPUs) to achieve rapid event selection and significant data reduction.
This approach enhances physics reach by expediting event reconstruction, minimizing storage demands, and enabling near-immediate feedback for discoveries in experiments like the LHC and RHIC.

Real-time analysis in high-energy physics (HEP) refers to the suite of algorithms, computing architectures, and workflows that enable the online selection, partial or full reconstruction, and physics-quality characterization of particle collisions or rare phenomena while the data are still in flight, often within microsecond to millisecond timescales. Motivated by detector output rates reaching terabytes per second and raw event volumes far exceeding practical storage or transfer capacities, real-time analysis is the linchpin for reducing, selecting, and making analysis-quality decisions on data at every stage from specialized hardware triggers to heterogeneously accelerated software farms. Contemporary and next-generation HEP experiments at facilities such as the LHC, RHIC, FAIR, and CTA employ hierarchies of hardware, firmware, and software that optimize latency, throughput, and selection performance under extreme constraints (Gligorov et al., 2023, Gligorov, 2015, Betts et al., 5 Aug 2025).

1. System Architectures and Trigger Hierarchies

Modern high-energy physics experiments universally adopt multi-tiered data reduction pipelines:

Level-1 (L1) Hardware Triggers: Custom logic (FPGAs/ASICs) with fixed latency budgets in the O(1–12) μs regime, sampling data at bunch-crossing rates (e.g., 40 MHz at the LHC), using coarse calorimetric or muon-chamber features to yield reduction factors of 10²–10³ (Gligorov et al., 2023, Gligorov, 2015). Current LHC devices (e.g., Xilinx Kintex Ultrascale) process up to O(100) clusters per 25 ns interval on-chip, deploying simplified pattern-recognition algorithms and a small number of low-latency arithmetic pipelines (Iiyama et al., 2020).
High-Level Triggers (HLT): Large farms of CPUs, GPUs, and sometimes FPGAs, process the output of L1 at rates of O(1–10⁵) events/s, with per-event time budgets of O(10–200) ms (Gligorov et al., 2023, Betts et al., 5 Aug 2025). HLT stages perform partial or full event reconstruction, including track- and vertex-finding (using Cellular Automata or Kalman filters), particle identification, and multivariate inference (BDTs, DNNs), often with staged cascades and adaptive buffering strategies (Gligorov, 2015).
Triggerless and Software-only Streaming: Upgrades at LHCb and ALICE move toward "triggerless" architectures, where all collisions are read out and filtered by pure software on heterogeneous CPU/GPU farms, removing the hardware threshold (Aaij et al., 2019).

A summary table of performance targets and architectures:

Stage	Latency Target	Throughput	Core Techniques
L1 (HW Trigger)	1–12 μs	O(1) TB/s	FPGA/ASIC, fixed-logic
HLT	10–200 ms/event	>100 kHz	CPU/GPU/FPGA, SW farm
Real-Time Scouting	1–10 ms/event	≳50 GB/s	Buffer+selection

(Gligorov et al., 2023, Iiyama et al., 2020)

2. Dataflow, Reduction, and Online Event Models

Data rates at collider and astroparticle experiments necessitate aggressive multi-stage reduction:

Input Volumes: LHC Run 1/2: LHCb ~1.2 TB/s, ATLAS/CMS ~40 TB/s; CTA: O(4 GB/s) sustained camera data; STAR@RHIC: hundreds of millions of events per run (Gligorov, 2015, Zoli et al., 2015, Betts et al., 5 Aug 2025).
Reduction Factors: Total data reduction from detector to permanent store is O(10⁴–10⁵) (Gligorov et al., 2023, Gligorov, 2015).
Output Event Models: Schemes such as the LHCb "Turbo" model store only user-specified high-level physics objects (e.g., candidate tracks/vertices) instead of full raw detector data, achieving persistent size ≤10% of the full event, and multiplying the sustainable trigger output by the same factor (Aaij et al., 2019, Aaij et al., 2016). Selective persistence and per-trigger grooming further optimize inclusive trigger retention.

The mathematical scaling of sustainable rate as a function of reduction factor R is

$f_{\text{max}} = \frac{B_{\text{max}}}{s}$

where $B_{\text{max}}$ is the bandwidth cap and $s$ is the event size after reduction (Aaij et al., 2019).

3. Algorithms for Real-Time Reconstruction and Selection

Real-time analysis leverages specialized and sometimes highly parallelized algorithms:

Pattern Recognition and Tracking: Sliding-window, Hough transforms, Cellular Automata for O(N log N) complexity in calorimeter/track-finding (Gligorov et al., 2023, Betts et al., 5 Aug 2025). CA implementations in STAR HLT deliver ~2 kHz throughput with O(1 ms) per-event latency for >10⁴ hits/event (Betts et al., 5 Aug 2025).
Kalman-Filter–Based Reconstruction: For both trajectory optimizations and vertexing, enabling efficient short-lived particle (e.g., hypernuclei) clustering with update/predict cycles streaming across candidate chains (Betts et al., 5 Aug 2025).
Machine-Learning–Based Selection: FPGA-deployable BDTs or compressed GNNs are employed at the L1 or HLT stage, using quantized weights, precomputed nonlinearity (e.g., exp tables), and integer arithmetic to meet hard resource and latency constraints; e.g., GNN-based calorimeter cluster identification with <1 μs latency, AUC ≈ 0.98 at O(50%) DSP utilization (Iiyama et al., 2020).
Express Data Production and Fast Event Rejection: In STAR, express workflows pipeline hit clustering, tracking, vertexing, and candidate selection through modular, parallel schedulers, reaching physics-quality event selection within hours of data taking (Betts et al., 5 Aug 2025).

4. Heterogeneous and Accelerated Computing

The need to meet extreme throughput and energy-efficiency requirements has guided a trend toward heterogeneous parallelism:

CPUs and Multithreading: Efficient software pipelines on O(10³–10⁴) cores, leveraging OpenMP, pthread, or MPI, as in the CBM FLES architecture for pure software triggers at rates up to 10 MHz (Singhal et al., 2018).
GPUs: Accelerate computationally dense, data-parallel workloads (track finding, pattern recognition, fit reduction) with per-event latencies down to O(1–400 μs) and throughput up to O(10⁶–10⁷) fits/s per card. Bottlenecks include PCIe I/O and deep execution pipelines, but tasks with >1 ms compute budget per event and regular access patterns are well-mapped to GPUs (Bruch, 2020).
FPGAs: Provide fixed, hard-real-time logic for the strictest latency stages (L1 triggers), particularly for fast clustering, GNN inference, and, more broadly, as programmable pipelines for cluster finding (e.g., real-time centroid calculation in LHCb’s VELO), programmable counters, and bespoke data path reductions (Iiyama et al., 2020, Cordova et al., 13 Mar 2025).

A representative table of energy-performance from CTA waveform extraction:

Device	Threads/Kernels	Throughput (kevents/s)	kevents/s/W
CPU (OpenMP)	56 cores	164.4	0.861
GPU (OpenCL)	Pixel-parallel	36.97	0.389
FPGA (OpenCL)	Unrolled pipeline	10.93	0.520

(Zoli et al., 2015)

5. Workflow Management, Frameworks, and Monitoring

Achieving robust real-time analysis requires coordination frameworks integrating distributed compute, priority queueing, configuration management, and integrated monitoring:

Modular, Distributed Pipelines: STAR’s HLT and Express Data Production employ distributed, event-parallel and intra-event–vectorized modules on a dedicated cluster with HTCondor-managed express jobs and real-time calibration feedback (Betts et al., 5 Aug 2025).
Real-time Analysis Frameworks: rta-dp (ZeroMQ-based), supports priority queues, JSON workflow configuration, and hierarchical monitoring at the Supervisor, WorkerManager, and Worker level; observed to deliver (process-dependent) latencies in the sub-millisecond range (Bulgarelli et al., 5 Nov 2025). LHCb’s Tesla application sequentially formats Turbo stream HLT2 output into analysis-ready datasets for direct physics exploitation without offline reconstruction (Aaij et al., 2016).
Online Monitoring and Alignment: Integrated in event-filter and HLT chains, including in situ buffer-driven calibration/alignment, logging, and outlier detection, e.g., programmable counters updated per millisecond in FPGAs yield O(μm) beam-spot monitoring resolution with real-time feedback capabilities (Cordova et al., 13 Mar 2025).

6. Technical Challenges and Scaling Laws

Real-time analysis is constrained and shaped by several technical and resource boundaries:

Latency and Throughput: Each system stage is bounded by fixed latency per event across all pipeline stages; e.g., L1 ≲ 1–12 μs, HLT ≲ 100–200 ms/event. Aggregate throughput is sensitively dependent on the achievable concurrency—expressed in the performance formula $T = 1/(II \times f_{\mathrm{clk}})$ for pipelined designs (López et al., 2 Jan 2025, Iiyama et al., 2020).
Resource Stratification and Device Bottlenecks: The allocation law $C_i \approx R_i T_i/\varepsilon_i$ governs parallel resource planning, with further trade-offs among LUTs, BRAM, DSP, and other on-chip resources for FPGAs (Iiyama et al., 2020, Betts et al., 5 Aug 2025).
Algorithmic Complexity vs. Parallelization: The complexity ( $O(N \log N)$ for CA, $\sim O(N)$ for optimized waveform extraction) constrains both scalability and achievable resource usage (Zoli et al., 2015, Betts et al., 5 Aug 2025). Memory and data movement (e.g., PCIe I/O, cache locality in query services) often dominate execution, necessitating multi-level caching and data-aware scheduling (Pivarski et al., 2017).
Scaling to Exascale/Next-Generation Experiments: Projected upgrades (FCC-hh, Muon Collider) anticipate raw rates of O(100 TB/s) and impose demands for L1 latency ≲10 μs and HLT event budgets ≲100 ms in exascale farms (Gligorov et al., 2023).

7. Impact on Analysis, Discovery, and Operations

Well-designed real-time analysis architectures produce transformative gains for science programs:

Physics Reach: LHCb’s Turbo and STAR’s Express chains enable order-of-magnitude increases in trigger rates and event yields for flavour, QCD, and exotic searches while capping storage and offline compute budgets (Aaij et al., 2019, Betts et al., 5 Aug 2025).
Fast Feedback: Real-time monitoring, low-latency reconstruction, and immediate quality assessment (e.g., on-the-fly hypernuclei discovery, continuous beam-spot tracking) reduce time-to-publication from months to days or weeks (Betts et al., 5 Aug 2025, Cordova et al., 13 Mar 2025).
Operational Resilience: Algorithmic and architectural flexibility accommodate dynamic run conditions, calibration drift, and variable beam/DAQ rates, with strategies such as priority queues, dynamic resource addition, and live configuration reloads (Bulgarelli et al., 5 Nov 2025, Gligorov, 2015).

In conclusion, real-time analysis is a multifaceted technical domain at the intersection of algorithmic innovation, hardware–software co-design, and workflow engineering, fundamentally shaping the data-to-discovery pipeline of high-energy physics and empowering the efficient, robust extraction of physics results under extreme bandwidth and latency constraints (Gligorov et al., 2023, Gligorov, 2015, Aaij et al., 2019, Betts et al., 5 Aug 2025, Iiyama et al., 2020).