RECAP Pipeline for CTA
- RECAP Pipeline is a modular, open-source event reconstruction stack for CTA that integrates calibration, cleaning, parameterization, and IRF generation.
- It utilizes a linear five-stage design combining ctapipe, protopipe, and pyirf to enable robust shower physics reconstruction and sensitivity optimization.
- The pipeline incorporates advanced techniques such as ML regression and deep-learning methods to enhance gamma/hadron discrimination and performance predictability.
The RECAP Pipeline refers to the current prototype open event-reconstruction pipeline implemented for the Cherenkov Telescope Array (CTA), the next-generation ground-based gamma-ray observatory. Designed as an open-source analysis stack and assembled in the protopipe framework, RECAP encompasses multiple tightly integrated stages for transforming raw telescope data into high-level science products. The pipeline orchestrates three major software frameworks—ctapipe, protopipe, and pyirf—executed in a linear chain of calibrated, image-based, multivariate, and instrument-response modules. Its functionality spans calibration of digitized telescope signals, image cleaning and parameterization, shower physics reconstruction (direction, energy, and gamma/hadron discrimination), and the derivation of instrument response functions (IRFs) required for sensitivity forecasting and astrophysical interpretation (Nöthe et al., 2021).
1. Pipeline Architecture and Module Interactions
RECAP adheres to a linear five-stage design, each implemented as a YAML-configured Python Tool that chains core ctapipe routines with higher-level logic in protopipe and IRF calculation in pyirf:
- Stage 0: Calibration & Image Extraction (ctapipe) Converts raw digitized signals into calibrated pixel charges and times, extracting images from telescope data streams.
- Stage 1: Image Cleaning & Pixel Selection (ctapipe) Applies multi-level charge and time thresholds to remove noise and select physically meaningful pixels.
- Stage 2: Image Parameterization (ctapipe) Reduces clean images to classical and auxiliary moments (Hillas parameters, skew, kurtosis, timing spread) for downstream analysis.
- Stage 3: Event-Level Reconstruction (ctapipe + protopipe.mva) Combines monoscopic (ML) and stereoscopic (geometric) shower reconstruction, including energy regression and particle identification.
- Stage 4: IRF Generation & Sensitivity Optimization (pyirf) Computes effective area, PSF, and energy dispersion histograms, optimizes gamma/hadron and angular cuts, and outputs GADF-compliant IRFs and sensitivity curves.
The overall data flow—managed via DIRAC/CTADIRAC grid orchestration—routes simulated or prototype-telescope events through ctapipe’s EventSource, calibration/image extraction, cleaning, parameterization, higher-level event reconstruction (DL1b, DL2, DL3), and IRF construction. Model training, configuration, and storage primitives are standardized throughout the toolchain.
2. Calibration and Image Extraction
Calibration centers on transforming digitized waveforms () into unbiased physical signals:
- Remove pixel-wise pedestal (), apply gain ():
- Integrate over window (fixed, sliding, or fit-based): Pixel charge: Mean arrival time:
Window selection leverages pulse-shape fits and sliding maximum searchers for S/N maximization. Cleaning employs a two-threshold system: a "picture" charge (, e.g. 5 p.e.) for core pixels and a "boundary" threshold (, e.g. 3 p.e.) for neighbors, plus time difference constraints. Advanced cleaning methods (e.g., wavelet or fractal) are also available.
3. Image Parameterization
Parameterization reduces the selected pixel field to a compact vector:
- Weighted centroid:
- Second central moments (Hillas): Width: Length: Principal axis rotation aligns length with the shower major axis.
Higher moments and auxiliary features—such as skewness, kurtosis, and pixel-time spread—inform ML-based event discrimination and regression stages.
4. Event-Level Reconstruction
Two main regimes exist:
- Monoscopic: Uses ML regressors (e.g., random forest or AdaBoost, trained in protopipe.mva) on per-telescope Hillas plus timing features to estimate shower direction and energy.
- Stereoscopic: Geometric reconstruction via intersection of major axes from multiple telescopes, aggregating direction vectors in a brightness-weighted scheme.
Energy estimation employs either lookup tables () or multivariate regressors. Particle ID (gamma/hadron) relies on a random-forest classifier, outputting a "gammaness" score in [0,1]. The ImPACT likelihood fit—maximizing —is used for template-based event reconstruction.
5. Instrument Response Function (IRF) Generation
pyirf formalizes all IRFs in the generalized GADF factorization:
- Effective area:
Quantifies event selection relative to simulated shower generation.
- PSF and Energy Dispersion:
Binned in direction offset , energy, and true/reconstructed bins.
pyirf outputs standard FITS IRF files and runs sensitivity optimization by adjusting "gammaness" and cuts for specified exposure durations.
6. Performance, Validation, and Implementation
Benchmark studies for LST-1 report:
- Angular resolution (68% containment): 0.04°–0.1° above 100 GeV, matching legacy pipelines.
- Energy resolution: 10–15% over broad energies.
- Sensitivity: 5–10× improvement over previous-generation IACT systems.
protopipe leverages numba-accelerated inner loops, multiprocessing in ctapipe’s Map, and DIRAC-based grid execution, maintaining job memory footprints at a few GB.
7. Future Directions
Current priorities include:
- Complete integration of real and simulated data streams via unified ctapipe I/O.
- Production release of protopipe after universal camera and mode validation.
- Incorporation of deep-learning reconstruction methods (ctlearn, gammalearn) into ctapipe.
- Time- and spatial-dependence support in pyirf’s IRF production, for variable and extended sources.
- Enhanced parallel I/O and HPC compatibility for grid-scale workflows.
These directions reflect the pipeline’s role as a flexible, extensible backbone for both operational analysis and methodological research in CTA event reconstruction (Nöthe et al., 2021).