ShapePipe: Dual Scientific Pipelines

Updated 6 February 2026

ShapePipe is a dual-purpose modular system offering distinct pipelines for weak-lensing measurement in astronomy and differentiable surrogate-based engineering shape optimization.
Its astronomy pipeline automates modular processing, dependency management, and parallel execution to efficiently generate calibrated galaxy shear catalogs for large surveys.
The engineering pipeline employs neural surrogates and gradient-based strategies to rapidly optimize complex design parameters, significantly accelerating CFD evaluation.

ShapePipe designates two unrelated scientific pipelines in the literature: (1) an open-source, modular astronomy software framework for weak-lensing shape measurement and analysis, and (2) a differentiable surrogate-based pipeline for engineering shape optimization. Both exhibit modular architectures supporting parallelism and reproducibility. However, their scientific aims, technical underpinnings, and fields of application are entirely distinct.

1. Modular Weak-Lensing Measurement and Analysis Pipeline

ShapePipe is an open-source, modular pipeline written in Python for weak-lensing data processing, analysis, and validation, developed for large photometric surveys, notably the UNIONS/CFIS collaboration. It orchestrates all steps required to generate calibrated galaxy shear catalogs from imaging data, with reproducibility and extensibility as core design goals (Farrens et al., 2022, Guinot et al., 2022).

ShapePipe is structured around two sub-packages: a “pipeline” core for configuration, job orchestration, and I/O, and a “modules” package, where each module encapsulates exactly one processing step. The pipeline core exposes a single command-line interface for chaining together arbitrary modules, which are executed sequentially as Python scripts or wrappers for external programs. Dependency management, FITS I/O, and logging are fully automated by the core, and new processing modules can be added without modifying the core codebase. Each module registers its required inputs/outputs, enabling the pipeline to route files transparently.

This modular paradigm allows arbitrary algorithmic variants for each task and facilitates “embarrassingly parallel” processing by dividing the sky into $\sim 0.5^\circ \times 0.5^\circ$ tiles. Joblib provides multi-core execution; mpi4py allows distributed runs on high-performance clusters, achieving near-linear scaling with computational resources (Farrens et al., 2022).

A summary of the main modules is provided below.

Module Category	Example Modules	Purpose
Pre-processing	uncompress_fits, split_exp	Prepares FITS inputs, splits exposures by CCD
Masking	mask	Bright star, artifact, and defect masking
Source Detection	sextractor, spread_model, setools	Star/galaxy separation, photometric cataloging
PSF Modeling	psfex, mccd_fit, psfex_interp	Per-CCD or focal-plane PSF reconstruction and interpolation
Shape Measurement	ngmix, galsim	Fitting galaxy shapes, PSF convolution, ellipticity
Shear Calibration	metacalibration (ngmix)	Shear response estimation via image shearing
Validation/Diagnostics	mccd_val, setools, TreeCorr	Residual analysis, null tests, two-point correlation stats
Catalog Generation	make_cat, pastecat, random_cat	Final output shears, random catalogs for clustering analyses

2. Key Algorithms and Processing Steps in Weak Lensing

The pipeline is initiated with MegaCam r-band FITS images, with astrometric and photometric calibration performed by upstream survey reduction pipelines (e.g., MegaPipe using Gaia DR2 for astrometry and PS1 for photometric scaling). The first pipeline stages decompress and split multi-extension exposures, build artifact masks using WeightWatcher (with catalog-driven star identification), and assemble required weight maps (Farrens et al., 2022).

Source extraction uses SExtractor (v2.25.0), followed by the “spread_model” statistic for robust star/galaxy discrimination and setools for catalog-based selection cuts and diagnostics (e.g., size–magnitude diagrams). Each source is modeled using Gaussian mixtures, with postage stamps cut for shape measurement.

Point Spread Function (PSF) models are generated with either PSFEx (per-CCD, pixel basis, low-degree spatial polynomials) or MCCD (principal components spanning global and CCD-level features), then interpolated to arbitrary positions within the focal plane. PSF validation utilizes residual maps, ellipticity and size residuals, and two-point correlation (“ρ-statistics”) to verify systematics are subdominant to the lensing signal (Guinot et al., 2022).

Galaxy shapes are measured using NGMIX, with multi-epoch model fitting and second-moment–based ellipticity calculation. Shear calibration is performed by metacalibration, which constructs the shear response matrix $R_{\alpha\beta} = \partial \langle e_\alpha \rangle / \partial g_\beta$ , including both shear and selection-induced response terms, yielding calibratable, bias-corrected ensemble shears. ShapePipe computes but does not apply $m, c$ bias corrections internally.

3. Performance, Validation, and Scientific Results in Weak Lensing

ShapePipe was deployed for the first weak-lensing analysis of 1700 deg $^2$ of CFIS r-band imaging, resolving $\sim$ 40 million galaxy shapes at $n_\mathrm{eff} \simeq 6.8$ arcmin $^{-2}$ —a sample well-suited for extragalactic lensing analyses (Guinot et al., 2022). Diagnostics confirmed that B-modes—tracers of systematic residuals—are consistent with zero (COSEBIs E/B decomposition), and PSF residual correlations are negligible compared to the lensing signal.

Shear measurements exhibit additive biases $|c| < 5 \times 10^{-4}$ and multiplicative calibration bias $|m| < 10^{-3}$ , as validated using image simulations based on deconvolved COSMOS galaxy models with realistic noise and PSF perturbations. Kaiser–Squires mass-mapping demonstrates clear E-mode correlations with Planck cluster locations, and the detection of stacked tangential shear profiles achieves $>4\sigma$ significance.

ShapePipe achieves full-tile ($0.25$ deg $^2$ ) processing in $\sim$ 4 hours on 8 cores. Large-area ( $\sim 1700$ deg $^2$ ) processing requires weeks on $\sim$ 200 VMs, and the final 40M galaxy catalog occupies $\sim$ 120 GB. The pipeline is trivially scalable to larger volumes by additional hardware resources (Farrens et al., 2022).

4. Differentiable Pipeline for Surrogate-Based Shape Optimization

ShapePipe is also the name for a differentiable pipeline enabling gradient-based optimization of engineering shapes through modular replacement of non-differentiable CAE (Computer-Aided Engineering) components with differentiable neural surrogates (Rehmann et al., 13 Nov 2025). This approach is motivated by the need for rapid, high-dimensional exploration in design optimization, where conventional mesh generation and CFD solvers (such as OpenFOAM) are non-differentiable.

The pipeline consists of three phases—data generation, surrogate training, and gradient-based optimization—composed of “Tesseract” modules. Initially, a Geometry Tesseract maps design parameters $\mathbf{p} \in \mathbb{R}^4$ (e.g., for the cone example: two radii, length, Euler angle) to a signed distance function (SDF) representation $\phi(\mathbf{x}; \mathbf{p})$ and mesh. Downstream modules invoke Gmsh for CFD mesh creation and OpenFOAM to solve Reynolds–Averaged Navier–Stokes (RANS) equations. Simulation outputs are interpolated to regular grids for training a full-field 3D U-Net surrogate $S_\theta$ . The surrogate is trained to minimize

$L_{\mathrm{sur}}(\theta) = \frac{1}{N}\sum_{i=1}^N \| S_\theta(\phi_i) - \mathbf{y}_i \|_2^2 + \lambda R(\theta)$

with $\lambda=0$ for baseline experiments and $\mathbf{y}_i$ the ground-truth velocity fields.

In the optimization phase, the surrogate enables fully differentiable mapping from design parameters to simulation outputs and final objectives (e.g., mean downstream velocity for drag minimization):

$J(\mathbf{p}) = \Theta(\mathbf{p}) = \frac{1}{|\Omega_\mathrm{grid}|} \sum_{\mathbf{x}} U_x(\mathbf{x}; \mathbf{p})$

Gradients for updating $\mathbf{p}$ are automatically propagated via autodiff through SDF evaluation, surrogate inference, and objective computation. Optimization utilizes the Method of Moving Asymptotes (MMA) with projection onto bounded parameter domains.

5. Case Study: Surrogate-Driven Shape Optimization Workflow

Applying ShapePipe to a rounded cone parameterized as $\mathbf{p} = [r_a, r_b, L, \theta_z]$ , the training set comprises 896 CFD simulations, with parameter values uniformly sampled over $[0.5,1.5]$ for radii, $[2.0,5.0]$ for length, and $[-0.5,0.5]$ for orientation angle. All CFD outputs are resampled onto $80 \times 40 \times 40$ regular grids. The trained U-Net achieves a final validation MSE around $3 \times 10^{-5}$ (normalized units); ablation indicates minimal effects from architectural variants (e.g., attention gates, mask smoothing).

Gradient-based optimization proceeds from $\mathbf{p}_0 = [1.5, 1.5, 5.0, 0.5]$ , converging in under 14 steps to an aligned, drag-minimized configuration. The pipeline performs each surrogate evaluation in $\lesssim 0.1$ s on GPU, compared to several minutes per direct CFD run. Although high-fidelity adjoint CFD was not implemented for this toy case, these results suggest that the surrogate approach can accelerate exploration of design spaces by orders of magnitude within the surrogate’s validity region (Rehmann et al., 13 Nov 2025).

6. Limitations and Extensions

Both incarnations of ShapePipe are subject to domain-specific limitations. For the weak-lensing pipeline, systematics such as residual PSF errors, selection biases, and contamination in crowded fields must be continually assessed as data volumes increase. The modular design mitigates these risks by supporting rapid integration of improved methods (e.g., deep-learning–based deblenders, multi-band photometric redshift modules), with ongoing development to incorporate Gaia-driven star masks and meta-detection engines (Farrens et al., 2022).

In the surrogate-based optimization pipeline, model risk arises from surrogate approximation error. Optimized shapes require validation against high-fidelity simulation/experiment before deployment in real engineering contexts. The up-front cost in generating training data and fitting the surrogate is substantial (hours–days, depending on problem complexity), and generalization is limited to the training domain; extrapolation beyond sampled parameters can fail catastrophically. Potential future work includes multi-fidelity training, richer physics models for turbulent flows, more general shape parametrizations (e.g., B-splines), and replacement of the U-Net surrogate with architectures such as graph neural solvers, Fourier neural operators, or transformers to capture long-range dependencies and mesh-level sensitivities (Rehmann et al., 13 Nov 2025).

7. Distinction Between ShapePipe Systems

It is critical to note that the weak-lensing ShapePipe (Farrens et al., 2022, Guinot et al., 2022) and the surrogate-based optimization ShapePipe (Rehmann et al., 13 Nov 2025) are unrelated except by name. The former is oriented to astronomical data analysis, whereas the latter targets engineering design optimization. Each is supported by an independent codebase and community, and cross-domain application is neither reported nor implied in the literature.