Clustering Redshifts Technique

Updated 29 November 2025

Clustering redshifts are an observational method that reconstructs redshift distributions by cross-correlating photometric samples with spectroscopic references, bypassing traditional photo-z limitations.
The technique employs angular cross-correlation functions with optimized scale selection to mitigate non-linear biases, achieving sub-percent precision in tomographic redshift calibration.
Robust pipelines integrate simulation-driven mock catalogs, bias corrections, and inversion modelling to support high-precision surveys like Euclid, LSST, and DESI.

Clustering redshifts, often termed "clustering-z" or "clustering-based redshift inference," are an observational methodology to reconstruct redshift distributions of extragalactic sources using spatial cross-correlations with reference samples of known redshift. This approach bypasses reliance on photometric redshift estimators, template SED assumptions, or training-set coverage, instead leveraging the well-established principle that only populations overlapping in redshift will exhibit non-zero angular/matched-field clustering. Clustering-redshift techniques are now critical for the calibration of cosmological survey tomographic bins, with demonstrated sub-percent precision and robust performance for next-generation experiments such as Euclid, LSST, and DESI.

1. Theoretical Formalism

Given a photometric ("unknown") sample $p$ with angular positions but unknown redshifts, and a spectroscopic ("reference") sample $s$ with secure redshifts sliced into narrow bins centered at $z_i$ , the observable is the angular cross-correlation function,

$w_{ps}(\theta) \equiv \langle \Delta_{p}(\hat{n})\, \Delta_{s}(\hat{n}+\theta) \rangle$

where $\Delta_{x}(\hat{n}) = (N_{x}(\hat{n}) - \bar{N}_{x}) / \bar{N}_{x}$ .

Under the Limber and flat-sky approximations, the cross-correlation can be expressed as

$w_{ps}(\theta) = \int dz\, b_{p}(z) n_{p}(z) b_{s}(z) n_{s}(z) \, \xi_{m}(r_{\perp} = \chi(z)\theta; z)$

where

$n_{p}(z)$ and $n_{s}(z)$ are the redshift distributions,
$b_{p}(z), b_{s}(z)$ are scale-averaged galaxy biases,
$\xi_{m}(r, z)$ is the matter correlation function,
$\chi(z)$ is the comoving distance.

For a spectroscopic slice narrow in redshift, $n_{s}(z) \approx \delta_D(z - z_i)$ , leading to

$w_{ps}(\theta; z_i) \approx b_{p}(z_i)\, b_{s}(z_i)\, n_{p}(z_i)\, \xi_{m}(r_{\perp}; z_i)$

and thus

$n_{p}(z_i) \propto \frac{w_{ps}(\theta; z_i)}{b_{p}(z_i) b_{s}(z_i) \xi_{m}(r_{\perp}; z_i)}$

The estimator is typically implemented using the Landy–Szalay formula applied to data–data, data–random, and random–random pairs. To maximize S/N, measurements are integrated over an annulus in projected comoving separation, with weighting $W(r_{\perp}) \propto r_{\perp}^{-1}$ or similar.

2. Calibrated Clustering-Redshift Pipeline

The pipeline consists of several key steps:

Mock Catalog Generation: Simulations such as Flagship2 are used to construct both the photometric sample (e.g., $i_E < 24.5$ for Euclid) and spectroscopic tracers (BOSS-like, DESI-like, Euclid NISP–S) coherently embedded in the same large-scale structure.
Tomographic and Spectroscopic Binning: The photometric sample is split into $N$ photo-z bins (for example, 10 bins with uniform $n$ over $0.2 < z_p < 1.6$ ). Each is cross-correlated with spectroscopic slices of width $\Delta z = 0.05$ in true-z.
Angular Correlation Measurement and Small-Scale Cuts: Cross-correlations are measured over comoving projected radii, e.g., 0.5–10 Mpc. Scales below 1.5 Mpc are excluded to mitigate non-linear 1-halo contributions, which manifest as deviations in the correlation coefficient $r_{ps}(r_{\perp}; z)$ from unity.
Photometric Sample Bias Measurement (M3 Method): Each photo-z bin is subdivided into broader slices ( $\Delta z_p = 0.1$ ), within which $b_p(z)$ is assumed constant. The photometric auto-correlation is measured and corrected using Limber-integrated predictions for the matter correlation (see Eq. 26 in (Doumerg et al., 15 May 2025)), and a low-order polynomial is fit to interpolate $b_p(z)$ across the bin. This controls systematic uncertainties in the photometric bias to ≤1% per bin.
Normalization and Inversion: The recovered $n_p(z_i)$ are normalized such that $\int n_p(z) dz$ matches the total sample or set to unity. Both parametric (shift/stretch) and non-parametric (Gaussian Process with suppression) models are fit to $n_p(z)$ to quantify mean and width with uncertainties.

3. Achieved Statistical and Systematic Precision

On application to realistic survey mocks, the clustering-redshift pipeline achieves:

For $z_p < 1.6$ , the mean redshift in each tomographic bin is constrained to

$\sigma(\langle z \rangle) \lesssim 0.002\;(1+z)$

meeting or exceeding the stringent calibration requirements for Stage-IV lensing and BAO analyses.

The fractional uncertainty in the standard deviation of each $n(z)$ is $\sigma(\sigma_z)/\sigma_z < 0.1$ .
Systematic biases are dominated by:
- 1-Halo Effects: Below 1.5 Mpc, satellite-central galaxy pairs introduce non-linear bias. Excluding these scales restores $r_{ps} \approx 1$ .
- “m-bin” (Dirac-slice) Approximation: Neglect of neighbor slices introduces an offset of order 0.2–1% in $n_p(z)$ . Matrix correction schemes can further minimize this effect.
- Magnification and RSD: Magnification induces a shift in $\langle z \rangle \lesssim 0.0005\,(1+z)$ for $\Delta z=0.05$ ; RSD have smaller impact in Euclid-like bins.

4. Galaxy Bias: Degeneracies, Mitigation, and Perspectives

A central limitation of clustering-redshift techniques is the perfect degeneracy between $n_p(z)$ and the photometric galaxy bias $b_p(z)$ , as the cross-correlation amplitude scales as $b_p\, b_s$ . In the pipeline:

$b_s(z)$ is measured directly from the spectroscopic sample’s auto-correlation.
$b_p(z)$ is estimated within each sub-bin via clustering auto-correlations, assuming mild redshift evolution. Higher-order corrections, including a 3-bin bias matrix formulation, are under development.

Uncertainty in $b_p(z)$ propagates linearly to the normalized $n_p(z)$ and the mean redshift $\langle z \rangle$ , making it the dominant systematic in most regimes. Mitigation relies on fine tomographic slicing, external bias constraints, and direct measurement from the survey itself.

5. Systematics, Validation, and Best Practices

Robust calibration requires careful treatment of several systematics:

Scale Selection: Exclusion of small, non-linear (1-halo) scales ensures unbiased large-scale clustering.
Spectroscopic Tracer Coverage: Sufficient sky density ( $n_s \gtrsim 10^{-5}$ arcmin $^{-2}$ per $\delta z = 0.01$ ) is required for sub-percent mean-redshift errors over wide areas (Scottez et al., 2017).
Tomographic Slicing: Fine subdivisions (by photo-z or color) reduce bias variation within each bin.
Survey Masks and Systematics: Survey masks, random catalogs, and corrections for angular selection function and completeness are essential for unbiased estimators.
Model Fitting: Both parametric (shift/stretch) and non-parametric (e.g., GP) models should be fit to the measured $n_p(z)$ to avoid negative and noisy solutions and to provide robust mean/width extraction.

End-to-end validation on mocks and comparison against available spectroscopic samples are essential to demonstrate error control at the required level.

6. Practical Applications and Extensions

Euclid, LSST, DESI: Clustering-redshift pipelines have been shown to calibrate tomographic redshift bins to within $\sigma(\langle z \rangle) < 0.002 (1+z)$ (Doumerg et al., 15 May 2025, Naidoo et al., 2022).
Cosmological Constraints: Cluster-z calibrated $n(z)$ distributions feed into weak lensing, galaxy-galaxy lensing, and large-scale structure analyses, restoring much of the constraining power of spectroscopic surveys (Kovetz et al., 2016).
Beyond 1.6 in $z$ : Extension to higher redshift will rely on QSO and Lyman-break galaxy tracers (e.g., DESI/eBOSS/4MOST–WST, LSST Deep) (Doumerg et al., 15 May 2025).
Joint Approaches: Hybrid inference combining clustering-z, self-calibration (auto/cross within photo-z), and photometric information further improves accuracy and robustness (Zheng et al., 18 Sep 2024, Sánchez et al., 2018).

7. Future Improvements and Research Directions

Advances underway include:

Higher-Order Corrections: Implementing full neighbor-slice (“m-bin”) matrix corrections to eliminate residual binning-induced bias.
Explicit Magnification/RSD Modeling: Measuring magnification slopes and including them in the estimator (e.g., $\alpha = 2.5s_\mu-1$ ); joint modeling with cosmological lensing and redshift-space distortions.
Covariance Modeling: Coherent modeling of full covariance matrices over large angular scales for robust cosmological parameter propagation.
Multi-Tracer Diagnostics: Simultaneous cross-correlation with multiple spectroscopic populations to detect small-scale conformity and tracer-dependent bias effects.
Individual-Galaxy PDFs: Employing clustering-z in color-magnitude–space cells for object-level $p(z)$ inference; hybrid schemes with machine learning and hierarchical Bayesian inference (Morrison et al., 2016, Sánchez et al., 2018).
Deeper and Fainter Samples: Adapting the methodology for surveys with unmatched photometric/spectroscopic depth, including radio, NIR, and dropout-selected samples (Rahman et al., 2015, Rahman et al., 2015).

Clustering-redshift calibration, with rigorous bias control and end-to-end simulation validation, forms a cornerstone of high-precision calibration for cosmological large-scale structure and lensing experiments in the coming decade (Doumerg et al., 15 May 2025, Naidoo et al., 2022, Cawthon et al., 2020).