Photometric Redshift Ground Truth Dataset

Updated 8 January 2026

Photometric redshift ground truth datasets are curated collections that merge broadband imaging with high-precision spectroscopic measurements to benchmark redshift estimation methods.
They support diverse methodologies including machine learning, template fitting, and hybrid approaches, crucial for calibrating surveys like Euclid and LSST.
These datasets incorporate rigorous quality checks, detailed photometric calibration, and reproducible train/validation splits to minimize biases and improve redshift predictions.

Photometric redshift ground truth datasets are critical resources for training, benchmarking, and validating algorithms that estimate galaxy redshifts from imaging and photometry instead of direct spectroscopy. These datasets consist of samples where both broadband photometry (or images) and highly accurate redshift measurements (usually spectroscopic, sometimes ultra-deep template-based photo-z’s) are provided, enabling empirical machine learning, template fitting, and hybrid approaches to be rigorously evaluated. The systematic construction and documentation of these datasets underpins advances in both cosmological survey science and in the development of robust, scalable redshift estimation pipelines for upcoming facilities such as Euclid and LSST.

1. Key Survey Sources and Dataset Structures

Photometric redshift ground truth datasets originate from large, high-fidelity surveys combining multi-band imaging and spectroscopic follow-up. Central examples include:

SDSS DR12 Galaxy Sample: Used for deep learning benchmarking with 1 million galaxies (ugriz images and PSF magnitudes, 0 ≤ z_spec < 1), selected from the SDSS DR12 catalog with spectroscopic redshifts providing ground truth (Henghes et al., 2021).
HSC–GalaxiesML: 286,401 galaxies from Hyper-Suprime-Cam PDR2 cross-matched to ≈15 spec-z catalogs (zCOSMOS, DEEP2, VVDS, 3D-HST, etc.), covering z = 0.01–4 with images (g,r,i,z,y), photometry, precise spectroscopic redshifts, and structural parameters (Do et al., 2024).
TransferZ and Combo Multisource Catalogs: The TransferZ dataset, drawn from COSMOS2020, provides up to 35-band template-fitted photometric redshifts (~0.03 precision) for 116,335 galaxies, while GalaxiesML augments with 286,401 high-precision spec-z objects. Merged ground truth sets allow broad color–magnitude coverage and high-redshift fidelity (Soriano et al., 2024).
Euclid Challenge COSMOS Dataset: Calibration and validation sets comprising ~398k sources with Euclid-like 8-band photometry (g,r,i,z,Y,J,H,VIS), spec-z and ultra-deep 30-band photo-z’s as ground truth, spanning z≈0–6 and tailored for weak lensing tomographic sample definitions (Collaboration et al., 2020).
Other notable releases: GOODS-North (deep NIR imaging plus spectroscopic redshifts, 93k catalog) (Hsu et al., 2018), Lockman Hole Deep Field (multiwavelength through UV–MIR, 187k objects, AGN handling) (Fotopoulou et al., 2011), 2dFLenS+KiDS South (50k spec-z, random sampling over 700 deg² to r<19.5) (Wolf et al., 2016), DEEP2/3+CFHTLS/EGS (9,200 galaxies with ugrizY photometry, matched-aperture corrections, and spec-z/grism-z) (Zhou et al., 2019).

Datasets are typically provided in ML-compatible formats: NumPy arrays for images, CSV/FITS/ASCII for catalog information, and explicit splits for train, validation, and test.

2. Sample Selection and Quality Criteria

Rigorous selection criteria ensure the integrity and applicability of ground truth sets:

Redshift coverage: Ranges from low-z (e.g., SDSS main sample peak at z≈0.1) up to z=4 (HSC–GalaxiesML, TransferZ; Lockman Hole AGN).
Spectroscopic precision: σ_z ∼ 2×10⁻⁴ for spec-z in HSC–GalaxiesML, SDSS DR12, DEEP2/3; template-based photo-z ground truths e.g. Laigle et al. (COSMOS) reach σ ~ 0.01–0.03.
Photometry completeness: Criteria such as non-missing ugriz (SDSS), all five HSC bands present, no NULL or flagged entries. Additional signal-to-noise and seeing constraints (e.g., HSC S/N>5, KiDS r<19.5, Euclid VIS S/N>10).
Quality flags: Outlier rejection via explicit spec-z uncertainty cuts, duplicate removal, and morphological quality checks (HSC flags, specz_flag_homogeneous, etc.). Further flagging for blends, saturated objects, edge-affected sources (Do et al., 2024), and X-ray/lensing AGN exclusions (Collaboration et al., 2020).
Train/test splits: Fixed splits for benchmarking (e.g., SDSS DR12 test set at 59,678 galaxies, HSC–GalaxiesML 60/20/20), clear separation of calibration and blinded validation sets (Euclid), and randomization with reproducibility seeds.

Summary of sample sizes and coverage:

Dataset	N_galaxies	Redshift Range	Photometry Bands
SDSS DR12 (Henghes et al., 2021)	1,055,678	0–1	ugriz (images)
GalaxiesML (Do et al., 2024)	286,401	0.01–4	grizy
TransferZ (Soriano et al., 2024)	116,335	0–4	5–35 bands
Euclid Challenge (Collaboration et al., 2020)	~400,000	0–2.6 (tomography), 0–6 (full)	g,r,i,z,Y,J,H,VIS

3. Photometric Inputs, Imaging, and Calibration

Datasets provide deep, uniform photometry and/or imaging:

Imaging cubes: SDSS (32×32×5 pixel stamps, native 0.396″ pix), HSC–GalaxiesML (60×60 pix, 0.168″ pix⁻¹, up/down-sampled), LSST-depth examples (DEEP2/3+CFHTLS/EGS at i<25).
Filters: SDSS ugriz, HSC grizy, CFHTLS ugrizY, Euclid challenge eight bands, Lockman Hole Deep (up to 21 bands from FUV to MIR), GOODS-N JHKₛ.
Photometric calibration: PSF homogenization to the worst seeing (Euclid, GOODS-N, Lockman Hole), matched-aperture corrections (EGS/CFHTLS via Moffat fits), zero-point offsets referenced to external standards (PanSTARRS, 2MASS), and galactic extinction corrections (Zhou et al., 2019).
Additional features: Morphology (e.g., Sèrsic index, axis ratios, R₁/₂), neighbor flags, segmentation maps, image backgrounds.

Key imaging and photometry characteristics:

Survey	Image Size	Pixel Scale	Bands	Calibration
SDSS DR12	32×32×5	~0.396″/pix	ugriz (images)	Native PSF, raw pixel values
HSC–GalaxiesML	60×60, 64×64,127×127	0.168″/pix	grizy	Sky-subtracted, PSF-matched
Euclid/COSMOS	—	0.27″–0.03″/pix	8 bands	PSF-matched to g-band

4. Ground Truth Redshift Measurement and Validation

The “ground truth” label is typically spectroscopic redshift (spec-z), though ultra-deep template-fitted photo-z’s (e.g. COSMOS 30-band, σ ≈ 0.01) supplement where spectroscopy is sparse.

Spec-z acquisition: SDSS pipeline, Keck/DEIMOS (DEEP2/3), aggregated from disparate surveys (HSC–GalaxiesML: DEEP2, VVDS, zCOSMOS, etc.), X-ray/AGN catalogs (Lockman Hole, GOODS-N), grism-based (3D-HST).
Precision: Usually σ_z < few×10⁻⁴; photometric ground truths (TransferZ, COSMOS Laigle) typically σ_z,photo ≈ 0.03.
Coverage limitations: Spectroscopic samples are biased toward bright, emission-line galaxies and low-z; faint/red/AGN samples require template photo-z supplementation. Catastrophic outlier fraction and error distribution must be quantified (details below).
Secondary redshift ground truth: Ultra-deep template photo-z’s from 30–35 bands (Laigle et al., COSMOS; TransferZ) plug gaps in color space, though they are significantly less precise than spectroscopy.
Selection priorities: Some datasets explicitly exclude AGN and stars for weak lensing calibration, while others provide specialized handling for AGN via hybrid templates and morpho–spectroscopic flagging (Lockman Hole, GOODS-N).

5. Error Metrics, Benchmarking, and Model Validation

Benchmarks use standardized metrics, always referenced to ground-truth redshifts:

Mean Squared Error (MSE): $MSE(z, \hat z) = \frac{1}{n} \sum_{i=1}^n (z_i − \hat z_i)^2$
Mean Absolute Error (MAE): $MAE(z, \hat z) = \frac{1}{n} \sum_{i=1}^n |z_i − \hat z_i|$
Normalized Median Absolute Deviation (NMAD): $\sigma_{NMAD} = 1.48 \times \mathrm{median} \left(\left|\frac{z_{phot} - z_{spec}}{1+z_{spec}}\right|\right)$
Bias: $\langle z_{phot} - z_{spec} \rangle$ or normalized $\langle (z_{phot} - z_{spec})/(1+z_{spec}) \rangle$
Catastrophic Outlier Fraction: Fraction $|z_{phot} - z_{spec}| > 0.15(1+z_{spec})$ (Euclid/GOODS-N/Lockman Hole), also $>0.10$ (SDSS).
Coefficient of Determination ( $R^2$ ): $R^2 = 1 - \frac{\sum (z_i - \hat z_i)^2}{\sum (z_i - \bar z)^2}$
Precision: $1.48 \times$ median absolute deviation, alternative to NMAD when normalized.
Probability Distribution Validation (PDZ metrics): Fraction of $P(z)$ enclosed within ±0.05(1+z) and ±0.15(1+z); Probability-Integral-Transform (PIT) histograms; Continuous Ranked Probability Score (CRPS) (Collaboration et al., 2020).

Canonical results (for test set performance):

Dataset/Model	Bias	Scatter (σ)	Outlier Fraction	MSE	Comments
Mixed-InceptionCNN(SDSS) (Henghes et al., 2021)	—	—	—	0.009	1M train, test z<1
GalaxiesML CNN (images+photometry)(Do et al., 2024)	$1.0×10^{-4}$	0.0167	3.8%	0.0615	40,914 test
Photometry-only NN(Do et al., 2024)	$1.1×10^{-3}$	0.0315	6.9%	0.1330	40,914 test; ~2× higher scatter
Euclid/COSMOS Challenge(Collaboration et al., 2020)	—	$\sigma_{NMAD}$ =0.01–0.04	$\eta$ =1.7–10%	—	13 codes, z=0.2–2.6

6. Dataset Integration and Generalization Strategies

Recent advances focus on integrating ground truths—combining deep photometric catalogs (broad and faint galaxy coverage) with limited yet high-precision spectroscopic labels:

Transfer Learning and Adapter Methods: Soriano et al. and Do et al. demonstrate that transfer learning (base network trained on photometric ground truth, fine-tuned on spectroscopic sample) can reduce bias and scatter by factors of 5–7× and 1.5×, respectively, with a 1.3–1.5× reduction in catastrophic outliers (Soriano et al., 2024, Seenivasan et al., 1 Jan 2026). LoRA (Low-Rank Adaptation) further optimizes model adaptation speed and mitigates catastrophic forgetting (Seenivasan et al., 1 Jan 2026).
Mixed or Combo Training Sets: Neural networks trained on merged photometric+spectroscopic ground truth show improved generalization and minimal degradation when evaluated on either domain (Soriano et al., 2024).
Benchmarking Approaches: Random Forest regression, CNNs, hybrid architectures utilizing both images and cataloged photometry are standard. Architectures are always tested on fixed, held-out ground-truth sets using the metrics described above.

A plausible implication is that continuous augmentation and careful integration of complementary ground truths will be essential for meeting Stage IV cosmological systematics requirements (|bias| ≲0.003, scatter ≲0.02, outliers ≲10%).

7. Release Formats, Best Practices, and Ongoing Challenges

Photometric redshift ground truth datasets are released in reproducible, machine learning–ready formats:

Data products: FITS/ASCII/CSV tables with photometry, redshifts, flags; NumPy arrays for imaging; HDF5 for image cubes and splits.
Best practices: Use matched-aperture photometry (magnitude corrections per object size/profile), fix train/val/test splits with reproducible seeds, propagate photometric and model uncertainties for robust PDF estimates, perform k-fold or leave-group-out cross-validation for generalizability (Zhou et al., 2019, Do et al., 2024).
Limitations: Spectroscopic ground truths bias samples toward bright, emission-line galaxies; photometric ground truths expand coverage but reduce redshift precision. Coverage at faint magnitudes (i≳24) and high redshift (z>1.5) is often incomplete (Collaboration et al., 2020, Soriano et al., 2024). AGN and rare populations require tailored hybrid template libraries and flagging.
Data access: Publicly available online (SDSS DR12, HSC–GalaxiesML, Euclid challenge, Lockman Hole catalogs) with detailed column descriptions and update schedules.

The field continues to evolve toward harmonized, multi-survey training sets and reproducible evaluation workflows, addressing both astrophysical and computational generalization challenges. These comprehensive ground truth datasets define the current limits and future directions of empirical photometric redshift estimation.