Flagship & GAEA Galaxy Mock Catalogues
- Flagship and GAEA galaxy mock catalogues are advanced synthetic datasets that combine gravity-only N-body simulations with semi-analytic models to reproduce realistic galaxy populations.
- They facilitate precision cosmology by enabling end-to-end survey validation, forward modelling of observables, and testing of cosmological models across cosmic time.
- These catalogues integrate detailed methodologies—from light-cone construction to baryonic process modelling—to support multiwavelength surveys such as Euclid and CSST.
Flagship and GAEA galaxy mock catalogues are major computational products designed to facilitate the interpretation, calibration, and exploitation of next-generation extragalactic surveys including Euclid and the China Space Station Telescope (CSST). The Flagship catalogue is a phenomenological, gravity-only mock built primarily for the Euclid mission with the aim of supporting precision cosmological and weak-lensing analyses on the survey scale. The GAEA-based catalogues are physically motivated, semi-analytic mocks constructed atop detailed merger trees from large N-body runs and tuned to model the baryonic physics central to galaxy formation. Together, these catalogues define the state of the art for synthetic galaxy populations in cosmological contexts, enabling rigorous end-to-end validation of survey pipelines, forward-modelling of observables, and testing of theoretical models across cosmic time.
1. Simulation Frameworks: N-body and Merger Trees
The Euclid Flagship mock catalogue is generated from the “Flagship 2” (FS2) N-body run, with dark matter particles within a Mpc periodic box, achieving a particle mass of and gravitational softening of kpc. Cosmological parameters are set to , , , , and . A full-sky lightcone is produced “on the fly” by recording particle crossings of the observer's past light cone, yielding 31 trillion records and 700 TB of data; the analysed octant contains 16 billion haloes up to (Collaboration et al., 2024).
GAEA-based catalogues, such as those for CSST, are constructed atop the Jiutian N-body runs. Two simulations are employed: Jiutian-1G (, ) and Jiutian-2G (, ), each with particles and Planck2018 cosmology (, , ). Halos are identified with a friends-of-friends (FOF) algorithm, and subhalos/merger trees are extracted using the HBT+ algorithm, which robustly tracks self-bound remnants across cosmic time and mitigates “overmerging” found in position-space-only approaches (Tan et al., 5 Nov 2025).
2. Galaxy Population Assignment: HOD/Abundance Matching versus Semi-Analytic Models
The Flagship catalogue populates halos with galaxies through a combination of Halo Occupation Distribution (HOD) modelling and abundance matching. Centrals are assigned according to for ; satellites follow with and . The cumulative galaxy function is
Luminosities are assigned via de-scattered cumulative luminosity functions (GOODS/SDSS -band) and double power-law or Schechter-like conditional luminosity functions, with explicit scatter for centrals (). Satellite luminosities are determined with a halo-dependent CLF. The approach is calibrated to reproduce low- luminosity functions and two-point clustering, ensuring consistency with observed number densities and spatial correlations (Collaboration et al., 2024).
In contrast, the GAEA-based mock employs the GAEA semi-analytic model to predict the internal and observable properties of galaxies from first principles. Physical processes encoded include radiative gas cooling, an -based star formation law (), chemical enrichment with delayed recycling, redshift-dependent stellar feedback, AGN radio/cold accretion, and satellite stripping. The relevant evolution equations include
The differential equations are parameterized and calibrated against local stellar mass functions (Li & White 2009), – scaling, and HI/quenched fractions, with explicit treatment of orphan galaxies after subhalo disruption and merging timescale estimation via dynamical friction (Tan et al., 5 Nov 2025).
3. Baryonic and Photometric Properties: SEDs, Dust, Emission Lines
Flagship assigns galaxy photometric and structural observables post hoc using template matching and empirical scaling relations. For each mock galaxy, apparent magnitudes are computed in 30 bands (Euclid VIS/NISP, plus broad SED interpolation among 136 COSMOS templates and extinction by Prevot/Calzetti laws). Structural properties include Sérsic bulge and exponential disk parameters (e.g., ), concentration, bulge fraction, triaxial axis ratios, color– relationships, and SFR from UV-based conversions (Kennicutt 1998). Emission lines—H, H, [OII], [OIII], [NII], [SII], [SIII]—are assigned using SFR and metallicity-based prescriptions, including BPT diagram evolution and scatter. Lensing parameters (convergence, shear, deflection) are assigned from all-sky HEALPix “onion” maps at (Collaboration et al., 2024).
The GAEA catalogue computes SEDs and magnitudes using StarDuster, a neural-network-based module trained on radiative transfer (SKIRT) simulations to model dust attenuation as a function of galaxy geometry, mass, and inclination. Key variables include dust mass and dust optical depths , , with “birth-cloud” attenuation for stellar populations Myr old. The attenuated luminosity is expressed as
The model supports full broadband SED prediction (UV through far-IR), including spatially resolved dust geometry effects. The GAEA mocks do not provide emission line predictions by default; these require further post-processing (Tan et al., 5 Nov 2025).
4. Light-Cone Construction and Survey Selection
In Flagship, galaxies are placed within the observer’s past lightcone over one octant to using dark matter particle outputs as they cross the lightcone surface. Halo identification is performed with ROCKSTAR on overlapping “bricks,” followed by galaxy assignment and full-sky projection. The magnitude-limited sample, , includes 3.4 billion galaxies, with completeness to (haloes down to 10 particle limit), exceeding Euclid Wide Survey depth (, 10 point-source) (Collaboration et al., 2024).
The GAEA-based lightcone is constructed via the Blic code, which aligns simulation volumes along the line of sight and interpolates positions, velocities, stellar masses, and magnitudes between 128 output snapshots using cubic or linear schemes. The catalogue supports CSST survey footprints: deep ( deg), ultra-deep ($400$ deg), and variable band selection. Catalog completeness in the deep survey (seven bands) reaches 90% cumulative at (extended to in ultra-deep); selection in alone raises to 2.1 (Tan et al., 5 Nov 2025).
5. Validation, Observational Agreement, and Cosmological Consistency
Validation in the Flagship mock is extensive, encompassing weak lensing, clustering, and internal galaxy statistics. The convergence power spectrum matches Halofit and the Euclid Emulator2 to better than 5% for ; shear 2-point and 3-point statistics agree with analytic predictions to 10%. Redshift-space multipoles and real-space 2PCFs are reproduced within 1, enabling robust extraction of linear bias and Alcock–Paczynski parameters. Galaxy occupation and distribution in clusters reproduces HOD expectations and NFW profile stacking, while cluster luminosity functions and color–magnitude diagrams recover observed bimodality and faint-end slopes. Comparisons with the GAEA model (e.g., cluster velocity dispersions) show agreement at the 10% level for relevant observables (Collaboration et al., 2024).
GAEA mocks are validated against stellar mass functions (Li & White 2009, GAMA) at and , luminosity functions (SDSS ugriz), gas fractions (), half-mass size–mass relations (Shen et al. 2003), and projected 2PCFs versus SDSS. These are reproduced within calibration and systematic uncertainties, except for moderate underprediction at high stellar mass and (attributable to AGN feedback tuning). The mock’s photometric redshift distributions, SED dimming (up to 1 mag in -band from dust), and clustering amplitude are demonstrated to be convergent across simulation resolutions (Tan et al., 5 Nov 2025).
6. Data Accessibility, Applications, and Comparative Properties
The Euclid Flagship catalogue is publicly hosted on CosmoHub in Apache Parquet format, with 5.9 TB in total and $2.5$ TB for the magnitude-limited galaxy sample. An SQL/Hive metadata interface, Python/ROOT/Parquet readers, and sample Jupyter notebooks are provided. The dataset supports precomputed covariance estimation (100 internal jackknife patches) and is integrable with the Euclid Science Ground Segment pipeline for photometric redshift, shape measurement, and slitless spectroscopy validation. Wider applications include cosmological forecasts for DESI, LSST, Roman, and end-to-end survey systematics studies (Collaboration et al., 2024).
The GAEA-based output is oriented toward forward-modeling galaxy evolution, calibration of CSST photometric and SFR indicators, and multiwavelength studies. It is particularly suited to studies requiring physically consistent SEDs, dust, gas, and sizes, enabling joint analyses with Flagship for lensing/clustering covariance. However, emission line predictions and observational systematics (e.g., blending, detection incompleteness) must be appended in downstream processing. Future extensions include high- AGN feedback improvements, line emission modeling, and photometric error pipelines (Tan et al., 5 Nov 2025).
A summary of key differences appears in the table below:
| Catalogue | Simulation Volume/Res. | Galaxy Assignment | Main Strengths |
|---|---|---|---|
| Flagship (FS2) | Mpc, | HOD + abundance matching | Lensing, clustering, completeness, lightcone fidelity |
| GAEA (Jiutian) | $1000$–, – | Semi-analytic (baryonic) | SED realism, physical gas/stars, merger history |
7. Scientific and Methodological Implications
The Flagship and GAEA catalogues represent complementary paradigms in large-scale galaxy mock construction. The Flagship approach—phenomenological, optimized for cosmological signal extraction—yields unparalleled volume, statistical power, and self-consistent lensing properties. The GAEA semi-analytic machinery enables explicit connection between observable quantities and underlying baryonic processes, at some cost in direct cosmological volume and resolution.
This contrast enables targeted usage: Flagship for precision cosmology, survey systematics, and end-to-end mock pipelines; GAEA for galaxy evolution, quenching pathways, scaling relation studies, and multiwavelength observables. Joint analyses, in which GAEA’s physical realism is used to interpret or reweight Flagship mocks, represent a promising avenue for fully leveraging multi-survey data. Future work will likely merge these approaches further through the inclusion of improved baryonic physics in large-volume, lightcone-enabled N-body backgrounds, and more sophisticated treatment of hydrodynamical and AGN feedback effects (Collaboration et al., 2024, Tan et al., 5 Nov 2025).