CAMELS Suite: Cosmological Simulations

Updated 3 September 2025

CAMELS Suite is a systematically designed library of over 4,000 simulations that vary key cosmological and baryonic feedback parameters in a controlled numerical environment.
The suite employs advanced simulation codes like AREPO, GIZMO, and ASTRID to model diverse feedback mechanisms affecting galaxy formation and halo structures.
Integrating cutting-edge machine learning, CAMELS enables rapid emulation of observables and robust parameter inference for precision cosmology and survey design.

The CAMELS Suite of Cosmological Simulations is a systematically designed and publicly available library of thousands of cosmological hydrodynamical and N-body simulations. Its defining objective is to enable robust joint constraints on fundamental cosmological parameters and baryonic feedback processes by varying them simultaneously in a controlled and reproducible numerical environment. CAMELS targets the intersection of galaxy formation theory, machine learning, and precision cosmology, aiming to marginalize over astrophysical uncertainties that can bias cosmological inferences and to provide versatile datasets for the development of ML-based emulators and analysis pipelines.

1. Structure, Scope, and Parameter Space

The suite comprises more than 4,000 cosmological simulations, divided approximately equally between full hydrodynamic (magneto-hydrodynamic) and dark matter–only (N-body) runs (Villaescusa-Navarro et al., 2020, Villaescusa-Navarro et al., 2022). Each simulation evolves a (25 h⁻¹ Mpc)³ periodic volume to z = 0, tracking the coupled evolution of dark matter, gas, stars, black holes, and, in some models, magnetic fields. Two main simulation codes are employed—AREPO (enabling the IllustrisTNG model) and GIZMO (supporting SIMBA)—with a third ASTRID suite added later, offering a broad range of baryonic feedback prescriptions (Ni et al., 2023, Gebhardt et al., 2023).

All core CAMELS runs systematically sample a high-dimensional parameter space, including:

Cosmological parameters: Ωₘ (total matter density), σ₈ (amplitude of matter fluctuations), and, in some suites, Ω_b (baryon fraction).
Four (or more) baryonic feedback parameters: Typically labeled A_SN1, A_SN2 for stellar feedback (e.g., wind loading and velocity), A_AGN1, A_AGN2 for AGN feedback (e.g., energy or momentum coupling, jet burstiness).

The simulations span at least two decades in each parameter, with Ωₘ ∈ [0.1, 0.5], σ₈ ∈ [0.6, 1.0], and feedback parameters generally varied by factors of a few (Villaescusa-Navarro et al., 2020, Shao et al., 2022). Later expansion sets (e.g., TNG-SB28/28-parameter runs) probe an even larger astrophysical model space (Ni et al., 2023, Lee et al., 15 Mar 2024).

2. Simulation Products, Catalogues, and Data Accessibility

CAMELS produces a wide variety of data products (Villaescusa-Navarro et al., 2022):

Catalogues: Dark matter halos, subhalos, galaxies (with stellar masses, star formation rates, black hole properties), voids.
Summary statistics: Matter and galaxy power spectra, bispectra, Lyman-α absorption spectra, probability distribution functions for density and temperature, halo radial profiles.
Photon lists: X-ray photon event lists for the intracluster/intrahalo medium.
Mock observations: 2D and 3D field maps (gas, HI, B-fields), synthetic photometry in multiple filters (>200 million sources (Lovell et al., 21 Nov 2024)), and properties relevant for multi-wavelength and multi-messenger surveys.
Semi-analytic modeling: Integration with the Santa Cruz Semi-Analytic Model (CAMELS-SAM) for rapid exploration of galaxy formation via post-processed N-body runs (Perez et al., 2022).

All data (~350 TB, 143,922 snapshots) is accessible via a public repository [https://camels.readthedocs.io], in HDF5, binary, and ASCII formats, with detailed documentation and companion analysis scripts (Villaescusa-Navarro et al., 2022).

3. Physical Models and Parameter Interdependencies

Three principal hydrodynamical implementations anchor CAMELS:

Suite	Code	AGN Feedback	Stellar Feedback	Notable Property
IllustrisTNG	AREPO	Kinetic & Thermal (bisec)	Wind energy, velocity	High total feedback energy
SIMBA	GIZMO	Radiative & Jet (kinetic)	Mass loading, wind vel	Large baryon spread via efficient jets
ASTRID	ASTRID	Mild, flexible AGN mode	Comparable to TNG	Weakest baryonic impact on clustering

Feedback energetics and couplings are controlled via the aforelisted parameters (A_SN1, etc.), which modulate the mass loading, wind speeds, jet energies, and burstiness of feedback mechanisms (Shao et al., 2022, Medlock et al., 21 Oct 2024). Notably, feedback effects are highly interdependent: Increasing stellar feedback (A_SN1, A_SN2) delays AGN feedback activation by suppressing black hole growth, with AGN feedback efficiency parameters (A_AGN1, A_AGN2) influencing CGM properties at low halo masses, particularly in SIMBA. Non-linear and sometimes counter-intuitive parameter interactions are consistently observed, demonstrating that the impact of feedback parameters cannot be captured by simple monotonic trends (Medlock et al., 21 Oct 2024, Gebhardt et al., 2023).

4. Baryonic Effects on Halo Structure, Clustering, and Observables

CAMELS enables detailed quantification of baryonic feedback on various cosmic structures and statistics:

Halo Concentration–Mass Relation (c_vir–M_vir): SN feedback (A_SN1, A_SN2) most strongly impacts c_vir at Milky Way–like masses (10¹¹–10¹² M_⊙/h), while AGN feedback affects group/cluster scales (>10¹³ M_⊙/h). Both cosmological (Ωₘ, σ₈) and feedback parameters control concentration non-linearly, with systematic differences between TNG and SIMBA (Shao et al., 2022).
Matter Power Spectrum and Baryon Spread: AGN feedback is the primary process ejecting baryons to large distances, resulting in significant suppression of the nonlinear matter power spectrum. The “baryon spread” metric—defined as the spatial displacement of baryons from their initial positions relative to dark matter—provides a robust predictor of power suppression. In SIMBA, typically 40% of baryons are ejected >1 Mpc/h, compared to ~10–15% in TNG/ASTRID (Gebhardt et al., 2023). Symbolic regression captures the scale dependence of suppression as a function of spread, but cross-model robustness remains a challenge.
Circumgalactic Medium (CGM) and X-ray Properties: Implementation details mediate the coupling efficiency between feedback and baryons. For a given feedback energy, SIMBA's bipolar kinetic outflows transport baryons further, lowering CGM gas fractions and increasing the closure radius, compared to TNG's more isotropic, thermal feedback—despite the latter's higher cumulative energy injection (Medlock et al., 21 Oct 2024). X-ray surface brightness and Lₓ–M★ scaling relations are most sensitive to stellar (rather than AGN) feedback at group mass scales, with enhanced feedback required to match eROSITA observations; such tuning tends to suppress star formation below constraints from stellar–halo mass relations, pointing to a tension between X-ray and stellar observables (Lau et al., 5 Dec 2024).
Baryon Effects on FRB and DM Statistics: CAMELS is used to model the cosmic DM seen by fast radio bursts, encapsulating the scatter with the parameter F ≡ σ_DM z^{1/2}. Strong feedback (SIMBA) produces a narrow, uniform DM distribution, whereas weak feedback (ASTRID) yields higher variance. However, CAMELS's small volumes limit direct comparison with current wide-field FRB observations (Medlock et al., 4 Mar 2024, Guo et al., 28 Jan 2025).

5. Integration of Machine Learning and Emulator Methodologies

A key innovation of CAMELS is its role as the largest hydrodynamical simulation suite tailored explicitly for training, benchmarking, and validating machine learning emulators (Villaescusa-Navarro et al., 2020, Andrianomena et al., 16 Feb 2024).

Emulation and Parameter Inference: Neural networks, random forests, and principal component analysis (PCA)–based regressors are trained on field maps, power spectra, clustering statistics, and catalogues. These models predict observables (e.g., SFRD, clustering, X-ray scalings) as a function of the 6–28 input parameters with high accuracy and speed, facilitating rapid inference—critical for massive survey analysis and for marginalizing over baryonic uncertainties (Lee et al., 15 Mar 2024, Shao et al., 2022, Andrianomena et al., 16 Feb 2024, Delgado et al., 2023).
Data Synthesis and Symbolic Regression: Generative adversarial networks (GANs) generate multifield, multi-channel images consistent with the true statistics and cross-correlations of CAMELS outputs, and symbolic regression yields analytic models relating feedback, baryon displacement, and power suppression (Andrianomena et al., 16 Feb 2024, Gebhardt et al., 2023).
Diffusion Models and Probabilistic Reconstructions: Diffusion generative models reconstruct unbiased dark matter fields from observed stellar density maps, marginalizing over astrophysical uncertainties and generalizing to much larger volumes than seen in training (Ono et al., 15 Mar 2024).
Optimal Transport Analysis: Statistical OT quantifies the shift in the distribution of galaxy properties with parameter changes, revealing that Ωₘ induces the dominant displacement, with secondary effects from SN feedback. However, differences between simulation suites highlight the challenge of translating emulator predictions directly to observed data (Contardo et al., 28 Mar 2025).

6. Applications to Cosmological Parameter Inference and Survey Science

CAMELS has enabled multiple lines of research directly impacting cosmological inference:

Parameter Constraints: Neural networks trained on clustering summary statistics constrain Ωₘ and σ₈ at the 3–8% level even after marginalizing over astrophysical uncertainties (CAMELS-SAM) (Perez et al., 2022). Photometry and colour distributions from synthetic galaxy catalogues provide constraints on σ₈, owing to the connection between clustering, star formation history, and metallicity (Lovell et al., 21 Nov 2024). In simulation-based classification, Ωₘ is detectable at ~10% precision from a single galaxy's features, a result explained by the strong displacement of property distributions with increasing Ωₘ. However, this finding is sensitive to simulation suite differences and may not transfer straightforwardly to observational samples (Contardo et al., 28 Mar 2025).
Emulator-enabled Fisher Forecasts: The use of GPR-based emulators (e.g., CARPoolGP) on zoom-in suites enables reduced-variance predictions for high-mass halo observables, tightening Fisher matrix constraints on feedback parameters by up to an order of magnitude (Lee et al., 15 Mar 2024, Moser et al., 2022).
Multi-wavelength Mock Observations and Survey Preparation: CAMELS supports the generation of realistic multifield maps for planned surveys (e.g., Euclid, LSST, SKA, eROSITA, CMB-S4), with ML models that can marginalize over baryonic systematics (Andrianomena et al., 16 Feb 2024, Lau et al., 5 Dec 2024). Constraints derived from cross-matching CAMELS predictions for CGM X-ray emission and FRB DMs with observations underscore the complex interplay between feedback implementations and observable statistics.

7. Systematic Uncertainties, Limitations, and Future Prospects

Several challenges are inherent to CAMELS and its physical conclusions:

Volume Effects and Cosmic Variance: The relatively small simulation box size (25 h⁻¹ Mpc)³ limits direct comparison to wide-field/massive structures and can bias predictions for quantities sensitive to large-scale modes (e.g., F in FRB DMs) (Guo et al., 28 Jan 2025).
Subgrid and Model Dependence: Systematic differences between IllustrisTNG, SIMBA, and ASTRID produce divergent predictions for baryon spread, CGM properties, and scaling relations, complicating the translation from simulation-inferred to real-universe cosmological constraints (Gebhardt et al., 2023, Medlock et al., 21 Oct 2024, Busillo et al., 2023).
Emulator Generalization and Domain Shift: ML models often do not generalize well across simulation codes due to subgrid prescription differences, requiring careful domain adaptation or marginalization (Lovell et al., 21 Nov 2024, Contardo et al., 28 Mar 2025).
Tension between Observables: Parameter regimes that align X-ray and tSZ predictions with data may simultaneously break agreement with independent constraints such as the stellar–halo mass relation, revealing the need for models that can satisfy multiple observational diagnostics concurrently (Lau et al., 5 Dec 2024). This suggests that current feedback models may be incomplete or that observational systematics remain.

Continuing development in the CAMELS ecosystem is focused on expanding box sizes, parameter spaces, and galaxy formation models (e.g., TNG-SB28, ASTRID); integrating more sophisticated (e.g., generative diffusion) ML tools; and linking simulation outputs directly to multi-wavelength observations to calibrate feedback physics more precisely (Lee et al., 15 Mar 2024, Ono et al., 15 Mar 2024, Lovell et al., 21 Nov 2024).

In summary, the CAMELS Suite constitutes a foundational numerical infrastructure for precision cosmology and galaxy formation physics. By offering thousands of simulations systematically sampling cosmological and baryonic physics parameters—and publicly releasing both raw outputs and derived catalogues—it provides indispensable tools for emulator construction, parameter inference, survey design, and assessment of baryonic systematics in large-scale structure studies. Its comparative framework across multiple galaxy formation models, coupled with direct interfaces to ML methodologies, positions CAMELS as the gold standard for the analysis, marginalization, and interpretation of baryonic effects in next-generation cosmological surveys.