Halo and Sub-halo Abundance Matching

Updated 16 October 2025

Halo and sub-halo abundance matching is an empirical technique linking observed galaxy properties (e.g., luminosity) to simulated dark matter halo characteristics through monotonic mapping.
It calibrates key parameters using galaxy clustering and photometric selection, highlighting differences between bright optical and HI-selected samples.
The method refines synthetic galaxy catalogs and deepens our understanding of the galaxy–halo connection by incorporating scatter and secondary halo properties.

Halo and sub-halo abundance matching (SHAM) is a class of empirical techniques designed to link the galaxy and dark matter halo populations by exploiting monotonic relationships between observable galaxy properties—such as stellar mass or luminosity—and simulated halo or subhalo properties, typically derived from cosmological N-body or hydrodynamical simulations. The approach is foundational for producing mock galaxy catalogues, interpreting large-scale structure surveys, and constraining models of galaxy formation. The implementation of SHAM and its variants, as well as its calibration and validation, are closely tied to definitions of halo properties, treatment of satellite galaxies, adopted photometric and sample selection criteria, and clustering calibration methods.

1. Formalism and SHAM Parameterization

SHAM operates under the principle that there exists a monotonic mapping between a galaxy property, 𝔤 (often luminosity or stellar mass), and a halo property, ℎ (commonly a function of present-day virial mass M₀, peak mass Mₚₑₐₖ, or circular velocity Vₚₑₐₖ). The classical abundance matching process matches the cumulative number densities of the galaxy and halo populations:

$N_{\text{gal}}(>\mathfrak{g}) = N_{\text{halo}}(> \bar{h})$

The paper introduces a flexible proxy for halo ranking that interpolates between present virial mass and peak historical mass:

$m_\alpha = M_0 \left( \frac{M_\text{peak}}{M_0} \right)^{\alpha}$

where α tunes the relative weight: α = 0 corresponds to M₀-only ranking, α = 1 to Mₚₑₐₖ-only ranking. A Gaussian scatter, σ_AM, in the log galaxy–halo relation is modeled to account for stochasticity and physical processes not captured in a purely monotonic mapping. The final assignment is performed such that the total number density is preserved, while scatter introduces up- and down-scattering in the rank order.

This formalism directly links the observed luminosity or stellar mass function—with its dependence on photometric choice—to the distribution of (sub)halo properties obtained from simulations.

2. Dependence on Photometric Pipeline and Selection Method

The SHAM results and optimal parameters are sensitive to galaxy property measurement and sample selection:

Photometric Definition: Sérsic magnitudes, derived from fits to a Sérsic profile, systematically include more low-surface-brightness light than Petrosian magnitudes, which truncate at the Petrosian radius. Catalogs derived from Sérsic photometry (e.g., NSA) tend to yield more galaxies at the high-luminosity and fewer at the faint end, altering the shape of the luminosity and stellar mass functions compared to Petrosian-based catalogs (e.g., NYU). This affects SHAM assignments owing to the preservation of cumulative number density.
Sample Selection: Optical samples (from SDSS) are usually r-band selected and display strong large-scale clustering. HI-selected samples (e.g., from ALFALFA and matched to NSA) are dominated by gas-rich, blue galaxies and exhibit substantially weaker clustering. SHAM applied to HI samples requires distinct parameter choices, with fits indicating the need for much larger scatter and often a negative α.

This dependence underscores the necessity of aligning the SHAM parameterization and proxies to the specific photometric and selection attributes of the targeted galaxy sample.

3. Calibration and Inference via Galaxy Clustering

SHAM parameters are calibrated by fitting to observed clustering statistics—specifically the projected two-point galaxy correlation function, %%%%2%%%%. The measured correlation function is computed as:

$w_p(r_p) = 2 \int_0^{\pi_\text{max}} \xi(r_p, \pi) d\pi$

where $\xi$ is estimated using the Landy–Szalay estimator:

$\xi(r_p, \pi) = \frac{DD - 2 DR + RR}{RR}$

Likelihoods are constructed to compare model and observed $w_p(r_p)$ in distinct bins, adjusting α and σ_AM (and potentially a pre-selection redshift cutoff, $z_\text{cut}$ ) to determine best-fit values. Results show that a universal set of SHAM parameters is not attainable; bright optical subsamples are best fit with low scatter and α near unity, whereas faint optical and HI samples require higher scatter and, for HI, negative or near-zero α.

Bayesian evidence and Bayes factors reveal significant tension between best fits for different galaxy types and sample selections, indicating that the galaxy–halo connection is modulated by intrinsic and environmental properties tied to both galaxy photometry and selection.

4. Fitted Parameters Across Sample and Photometry Types

The extracted SHAM parameters display pronounced variations:

Sample Type	Optimal α	Scatter (σ_AM)
Bright SDSS Optical	1.1 – 1.25	0.21 – 0.27 dex
Faint SDSS Optical	0.5 – 1.0*	up to ~0.5 dex*
HI-selected (ALFALFA)	< 0	Up to ~2 dex

*Banana-shaped posteriors for faint optical samples reflect poor constraint and tension with bright-sample fits.

For HI samples, the introduction of $z_\text{cut}$ —a cutoff in the formation redshift of subhalos—lowers the required scatter ( $z_\text{cut} \sim 0.22$ , $\sigma_\text{AM} \sim 0.42$ dex), suggesting that physically motivated filtering in formation time can partially alleviate mismatches but cannot fully reconcile HI and optical selections.

The optimal photometric pipeline in clustering terms is the NYU Petrosian catalog; Sérsic-based results are less consistent across the dynamic range.

5. Goodness-of-Fit, Scatter, and Physical Interpretation

Fitted SHAM models reproduce $w_p(r_p)$ for bright optical samples with high fidelity and relatively low scatter ( $\sim$ 0.2–0.3 dex). For fainter galaxies, a larger scatter is required, indicating increased stochasticity or additional degrees of freedom influencing halo occupation. HI-selected galaxies exhibit even larger scatter, effectively randomizing the galaxy–halo assignment for the most gas-rich systems.

This breakdown in the tightness of the monotonic mapping at low luminosity and for HI-selected samples signals a fundamental limit of mass-only SHAM. In these regimes, other halo properties (formation time, spin, environment) or baryonic processes play substantial roles, warranting incorporation of secondary parameters in the SHAM framework.

6. Mutual Exclusivity and Its Implications

The SHAM parameter regions derived for optically-selected and HI-selected galaxies are essentially non-overlapping: optical samples demand positive α near unity with modest scatter, while HI samples require negative α and large scatter. This mutual exclusivity implies that the galaxy–halo connection is controlled by fundamentally different physics in gas-rich versus optically-selected samples. For HI galaxies, efficient mapping requires accounting for secondary halo properties or formation history, rather than halo mass alone.

This result cautions against the universal application of a single SHAM parameterization across disparate samples and motivates the development of more nuanced, sample-aware models.

7. Systematic Error Control and Extended Applicability

Explicit calibration of SHAM across photometric definitions and selection criteria reveals clear systematic dependencies. For bright optical galaxies, a nearly universal SHAM parameter set is reliable (α ≈ 1.2, σ_AM ≈ 0.25 dex), but extension to faint or HI-rich samples necessitates recognition of increased scatter and shifts in the optimal proxy.

Careful choice of photometric pipeline (e.g., NYU Petrosian for SHAM with luminosity matching), explicit modeling of selection biases, and the possible use of halo formation time as a secondary variable are essential steps in mitigating systematic errors. Flexibility in the halo proxy (α, $z_\text{cut}$ ) extends the valid domain of SHAM and leads to more robust mapping for cosmological applications.

Continued recognition and quantitative modeling of these dependencies will reduce systematic bias in forward modeling of galaxy surveys and strengthen SHAM as a tool for constructing realistic synthetic catalogs tailored to specific survey and science goals.

In sum, subhalo abundance matching as calibrated in this work is highly sensitive to the choice of photometry, the selection function, and physical modeling of the galaxy–halo connection. While highly predictive for bright, optically-selected samples using monotonic mass-based rankings with modest scatter, its applicability to faint or gas-selected galaxies is limited unless additional variables—particularly related to halo assembly—are incorporated. Systematic exploration and calibration across selection methods enables systematic error reduction and expands the utility of SHAM for contemporary and future survey science.

PDF Markdown Chat (Pro)

Whiteboard

Generate a whiteboard explanation of this topic.

Follow Topic

Get notified by email when new papers are published related to Halo and Sub-halo Abundance Matching.