High-Res Dark Matter Merger Trees

Updated 23 January 2026

High-resolution N-body merger trees are detailed frameworks that capture the complete hierarchical assembly of dark matter halos with fine mass and temporal resolution.
They employ advanced algorithms, such as FoF, SUBFIND, and particle overlap metrics, to reliably identify halos and substructures in simulations.
Integrating these trees with semi-analytic models and machine learning approaches enhances predictions of galaxy formation and supports efficient cosmological analyses.

High-resolution N-body dark matter merger trees represent the evolutionary history of dark matter halos, capturing the full branching hierarchy of their assembly across cosmic time. These trees, generated from cosmological N-body simulations, encode the sequence of halo mergers, accretion events, and tidal interactions, providing a fundamental backbone for semi-analytic models (SAMs) of galaxy formation, mock survey generation, and the analysis of hierarchical structure growth. Precision in both mass and temporal resolution, robust subhalo tracking, and computational scalability are critical for faithfully reconstructing dark matter assembly and connecting simulation outputs to statistical and physical galaxy properties.

1. Simulation Requirements and Data Structures

High-resolution N-body merger trees demand simulations with fine force and mass resolution to resolve both host halos and substructure down to the relevant scale of baryonic processes and galaxy formation. For instance, the Q Continuum simulation achieves $m_p \sim 1.5 \times 10^8 M_{\odot}$ with $8192^3$ particles per $(1300~\mathrm{Mpc})^3$ volume (Rangel et al., 2020), while MillenniumTNG utilizes $4320^3$ DM particles in $(740~\mathrm{Mpc})^3$ at $m_{\mathrm{cdm}} = 1.32 \times 10^8~h^{-1}~M_{\odot}$ (Hernández-Aguayo et al., 2022). Ultra-high-res studies of subhaloes use particle masses of $5 \times 10^3~h^{-1}~M_{\odot}$ and resolve halo masses down to $10^7~h^{-1}~M_{\odot}$ to suppress numerical and discreteness artefacts (Kazuno et al., 2024).

The primary data structure is a directed acyclic graph (DAG) $G = (V,E)$ , where $V$ indexes halo (subhalo) nodes at discrete snapshots and $E$ encodes progenitor-to-descendant links across time. Each node records properties such as masses ( $M_{\mathrm{vir}}$ , $M_{200}$ ), positions, velocities, concentrations, binding energies, and most-bound particle (MBP) indices; auxiliary arrays hold subsample PIDs, orphan tracers, or subhalo hierarchy ladders (Ivkovic et al., 2018, Hernández-Aguayo et al., 2022).

2. Halo and Subhalo Identification

Haloes are typically identified via Friends-of-Friends (FoF) algorithms using a linking length $b=0.2$ –$0.28$ times the mean interparticle spacing, supplemented by phase-space substructure finders that unbind particles based on their binding energies (e.g., SUBFIND, ROCKSTAR, HBT+) (Jiang et al., 2013, Kazuno et al., 2024, Hernández-Aguayo et al., 2022). Subhaloes within FoF groups are represented as distinct, self-bound objects. For hierarchical histories, Dhalo construction groups subhaloes by spatial enclosure and recovers subhalo identities even in cases of artificial FoF merging, enforcing strict hierarchy and monotonic mass assembly (Jiang et al., 2013).

The identification of the most-bound particle and bound core tracking distinguishes between temporary and physical mergers, aids in orphan galaxy recovery, and provides robust markers for branch linking (Ivkovic et al., 2018, Rangel et al., 2020).

3. Merger Tree Construction Algorithms

Particle-Based and Tracer Algorithms

Particle-based methods calculate progenitor-descendant links by maximizing overlap in particle IDs between halo catalogs from adjacent snapshots. Merit functions, such as $M(A,B) = (n_{A \cap B})^2 / (n_{\max}~n_{\min})$ , quantify the link strength (where $n_{A \cap B}$ counts shared tracer particles) (Ivkovic et al., 2018). Most algorithms process snapshots in reverse chronological order to address temporal linking ambiguities and fly-by events (Rangel et al., 2020, Bose et al., 2021).

Subsampled cores (the densest or most-bound $\sim 10\%$ of particles) are used in AbacusSummit and HACC to define robust ancestry associations. In high-performance implementations, MPI-based domain decomposition distributes AMR patches or sub-volumes and orchestrates local/global tracer list exchange (Ivkovic et al., 2018, Rangel et al., 2020).

Data Table: Merger Tree Linking Criteria

Method	Core Link Metric	Descendant Assignment
ACACIA/RAMSES	MBP overlap fraction	Maximize $M(A,B)$ merit function
AbacusSummit	Fractions $f_{donate}$ , $f_{match}$	Largest fraction overlap
Q Continuum/HACC	Particle set intersection $C_{i,j}$	Argmax overlap, merit threshold

Temporary mergers, splits, and fly-bys are handled via non-monotonic mass history cleaning (merging of flagged haloes), orphan particle tracing, or core inheritance (Bose et al., 2021, Ivkovic et al., 2018, Jiang et al., 2013).

Monte Carlo and Machine Learning Approaches

Augmentation algorithms, such as the PCH (Parkinson, Cole & Helly) method, graft high-resolution Monte Carlo branches onto coarse N-body trees, matching progenitor masses within fractional tolerance $\epsilon \sim 0.15$ and reconstructing conditional mass functions with near-N-body fidelity, enabling convergence in the galaxy–halo–mass relation (Benson et al., 2016). Deep learning frameworks (GANs, normalizing-flow graph models) can emulate entire high-resolution merger tree populations, reproducing mass accretion histories, branching ratios, and merger statistics at a fraction of traditional computational cost (Robles et al., 2022, Nguyen et al., 14 Jul 2025).

4. Temporal and Mass Resolution: Convergence and Accuracy

Simulation analysis reveals that mean galaxy property predictions (stellar and baryonic masses, mass functions, SFR histories) converge to $<5\%$ accuracy for $N \geq 128$ uniformly or logarithmically spaced snapshots over $z=20$ –$0$ (Benson et al., 2011). Finer temporal sampling improves the fidelity of merger timing, while mass resolution demands halos with $\geq 20$ –$2000$ particles for robust subhalo/substructure tracking; mass completeness and the suppression of numerical artefacts depend tightly on these thresholds (Kazuno et al., 2024, Jiang et al., 2013, Rangel et al., 2020).

Snapshot grids spaced uniformly in $\ln a$ or $\ln \delta_c$ slightly accelerate convergence at high $z$ compared to uniform $a$ spacing, but all schemes yield comparable results at $z=0$ given sufficient $N$ (Benson et al., 2011).

Empirical validation requires comparing tree statistics, e.g., main branch length distributions, progenitor multiplicity, accretion rate histograms, against established codes such as ROCKSTAR, TreeMaker, Subfind, and AHF (Ivkovic et al., 2018, Bose et al., 2021). Cleaning and quality control (mass monotonicity, snapshot interval tuning, orphan re-linking) suppress nonphysical artefacts.

5. Substructure Tracking, Orphan Galaxies, and Tidal Effects

Particle-level and MBP-based tracking supports the recovery of orphan galaxies—that is, stellar systems whose parent subhalo has dissolved due to over-merging or tidal disruption (Ivkovic et al., 2018, Hernández-Aguayo et al., 2022). Most algorithms retain orphan tracers (either as reserved MBPs or core particle sets) and attempt to re-link them to candidate hosts by binding energy or spatial proximity, improving the continuity of galaxy merger histories and satellite population modeling.

Tidal stripping effects and two-phase mass evolution (accretion $z > z_{\mathrm{peak}}$ , stripping $z < z_{\mathrm{peak}}$ ) have been quantified in ultra-high-res trees, with stripping phases dominating subhalo trajectories. The transition redshift $z_{\mathrm{peak}}$ scales inversely with final subhalo mass, and $>80\%$ of subhaloes in Milky Way-like hosts experience significant mass loss (Kazuno et al., 2024). Orbital dynamics (eccentricity, pericentre distances) extracted from merger tree data align with observed satellite galaxy orbits, validating the standard CDM paradigm.

6. Universal Merger Rate Functions and Semi-Analytic Model Integration

High-resolution trees reveal a universal specific merger rate function $f(\xi)$ for host mass $M$ and mass ratio $\xi$ , invariant over redshift and cosmology for $10^{12} \lesssim M \lesssim 10^{14}~h^{-1}~M_{\odot}$ and $\xi \gtrsim 10^{-2}$ (Dong et al., 2021). This function predicts the rate: $f(\xi) = (a_1\,\xi^{b_1}+a_2\,\xi^{b_2})\,\exp(c\,\xi^{d}),$ with calibrated coefficients. When combined with the universal mass accretion history $M(z)$ , the un-evolved subhalo mass function and the time-dependent merger statistics are fixed, forming the statistical basis for galaxy population synthesis and SHAM (Subhalo Abundance Matching), semi-analytic and empirical modeling frameworks (Dong et al., 2021, Jiang et al., 2013).

Strictly hierarchical merger trees (Dhaloes, branch-split algorithms) and orphan treatment are essential for the physically consistent modeling of ram-pressure stripping, satellite disruption, and dynamical friction, with direct consequences for the reliability of merger-driven star formation and black hole growth (Jiang et al., 2013, Nguyen et al., 14 Jul 2025, Robles et al., 2022).

7. Computational Scalability and Data Management

State-of-the-art simulations produce tens of billions of halo trees and hundreds of terabytes of analysis data (Rangel et al., 2020, Bose et al., 2021). Efficient construction leverages MPI parallelism (per-patch, sub-volume, or halo domains), sparse matrix intersection for particle linkage, and on-the-fly catalogue generation to mitigate I/O costs (Ivkovic et al., 2018, Rangel et al., 2020). For instance, ACACIA's on-the-fly approach within RAMSES adds a 5–15% CPU overhead relative to pure N-body evolution (Ivkovic et al., 2018).

Machine learning models, specifically adversarial and normalizing-flow architectures, generate realistic merger trees at $10^2$ – $10^3\times$ speed-up, requiring only modest GPU resources for both training and inference (Robles et al., 2022, Nguyen et al., 14 Jul 2025).

All merger tree catalogues are typically distributed as HDF5 or similar hierarchical data formats with per-halo pointers, property arrays, and cross-index indices for efficient SAM and mock survey integration (Hernández-Aguayo et al., 2022).

References:

"ACACIA: a new method to produce on-the-fly merger trees in the RAMSES code" (Ivkovic et al., 2018)
"Convergence of Galaxy Properties with Merger Tree Temporal Resolution" (Benson et al., 2011)
"Achieving Convergence in Galaxy Formation Models by Augmenting N-body Merger Trees" (Benson et al., 2016)
"The MillenniumTNG Project: High-precision predictions for matter clustering and halo statistics" (Hernández-Aguayo et al., 2022)
"A deep learning approach to halo merger tree construction" (Robles et al., 2022)
"Emulating Dark Matter Halo Merger Trees with Graph Generative Models" (Nguyen et al., 14 Jul 2025)
"Cosmological evolution of dark matter subhaloes under tidal stripping by growing Milky Way-like galaxies" (Kazuno et al., 2024)
"Constructing high-fidelity halo merger trees in AbacusSummit" (Bose et al., 2021)
"The Universal Specific Merger Rate of Dark Matter Halos" (Dong et al., 2021)
"N-body Dark Matter Haloes with simple Hierarchical Histories" (Jiang et al., 2013)
"Building Halo Merger Trees from the Q Continuum Simulation" (Rangel et al., 2020)