Papers
Topics
Authors
Recent
2000 character limit reached

δ-core Subsampling in Random Trees

Updated 14 February 2026
  • δ-core subsampling is the extraction of a representative core subtree from SARRT models, enabling precise analysis of tree depth and height.
  • This method leverages leaf-spanning extraction and rigorous scaling techniques, with convergence shown in the Gromov–Hausdorff–Prokhorov topology.
  • Analytical tools like Pólya urn couplings, Beta–Binomial models, and moment inequalities establish control over the core substructure’s behavior.

δ-core subsampling does not appear as a term or method in the referenced literature. However, the underlying models explored—especially the class of scaled-attachment random recursive trees (SARRTs) and their scaling limits—contain rigorous methodologies for subsampling and analysis of "core" substructures within large random trees, particularly through the lens of scaling limits, depth, and subtree couplings. This entry systematically reviews these connections under the established framework.

1. Definitions and Underlying Models

The scaled-attachment random recursive tree (SARRT) model, as introduced by Devroye, Fawzi, and Fraiman, defines random trees on the vertex set {0,1,,n}\{0,1,\dots,n\} with i.i.d. random multipliers Xi[0,1)X_i\in[0,1) sampling from law L(X)\mathcal{L}(X). Each new node ii attaches to vertex iXi\lfloor iX_i\rfloor, yielding a tree whose depth and height properties can be exactly characterized for broad classes of XX under minimal regularity conditions (Devroye et al., 2012).

A structurally related but combinatorially distinct family is constructed via inhomogeneous recursive insertion of vertices and leaves: starting from a root v0v_0 and a single leaf, each step selects an edge uniformly, inserts an interior vertex, and, with period \ell, attaches a new leaf. This process interpolates between Rémy’s algorithm (uniform full binary trees, =1\ell=1) and richer inhomogeneous behaviors for general integer \ell (Ross et al., 2016).

2. Subsampling and Core Subtree Construction

Within these stochastic tree models, the notion closest to a "δ-core subsample" arises in the scaling limit analyses via two main processes:

  • Leaf-spanning subtree extraction: For fixed kk, the subtree Tk(n)T_k^{(n)} of a random tree T(n)T^{(n)} spanned by its first kk leaves is analyzed after appropriate rescaling. This subtree serves as a "core" sample of the larger structure.
  • Scaling and convergence: Under suitable scaling (graph distance multiplied by cn1/(+1)c_\ell n^{-1/(\ell+1)}), these subtrees converge almost surely in the Gromov-Hausdorff-Prokhorov topology to subtrees of a real, continuum random tree (Ross et al., 2016).

A "core" in this context represents a fixed or slowly growing subtree capturing essential structural properties while enabling rigorous limit analysis.

3. Scaling Limits and Gromov-Hausdorff-Prokhorov Convergence

Let c=(+1)1/(+1)c_\ell = (\ell+1)^{-1/(\ell+1)}. Given the random tree T(n)T^{(n)} with nn vertices, the rescaled sequence

(V(T(n)),cn1/(+1)dgr,μn)\left(V(T^{(n)}),\, c_\ell n^{-1/(\ell+1)} d_{\mathrm{gr}},\, \mu_n \right)

will converge almost surely to a real tree (T,dlen,μ)(T, d_{\mathrm{len}}, \mu) constructed via an inhomogeneous Poisson line-breaking process with rate (+1)tdt(\ell+1)t^\ell dt, where dlend_{\mathrm{len}} is the intrinsic metric and μ\mu the uniform leaf mass. In this limit, core subtrees Tk(n)T_k^{(n)} (spanned by kk marked leaves) approximate continuum subtrees TkT_k of TT, and the remainder of the tree outside these cores becomes negligible in both mass and height for slowly growing kk [(Ross et al., 2016), Propositions 1.3–1.4].

4. Analytical Techniques for Subsample Control

Key analytic tools for controlling the behavior of core subtrees and their convergence include:

  • Pólya urn couplings: Time-inhomogeneous urn models track the number of discrete vertices mapping to specific continuum arcs, yielding sharp concentration for the "vertex mass" associated with a given arc-length portion of the limiting tree.
  • Dirichlet/Gamma algebra and Beta–Binomial couplings: These arise in the line-breaking and insertion processes, governing the distribution of subtree lengths and vertex allocations.
  • Tail bounds and moment inequalities (e.g., Bernstein–Hoeffding): Employed to control fluctuations in the embedding of discrete trees within the continuum limit.

Through these mechanisms, finite core substructures can be matched and analyzed precisely even as the ambient trees grow without bound (Ross et al., 2016).

5. Connections to Depth, Height, and Minimal Subsamples

In SARRT models, subsampling by core depth or height directly enables precise asymptotics for typical node depth (DnD_n), height (HnH_n), and minimum depth among the newest half of the nodes (MnM_n):

  • Typical depth: E[Dn]μ1logn\mathbb{E}[D_n] \sim \mu^{-1} \log n, with μ=E[logX]\mu = \mathbb{E}[-\log X].
  • Height: HnαmaxlognH_n \sim \alpha_{\max} \log n, for an explicit constant αmax\alpha_{\max} maximizing a rate function.
  • Minimum depth among late-arriving nodes: MnαminlognM_n \sim \alpha_{\min} \log n (Devroye et al., 2012).

These results depend on large deviations, renewal theory, and coupling arguments, which together ensure that the statistical structure of a core subsample (e.g., a δ\delta-fraction of leaves or depth) reflects the macroscopic limits of the entire tree.

6. Special Cases and Extensions

For XUnif[0,1)X \sim \mathrm{Unif}[0,1) (the uniform random recursive tree), the SARRT model recovers the classic result HnelognH_n \sim e \log n for the maximal depth, with all associated constants computable explicitly via Cramér transforms and associated convex duals (Devroye et al., 2012). The methodology naturally extends to power-of-choice random directed acyclic graphs (DAGs) and more complex non-i.i.d. or atomically perturbed settings, generalizing the core subsampling and limit results.

A plausible implication is that, by subsampling core subtrees in more general random tree models, one can obtain universal scaling laws and convergence properties, provided they fit within this renewal- and line-breaking-based analytic framework.

7. Relation to the Literature and Open Problems

The convergence of core subtrees to continuum analogues in the Gromov–Hausdorff–Prokhorov sense (Theorem 1.1 of (Ross et al., 2016)) links these stochastic tree models to continuum random tree (CRT) theory, notably Aldous’s Brownian CRT for the uniform case (=1\ell=1) and its α\alpha-stable generalizations. Further structural properties of the limit are referenced in Curien–Haas ([13], as cited in (Ross et al., 2016)) and extend the reach of these subsampling methods to multifractal and exchangeable structures.

The precise role and generalization of "core" subsampling within broader classes of random combinatorial and metric trees remain an open area, especially regarding universality outside the direct construction mechanisms presented in the referenced works.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to δ-core Subsampling.