SpatialTree: Adaptive Indexing & Cognitive Benchmark

Updated 24 December 2025

SpatialTree is a dual-framework uniting dynamic self-balancing spatial data structures and a cognitive taxonomy for assessing AI spatial intelligence.
It employs locally adaptive octree algorithms to achieve logarithmic update and query times, significantly boosting performance in machine learning tasks.
Its cognitive benchmark categorizes spatial abilities in MLLMs, guiding targeted training and revealing asymmetric transfer across skill levels.

SpatialTree denotes two fundamentally distinct yet complementary frameworks in contemporary research: (1) a class of dynamic, self-balancing, memory-efficient spatial data structures for adaptive metric space management (Ellendula et al., 25 Apr 2025), and (2) a cognitive-science-inspired hierarchical taxonomy and benchmark for dissecting spatial abilities in multimodal LLMs (MLLMs), along with tools for diagnosis and training (Xiao et al., 23 Dec 2025). The term thus spans both efficient multidimensional data indexing for machine learning and a rigorous schema for quantifying, evaluating, and improving spatial intelligence in AI agents. The following sections systematically examine the data-structural and cognitive-ability meanings, situating them within the broader landscape of spatial trees.

1. Self-Balancing SpatialTree: Formal Definition and Algorithmics

The SpatialTree, in the context of adaptive spatial indexing (Ellendula et al., 25 Apr 2025), is a dynamic two-parameter octree supporting fast updates and queries in evolving metric spaces. Let $\mathcal{T}$ be an octree over $n$ points in $\mathbb{R}^d$ , parameterized by:

$K>0$ : node-capacity scale (integer)
$\alpha\ge1$ : balance factor (real)

A $(K,\alpha)$ -admissible octree satisfies, for every node $v$ :

$\frac{K}{\alpha} < |P(v)| \leq \alpha K,$

where $P(v)$ is the set of points in the subtree rooted at $v$ . Splitting occurs when a leaf exceeds $\alpha K$ points, spawning eight children (in 3D); merging occurs when an internal node's occupancy drops below $K/\alpha$ , collapsing it back to a leaf. Local repair is triggered only along update paths, yielding amortized $O(\log n)$ update and query times:

$T_{\rm update}(n) = O(\log n), \qquad T_{\rm query}(n) = O(\log n).$

Points are stored exactly once, with $O(1)$ additional storage per internal node, for total space $S(n) \leq (1+8/\alpha)n + O(1) = O(n)$ .

Insertion, deletion, and $k$ -nearest neighbor (kNN) queries are implemented via local recursion and top-down, bottom-up traversals, as codified in succinct pseudocode (Ellendula et al., 25 Apr 2025).

2. Self-Balancing and Memory Efficiency

The core self-balancing protocol is distinct from static or globally rebalanced trees. Upon each insertion, traversal descends to the relevant leaf, the point is inserted, and the ancestry is checked bottom-up for splits ( $|P(v)|> \alpha K$ ). For deletions, removal from the leaf may trigger bottom-up merges ( $|P(v)| < K/\alpha$ ). This strictly local repair guarantees that only a constant-depth subtree is affected per operation. The resulting tree height satisfies

$\text{height}(\mathcal{T}) = O(\log_\alpha n) = O(\log n),$

ensuring both operational and storage efficiency. Tuning $K$ and $\alpha$ allows for adaptive leaf sizing: large in sparse regions, small in dense regions, preventing both over-splitting and pathological chaining.

The total number of nodes is bounded via

$N_{\rm nodes} \leq \frac{n}{K/\alpha} + \frac{n}{\alpha K} = O\left(\frac{n}{K}\right),$

implying $S(n) = n + O(n/K) = O(n)$ . The amortized cost per split/merge is $O(1)$ , given random or localized updates.

3. Empirical Performance in Machine Learning and Generative Models

SpatialTree, as a dynamic structure, underpins exponential speedups in multiple machine learning domains:

Stein Variational Gradient Descent (SVGD): Complexity is reduced from $O(n^2)$ to $O(n\log n)$ , supporting up to 40 $\times$ faster execution at $n=1,000$ particles while preserving posterior accuracy (Ellendula et al., 25 Apr 2025).
Incremental kNN Classification: Update costs fall from $O(n^2)$ to $O(\log n)$ , with measured speedups of $5.6\times$ – $9.4\times$ for updates and $1.6\times$ – $1.9\times$ for queries, maintaining classification accuracy within 0.2% of static baselines.
Retrieval-Augmented Generation (RAG): Index updates become $O(\log n)$ (from $O(n)$ full rebuilds), enabling semantic search speedups (≈4.2 $\times$ ) versus FAISS batch approaches, with retrieval accuracy on par with industry standards.
Dual-Space Optimal Transport Flow: Enforcing neighborhood consistency in both input and latent spaces yields substantial improvements (reconstruction error reduced by up to 99%, trajectory curvature by 69%, neighborhood Jaccard index up by nearly 90%), with a minor 17% runtime overhead—demonstrating major gains in structural preservation during training.

These improvements are robust to continuous data evolution, obviating periodic tree rebuilds and supporting real-time or online learning workflows.

SpatialTree's design enables several generalizations to mitigate high-dimensional breakdown:

Hybrid KD-Octree: Employs axis-aligned (kd-tree) splits adaptively beyond the domain where octree equal splits become impractical.
R-Tree Extensions: The $(K,\alpha)$ -admissibility extends to rectangle-bounded hierarchies, analogous to classic bounding volume hierarchies.
Learned Parameter Control: Reinforcement learning agents dynamically allocate $(K,\alpha)$ in local tree regions for density- or access pattern-optimized balancing.
Metric-Aware Splits: Splitting directions may follow principal component axes in anisotropic or learned metrics, forming PCA-octrees.

SpatialTree thus unifies design principles from canonical spatial trees (octree, kd-tree, R-tree) but foregrounds strict local balancing and continuous, low-cost adaptability.

5. Cognitive-Science SpatialTree: Taxonomy and Benchmark

SpatialTree also denotes an explicitly hierarchical taxonomy and evaluation protocol for spatial ability in MLLMs (Xiao et al., 23 Dec 2025). This taxonomy, rooted in the cognitive-science traditions of Piaget, Tolman, and Kuipers, partitions spatial intelligence into four levels:

L1—Low-Level Perception: Fast, intuitive extraction of geometric, motion, orientation, relation, and localization cues, comprising 11 atomic sub-abilities.
L2—Mental Mapping: Linking spatial perception to language and memory, with sub-abilities in captioning, semantic labeling, perspective, affordance, and cognitive mapping.
L3—Mental Simulation: Internal reasoning (causal, sequential planning) about spatial-and-physical phenomena (6 sub-abilities).
L4—Agentic Competence: Embodied, interactive skills involving navigation and manipulation (3 sub-abilities).

Benchmarking proceeds via the SpatialTree-Bench, encompassing 41 task sets mapped to 27 sub-abilities, combining prior datasets and model-generated content. Metrics include MCQ accuracy, relative/numeric error, orientation kernel scores, and agentic success rates, aggregated in a bottom-up weighted schema reflecting the hierarchy.

6. Empirical Findings and Training Strategies in the Cognitive SpatialTree

Evaluation of state-of-the-art MLLMs using SpatialTree-Bench reveals structural and transfer properties:

L1 skills are largely orthogonal ( $\bar r \approx 0.1$ among sub-abilities), while L3–L4 are highly interdependent ( $\bar r \gtrsim 0.7$ ). This suggests modular perceptual foundations but tightly coupled high-level reasoning and action abilities.
Supervised fine-tuning (SFT) on single L1 abilities often yields negative transfer to other L1s but positive transfer to higher levels. For example, distance SFT improves geometry (+3.2) but hurts motion (–2.0) and relation (–5.8), while boosting L2 understanding (+2.0) and L4 goal execution (+3.4).
Naïve RL "think" paradigms that reward extended chain-of-thought length optimize L3–L4 (CausalReasoning +5.2) but compromise L1 accuracy (e.g., sequential planning –13.0).
Auto-think strategies—penalizing "over-thinking" on L1 tasks and rewarding it in L3–L4—yield consistent improvements across the hierarchy (e.g., Geometry +3.3, OpenExpl +8.3), preserving "fast" perception and "slow" reasoning capabilities simultaneously.

These findings indicate a fundamentally asymmetric transfer dynamic and motivate differentiated, hierarchy-aware training curricula.

SpatialTree in the data-structural sense can be contextualized among a broader class of spatial hierarchies:

Tree Type	Balancing Mechanism	Update Cost	Query Cost	Memory Efficiency	Tunability	Reference
SpatialTree	Local $(K,\alpha)$	$O(\log n)$	$O(\log n)$	$O(n)$	$K,\alpha$ (adaptive, learned, metric-aware)	(Ellendula et al., 25 Apr 2025)
Kd-Tree	Median/hyperplane	$O(\log n)$ *	$O(\log n)$ *	$O(n)$	Depth, split axis, leaf capacity	(Dmitrenok et al., 2016)
Octree	Fixed octant splits	$O(\log n)$	$O(\log n)$	$O(n)$	Leaf capacity, min cell size	(Dmitrenok et al., 2016, Zhu et al., 2023)
R-Tree	MBR heuristics	$O(\log_M n)$	$O(\log_M n)$	$O(n)$	Node fanout $M$ , split heuristics	(Dmitrenok et al., 2016)
b-Tree (SPH)	Adaptive branching	$O(n)$ – $O(n\log n)$	$O(n)$ – $O(n\log n)$	$O(n)$	Bucket size, rebalance params $\alpha, \beta$	(Cavelan et al., 2019)
Scaled Hilbert	Data-driven refinement	$O(n)$	$O(\mathrm{depth})$	$O(n)$	Bucket size, permutation type	(Jahn et al., 2019)

Performance degrades in very high-dimensions ( $d\gg 3$ ) as the octree's $2^d$ fan-out incurs exponential branching costs and deteriorates distance-based pruning. Remedies include hybridizing with kd-tree logic, R-tree heuristics, or employing metric-adaptive splits. In the cognitive domain, transfer is asymmetric and naive RL training can hurt performance in foundational skills, motivating nuanced, hierarchy-aware optimization.

SpatialTree, in both structural and cognitive senses, serves as an organizing principle: for data, it provides adaptive, logarithmic-complexity maintenance of metric neighborhoods; for ability, it delivers a ground-truth taxonomy and diagnostic tool for the emergence and scaling of spatial intelligence. Both applications foreground adaptability, local balance, and hierarchical composition as central to efficient spatial reasoning and representation (Ellendula et al., 25 Apr 2025, Xiao et al., 23 Dec 2025).