Latent Landmark Graphs Overview

Updated 25 November 2025

Latent landmark graphs are structured representations where nodes denote key landmarks in a latent space and edges encode reachability, ordering, or causal relationships.
They integrate methodologies like farthest point sampling, spectral embedding, and contrastive learning to enhance planning, reinforcement learning, and manifold learning.
Empirical studies show significant improvements in RL success rates and clustering metrics, while also highlighting challenges in scalability, uncertainty, and dynamic adaptation.

Latent landmark graphs are structured representations wherein nodes correspond to discrete or abstracted “landmarks” in an underlying space—typically a latent, high-dimensional, or combinatorial space—while edges capture reachability, ordering, or transition relationships that are crucial for efficient planning, learning, or representation. Originating as a bridge between sample-efficient learning and scalable symbolic planning, latent landmark graphs are now integral to state abstraction in reinforcement learning, robust manifold embedding, and generalized planning across problem instances and domains.

1. Formal Definitions and Structural Principles

A latent landmark graph comprises two core components: a set of landmark nodes and a collection of edges encoding probabilistic, geometric, or causal relationships.

Landmarks as Informative Subsets or Abstract Prototypes: In continuous domains, such as model-based RL or graph embeddings, landmarks are typically points (or small regions) in a learned latent space that either maximize coverage (as in k-means++ or farthest point sampling) or capture statistically salient features (Zhang et al., 2020, Zhang et al., 2023). In planning, they correspond to parameterized predicate schemas that are critical for reaching goals (Pérez-Corral et al., 21 Sep 2025).
Graph Structure: Edges between landmarks encode diverse semantics: reachability costs or Q-values (in RL), weighted orderings or causal preconditions (planning), or spectral diffusion geometry (manifold learning).

Probabilistic and lifted formulations allow latent landmark graphs to generalize across problem instances by embodying domain-invariant structural knowledge rather than instance-specific facts (Pérez-Corral et al., 21 Sep 2025). Formal constructions frequently involve:

Latent embedding functions $\phi : S \to \mathbb{R}^d$ or domain-to-graph mappings
Edge weightings $w_{i,j}$ defined by Q-functions, transition models, or counting-based statistics
Subgraph selection criteria, computationally driven by scalability and task demands

2. Construction Methodologies Across Domains

a) Latent Manifold and Diffusion Embeddings

Neumann eigenmaps, as introduced by Sule & Czaja (2024), extend diffusion map embeddings with landmark subgraphs to yield efficient, stable representations (Sule et al., 10 Feb 2025):

Landmark subset $\delta S$ is sampled from data; residual subset $S$ defines the subgraph.
The Neumann Laplacian $L_S^N = L_S^D - B_S^\top (T_S^\delta)^{-1} B_S$ incorporates boundary information, enabling a reflecting random walk interpretation.
Eigenmap coordinates $\Psi_N(i) = (\phi_2(i), \ldots, \phi_{d+1}(i))$ recover diffusion distances via the spectral embedding of the subgraph, with extensions to held-out data via a Nyström-style out-of-sample condition.

b) Model-Based Reinforcement Learning

In RL, latent landmark graphs encapsulate long-horizon exploration and planning by discretizing the latent space via subgoal prototypes (Zhang et al., 2020, Zhang et al., 2023):

Node Selection: Greedy Latent Sparsification or Farthest Point Sampling create a set of landmarks $\mathcal{L}$ that maximize latent space coverage.
Edge Estimation: Edges are weighted by learned reachability Q-functions $Q_{\rm reach}(z,\ell_j)$ , reflecting expected discounted hitting times or utility between subgoals.
Graph-Based Planning: Graph-search algorithms (Dijkstra, A*, soft-Floyd) over this structure enable global planning via intermediate subgoals, improving sample efficiency and long-horizon control robustness.

c) Symbolic Planning and Problem Generalization

The probabilistic lifted ordering graph (p-LOG) framework generalizes landmark extraction across sets of planning tasks (Pérez-Corral et al., 21 Sep 2025):

Lifted Landmarks: Nodes are parameterized schemas (e.g., $p(x_1,\ldots,x_k)$ ); edges encode probabilistic precedence relations $w(L'_i, L'_j)$ calibrated over multiple task instances.
Instantiating to New Tasks: The two-phase procedure generates and aligns graphs from both initial state and goal state using variable-domain constraints and matching strategies. Probabilistic, reusable substructures yield high recall and precision when adapted to new tasks, setting a new baseline in generalized planning.

3. Learning, Optimization, and Algorithmic Frameworks

The learning of latent landmark graphs employs specialized objectives depending on the application domain:

Reinforcement Learning

Contrastive Subgoal Representations: Joint contrastive and regularization losses induce temporally coherent subgoal embeddings (as in HILL (Zhang et al., 2023)), promoting both local continuity and global discriminability.
Dual Novelty–Utility Measures: For exploration–exploitation balancing, novelty is computed from discounted latent occupancy; utility leverages UVFA-estimated values, enabling strategic subgoal selection along both axes.

Manifold Learning

Spectral Decomposition: Computational efficiency stems from restricting eigenproblems to landmark subgraphs, with resulting embeddings interpolated to the remainder via boundary-aware Nyström extension (Sule et al., 10 Feb 2025).
Reflecting Random Walks: Both theoretical diffusion distances and empirical stability under dataset perturbation depend on the Neumann construction’s preservation of reflecting stochastic geometry.

Generalized Planning

Probabilistic and Lifted Relational Induction: Edges in p-LOG are weighted by empirical occurrence rates over collections of tasks, supporting scalable, probabilistically sound domain abstraction.

4. Empirical Advances and Performance Outcomes

Latent landmark graphs deliver measurable performance improvements across multiple domains:

Task/Domain	Metric	Latent Landmark Graphs	Baseline (Best Prior)
FetchPush (RL)	Success Rate	78% (Zhang et al., 2020)	33% (HER+SAC)
AntMaze (RL)	Success Rate	92% (Zhang et al., 2020)	45% (HER+SAC)
UCI Digits (Emb.)	Clustering NMI	0.85 (Sule et al., 10 Feb 2025)	0.71 (Roseland)
Barman (Planning)	Landmark F1 (avg)	0.93 (Pérez-Corral et al., 21 Sep 2025)	0.72 (True Baseline)
Ant FourRooms (RL)	Sample Eff. & Asymptotic	HILL best (Zhang et al., 2023)	HIRO/HESS (worse)

Latent landmark graphs in RL settings have demonstrated both faster convergence and higher asymptotic success rates. In symbolic planning, p-LOG and p-LGG representations deliver a significant increase in recall (>30 percentage points in some settings) with only minor losses in precision. NeuMaps in manifold learning preserve stability and clustering tightness even under substantial data pruning.

5. Integration with Planning, Learning, and Search Heuristics

Latent landmark graphs integrate flexibly into a wide range of AI pipelines:

Heuristic Extraction: Probabilistic landmark orderings or counts yield powerful admissible heuristics for search (e.g., $h_{LM}$ , LM-cut with probabilistic weights) (Pérez-Corral et al., 21 Sep 2025).
Subgoal Planning and Curriculum: RL agents exploit landmark-based decompositions for hierarchical planning, global search, or dynamic curriculum construction, often without recourse to direct model prediction (Zhang et al., 2020, Zhang et al., 2023).
Diffusion Geometry and Out-of-Sample Extension: NeuMaps support robust, out-of-sample manifold embeddings and enable diffusion-based distances that mirror reflective random walks (Sule et al., 10 Feb 2025).

6. Limitations, Open Challenges, and Future Directions

Despite their advantages, latent landmark graphs present critical open challenges:

Landmark Selection Tradeoffs: The choice of landmark budget $N$ affects coverage, computation, and robustness. Poor embeddings or suboptimal coverage can yield misleading planning graphs (Zhang et al., 2020).
Probabilistic Edge Semantics: Uncertainty in edge weights (e.g., in p-LOG, $w<1$ edges) may produce false positives or negatives, affecting planning guarantees. Balancing precision and recall under uncertainty requires further paper (Pérez-Corral et al., 21 Sep 2025).
Scalability and High-Arity Domains: Both symbolic and latent graphs can become complex in domains with large numbers of objects, predicates, or latent dimensions.
Extension to Continuous and Dynamic Graphs: Extending landmark graphs from discrete to continuous settings, or enabling dynamic adaptation over long horizons, remains a nascent research area.
Theoretical Guarantees: Rigorous analysis of planning optimality and error accumulation in RL settings with imperfect reachability or abstraction quality remains an open area (Zhang et al., 2020, Zhang et al., 2023).

7. Cross-Domain Impact and Theoretical Significance

Latent landmark graphs constitute a unifying abstraction for multi-resolution reasoning across RL, symbolic planning, and geometric machine learning. Empirical gains in both sample efficiency and policy robustness, along with enhanced stability in high-dimensional representations, underscore their utility as foundational structures for scalable reasoning. The hybridization of probabilistic lifting (for generalization), spectral geometry (for efficiency and stability), and learned utility/novelty (for goal-driven exploration) positions latent landmark graphs as a versatile paradigm at the intersection of deep learning and automated planning (Sule et al., 10 Feb 2025, Pérez-Corral et al., 21 Sep 2025, Zhang et al., 2020, Zhang et al., 2023).