LM-A*: Dual-Landmark Heuristic for Large Graphs
- LM-A* is a dual-landmark heuristic framework that generalizes ALT by leveraging polygon inequalities to compute tighter lower bounds for shortest-path search.
- It reduces preprocessing storage from Θ(|V|·|L|) to Θ(|V|+|L|²), enhancing efficiency in memory-critical applications like large road networks.
- Empirical results show LM-A* can cut query times by 30–60% and reduce node expansions by 2–3×, especially in long-range search scenarios.
Landmark Progression (LM-A*)—formally denoted as ALP (A*, Landmarks, Polygon inequalities)—is a dual-landmark heuristic framework that generalizes the classic ALT (A*, Landmarks, Triangle inequalities) approach for shortest-path search on large graphs. By leveraging generalized polygon inequalities over pairs of landmarks, LM-A* achieves significantly tighter heuristic lower bounds while dramatically reducing the preprocessing storage requirements compared to ALT. This makes it especially effective for large-scale road networks and similar environments where memory efficiency and search speed are critical (Jr, 2016).
1. Classical ALT and Its Generalization via LM-A*
The ALT method augments A* search by preselecting a set of landmarks and precomputing all shortest-path distances for every vertex and every landmark . During search, for a source and target , the ALT heuristic is defined as:
which, by the triangle inequality (), provides an admissible lower bound on . LM-A* generalizes this by employing pairs of landmarks and applying polygon inequalities. Instead of bounding by a single triangle, LM-A* forms the quadrilateral 0 and extracts several lower-bound estimators, ultimately taking the maximum to define the dual-landmark heuristic.
2. Dual-Landmark Lower Bound Formulation
Given two distinct landmarks 1 and 2, and a search node 3 with target 4, LM-A* (ALP) derives six admissible lower bounds using reverse triangle and Ptolemaic inequalities over the quadrilateral 5. The explicit lower-bound expressions used are:
- (2a) 6
- (2b) 7
- (2c) 8
- (2d) 9
- (2e) 0
- (2f) 1
Each satisfies 2. The LM-A* heuristic at 3 for target 4 is defined as:
5
This procedure yields a strictly tighter lower bound in many cases, particularly for long-range source-target pairs (Jr, 2016).
3. Preprocessing Strategy and Space Complexity
LM-A* implements a distributed embedding preprocessing paradigm, which minimizes storage:
- Landmark Selection: The graph 6 is partitioned by community detection or similar algorithms. Within each partition, a single landmark is chosen (randomly or per ALT heuristics).
- Distance Storage: For each landmark 7, shortest-path distances 8 are stored only for vertices 9 in its partition (i.e., one per vertex). Additionally, all-pairs shortest-paths 0 are stored for all landmark pairs.
- Memory Complexity: This approach requires 1 space: one per-vertex distance, plus a full 2 matrix for landmark–landmark distances. By contrast, ALT requires 3 entries (every vertex to every landmark), yielding orders-of-magnitude larger storage for moderately sized 4.
4. LM-A* Query Algorithm
Query-time search proceeds as follows:
- Given source 5 and target 6, initialize A* search with 7, 8, using a priority queue on 9-values.
- For each search node 0, the heuristic 1 is computed by considering all pairs 2, where 3 is the partition landmark for 4 and 5. For each such pair, the six lower bounds of (2a–2f) are computed, using the stored 6, 7, 8, and 9 as required.
- The maximum over these (for all 0) is returned as 1.
The query pseudocode in the original work formalizes these steps and notes that in practice, storing small landmark “neighborhoods” per partition can tighten bounds, though the core approach operates with one landmark per partition (Jr, 2016).
5. Theoretical Properties: Admissibility, Consistency, and Complexity
- Admissibility: The dual-landmark heuristic 2 is proven to be admissible; it never overestimates the true shortest-path distance.
- Consistency: In LM-A*, the heuristic may be inconsistent under distributed embedding, i.e., 3 can exceed 4. This necessitates possible node re-openings during A*.
- Computation: Query time per node is A*’s baseline plus 5 or 6 heuristic computations, dependent on the landmark pair strategy.
- Space Complexity: The main data structure requires 7 memory, plus overhead for A* search structures.
A comparison table summarizes core complexity properties:
| Approach | Preprocessing Storage | Query Heuristic Cost |
|---|---|---|
| ALT | 8 | 9 |
| LM-A* | 0 | 1–2 |
6. Empirical Performance and Benchmarks
Benchmarks on DIMACS road networks and synthetic graphs (up to 3 nodes) demonstrate the following:
- For short path queries (<50 hops), ALT and LM-A* have similar performance.
- For longer distances (>100 hops), LM-A* reduces the number of nodes expanded by up to 4–5 and achieves query time reductions of 6–7.
- With 8, ALT requires 9 distance entries for a 0 node graph, while LM-A* only needs approximately 1 entries, representing a two-order-of-magnitude reduction in memory footprint (Jr, 2016).
7. Limitations, Open Questions, and Extensions
Key limitations and avenues for further investigation include:
- Heuristic Inconsistency: Node re-opening due to inconsistency can occur; the impact on performance in very large graphs has not been fully characterized.
- Landmark Selection: Current landmark choices (random, one-per-partition) may not be optimal. The potential of farthest-landmark or other advanced selection methods within distributed embedding remains open.
- Extension to Higher-Order Polygons: While LM-A* applies quadrilateral inequalities, hypothetically employing pentagon or higher-order polygon inequalities could further tighten bounds, but would increase preprocessing and query overhead.
- Parameter Balancing: Determining optimal 2, which governs both 3 space and heuristic evaluation cost, versus query speedup is left for future tuning.
A plausible implication is that further advances in the theory and practice of distributed embedding, polygonal inequalities, and landmark selection may yield even more effective preprocessing/search trade-offs while deepening our understanding of heuristic search on massive graphs (Jr, 2016).