LaMMA-P: Methods in Planning, Estimation, & Optimization
- The paper presents a novel integration of LLM-driven task decomposition with formal PDDL-based heuristic planning, achieving over a 100% relative gain in multi-agent robotic coordination success rates.
- LaMMA-P in statistical modeling introduces a likelihood-adaptively penalized estimator that generalizes elastic-net penalties by tuning to local curvature in exponential-family regressions.
- The modified augmented Lagrangian method in LaMMA-P reformulates quadratic-penalty problems using auxiliary variables to preserve convergence and mitigate ill-conditioning.
LaMMA-P denotes a family of methods and systems in several contemporary research domains, comprising: (1) a generalizable multi-agent long-horizon planning framework that combines LLM–driven translation of high-level instructions with classical heuristic search (Zhang et al., 2024), (2) a likelihood-adaptively penalized estimator for model selection and sparse regression (Feng et al., 2013), and (3) a modified augmented Lagrangian approach for unconstrained quadratic-penalty minimization (Neuenhofen, 2018). Each usage is context-specific and technically distinct.
1. LLM–Driven Multi-Agent PDDL Planner (LaMMA-P) (Zhang et al., 2024)
LaMMA-P is a modular long-horizon task planning framework integrating LLMs with automated heuristic PDDL planners for multi-robot instruction following. The system translates a single natural-language command into coordinated, parallelizable sub-plans for a heterogeneous robot team. The pipeline marries the semantic reasoning capacity of LLMs with the efficiency and executability guarantees of formal PDDL-based planners, supporting robust generalization across task categories.
The high-level architecture comprises:
- Precondition Identifier (P): LLM-driven module decomposing NL instructions into subtasks with formalized preconditions and effects.
- Task Allocator: Uses agent skill profiles and LLM guidance to assign subtasks to appropriate agents, maximizing under skill satisfaction constraints.
- Problem Generator (G): LLM-formulated PDDL problem files per agent, leveraging world state, agent domains, and subtask allocations.
- PDDL Validator (V): Syntax and logical consistency checker on PDDL output.
- Fast Downward/Heuristic Planner: Executes planning using relaxed-plan (FF) heuristics and A* over per-agent PDDL problems, with LM-driven repair if plans are invalid or high-cost.
- Sub-Plan Combiner: Integrates valid per-agent plans into a global schedule optimized for parallelism or explicit dependencies.
- Plan-to-Code/Execution: Translates plan schedules into executable commands for either a simulator (AI2-THOR) or real robots.
Prompt engineering relies on deterministic few-shot LLM prompts (temperature=0.0, top_p=1.0) to maintain output reproducibility at all induced reasoning and generation steps.
2. Task Decomposition, Allocation, and Heuristic Planning
Task decomposition proceeds with LLM extraction of subtask sequences, each annotated by required skill sets. A mapping from subtasks to agents is constructed to maximize the probability of successful multi-agent execution. The assignment process prioritizes subtasks with the largest skill requirements and seeks agents with matching capabilities and minimum estimated workload.
Formally,
Planning is performed over per-agent PDDL problems using Fast Downward with the heuristic,
and A* (). Failures due to domain mismatch or high cost invoke an LLM-based fallback planner for prompt-level plan repair.
3. Benchmarking: MAT-THOR Evaluation Suite
MAT-THOR, constructed atop the AI2-THOR simulation environment, provides 70 tasks spanning three categories: compound (2–4 subtasks), complex (≥6, with heterogeneous dependencies), and vague (ambiguous high-level NL). Evaluation metrics include success rate (SR), goal condition recall (GCR), robot utilization (RU), executability (Exe), and efficiency (Eff). Results demonstrate marked improvements over SMART-LLM across all metrics and for all task categories:
| Metric | LaMMA-P (GPT-4o) | SMART-LLM (GPT-4o) | Relative Gain |
|---|---|---|---|
| SR | 0.72 | 0.35 | +105% |
| Eff | 0.74 | 0.54 | +36% |
| GCR | 0.75 | 0.40 | |
| RU | 0.82 | 0.63 | |
| Exe | 0.97 | 0.79 |
Task breakdown reveals that LaMMA-P retains significantly higher SR even on fully “vague” instructions, unlike baseline systems which collapse to zero (Zhang et al., 2024).
4. Generalization and Failure Characteristics
LaMMA-P exhibits notable zero-shot generalization, retaining non-zero SR on instance types not seen in training. Limitations include:
- Static world assumption (full observability, static obstacles).
- Inability to handle instructions invoking out-of-domain skills ("unplug microwave") due to domain model limitations at precondition identification and PDDL generation stages.
- LLM hallucinations or prompt failures under instructions with extreme ambiguity or deep dependency chains.
A plausible implication is that future work must address partial observability and dynamic world adaptation, potentially via vision-language integration or adaptive online planning modules.
5. Likelihood Adaptively Modified Penalties (Statistical LaMMA-P) (Feng et al., 2013)
In the statistical modeling context, LaMMA-P refers to "Likelihood Adaptively Modified Penalties," a nonconvex penalization method for model selection within exponential-family regression:
- The penalty function is derived from the cumulant function of the exponential family:
- Tuning parameters , , and control penalty level, concavity, and location, respectively.
- The penalized likelihood estimator achieves (under regularity) estimation consistency, model selection consistency, and strong asymptotic stability.
- An efficient coordinate-descent algorithm exploits IRLS surrogates and soft-thresholding updates.
This LAMP penalty adapts to the local curvature of the log-likelihood, generalizing the properties of elastic-net, truncated , or sigmoid-type penalties for different GLM settings.
6. Modified Augmented Lagrangian Method for Quadratic-Penalty Problems (Neuenhofen, 2018)
LaMMA-P also denotes the "Modified Augmented Lagrangian Method" for unconstrained minimization problems with large quadratic penalty terms:
To avoid ill-conditioning as , a lifted reformulation introduces auxiliary variables, allowing ALM convergence properties to be preserved at fixed moderate . The resulting method:
- Alternates between Newton-type root-finding for optimality and merit-function based globalization;
- Maintains flexibility to return unconstrained solutions at prescribed penalty levels, circumventing the signature “banana-valley” pathology of traditional penalty methods;
- Demonstrates competitive or superior performance to both classical ALM and pure penalty minimization in numerical benchmarks.
7. Future Directions
Advances in the multi-agent planning LaMMA-P system include integration with vision-language perception to enable partial observability, developing online adaptive re-planning mechanisms for dynamic environments, and automating expansion of domain models by learning new skills/operators through interaction. In the statistical and optimization LaMMA-P contexts, further analysis of theoretical properties and applications beyond current regression or quadratic-penalty settings is warranted.
References
- “LaMMA-P: Generalizable Multi-Agent Long-Horizon Task Allocation and Planning with LM-Driven PDDL Planner” (Zhang et al., 2024)
- “Likelihood Adaptively Modified Penalties” (Feng et al., 2013)
- “Modified Augmented Lagrangian Method for the minimization of functions with quadratic penalty terms” (Neuenhofen, 2018)