GABAR: Graph Attention-Based Action Ranking for Relational Policy Learning (2412.04752v2)

Published 6 Dec 2024 in cs.LG

Abstract: We propose a novel approach to learn relational policies for classical planning based on learning to rank actions. We introduce a new graph representation that explicitly captures action information and propose a Graph Neural Network architecture augmented with Gated Recurrent Units (GRUs) to learn action rankings. Our model is trained on small problem instances and generalizes to significantly larger instances where traditional planning becomes computationally expensive. Experimental results across standard planning benchmarks demonstrate that our action-ranking approach achieves generalization to significantly larger problems than those used in training.

Summary

The paper introduces GABAR, a novel GNN-based method using graph attention and GRUs to directly learn action rankings, offering improved generalization in classical planning.
Experimental evaluation across six domains shows GABAR achieves high coverage and substantial generalization, solving larger instances than trained on with competitive plan quality.
GABAR demonstrates that learning action rankings can enhance scalability and efficiency in classical planning, effectively balancing computational needs with the ability to handle complex domains.

GABAR: Graph Attention-Based Action Ranking for Relational Policy Learning

The paper introduces GABAR, a novel approach for learning action-ranking policies for classical planning through a Graph Attention-Based Action Ranking mechanism. It emphasizes the design of a Graph Neural Network (GNN) architecture with Gated Recurrent Units (GRUs) to improve generalization in relational policy learning, particularly for classical planning domains that are not easily scalable through traditional methods.

Key Contributions and Methodology

GABAR's central innovation lies in its ability to learn action rankings directly rather than computing complex value functions, which are often computationally intensive and less generalizable. This is executed through several notable components:

Graph Representation: A structured graph representation encodes action information distinctly, with nodes representing predicates, objects, action schemas, and a global node integrating graph-level knowledge. This novel representation encapsulates both the relationships among objects and actions and the semantics needed for efficient policy learning.
GNN Architecture: By integrating GNNs with GRUs, the model performs iterative message passing to update node and edge embeddings, capturing both local and global dependencies. This facilitates learning across various problem sizes, a critical aspect for scalable planning.
Action Decoder via GRUs: The model employs a sequential decoder leveraging GRUs to construct complete grounded actions. By using beam search during training, the model improves exploration of action sequences, enhancing its ability to generalize learned policies.

Experimental Evaluation

The effectiveness of GABAR is validated across six classical planning domains, including Blocks World, Gripper, Miconic, Logistics, Visitall, and Grid. These domains encapsulate varying structural complexities and scaling challenges. Experimental results highlight notable strengths in several areas:

Generalization: GABAR demonstrates substantial generalization capability, solving larger problem instances significantly better than those it was trained on. For example, in the Blocks World domain, GABAR achieved 100% coverage on test instances, far surpassing traditional methods.
Coverage and Plan Quality: GABAR achieves high coverage in terms of solving instances, with plan lengths often competitive with or better than those produced by satisficing planners. In practical domains like Miconic and Gripper, the ability to solve problems with a broad range of configurations indicates robust policy learning.

The paper includes an ablation test, evidencing the critical role of global nodes in the GABAR architecture, particularly in complex domains like Logistics, where coverage declines significantly without global nodes. This highlights its effectiveness in long-range planning and maintaining coherent policy decisions across disparate elements.

Implications and Future Directions

GABAR marks a step forward in relational policy learning by demonstrating that direct action ranking can outperform more complex value estimation processes. The proposed architecture effectively balances computational efficiency with scalability, integrating GNNs and GRUs in a manner applicable to larger problem instances and complex planning domains.

Future research could explore several pertinent areas:

Efficiency Improvements: Exploring ways to reduce the computational overhead associated with graph conversions and neural computations could make GABAR more applicable across broader real-time planning scenarios.
Enhanced Representations: Further refinement in capturing more compact but expressive representations could extend the applicability of GABAR to domains with even more intricate relational structures.
Transferability Across Domains: Investigating mechanisms to transfer learned policies across domains with similar structural properties could enhance GABAR’s utility in dynamic multi-environment applications.

In conclusion, GABAR offers a promising methodology for enhancing the scalability and efficiency of classical planning through learning generalizable action-ranking policies. The techniques presented in the paper align well with ongoing advancements in AI technologies aimed at overcoming the traditional limitations of planning in large-scale and complex systems.

PDF Markdown