Papers

Topics

Authors

Recent

View all

Assistant

AI Research Assistant

Well-researched responses based on relevant abstracts and paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses.

Gemini 2.5 Flash

Gemini 2.5 Flash 87 tok/s

Gemini 2.5 Pro 51 tok/s Pro

GPT-5 Medium 17 tok/s Pro

GPT-5 High 23 tok/s Pro

GPT-4o 102 tok/s Pro

Kimi K2 166 tok/s Pro

GPT OSS 120B 436 tok/s Pro

Claude Sonnet 4 37 tok/s Pro

2000 character limit reached

Per-Domain Generalizing Policies: On Validation Instances and Scaling Behavior (2505.00439v1)

Published 1 May 2025 in cs.LG and cs.AI

Abstract: Recent work has shown that successful per-domain generalizing action policies can be learned. Scaling behavior, from small training instances to large test instances, is the key objective; and the use of validation instances larger than training instances is one key to achieve it. Prior work has used fixed validation sets. Here, we introduce a method generating the validation set dynamically, on the fly, increasing instance size so long as informative and feasible.We also introduce refined methodology for evaluating scaling behavior, generating test instances systematically to guarantee a given confidence in coverage performance for each instance size. In experiments, dynamic validation improves scaling behavior of GNN policies in all 9 domains used.

Summary

Overview of "Per-Domain Generalizing Policies: On Validation Instances and Scaling Behavior"

This paper addresses the pivotal challenge of scaling behavior in per-domain generalization within the field of PDDL planning. Previous works have demonstrated the ability to learn successful per-domain generalizing action policies using neural architectures, but the focus here is on the scalability of these policies, meaning their effectiveness when generalized from small training instances to larger test instances.

Dynamic Validation of Policies

The authors propose a novel approach to enhance scalability by dynamically generating validation instances. Unlike prior research that relied on fixed validation sets, which inherently limited the opportunity for size scalability, the methodology in this paper generates validation instances on-the-fly. Instances are increased in size only if they continue to provide informative feedback while remaining feasible. This technique ensures that the validation process continually challenges the learning model, pushing its limits in comprehending larger and more complex instances.

Key factors in the dynamic validation approach include:

A systematic size-scaling scheme, which is applied consistently across the domain to generate instances.
The adoption of CSP encodings to extract generator parameters for creating domain instances of increasing size.

Evaluation Methodology

A refined methodology for evaluating the scaling behavior of learned policies is introduced, addressing the inadequacies of prior IPC test sets which lacked systematic size variance. The newly proposed evaluation technique systematizes instance generation to achieve comprehensive size coverage, thereby allowing fine-grained insight into policy performance across expansion scales.

The methodology is validated across several domains, showcasing consistent improvement in scaling behavior, particularly with Graph Neural Network (GNN) policies subjected to dynamic validation.

Experimental Results

The empirical results across nine IPC'23 domains consistently showed that policies validated via the dynamic set significantly surpassed those validated using fixed sets, either by loss or coverage metrics. Notably, in eight out of nine domains, dynamic validation led to substantial improvements over the other methods.

Specifically, employing dynamic validation enhanced the policies' ability to generalize to larger instance sizes. The innovative validation technique was effective in selecting policies that regularly achieved higher statistical coverage across a broader range of instance sizes than policies validated through conventional methods.

Implications and Future Directions

The findings challenge existing paradigms in per-domain policy learning, pushing towards a dynamic validation approach that could offer widespread benefits in AI planning systems. By dynamically adjusting validation sets, the learning process benefits from increased diversity and complexity in instances an agent encounters during training, which naturally extends to improved scaling behavior.

Future work may involve leveraging insights gained during validation directly back into the training phase, potentially leading to even more robust policy generalization. Additionally, adapting these techniques for reinforcement learning frameworks presents an interesting research avenue, promising improvements beyond the confines of supervised learning paradigms.

In conclusion, this paper provides compelling evidence supporting the shift to dynamic validation processes in per-domain policy development, promising greater scalability and adaptability of learned policies in planning systems.