The 3rd International Planning Competition: Results and Analysis

Published 29 Jun 2011 in cs.AI | (1106.5998v1)

Abstract: This paper reports the outcome of the third in the series of biennial international planning competitions, held in association with the International Conference on AI Planning and Scheduling (AIPS) in 2002. In addition to describing the domains, the planners and the objectives of the competition, the paper includes analysis of the results. The results are analysed from several perspectives, in order to address the questions of comparative performance between planners, comparative difficulty of domains, the degree of agreement between planners about the relative difficulty of individual problem instances and the question of how well planners scale relative to one another over increasingly difficult problems. The paper addresses these questions through statistical analysis of the raw results of the competition, in order to determine which results can be considered to be adequately supported by the data. The paper concludes with a discussion of some challenges for the future of the competition series.

Abstract PDF Upgrade to Chat

Citations (385)

View on Semantic Scholar

Summary

The paper presents a comprehensive evaluation of both automated and hand-coded planners using robust statistical tests.
The analysis highlights the integration of temporal and numeric constraints via the pddl2.1 language, revealing critical domain-specific challenges.
The study demonstrates varied scalability among planners, with specific insights on TLPlan's and LPG's performance under complex planning conditions.

An Analysis of the Third International Planning Competition

The Third International Planning Competition (IPC), held in conjunction with the AI Planning and Scheduling Conference (AIPS) in 2002, represented a significant empirical examination of the state of automated planning. This paper by Long and Fox offers an extensive evaluation of the competition results, delineating insights into comparative planner effectiveness, domain-specific challenges, and scalability issues within the field of AI planning.

Central to the third competition was the introduction of more sophisticated temporal and numeric constraints, embodied in the pddl2.1 language extension. The overarching aim was to advance research into planning systems capable of temporal reasoning and managing numerically-intensive resources. The competition featured both fully-automated planners and ones enhanced by hand-coded control knowledge, emphasizing performance on benchmark problems across diverse domains and planning levels.

Main Findings

Planner Performance:
- Among the fully-automated planners, LPG exhibited superior performance, particularly in temporal domains, based on its use of local search strategies on plan graphs.
- FF showed exceptional speed in solving strips and numeric problems due to its relaxed plan heuristics.
- For hand-coded planners, TLPlan demonstrated efficiency and comprehensive coverage, exploiting domain-specific control knowledge effectively.
Domain Challenges:
- Fully-automated planners generally found the ZenoTravel and Satellite domains relatively easy across various levels. However, problems within the DriverLog and Rovers domains exhibited greater complexity, significantly challenging planners, particularly at higher levels of temporal and numeric integration.
- The hardnumeric variant of Satellite posed unique challenges; logical goals were trivial, but plan quality depended on data collection, highlighting the advantage hand-coded planners often have in utilizing domain knowledge to optimize goal satisfaction.
Scalability:
- The planners exhibited varying scaling behaviors across different domains and problem levels. TLPlan consistently demonstrated effective scaling relative to other hand-coded solutions.
- Fully-automated planners like FF and LPG showed varying efficacy in scaling with problem complexity, affected predominantly by the planning domain and level.

Implications and Future Directions

The statistical analyses applied in this study, including the Wilcoxon rank-sum matched-pairs test and the multi-judgement correlation tests, have provided a robust framework for comparing planner performance in intricate temporal and numeric domains. The competition outcomes suggest several implications for the planning community:

There is a clear need for integrating more sophisticated reasoning mechanisms to handle temporal and numeric constraints in automated planning, building upon the strengths of heuristic-guided search strategies and plan graph-based methods.
An interesting avenue for future exploration lies in quantifying the effort required to encode control knowledge and determining the tangible benefits of such hand-coded interventions relative to heuristic optimization alone.
Structuring future IPC events to include designed experiments and controlled environments could better facilitate scientific inquiry into planner capabilities and advancements.

By delivering a comprehensive statistical examination of competition results and their implications, this paper significantly informs the planning research community, setting the stage for future advancements in automated and semi-automated planning systems. Through continued iterations of the IPC and related forums, the field can advance toward achieving more sophisticated, scalable, and efficient planning solutions.

Markdown