International Planning Competition (IPC)
- International Planning Competition (IPC) is a benchmark platform that evaluates automated, domain-independent planning systems using standardized tasks and diverse real-world domains.
- It employs rigorous methodologies including evolving PDDL standards and advanced statistical tests to objectively assess planner performance and innovation.
- The IPC drives algorithmic advancement by challenging planners in classical, temporal, probabilistic, and hierarchical tracks across varied practical applications.
The International Planning Competition (IPC) is the primary empirical benchmark and evaluation platform for automated planning systems, especially those targeting domain-independent planning models. Since its establishment in 1998, the IPC series has continuously shaped research directions, benchmark methodology, language standards, and the practical assessment of planners ranging from purely classical STRIPS solvers to advanced temporal, numeric, probabilistic, and hierarchical systems.
1. Evolution and Objectives
The IPC was inaugurated at AIPS'98 to provide a controlled, domain-rich environment for competitive comparison of planning systems under strict, shared conditions. Initial editions focused on classical STRIPS domains (Logistics, Gripper, Movie, Assembly, etc.), quickly expanding to include ADL, numeric fluents, durative actions, and various forms of temporal and resource reasoning as challenges shifted towards more realistic problems (Fox et al., 2011). Across its editions, the IPC's guiding objectives have included:
- Establishing a scientific basis for empirical planner evaluation
- Advancing the modeling language (from PDDL to PDDL2.1, PDDL2.2, and beyond) to accommodate emerging modeling needs, such as time, concurrency, numeric state variables, and derived predicates (Fox et al., 2011, Edelkamp et al., 2011)
- Encouraging innovation in search algorithms, heuristic computation, expressivity, and plan metrics
- Providing a neutral, standardized setting for tracking planner progress and facilitating reproducible research
With its biennial format and focus on benchmark-driven progress, the IPC has become the de facto point of reference for both academic and applied planning communities (Fox et al., 2011).
2. Benchmark Design and Language Standards
Each IPC features a carefully curated collection of benchmark domains designed both to reflect practical applications and to systematically stress different planner capabilities. Over its evolution, these domains have included:
- Classical domains: e.g., Blocksworld, Logistics, Gripper, Satellite, ZenoTravel
- Real-world inspired models: e.g., Airport ground operations, Oil pipeline (Pipesworld), Power Supply Restoration (PSR), UMTS call setup (Edelkamp et al., 2011)
- Safety verification and model checking: Promela-derived Dining Philosophers, Optical Telegraph (Edelkamp et al., 2011)
Key technical advances have paralleled advances in the Planning Domain Definition Language (PDDL):
| IPC Edition | Major PDDL Version | Language Features Introduced |
|---|---|---|
| AIPS’98-00 | PDDL 1.x | STRIPS/ADL, basic typing |
| IPC-3 (2002) | PDDL2.1 | Numeric fluents, metric optimization, durative actions, concurrency (Fox et al., 2011, Gerevini et al., 2011) |
| IPC-4 (2004) | PDDL2.2 | Derived predicates, timed initial literals (Edelkamp et al., 2011) |
| IPC-2020 | HDDL (HTN track) | Hierarchical task networks, procedural decomposition (Pellier et al., 2021, Höller et al., 2019) |
Sophisticated compilation techniques, such as transforming ADL and derived predicates into STRIPS or flattening temporally extended constructs, have enabled broader participation from planners with narrower language support (Edelkamp et al., 2011).
3. Track Structure, Evaluation Criteria, and Statistical Methodology
The IPC typically features multiple tracks, often split by optimization guarantees (optimal vs. satisficing), problem expressivity (classical vs. temporal/numeric), representation paradigms (STRIPS/ADL, derived predicates, HTN), and, in recent editions, stochasticity (deterministic vs. probabilistic planning) (Edelkamp et al., 2011, Bonet et al., 2011, Pellier et al., 2021).
- Satisficing Track: Evaluates planners seeking any valid plan, often optimized for speed or approximate plan quality.
- Optimal Track: Assesses systems that guarantee optimality with respect to an explicit metric (makespan, cost, resource usage).
- Probabilistic Track: Requires solving MDPs or PPDDL-specified problems, stressing policy synthesis mechanisms (Bonet et al., 2011).
- HTN Track: Adds abstraction via methods and decomposition, evaluated with respect to task reduction and plan execution equivalence (Höller et al., 2019, Pellier et al., 2021).
Evaluation metrics have evolved with domain complexity:
- Coverage (instances solved over those attempted)
- Time to solution (with fixed hardware/timeout constraints; e.g., 30 minutes, 1GB per instance)
- Plan quality (domain-specific metrics: makespan, step count, metric cost, reward accumulation)
- For the HTN track, plan decomposition witnesses and correctness with respect to method libraries
Robust statistical methodology is characteristic, including the Wilcoxon matched-pairs rank sum test for comparative planner assessment, paired t-tests for magnitude evaluation, bootstrapping for domain hardness estimation, and rank-correlation analyses for instance difficulty agreement and scaling properties (Fox et al., 2011).
4. Algorithmic Innovations and Planner Architectures
The competitive pressure and standardized testbeds of the IPC have catalyzed a wide array of algorithmic innovations:
- Heuristic forward search: FF (enforced hill climbing with relaxed-plan heuristics), Fast-Downward (causal graph and SAS+ variable-based relaxations), LPG (stochastic local search in temporal action graphs) (Fox et al., 2011, Gerevini et al., 2011, Fox et al., 2011, Edelkamp et al., 2011)
- Macro-based planners: Macro-FF (off-line macro extraction), Marvin (online macro-action synthesis for plateau escape), Macro-operator learning for speedup (Coles et al., 2011, Muppasani et al., 2023)
- SAT and integer programming: Planning as SAT (Blackbox, SATPLAN), integer programming (Optiplan) (Kambhampati et al., 2011, Edelkamp et al., 2011)
- Hierarchical planners: SHOP2 (total/partial order HTN planning), TLPLAN and TALPLANNER (temporal logic control), HDDL-based systems (Kvarnström et al., 2011, Pellier et al., 2021)
- Probabilistic planners: mGPT (MDP solving via heuristic search over deterministic relaxations and RTDP/lrtdp/asp) (Bonet et al., 2011)
Many planners now natively exploit features such as ADL, derived predicates, and timed initial literals, or build specialization layers (either via compilation or direct algorithms) to support richer representations.
5. Impact on Planning Research and Broader Lessons
IPC's rigorous methodology and data dissemination have produced lasting impacts:
- Empirical rigor: Transparent, quantitative analysis enhances the replicability of claims and helps isolate causes of planner failures or strengths (Fox et al., 2011).
- Benchmark diversity and realism: Domains span PSPACE-complete to P-class decision problems, various search-space topologies, and connection densities (Edelkamp et al., 2011).
- Community standards and toolchains: Shared languages, reference validators, grounders, and problem packages lower the barrier for entry and promote cumulative improvement.
- Ontology and knowledge integration: Recent work leverages data from multiple IPCs to construct ontologies, facilitating planner selection or macro extraction to accelerate planning (Muppasani et al., 2023).
- Evaluation of control knowledge: Direct comparison of hand-coded vs. automated strategies (e.g., TLPLAN vs. FF/LPG) clarifies tradeoffs between human effort and plan optimality or speed (Kvarnström et al., 2011).
- Challenges in expressivity: Emergent issues with derived predicates, timed literals, and the cost of compilation versus native support shape future language and tool development (Edelkamp et al., 2011, Edelkamp et al., 2011, Pellier et al., 2021).
6. Hierarchical and Probabilistic Planning Extensions
Modern IPCs reflect the growing importance of representational generality:
- HTN Track: Introduced in 2020, this track evaluates planners on HDDL-based hierarchical domains, supporting both totally and partially ordered methods, and often requiring complex method selection and recursive decomposition management (Pellier et al., 2021, Höller et al., 2019). The formal competition schema encompasses extended PDDL semantics for abstract tasks, method declarations, and task networks. Benchmark design now explicitly demands domains exploiting HTN expressivity (e.g., procedural knowledge, grammar intersection).
- Probabilistic Planning: Tracks focused on PPDDL require planners to synthesize policies optimal under uncertainty in reward, transition, or outcome structure, using MDP solvers bootstrapped by admissible deterministic relaxations (Bonet et al., 2011). These tracks have highlighted scalability bottlenecks, the fragility of heuristic relaxations under ADL, and the need for precise domain fitness measures.
7. Current Challenges and Future Directions
Major technical and organizational challenges for the IPC and the planning community include:
- Balancing benchmark diversity and continuity to track genuine progress
- Avoiding PDDL feature inflation and minimizing the semantic gap between real applications and standardized modeling
- Further developing plan validation and explanation tooling (especially for temporal, numeric, and HTN plans)
- Extending statistical methodology to address multi-objective and anytime scenarios
- Quantifying human-effort (control-code) vs. autarkic behavior in hand-crafted and domain-independent systems (Fox et al., 2011)
- Addressing scalability (state-space/representation blowup) in domains with complex topology or high fact/action-connectivity (Edelkamp et al., 2011)
- Driving innovation in planners that combine hierarchy, uncertainty, and continuous dynamics in a unified framework
The IPC remains central to these efforts—both as a technical crucible and as the primary driver of comparative empirical rigor in automated planning.