Predicate Invention in Logic & AI

Updated 14 February 2026

Predicate invention is the process of automatically synthesizing new predicates to capture latent structures and abstractions, enhancing model compactness.
Methodologies range from symbolic ILP and anti-unification to neural and data-driven approaches that optimize planning efficiency and representation learning.
Applications in autonomous planning, program synthesis, and explainable AI demonstrate significant improvements in task success rates and model interpretability.

Predicate invention is the process of automatically synthesizing new, previously unrepresented predicates that capture latent structure, abstractions, or regularities in observed data or logic programs. It is a foundational operation in Inductive Logic Programming (ILP), Statistical Relational Learning (SRL), neuro-symbolic learning, and modern robot planning, enabling more compact, generalizable, and interpretable models. Predicate invention ranges from logic-based symbol synthesis to grounded, data-driven discovery of visual and physical relations and underlies key advances in autonomous planning, program synthesis, and explainable AI.

1. Formal Definitions and Theoretical Foundations

At its core, predicate invention extends a relational language by introducing new predicate symbols not present in the original background knowledge. Formally, given a set of observed predicate symbols $\mathcal{P}$ and data (typically structured as logic programs, demonstrations, or state–action traces), predicate invention aims to invent a set $\mathcal{Q}$ of new predicate symbols, along with definitions (intensions) in terms of $\mathcal{P}$ and/or other invented predicates. The invented predicates serve as abstractions over complex patterns in the observed data, supporting model compression, program induction, and higher-order generalization (Dumancic et al., 2016, Cropper et al., 2021).

Let $\mathcal{KB}$ be a knowledge base over constants $\mathcal{C}$ and predicates $\mathcal{P}$ . With a language bias $\mathcal{L}$ (set of formulas), the set of "true" formulas is $\mathcal{T} = \{\varphi \in \mathcal{L} \mid \mathcal{KB} \models \varphi\}$ . Predicate invention introduces a disjoint vocabulary $\mathcal{Q}$ , with definitions of the form $h(X_1,\dots,X_{k_h}) \Leftrightarrow \mathrm{Body}_h$ , and seeks to optimize representational or procedural properties, such as compressive loss, coverage, or planning efficiency (Dumancic et al., 2016, Silver et al., 2022).

In ILP, a logic program $H$ contains invented predicates if $\mathrm{ps}(H) \setminus \mathrm{ps}(B \cup E^+ \cup E^-)\neq\emptyset$ , i.e., if $H$ uses symbols not in the background or data (Cropper et al., 2021). The process is necessary if no non-inventive solution exists, and useful if invented-predicate hypotheses are no more complex than the best non-inventive ones.

In neuro-symbolic and representation-learning settings, a predicate can be construed as a latent invariant or composition, e.g., a function $p = \bigcap_{n=1}^N x^{(n)}$ derived by intersection across base representations $x^{(n)}$ (Martin et al., 2018).

2. Methodologies for Predicate Invention

2.1 Symbolic and Inductive Logic Programming Approaches

Classical symbolic approaches (e.g., Meta-Interpretive Learning, ILP, ASP-based systems) perform predicate invention by searching the space of possible logic programs, extending the language with new symbols and definitions. Techniques include:

Generate–test–constrain loops: Enumerate candidate logic programs (with constraints), test on examples, and iteratively add constraints to prune failure modes (generalization/specialization/redundancy) (Cropper et al., 2021). The Poppi system formulates predicate invention as an answer set programming (ASP) problem, systematically introducing invented predicates, supporting recursion and multi-clause programs.
Anti-unification and structural abstraction: Systems like Amao use anti-unification, inductive momentum pruning, and neural multi-space (NeMuS) graph representations to identify shared structure and define invented predicates on-the-fly, enabling efficient clause construction and recursion (Mota et al., 2019).
Autoencoder-theoretic formulations: Predicate invention as a form of autoencoder reconstruction over logical theories. An encoder maps the observed theory to a hidden vocabulary (invented predicates), a decoder reconstructs the original, and learning optimizes reconstruction loss plus regularization over latent predicates and their definitions. This formalism supports parallel invention of multiple predicates and extends to theory revision (Dumancic et al., 2016).

2.2 Data-Driven and Bilevel Planning Approaches

Modern robotic and planning systems increasingly require invention of predicates as lifted, data-grounded classifiers over continuous state spaces, supporting efficient symbolic planning and generalization (Silver et al., 2022, Shao et al., 2 Oct 2025, Yang et al., 22 Nov 2025). Key algorithms include:

Surrogate objective optimization: Predicate sets are optimized not directly for logical expressivity, but for costs such as abstract-plan search time, success probability, or refinement efficiency over demonstrations. Bilevel planners abstract continuous planning with symbolic operators and invented predicates, inducing operators (preconditions/effects) via clustering, grammar-based enumeration, and hill-climbing (Silver et al., 2022).
Gaussian mixture and cluster-based predicate learning: In frameworks like SymSkill, predicates are defined as indicator functions over ellipsoidal (Gaussian) regions in relative-pose spaces, learned via clustering of demonstration endpoints. The invented predicates serve as preconditions or goal conditions for skill compositions, enabling robust real-time planning (Shao et al., 2 Oct 2025).
Contrastive foundation-model-driven invention: SkillWrapper and UniPred leverage pretrained foundation models (LLMs, VLMs) to propose symbolic predicate templates, and empirically refine or validate them against observed transitions. Predicate invention is triggered by failures of explanation, incongruities in precondition/effect, and leads to iterative language augmentation (Yang et al., 22 Nov 2025, Wang et al., 19 Dec 2025).

2.3 Neural and Neuro-Symbolic Approaches

Predicate invention extends to the neural domain, where predicates are emergent representations of invariant structure, learned by mechanisms such as intersective comparison and representational binding (Martin et al., 2018). In DORA, predicates are learned as intersections across neural activation patterns and bound to arguments via oscillatory rhythms. These invented neural predicates enable composition and extrapolation, matching relational reasoning observed in humans.

2.4 Visual and Multimodal Predicate Invention

Recent advancements use vision–LLMs to propose, score, or ground predicates directly from pixels, enabling zero-shot generalization in visual planning tasks (Athalye et al., 2024). The process involves:

Enumerating candidate natural-language-named predicates via GPT-style prompting.
Noisily labeling ground atoms with VLMs.
Hill-climbing or surrogate-loss-based selection (e.g., planning cost, abstract search nodes).
Integrating selected predicates into PDDL-like domains and learning operator schemas.

3. Empirical Performance, Applications, and Impact

Predicate invention is critical for scaling symbolic and neuro-symbolic learning in complex domains, supporting:

Recursive and hierarchical program induction: In ILP, invented predicates enable learning recursive, multi-clause definitions with logarithmic scaling in task parameter size (e.g., $k$ -step planning tasks, list indexing) (Cropper et al., 2021, Mota et al., 2019).
Robot planning and manipulation: Bilevel planners with invented predicates achieve 98–100% success in long-horizon robotic tasks, outperforming bisimulation, inverse planning, and hand-designed abstractions (Silver et al., 2022, Shao et al., 2 Oct 2025, Yang et al., 22 Nov 2025). Planning latency is reduced by factors up to 10, and data efficiency is improved (e.g., >90% success with 50 demos).
Neuro-symbolic RL agents: Predicate invention enables logic-based policies with full symbolic justification and interpretability, achieving or exceeding human-level performance in game environments with minimal background knowledge. Pruning of candidates based on necessity and sufficiency leads to compact, performance-aligned predicate sets (Sha et al., 2024).
Program verification and code synthesis: Inductive synthesis of separation-logic predicates yields property-rich specifications for heap data structures, extending to sorting, balanced trees, and facilitating correct-by-construction code (Yang et al., 20 Feb 2025).
Generalization from pixels and sensory streams: Invented visual predicates support solving tasks of much longer horizon and complexity than seen at training time, by enabling abstract search over symbolic models derived from raw images (Athalye et al., 2024).

4. Limitations, Open Problems, and Design Considerations

Despite substantial progress, predicate invention faces significant open challenges:

Expressivity vs. search complexity: Grammar-based and symbolic systems are bounded by the language bias or predicate template space. Current grammars may not support arbitrary physics, continuous constraints, or multi-object relations without substantial extension (Silver et al., 2022, Shao et al., 2 Oct 2025).
Data dependence and discovery guarantees: Many approaches rely on demonstration coverage or fixed clustering schemes (e.g., number of GMM modes), which may miss semantically essential predicates. Automatic determination of the correct predicate cardinality and expressivity remains an open question (Shao et al., 2 Oct 2025).
Robustness to noise and ambiguity: Vision–LLM labeling is error-prone; robust subset selection and planning-centric objectives are needed to mitigate hallucinations and spurious predicate inclusion (Athalye et al., 2024, Wang et al., 19 Dec 2025).
Integration with controller and skill learning: Most frameworks assume a decoupled pipeline (predicates, operators, skills), precluding mutual shaping between symbolic abstractions and low-level skill policies (Silver et al., 2022).
Planner dependency and transfer: Invented predicates are often tailored to a specific planning algorithm or heuristic (e.g., A* + LMCut), potentially limiting transferability (Silver et al., 2022).

5. Advances in Predicate Invention across Domains

The field has seen a proliferation of new frameworks, architectures, and evaluation methodologies:

System/Framework	Predicate Representation	Learning Modality
Poppi (Cropper et al., 2021)	Symbolic logic programs	ASP-based ILP, failures
Amao/NeMuS (Mota et al., 2019)	Weighted graphs, anti-unification	Inductive clause learning
Sippy (Yang et al., 20 Feb 2025)	Separation-logic, heap predicates	Positive-only ILP
DORA (Martin et al., 2018)	Neural activations, intersections	Hebbian, oscillatory
Bilevel Planner (Silver et al., 2022)	Symbolic predicates & classifiers	Grammar, surrogate planning cost
SymSkill (Shao et al., 2 Oct 2025)	Gaussian clusters in SE(3)	Unsupervised from demos
SkillWrapper (Yang et al., 22 Nov 2025)	VLM-proposed predicates on images	Active, model-driven
UniPred (Wang et al., 19 Dec 2025)	LLM- and NN-proposed, grounded	Bilevel, LLM + SGD
pix2pred (Athalye et al., 2024)	VLM-labeled, hill-climb-selected	Visual planning cost
EXPIL (Sha et al., 2024)	Distance/direction, necessity/sufficiency	Replay-buffer, actor–critic RL

These systems collectively highlight a trend toward modular, planner-aware, and foundation-model-augmented predicate invention, bridging purely symbolic, neural, and multimodal paradigms.

6. Theoretical Guarantees and Evaluation Metrics

Provable guarantees achieved in recent predicate invention frameworks include:

Soundness: Every invented operator or hypothesis is provably supported (i.e., grounded) by observed data (Yang et al., 22 Nov 2025).
Completeness: Positive-only learning algorithms find all most-specific hypotheses covering observed examples, subject to expressivity bounds (Yang et al., 20 Feb 2025).
Bounded regret/probabilistic completeness: With sufficient demonstration coverage, invented predicate models converge to ground-truth abstractions with high probability (Yang et al., 22 Nov 2025).
Optimality (for fixed cost functions): ASP- and reconstruction-based approaches find size-optimal or compressive representations (Cropper et al., 2021, Dumancic et al., 2016).

Evaluation metrics encompass:

Reconstruction error and compression ratio in theory autoencoders (Dumancic et al., 2016).
Abstract search node count, planning cost, and success rates in bilevel and visual world modeling (Silver et al., 2022, Athalye et al., 2024).
Necessity/sufficiency scores for explainability of logical rules (Sha et al., 2024).
End-to-end performance (e.g., robot task success, planning latency) in empirical benchmarks (Shao et al., 2 Oct 2025, Yang et al., 22 Nov 2025).

7. Broader Implications and Future Directions

Predicate invention remains essential for scalable, explainable, and data-efficient learning in symbolic, neuro-symbolic, and robotics domains. Ongoing directions include:

Integrating predicate invention with active data collection and lifelong learning.
Generalizing from structured symbolic predicates to continuous, multimodal, or non-logical abstractions.
Extending foundation-model-driven hypothesis spaces to skills and higher-order operators.
Formalizing transferability of invented predicates across planners, domains, and tasks.
Developing more general frameworks for abstraction learning that unify language guidance, data-driven grounding, and planning-centric evaluation (Wang et al., 19 Dec 2025).

The continuing interplay between logic-based, neural, and multimodal approaches is driving significant progress, but key challenges of expressivity, efficiency, and robustness persist. Predicate invention will remain a central topic in computational abstractions and the construction of interpretable, transferable world models.