- The paper introduces yILP, a fully differentiable ILP pipeline that grounds visual data into symbolic logic using clustering and LLM-based predicate invention.
- The paper demonstrates high precision and recall in transforming image representations into logical rules, outperforming state-of-the-art multimodal models on standard benchmarks.
- The paper highlights advances in neuro-symbolic reasoning by enabling predicate invention from pure image data and outlining future directions for multimodal inputs.
Inductive First-Order Rule Learning from Visual Perceptual Data: The yILP Framework
Motivation and Problem Statement
The inductive rule learning paradigm is vital for interpretable and explainable AI, underpinning trustworthy automated systems and enhancing reasoning in multimodal contexts. Traditional inductive logic programming (ILP) frameworks are architected for symbolic, relational data, but in real-world settings, image data increasingly constitutes the constants of knowledge graphs and relational structures. A critical challenge in this domain is symbol grounding: mapping visual representations to symbolic variables for logic reasoning without explicit labels, thus avoiding label leakage. Furthermore, when relational descriptions are absent, predicate invention—generating novel relations to characterize image-based concepts—is required.
Methodological Framework: yILP
The paper introduces yILP, a fully differentiable ILP pipeline that operates on image constants, supporting both cases where relations are predefined (relational image datasets) and where they are absent (pure image datasets, e.g., Kandinsky patterns). The methodological advances are threefold:
- Latent Space Generalization via Differentiable Clustering: Image constants are embedded by pretrained encoders (ViT, VAE), with clustering serving as the generalization function that maps specific constants to cluster centroids corresponding to logical variables. Clustering is implemented through a differentiable approach, optimizing a soft assignment objective on GPU.
- Differentiable Substitution and Rule Induction: The substitution mechanism, also fully differentiable at batch scale, enables tensor operations and efficient GPU utilization. For predefined relations, substitution connects cluster centroids according to positive and negative examples, introducing language bias for forward-chaining when necessary. For undefined relations, variable constraints link the number of logic variables directly to number of clusters, grounding body atoms using cluster centroids.
- Predicate Invention and Semantic Translation via LLMs: yILP invents predicates when relational structure is absent, representing these with placeholders in first-order logic rules. Semantic interpretation of these predicates is accomplished by querying LLMs, which translate the visual semantics underlying cluster variables into natural language format.
Experimental Evaluation
Relational Image Datasets
yILP was evaluated on classical ILP datasets adapted to relational image settings, with constants replaced by MNIST images and relations encoded in text or random strings to prevent label leakage. Encoders used included VAE and ViT. Strong numerical results were obtained: yILP achieves both precision and recall of 1 on standard tasks requiring up to three variables, outperforming state-of-the-art multimodal LLMs (Gemini 2.5 Pro, GPT-5), which failed to induce complete rules in certain tasks when the relation semantics were obfuscated.
For tasks such as Fizz and Buzz, requiring rules with four or six variables, yILP's search space proved intractable under time constraints, resulting in incomplete rule induction.
Pure Image Data: Kandinsky Patterns and Predicate Invention
yILP demonstrates robust predicate invention capability on Kandinsky pattern datasets, where relations among image constants are undefined and must be invented. Classification accuracy surpasses vision-only and propositional rule learners, achieving up to 1.0 accuracy for one-red and one-triangle tasks; performance on the two-pair pattern is slightly lower due to increased combinatorial complexity. yILP successfully recovers and interprets predicate semantics—e.g., "same shape, different color" for two-pair patterns, "color in red" for one-red, and "shape in triangle" for one-triangle—by leveraging LLMs as translators.
Comparative Analysis
yILP outperforms learning strategies such as RIPPER-ViT and C4.5-ViT, which generate propositional rules with limited interpretability in the context of invented predicates. Typical reasoning-capable LLMs show high efficacy for simpler tasks but struggle with rules involving more complex relational structure, such as two-pair patterns. yILP also demonstrates stable performance across hyperparameter configurations and training seeds.
Practical and Theoretical Implications
The fully differentiable architecture of yILP enables seamless, end-to-end training and rule extraction from neural networks, scaling ILP to high-dimensional visual domains. By decoupling rule learning from symbolic label dependence and supporting unsupervised predicate invention, yILP sets a foundation for explainable neuro-symbolic reasoning in domains where explicit annotation is unavailable or impractical. The integration of LLMs for predicate semantic translation bridges perceptual and conceptual reasoning, furthering the practical interpretability of learned rules.
Theoretically, yILP offers a general template for multimodal ILP, attesting to the expressiveness and scalability of differentiable logic programming in settings where symbol grounding and predicate invention are paramount. The approach also exposes current limitations in both rule length induction and the complexity of invented relations, especially as relational arity increases.
Future Directions
Future work should address spatial information reasoning in images, expand yILP to multimodal inputs (e.g., text-image combinations), and introduce tailored language bias or logical templates to enable induction of longer or more complex rules. Additional research is warranted on dynamic clustering approaches, embracing continuous conceptual evolution in image-rich domains.
Conclusion
yILP presents a significant step in neuro-symbolic rule learning from visual perceptual data, harnessing differentiable clustering, substitution, and predicate invention. It achieves high precision and recall across diverse datasets, notably expanding ILP applicability into domains lacking symbolic labels. The framework’s scalability and semantic interpretability via LLMs mark it as a robust foundation for future advances in explainable AI and multimodal reasoning (2604.07897).