Learning Explanatory Rules from Noisy Data (1711.04574v2)

Published 13 Nov 2017 in cs.NE and math.LO

Abstract: Artificial Neural Networks are powerful function approximators capable of modelling solutions to a wide variety of problems, both supervised and unsupervised. As their size and expressivity increases, so too does the variance of the model, yielding a nearly ubiquitous overfitting problem. Although mitigated by a variety of model regularisation methods, the common cure is to seek large amounts of training data---which is not necessarily easily obtained---that sufficiently approximates the data distribution of the domain we wish to test on. In contrast, logic programming methods such as Inductive Logic Programming offer an extremely data-efficient process by which models can be trained to reason on symbolic domains. However, these methods are unable to deal with the variety of domains neural networks can be applied to: they are not robust to noise in or mislabelling of inputs, and perhaps more importantly, cannot be applied to non-symbolic domains where the data is ambiguous, such as operating on raw pixels. In this paper, we propose a Differentiable Inductive Logic framework, which can not only solve tasks which traditional ILP systems are suited for, but shows a robustness to noise and error in the training data which ILP cannot cope with. Furthermore, as it is trained by backpropagation against a likelihood objective, it can be hybridised by connecting it with neural networks over ambiguous data in order to be applied to domains which ILP cannot address, while providing data efficiency and generalisation beyond what neural networks on their own can achieve.

Authors (2)

Richard Evans (18 papers)
Edward Grefenstette (66 papers)

Citations (461)

View on Semantic Scholar

Summary

The paper introduces a differentiable forward chaining system that learns symbolic rules from noisy data by minimizing cross-entropy loss.
The methodology combines rule templates with gradient descent to achieve data efficiency and maintain performance despite up to 20% mislabeling.
The approach integrates ILP with convolutional networks to process ambiguous inputs, paving the way for advanced AI systems with explicit reasoning.

Learning Explanatory Rules from Noisy Data

The paper "Learning Explanatory Rules from Noisy Data" by Evans and Grefenstette introduces a novel approach in the domain of Inductive Logic Programming (ILP) by developing a Differentiable Inductive Logic framework. Traditional ILP systems, while data-efficient, struggle with noisy or ambiguous inputs and cannot be applied to unsymbolic domains such as raw pixel data. In contrast, neural networks handle noise well but require extensive data and their implicit models are often opaque. This work bridges the gap by integrating the robustness of neural networks with the explicit rule-learning of ILP.

Key Contributions

The central innovation is a differentiable forward chaining system that learns symbolic rules and handles noise in data. This framework converts learning into an optimization problem using gradient descent, specifically minimizing cross-entropy loss, and offers:

Data Efficiency: The system is proficient with minimal training examples due to its rule-based learning.
Robustness to Noise: Unlike traditional ILP, this approach tolerates mislabeled examples, maintaining performance with up to 20% mislabeling.
Handling Ambiguity: It processes unclear data, exemplified by tasks involving raw pixel inputs via integration with convolutional neural networks.

Methodology

The model employs a two-stage process: generating possible clauses from templates and assigning weights to these clauses using backpropagation in a neural network-like setting. It thus combines the symbolic clarity of ILP with the robustness to error typical of neural nets.

A language frame defines possible predicates and constants.
A program template constrains rule generation while allowing recursive and auxiliary predicates.
Each clause's weight impacts its selection during training, implemented through matrix operations to ensure differentiability.

Successful solutions require the model to synthesize complex, sometimes recursive, invented predicates, integrating both extensional and intensional learning. Examples included tasks like implementing graph-based and numerical reasoning through recursive predicate learning, such as the transitive closure of graphs and even/odd number detection.

Experimental Evaluations

The model was evaluated against 20 symbolic ILP tasks and demonstrated efficiency on smaller benchmark tasks, such as arithmetic operations and family tree inferences, which demand understanding recursive dependencies. The learning system demonstrated robust performance, consistently identifying correct logic structures despite noise and ambiguity.

Implications and Future Prospects

The ability to interface with perceptual systems like convolutional networks positions this framework well for future advancements in AI, especially in domains requiring symbolic reasoning integrated with sensory data processing.

Potential future developments include reducing the dependency on hand-engineered templates through automated template discovery, streamlining memory consumption for larger datasets, and extending recursive or multi-predicate learning capabilities further.

By presenting a hybrid approach leveraging neural and symbolic benefits, this work significantly contributes to the field of AI by proposing a scalable, efficient, and robust framework for rule-based learning in noisy environments. This approach paves the way for future AI systems capable of both perceiving and reasoning within complex and dynamic environments.

PDF Markdown

Related Papers

Tweets

https://twitter.com/BrokenStitch/status/1821514064100466959