Papers
Topics
Authors
Recent
Search
2000 character limit reached

Differentiable FOL with LTNs in Neuro-Symbolic AI

Updated 25 February 2026
  • The paper introduces a differentiable framework for FOL with LTNs, integrating neural models with logical reasoning.
  • It leverages fuzzy logic and log-space techniques to ensure stable optimization and effective aggregation of logical constraints.
  • Applications include zero-shot image classification and medical segmentation, demonstrating improved performance in neuro-symbolic tasks.

Differentiable first-order logic with Logic Tensor Networks (LTNs) unifies learning and reasoning by grounding the syntax and semantics of first-order logic—constants, variables, functions, predicates, connectives, and quantifiers—as differentiable operations in neural architectures. Through fully differentiable "Real Logic" semantics, LTNs enable the optimization of logical knowledge bases and support applications in neuro-symbolic AI such as zero-shot image classification and medical semantic segmentation. PROTOtypical LTNs (PROTO-LTNs) extend this paradigm, providing parameter-efficient, prototype-based class groundings suitable for zero and few-shot learning. This approach underpins a broad range of tasks where integrating symbolic knowledge and data-driven learning is essential.

1. Real Logic and LTNs: Semantics of Differentiable FOL

Logic Tensor Networks implement "Real Logic," mapping all elements of classical first-order logic to real-valued, continuous, and differentiable functions (Badreddine et al., 2020). Groundings are defined as follows:

  • Constants/Variables: Each symbol xix_i is mapped to G(xi)∈Rn\mathcal{G}(x_i) \in \mathbb{R}^n.
  • Function Symbols: Each kk-ary function ff is mapped as G(f):(Rn)k→Rn\mathcal{G}(f): (\mathbb{R}^n)^k \to \mathbb{R}^n (commonly a neural network).
  • Predicate Symbols: For each kk-ary predicate PP, the grounding is G(P):(Rn)k→[0,1]\mathcal{G}(P): (\mathbb{R}^n)^k \to [0,1], typically using neural tensor networks or MLPs.

The connective layer replaces classical Boolean operators with differentiable fuzzy logic:

G(ϕ∧ψ)=T(G(ϕ),G(ψ)), G(ϕ∨ψ)=S(G(ϕ),G(ψ)), G(¬ϕ)=1−G(ϕ), G(ϕ→ψ)=S(1−G(ϕ), G(ψ)).\begin{aligned} \mathcal{G}(\phi\land\psi) &= T(\mathcal{G}(\phi),\mathcal{G}(\psi)), \ \mathcal{G}(\phi\lor\psi) &= S(\mathcal{G}(\phi),\mathcal{G}(\psi)), \ \mathcal{G}(\neg\phi) &= 1 - \mathcal{G}(\phi), \ \mathcal{G}(\phi\to\psi) &= S(1-\mathcal{G}(\phi),\,\mathcal{G}(\psi)). \end{aligned}

Commonly, T(a,b)=abT(a,b) = a b (product t-norm) and G(xi)∈Rn\mathcal{G}(x_i) \in \mathbb{R}^n0 (probabilistic sum) (Badreddine et al., 2020, Martone et al., 2022).

Quantifiers are interpreted as aggregators:

  • G(xi)∈Rn\mathcal{G}(x_i) \in \mathbb{R}^n1 ("for all") as a generalized G(xi)∈Rn\mathcal{G}(x_i) \in \mathbb{R}^n2-mean close to a product,
  • G(xi)∈Rn\mathcal{G}(x_i) \in \mathbb{R}^n3 ("there exists") as a generalized mean or smoothed max.

Letting G(xi)∈Rn\mathcal{G}(x_i) \in \mathbb{R}^n4 be groundings over the quantification domain, quantifier aggregation can be:

G(xi)∈Rn\mathcal{G}(x_i) \in \mathbb{R}^n5

A knowledge base G(xi)∈Rn\mathcal{G}(x_i) \in \mathbb{R}^n6 of fuzzy formulas is globally satisfied to the extent

G(xi)∈Rn\mathcal{G}(x_i) \in \mathbb{R}^n7

Learning is formulated as maximizing this satisfaction, with G(xi)∈Rn\mathcal{G}(x_i) \in \mathbb{R}^n8 regularization on parameters.

2. logLTN and Differentiable Fuzzy Logic in the Logarithm Space

logLTN (Badreddine et al., 2023) addresses numerical instability and gradient vanishing/exploding issues by grounding all connectives and quantifiers in the log domain:

  • Atoms: If G(xi)∈Rn\mathcal{G}(x_i) \in \mathbb{R}^n9 is a sigmoid/softmax output for predicate kk0, kk1 is used.
  • Conjunction: kk2.
  • Disjunction: kk3.
  • Negation: Atom-level with exact closed forms for log-sigmoid and log-softmax.
  • Universal quantifier: Mean of log-truths (ensuring batch-size invariance).
  • Existential quantifier: LogMeanExp (LME), providing a tight smooth approximation to max, distributing gradients stably.

The mean over log-truths for kk4 and LME for kk5 prevent underflow/overflow and maintain efficient backpropagation. Empirical benchmarks demonstrate significantly improved stability and tighter theoretical bounds compared to traditional product and sum-based aggregators.

3. LTNs in Practice: Model Architecture and Optimization

A typical LTN model graph includes:

  1. Grounding variables as real vectors/tensors for the given domain.
  2. Neural networks for predicate and function symbols.
  3. Fuzzy connective layers compute kk6, kk7, kk8, kk9 element-wise over outputs.
  4. Quantifier layers aggregate (via ff0-means or log-space operators) over variable axes.
  5. The global satisfaction is aggregated (e.g., via a t-norm or mean), and the final loss is typically:

ff1

Gradients propagate through all layers, including logic operators and aggregators, enabling joint end-to-end learning of neural parameters and logical satisfaction (Badreddine et al., 2020, Badreddine et al., 2023).

4. PROTOtypical Logic Tensor Networks for Zero- and Few-Shot Learning

PROTO-LTN (Martone et al., 2022) introduces a paradigm for class-level reasoning in few- and zero-shot visual tasks:

  • Prototype-based Grounding: Each class ff2 is represented as a prototype vector ff3, occupying the same space as image embeddings.
  • Few-Shot: Prototypes are computed as means of support embeddings: ff4 where ff5 maps input images to embeddings.
  • Zero-Shot: Unseen-class prototypes are derived by embedding semantic attribute vectors via ff6.

The ff7 predicate, fundamental for semantic interpretation, is grounded by a Gaussian kernel over embedding distances: ff8 Alternatively, a parameterized similarity function can be learned.

Logical axioms enforce class membership and non-membership constraints for queries. For each episode:

  • Positive axioms: each query example must be an instance of its true class.
  • Negative axioms: each query must not be an instance of any other class (with down-weighting ff9).

The per-episode differentiable loss aggregates the satisfaction of these axioms, incorporating both data-driven and logical supervision.

5. Applications and Empirical Results

5.1 Semantic Image Interpretation and GZSL

In generalized zero-shot learning (GZSL), PROTO-LTN demonstrates performance at or above state-of-the-art embedding-based methods. On AWA2, it achieves unseen-class accuracy G(f):(Rn)k→Rn\mathcal{G}(f): (\mathbb{R}^n)^k \to \mathbb{R}^n0 and harmonic mean G(f):(Rn)k→Rn\mathcal{G}(f): (\mathbb{R}^n)^k \to \mathbb{R}^n1, outperforming DEM’s G(f):(Rn)k→Rn\mathcal{G}(f): (\mathbb{R}^n)^k \to \mathbb{R}^n2, G(f):(Rn)k→Rn\mathcal{G}(f): (\mathbb{R}^n)^k \to \mathbb{R}^n3 (Martone et al., 2022). Embedding visualizations via t-SNE confirm that instances and prototypes cluster semantically in the learned metric space.

5.2 Semantic Segmentation with Medical Knowledge

The integration of FOL constraints into medical image segmentation is exemplified by combining LTNs with a SwinUNETR backbone (Bergamin et al., 26 Sep 2025). Background knowledge rules—such as connectivity, non-nesting, and volume similarity—are expressed as FOL formulas and continuously relaxed to yield differentiable loss components. Empirically, LTNs act as soft, anatomically-informed regularizers, yielding consistent improvements in Dice coefficient, notably in low-data settings. For instance, with only 5% training data, SwinUNETR + LTN yields a Dice of G(f):(Rn)k→Rn\mathcal{G}(f): (\mathbb{R}^n)^k \to \mathbb{R}^n4 versus G(f):(Rn)k→Rn\mathcal{G}(f): (\mathbb{R}^n)^k \to \mathbb{R}^n5 without LTN.

6. Significance, Limitations, and Directions

LTNs and their variants—including logLTN and PROTO-LTN—provide a unified, differentiable backbone for neuro-symbolic AI. Their chief strengths are parameter sharing, end-to-end optimization, and the ability to inject arbitrary, domain-specific, or commonsense background knowledge through differentiable logic. logLTN advances address prior instability and scaling issues, facilitating broader classes of formulas with improved convergence.

A plausible implication is that this approach can generalize seamlessly to new domains, provided predicates and axioms are expressible as neural modules and fuzzy-logic formulas. Nevertheless, constraints on expressiveness may arise if required logical properties cannot be adequately captured by fuzzy groundings. Moreover, practical performance depends on the differentiability and smoothness of chosen relaxations, as demonstrated in ablation studies comparing p-means, t-norms, and log-space semantics (Badreddine et al., 2023).

7. Summary Table: LTN Variants and Key Features

Framework Connective Semantics Quantifier Aggregation Notable Features
LTN ("Real Logic") (Badreddine et al., 2020) Product t-norm, S-norm, min/max G(f):(Rn)k→Rn\mathcal{G}(f): (\mathbb{R}^n)^k \to \mathbb{R}^n6-mean for G(f):(Rn)k→Rn\mathcal{G}(f): (\mathbb{R}^n)^k \to \mathbb{R}^n7 General-purpose, modular
logLTN (Badreddine et al., 2023) Log-domain addition, LME Mean of logs, LME Numerical stability, batch-invariant, tighter bounds
PROTO-LTN (Martone et al., 2022) Gaussian kernel (prototype) Regular G(f):(Rn)k→Rn\mathcal{G}(f): (\mathbb{R}^n)^k \to \mathbb{R}^n8-mean Prototype-based, parameter efficiency, GZSL/few-shot

The differentiable first-order logic formalized by LTNs and their extensions enables principled integration of data-driven learning with logical axioms, providing a robust neuro-symbolic framework that supports both knowledge-based reasoning and high-capacity function approximation.

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Differentiable First-Order Logic with LTNs.