Syllogism Completion Task Overview

Updated 19 November 2025

Syllogism completion task is defined as determining valid conclusions from quantified premises using formal logical inference and set-theoretic semantics.
Hybrid neural-symbolic models and diagrammatic methods enhance efficiency and accuracy in resolving syllogistic reasoning challenges.
The task extends to fuzzy, relational, and multi-premise syllogisms, providing insights into reasoning biases and cognitive heuristics in both humans and AI models.

A syllogism completion task is the formal challenge of determining, given a set of categorical or more general quantified premises, which conclusion(s)—if any—necessarily follow(s). This paradigm encompasses classical Aristotelian syllogistics, intermediate quantifiers, fuzzy generalizations, relational syllogistics, and diagrammatic/procedural approaches. Modern treatments extend to computational models, including neural-symbolic architectures, and empirical analyses of syllogistic reasoning in humans and LLMs.

1. Formal Foundations and Classical Categorical Syllogistics

Classically, syllogistic completion operates over syllogisms composed of two premises, each relating two of three terms (A, B, C) by one of four categorical quantifiers:

A (universal affirmative): ∀x(A(x)→B(x)),
E (universal negative): ∀x(A(x)→¬B(x)),
I (particular affirmative): ∃x(A(x)∧B(x)),
O (particular negative): ∃x(A(x)∧¬B(x)).

Syllogisms are classified into four figures, depending on the arrangement of the middle term. Of the 64 theoretical moods (4×4 quantifier choices × 4 figures), only 27 license a logically valid conclusion relating A and C; the remainder entail "no valid conclusion" (Eisape et al., 2023, Bertolazzi et al., 17 Jun 2024).

The completion task operates as follows: Given two categorical premises (possibly instantiated with content words or pseudowords), enumerate over all possible candidate conclusions between the non-overlapping subject and predicate terms, and select the unique (if any) conclusion(s) entailed under monadic predicate logic or set-theoretic semantics (Eisape et al., 2023, Ozeki et al., 8 Aug 2024).

Formally, let $\mathcal{Q} = \{\text{A},\text{E},\text{I},\text{O}\}$ and let $P_1,P_2$ be premises relating combinations of $A$ , $B$ , $C$ . The inference rules exhaust the set-theoretic entailments and their negations or existential imports as needed. The logic is sound and complete for all classical moods (Eisape et al., 2023, Ozeki et al., 8 Aug 2024).

2. Generalizations: Intermediate, Fuzzy, and Relational Syllogistics

Intermediate quantifiers extend the categorical base to include expressions such as "most," "few," "many," and "almost all" (Iero et al., 2018). Such systems organize quantifiers into total orders (affirmative and negative chains) and extend Aristotelian procedures via a monotonicity calculus capable of inferring, for example,

from "Most humans are writers" and "All writers are communicators" to "Most humans are communicators," by upward and downward monotonicity schemes and transitivity of quantifier order (see rules MON, IMP). Soundness and completeness hold for the generalized square of opposition.

Fuzzy syllogistic systems utilize crisp, interval, or fuzzy (e.g., trapezoidal) quantifiers, existential, proportional, exception, and comparative quantifiers (Pereira-Fariña et al., 2014, Pereira-Fariña et al., 2014). The premises are cast as cardinaility or linear-fractional constraints over the Boolean algebra of properties on a universe $E$ :

Logical quantifiers: $Q_{\rm all}(Y_1,Y_2)\equiv Y_1\subseteq Y_2$ ,
Proportional: $a \leq \frac{|Y_1\cap Y_2|}{|Y_1|}\leq b$ .

Completion is formulated as a mathematical optimization problem: introduce variables for each "Venn region" of atoms, transcribe premises into (in)equalities, and compute the tightest possible quantifier $[a_C,b_C]$ for the conclusion by solving the induced linear (or fractional) program (Pereira-Fariña et al., 2014). Fuzzy systems reduce any multi-premise, arbitrary-quantifier syllogism to such programs, yielding a fuzzy quantifier by α-cut stacking.

Relational syllogistics admit premises involving binary relations: e.g., "All students read some textbooks." The formal language contains Boolean algebras of set terms and relational terms, with primitives such as $(A,B)[R]$ (“some $A$ are $R$ -related to some $B$ ”), and four quantifier patterns (some-some, all-some, some-all, all-all) (Ivanov et al., 2011). Validity is characterized axiomaticaly with classical propositional logic plus linking axioms/rules, and decidability corresponds to the complexity of Boolean modal logic with converse.

3. Diagrammatic, Algebraic, and Proof-Theoretic Engines

Diagrammatic calculi (SYLL, SYLL $^*$ ) encode categorical and De Morgan syllogisms as one-dimensional sequences of symbols: term-variables, directional arrows, and bullets (Pagnan, 2013). The four canonical propositions map as:

Proposition	Diagram (SYLL)
A(S,P)	S→•→P
E(S,P)	S→•←P
I(S,P)	S•→P
O(S,P)	S←•→P

Premises are concatenated on common terms; composition deletes middle terms with aligned arrows, and specialized "star" rules handle complemented terms (non-A). The system is resource-sensitive and sound and complete with respect to intuitionistic linear logic $RLL^\perp$ . Implementation admits efficient $O(n^2)$ reduction algorithms, and the approach generalizes to $n$ -term syllogisms (Pagnan, 2010, Pagnan, 2013).

Algebraic methods, following Boole (Burris, 2023), represent categorical premises as equations in Boolean indicator variables (x, y, z for subject, middle, predicate), apply elimination of the middle term, and derive the solution for the subject variable in terms of the predicate. The process yields the canonical forms for valid conclusions via case analysis over the "constitutents", establishing the link to classical moods.

Proof-theoretic (e.g., sequent calculus) and computational engines formally validate syllogism completion:

Translate each premise into a formula (linear logic, set theory, predicate logic, etc.),
Perform resource-sensitive rewriting or cut-free derivation to infer the conclusion,
Return the completed quantifier or conclusion if, and only if, the deduction holds,
For fuzzy/interval-quantified and multi-premise syllogisms, solve the induced set of constraints programmatically (Pereira-Fariña et al., 2014, Pagnan, 2010, Pagnan, 2013).

4. Syllogism Completion with Neural, Symbolic, and Hybrid Models

Syllogism completion has become a central benchmark for computational reasoning, particularly in evaluating LLMs and neuro-symbolic architectures.

LLMs: Evaluations reveal that raw LLMs (e.g., LLaMA-3, GPT-4) can instantiate the completion task in chain-of-thought or multiple-choice settings, but typically display robust reasoning biases:

Content-effects (favoring world-knowledge-consistent conclusions),
Atmosphere and conversion errors,
Reluctance or inability to generate "nothing follows," especially in free-generation formats (Bertolazzi et al., 17 Jun 2024, Ozeki et al., 8 Aug 2024, Eisape et al., 2023).

Supervised fine-tuning (SFT) on balanced, pseudo-worded syllogistic data eliminates many biases and achieves near-ceiling accuracy (e.g., LLaMA-3 8B achieves 94.9% on valid, 97.1% on invalid, 99.4% on "unbelievable" items after SFT) (Bertolazzi et al., 17 Jun 2024).

Hybrid Neural-Symbolic Systems: Integration strategies use fine-tuned neural "assistants" (premise selectors or contradiction-finders) to guide a symbolic prover (rule-based) (Guzmán et al., 10 Oct 2025). The symbolic component ensures completeness and transparency, while the neural assistant reduces the search space (from factorial to linear time in the number of premises or steps). Accuracy approaches 0.94–0.95, and hybrid inference achieves up to 1000× efficiency gain over purely symbolic search (Guzmán et al., 10 Oct 2025). Notably, recursive generalization (multi-step chain reasoning) is acquired more readily than compositional generalization (abstracting simple inference schemas from experience).

Circuit-level Interpretation in Transformers: Mechanistic studies identify "middle-term suppression" circuits (attention head clusters) that implement the logical composition required for AAA-1 syllogisms. Suppression heads remove middle-term vectors, while "mover" heads transfer the logical payload to the output position (Kim et al., 16 Aug 2024). Content contamination occurs when symbolic terms are replaced with world-knowledge-laden tokens, activating "belief heads" that introduce content-driven bias.

5. Task Design, Dataset Construction, and Evaluation Protocols

Completion datasets may use natural-language, pseudo-word, or formalized premises/conclusions. Core formats include:

Multiple-choice: two premises plus a set of candidate conclusions covering all quantifier-direction combinations and "nothing follows" (Bertolazzi et al., 17 Jun 2024, Eisape et al., 2023).
Natural Language Inference (NLI): two premises plus a single hypothesis, with three-way annotation (entailment, contradiction, neutral) (Ozeki et al., 8 Aug 2024).
Translate-and-Explain Chain-of-Thought: explicit stepwise translation into logical formalisms, followed by structured, causal reasoning chains, and a max-one-word final decision (Ozeki et al., 8 Aug 2024).

Evaluation metrics:

Per-class accuracy (entailment, contradiction, neutral),
Consistency (contradiction rates in output sets),
Content-effect bias (accuracy differential on believable vs. unbelievable conclusions),
Human–model and per-schema correlation (e.g., Spearman's ρ),
Statistical tests (e.g., χ² on bias effects).

Best practices for dataset design and modeling include using balanced schema coverage, explicit inclusion of "nothing follows" cases, pseudo-word substitution to minimize semantic leakage, and decoupled translation-reasoning prompt structures (Eisape et al., 2023, Ozeki et al., 8 Aug 2024, Bertolazzi et al., 17 Jun 2024).

6. Analysis of Error Types, Reasoning Biases, and Cognitive Heuristics

Empirical studies report that contemporary LLMs replicate human cognitive fallacies such as:

Conversion errors: symmetric misinterpretation of quantifiers ("No A are B" ⇒ "No B are A"),
Atmosphere effect: selection of conclusion mood matching premise moods,
Existential import errors: inferring I/O-type conclusions from universal premises without existential warrants,
Belief bias: higher performance or deviation toward world-knowledge-plausible conclusions (Eisape et al., 2023, Bertolazzi et al., 17 Jun 2024, Ozeki et al., 8 Aug 2024).

Human–model patterns exhibit high per-schema correlation (ρ up to 0.87), but LLMs rarely volunteer "nothing follows" unless directly prompted or trained (Bertolazzi et al., 17 Jun 2024, Eisape et al., 2023). Supervised fine-tuning reliably eliminates content-effect biases and maintains near-zero contradiction rates (Bertolazzi et al., 17 Jun 2024).

7. Beyond Aristotelian Syllogistics: Extensions and Computational Complexity

Contemporary systems extend syllogism completion to:

Arbitrary $n$ -term chains (diagrammatic rewriting: $O(n^2)$ -time normalization with guaranteed uniqueness) (Pagnan, 2010),
Boolean-relational settings (full propositional logic, arbitrary relations, set and relation term algebras) (Ivanov et al., 2011),
Fuzzy-quantified and multi-premise syllogisms (algorithmic reduction to linear/linear-fractional programming over region cardinalities) (Pereira-Fariña et al., 2014, Pereira-Fariña et al., 2014),
Intermediate quantifiers with explicit monotonicity profile propagation (Iero et al., 2018).

Complexity reflects the underlying logic: relational systems are NExpTime-complete (infinite relation vocabulary), ExpTime-complete (finite), while fuzzy or classical completion is polynomial for bounded syllogisms and LPs, and NP-complete in restricted fragments (Ivanov et al., 2011, Pereira-Fariña et al., 2014).

The syllogism completion task thus lies at the intersection of formal logic, cognition, NLP, and algorithmic reasoning, supporting robust theoretical frameworks, high-precision computational pipelines, and rich empirical typologies of reasoning artifacts across natural and synthetic agents.