Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
158 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

How Artificial Intelligence Leads to Knowledge Why: An Inquiry Inspired by Aristotle's Posterior Analytics (2504.02430v1)

Published 3 Apr 2025 in cs.AI and cs.LO

Abstract: Bayesian networks and causal models provide frameworks for handling queries about external interventions and counterfactuals, enabling tasks that go beyond what probability distributions alone can address. While these formalisms are often informally described as capturing causal knowledge, there is a lack of a formal theory characterizing the type of knowledge required to predict the effects of external interventions. This work introduces the theoretical framework of causal systems to clarify Aristotle's distinction between knowledge that and knowledge why within artificial intelligence. By interpreting existing artificial intelligence technologies as causal systems, it investigates the corresponding types of knowledge. Furthermore, it argues that predicting the effects of external interventions is feasible only with knowledge why, providing a more precise understanding of the knowledge necessary for such tasks.

Summary

  • The paper introduces a causal systems framework that differentiates knowledge-that from knowledge-why using Aristotelian principles.
  • It applies the framework to both deterministic and probabilistic AI models, detailing how interventions and counterfactuals are explained.
  • The study unifies Bayesian Networks, LogLinear models, and Probabilistic Causal Models under a maximum entropy causal system for enhanced explainability.

This paper, "How Artificial Intelligence Leads to Knowledge Why: An Inquiry Inspired by Aristotle's Posterior Analytics" (2504.02430), introduces a formal framework called causal systems to distinguish between Aristotle's concepts of "knowledge-that" (knowing facts) and "knowledge-why" (understanding causes and explanations) within the context of artificial intelligence. The primary motivation is to formally characterize the type of knowledge required for AI systems, like Bayesian networks and causal models, to answer queries about external interventions and counterfactuals, tasks that go beyond simple probabilistic reasoning.

The paper begins by outlining Aristotle's theory of scientific knowledge from Posterior Analytics. Key concepts include:

  • Knowledge by Demonstration: Scientific explanation involves deducing facts from more fundamental causes (demonstration), distinct from mere logical deduction (syllogism).
  • Indemonstrable Knowledge: Demonstrations must start from fundamental facts known through insight ("nous") into essences, not derived from further demonstrations.
  • Knowledge-that vs. Knowledge-why: One can know that something is true (e.g., through observation) before knowing why it is true (understanding its cause). Explanations yielding only knowledge-that might not follow the true causal order.
  • Subordinate/Superordinate Sciences: Some sciences rely on principles explained by others (e.g., optics relies on geometry).

To apply these concepts to AI, the paper adopts Bochman's approach of extending propositional logic. While standard logic uses the provability operator (\vdash)/2 for syllogisms (knowledge-that), an explainability operator (\Rrightarrow)/2 is introduced for demonstrations (knowledge-why). \Phi \Rrightarrow \psi means knowledge-that about \Phi leads to knowledge-why about \psi.

Causal knowledge is represented by a causal theory \Delta, a set of causal rules \phi \Rightarrow \psi, signifying that knowledge-why about \phi leads to knowledge-why about \psi. An area of science also has external premises \mathcal{E}—facts assumed true without needing explanation within that science. Causal Foundation (Principle 3) states that explanations yielding knowledge-why must originate from these external premises.

These components are combined into a deterministic causal system CS := (\Delta, \mathcal{E}, \mathcal{O}), where \mathcal{O} represents observations (additional knowledge-that). The system reasons based on two core principles:

  • Natural Necessity (Aquinas, Principle 4): If the cause exists, the effect must follow.
  • Sufficient Causation (Leibniz, Assumption 1): Every effect has a cause (no unexplained events).

The semantics of a causal system are defined based on causal worlds \omega. A world \omega is a causal world of CS if it is consistent with observations \mathcal{O} and fully explained by its intersection with external premises \mathcal{E}, i.e., \mathcal{C}(\omega \cap \mathcal{E}) = \omega, where \mathcal{C} is the consequence operator derived from \Delta and defaults for \mathcal{E}.

  • CS has knowledge-that about \phi if \phi holds in all its causal worlds.
  • CS has knowledge-why about \phi if (\Delta, \mathcal{E}, \emptyset) (the system without observations) has knowledge-that about \phi.

This framework addresses critiques of Bochman's original theory, particularly issues with cyclic dependencies leading to ungrounded explanations. The paper then interprets Pearl's Structural Causal Models (SCMs) as deterministic causal systems via a Bochman Transformation. It shows that for acyclic SCMs, their solutions correspond to the causal worlds of the transformed system, implying that acyclic SCMs represent knowledge-why. External interventions (\Do operator) are modeled by creating a modified causal system CS_i where rules and premises related to the intervened variables are adjusted. Knowledge about the effect of an intervention i on \phi corresponds to knowledge-why about \phi in CS_i.

The paper then extends the framework to handle uncertainty, moving from deterministic to probabilistic reasoning. It reviews probability theory, the Principle of Maximum Entropy (as a probabilistic analogue of deduction/syllogism), LogLinear models, Bayesian Networks (BNs), and Probabilistic Causal Models. It highlights Reichenbach's Common Cause Assumption and Williamson's Causal Irrelevance Principle, arguing that maximizing entropy greedily along a causal order (as implicitly done in BNs) is the probabilistic analogue of Aristotelian demonstration, thus yielding probabilistic knowledge-why.

This leads to Weighted Causal Theories \Theta, containing rules (w, \phi \Rightarrow \psi) where weights quantify the belief in Natural Necessity for that rule. The Common Cause Semantics \pi_{\Theta} defines a distribution based on maximizing entropy along the causal structure derived from \Theta, assuming Causal Irrelevance holds.

Finally, Maximum Entropy Causal Systems CS := (\Theta, \mathcal{E}, \mathcal{O}, \Sigma, complete) are introduced. Here, \Sigma is a LogLinear model representing knowledge from a superordinate science, and complete indicates if the model assumes causal completeness (i.e., Causal Irrelevance/Common Cause holds).

  • The a priori distribution \pi_{CS} combines information from \Theta (potentially via common cause semantics if complete=\top) and \Sigma.
  • Probabilistic knowledge-that \pi^{that}_{CS}(\phi) is the posterior probability \pi_{CS}(\phi | \mathcal{O}).
  • Probabilistic knowledge-why \pi^{why}_{CS}(\phi) is defined as \pi_{CS}(\phi | \mathcal{O}, \sufficient(CS)), where \sufficient(CS) is the event that all facts in a world are explainable by the system's explanatory component. This is accompanied by a confidence level \pi(\sufficient(CS) | \mathcal{O}).

The paper shows how LogLinear models, BNs, and Probabilistic Causal Models can be interpreted as specific instances of Maximum Entropy Causal Systems:

  • LogLinear Models: Lack knowledge-why (\sufficient=\emptyset) and knowledge of intervention effects.
  • Bayesian Networks: Encode knowledge-why (confidence 1) and correctly predict intervention effects under the common cause assumption.
  • Probabilistic Causal Models (acyclic): Also encode knowledge-why (confidence 1) and intervention effects.

Interventions in Maximum Entropy Causal Systems are defined similarly to the deterministic case, modifying \Theta and \mathcal{E} but leaving \Sigma untouched due to modularity issues.

In conclusion, the paper provides a unified framework grounded in Aristotelian distinctions to analyze the type of knowledge AI formalisms capture, arguing that the ability to handle interventions fundamentally relies on possessing "knowledge-why", whether deterministic or probabilistic. It connects causal reasoning, probability, and logic programming concepts, suggesting future work in extending this framework to relational AI formalisms and characterizing the knowledge needed for counterfactual reasoning.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

Youtube Logo Streamline Icon: https://streamlinehq.com