- The paper introduces a causal systems framework that differentiates knowledge-that from knowledge-why using Aristotelian principles.
- It applies the framework to both deterministic and probabilistic AI models, detailing how interventions and counterfactuals are explained.
- The study unifies Bayesian Networks, LogLinear models, and Probabilistic Causal Models under a maximum entropy causal system for enhanced explainability.
This paper, "How Artificial Intelligence Leads to Knowledge Why: An Inquiry Inspired by Aristotle's Posterior Analytics" (2504.02430), introduces a formal framework called causal systems to distinguish between Aristotle's concepts of "knowledge-that" (knowing facts) and "knowledge-why" (understanding causes and explanations) within the context of artificial intelligence. The primary motivation is to formally characterize the type of knowledge required for AI systems, like Bayesian networks and causal models, to answer queries about external interventions and counterfactuals, tasks that go beyond simple probabilistic reasoning.
The paper begins by outlining Aristotle's theory of scientific knowledge from Posterior Analytics. Key concepts include:
- Knowledge by Demonstration: Scientific explanation involves deducing facts from more fundamental causes (demonstration), distinct from mere logical deduction (syllogism).
- Indemonstrable Knowledge: Demonstrations must start from fundamental facts known through insight ("nous") into essences, not derived from further demonstrations.
- Knowledge-that vs. Knowledge-why: One can know that something is true (e.g., through observation) before knowing why it is true (understanding its cause). Explanations yielding only knowledge-that might not follow the true causal order.
- Subordinate/Superordinate Sciences: Some sciences rely on principles explained by others (e.g., optics relies on geometry).
To apply these concepts to AI, the paper adopts Bochman's approach of extending propositional logic. While standard logic uses the provability operator (\vdash)/2
for syllogisms (knowledge-that), an explainability operator (\Rrightarrow)/2
is introduced for demonstrations (knowledge-why). \Phi \Rrightarrow \psi
means knowledge-that about \Phi
leads to knowledge-why about \psi
.
Causal knowledge is represented by a causal theory \Delta
, a set of causal rules \phi \Rightarrow \psi
, signifying that knowledge-why about \phi
leads to knowledge-why about \psi
. An area of science also has external premises \mathcal{E}
—facts assumed true without needing explanation within that science. Causal Foundation (Principle 3) states that explanations yielding knowledge-why must originate from these external premises.
These components are combined into a deterministic causal system CS := (\Delta, \mathcal{E}, \mathcal{O})
, where \mathcal{O}
represents observations (additional knowledge-that). The system reasons based on two core principles:
- Natural Necessity (Aquinas, Principle 4): If the cause exists, the effect must follow.
- Sufficient Causation (Leibniz, Assumption 1): Every effect has a cause (no unexplained events).
The semantics of a causal system are defined based on causal worlds \omega
. A world \omega
is a causal world of CS
if it is consistent with observations \mathcal{O}
and fully explained by its intersection with external premises \mathcal{E}
, i.e., \mathcal{C}(\omega \cap \mathcal{E}) = \omega
, where \mathcal{C}
is the consequence operator derived from \Delta
and defaults for \mathcal{E}
.
CS
has knowledge-that about \phi
if \phi
holds in all its causal worlds.
CS
has knowledge-why about \phi
if (\Delta, \mathcal{E}, \emptyset)
(the system without observations) has knowledge-that about \phi
.
This framework addresses critiques of Bochman's original theory, particularly issues with cyclic dependencies leading to ungrounded explanations. The paper then interprets Pearl's Structural Causal Models (SCMs) as deterministic causal systems via a Bochman Transformation. It shows that for acyclic SCMs, their solutions correspond to the causal worlds of the transformed system, implying that acyclic SCMs represent knowledge-why. External interventions (\Do
operator) are modeled by creating a modified causal system CS_i
where rules and premises related to the intervened variables are adjusted. Knowledge about the effect of an intervention i
on \phi
corresponds to knowledge-why about \phi
in CS_i
.
The paper then extends the framework to handle uncertainty, moving from deterministic to probabilistic reasoning. It reviews probability theory, the Principle of Maximum Entropy (as a probabilistic analogue of deduction/syllogism), LogLinear models, Bayesian Networks (BNs), and Probabilistic Causal Models. It highlights Reichenbach's Common Cause Assumption and Williamson's Causal Irrelevance Principle, arguing that maximizing entropy greedily along a causal order (as implicitly done in BNs) is the probabilistic analogue of Aristotelian demonstration, thus yielding probabilistic knowledge-why.
This leads to Weighted Causal Theories \Theta
, containing rules (w, \phi \Rightarrow \psi)
where weights quantify the belief in Natural Necessity for that rule. The Common Cause Semantics \pi_{\Theta}
defines a distribution based on maximizing entropy along the causal structure derived from \Theta
, assuming Causal Irrelevance holds.
Finally, Maximum Entropy Causal Systems CS := (\Theta, \mathcal{E}, \mathcal{O}, \Sigma, complete)
are introduced. Here, \Sigma
is a LogLinear model representing knowledge from a superordinate science, and complete
indicates if the model assumes causal completeness (i.e., Causal Irrelevance/Common Cause holds).
- The a priori distribution
\pi_{CS}
combines information from \Theta
(potentially via common cause semantics if complete=\top
) and \Sigma
.
- Probabilistic knowledge-that
\pi^{that}_{CS}(\phi)
is the posterior probability \pi_{CS}(\phi | \mathcal{O})
.
- Probabilistic knowledge-why
\pi^{why}_{CS}(\phi)
is defined as \pi_{CS}(\phi | \mathcal{O}, \sufficient(CS))
, where \sufficient(CS)
is the event that all facts in a world are explainable by the system's explanatory component. This is accompanied by a confidence level \pi(\sufficient(CS) | \mathcal{O})
.
The paper shows how LogLinear models, BNs, and Probabilistic Causal Models can be interpreted as specific instances of Maximum Entropy Causal Systems:
- LogLinear Models: Lack knowledge-why (
\sufficient=\emptyset
) and knowledge of intervention effects.
- Bayesian Networks: Encode knowledge-why (confidence 1) and correctly predict intervention effects under the common cause assumption.
- Probabilistic Causal Models (acyclic): Also encode knowledge-why (confidence 1) and intervention effects.
Interventions in Maximum Entropy Causal Systems are defined similarly to the deterministic case, modifying \Theta
and \mathcal{E}
but leaving \Sigma
untouched due to modularity issues.
In conclusion, the paper provides a unified framework grounded in Aristotelian distinctions to analyze the type of knowledge AI formalisms capture, arguing that the ability to handle interventions fundamentally relies on possessing "knowledge-why", whether deterministic or probabilistic. It connects causal reasoning, probability, and logic programming concepts, suggesting future work in extending this framework to relational AI formalisms and characterizing the knowledge needed for counterfactual reasoning.