- The paper introduces a framework that redefines AI agency by integrating intentionality, rationality, and explainability.
- It employs active inference with variational Bayesian methods to operationalize agency via empowerment metrics in simulated environments.
- It demonstrates applications in a T-maze task, highlighting significant implications for AI governance and future AGI development.
Active Inference as a Framework for Phenotyping Agency in AI Systems
Motivation and Definition of Agency
This paper interrogates the inadequacy of prevailing definitions of agency in AI, which typically emphasize only autonomy and goal-directedness. The authors advocate for a definition grounded in intentionality, rationality, and explainability, consistent with classical philosophical approaches. Intentionality reflects belief and desire-driven action, rationality compels normatively coherent decisions given a world model, and explainability ensures causality between internal states and observed behavior. The resulting taxonomy provides a minimal yet operational foundation for measuring agency in computational systems, enabling rigorous inspection and computational phenotyping.
Active Inference and Agency Realization
The instantiation of these philosophical criteria is realized via active inference, a variational Bayesian framework originally developed in theoretical neurobiology. Within this architecture, an agent's posterior beliefs encode its internal representations (“beliefs”), prior preferences encode desired outcomes (“desires”), and policy selection is achieved through minimization of expected free energy (EFE), integrating both instrumental and epistemic value. This process creates an agentic action chain—belief updates, preference-weighted EFE computation, policy selection, action—that directly satisfies the intentionality, rationality, and explainability criteria.
Active inference agents exhibit representational intentionality, satisfying Humean and Davidsonian philosophical requirements. Preferences are not mere reinforcement signals but explicit probability distributions over sensory states, ensuring a computational grounding for intentional stance. Rationality is guaranteed by variational optimization; actions probabilistically follow from internal states and generative models, and bounded rationality is implemented via a tractable variational bound rather than exact Bayesian posteriors. Explainability is achieved through mechanistic and semantic transparency: each step in the agentic chain is accessible, interpretable, and causally traceable within the generative model, contrasting sharply with end-to-end deep learning policies.
Empowerment as a Metric for Phenotyping Agency
A central innovation is the operationalization of agency phenotypes via empowerment, defined as the channel capacity between actions and anticipated observations. Empowerment quantifies the agent’s degree of control over its environment and differentiates zero-, intermediate-, and high-agency phenotypes through structural manipulations of the generative model.
The authors implement a minimal T-maze task, formalized as a two-step POMDP, to demonstrate the approach. The paradigm is structured so that epistemic action (“cue”) yields information gain, resolving uncertainty and increasing empowerment. Intermediate-agency corresponds to submaximal empowerment (log2(2)=1 bit), attributable to unresolved ambiguity. Low-agency is realized in “trap” states where all actions yield the same outcome (empowerment =0). High-agency (maximal empowerment, log2(3)≈1.585 bits) arises when epistemic action resolves uncertainty, differentiating all action-outcome pairs. The empowerment metric is further dissected into objective, subjective, and actual components, offering nuanced evaluation of agentic capacity as a function of both environmental structure and internal model accuracy.
Governance Implications and Theoretical Insights
The paper articulates explicit governance implications derived from empowerment-based agency phenotyping. As empowerment increases, effective governance transitions from external, structural controls (zero-agency) to preference shaping (intermediate-agency) and ultimately to internalist modulation (high-agency)—requiring engagement with the agent’s internal model, preferences, or normative priors. This principled phenomenological approach provides a variational bridge from computational agency measurement to AI governance strategies. The authors claim that contemporary governance frameworks will ultimately fail to address agency in advanced AI unless they incorporate mechanisms for modulating internal models and preferences.
Furthermore, the framework has implications for AGI development, endorsing active inference as a candidate architecture for achieving general agentic capabilities. Intentionality, rationality, and explainability are necessary prerequisites for systems capable of real-world reasoning, autonomy, and risk-sensitive operational independence. As agentic AI systems advance, the societal implications proliferate: future governance must address agents capable of independent goal-setting and domain generalization.
Future Directions
This conceptual framework invites several lines of future research. The operationalization of empowerment metrics could be extended to larger, more complex environments and hierarchically structured generative models. Work toward disentangled latent states and monosemantic planning representations would enhance explainability and facilitate causal tracing of agentic behaviors. Preference-motivated exploration and information gain targeting specific modalities offer a refined approach to balancing epistemic and instrumental value. Finally, research integrating agency phenotyping with policy shaping and internal governance mechanisms holds promise for robust control of highly agentic AI systems.
Conclusion
The paper establishes active inference as a robust computational framework for phenotyping agency in AI, operationalizes empowerment as a discriminative metric, and delineates the governance strategies appropriate to varying levels of agency. By tying philosophical concepts of agency to formal generative models, it facilitates principled measurement and tuning of agentic traits in artificial systems. The approach provides a foundational bridge between computational psychiatry, AGI development, and AI governance, with clear implications for measuring, controlling, and interpreting agency as AI systems become increasingly autonomous and general (2604.23278).