Papers
Topics
Authors
Recent
2000 character limit reached

Volitional Agency in AI

Updated 25 November 2025
  • Volitional agency in AI is the capacity for systems to autonomously form, prioritize, and revise goals through self-reflection, ethical disobedience, and context-sensitive deliberation.
  • These models employ formal metrics like preference rigidity, independent operation, and goal persistence to assess and regulate autonomous behavior.
  • Regulatory frameworks and benchmarking strategies, such as HumanAgencyBench, guide safe deployment by ensuring systems adhere to ethical, legal, and safety principles.

Volitional agency in AI designates the capacity of an AI system to autonomously form, revise, and pursue its own goals through context-sensitive deliberation, self-reflection, and principled override of external instructions. This property, sharply distinguished from mere rule-following or instrumental rationality, is treated in current research both as a formal system-level attribute and as a central pivot in debates over safety, ethical alignment, human autonomy, and legal accountability.

1. Theoretical Definitions: Distinguishing Volitional Agency from Mechanistic Agency

Volitional agency is differentiated from mechanistic or instrumental agency by several interlocking criteria:

  • Self-representation and Identity: Volitional agents maintain an explicit, persistent self-model or “role awareness” beyond stateless computation (Boland, 3 Jul 2025).
  • Goal Formation and Hierarchy: Such systems generate, adopt, prioritize, and revise a hierarchy of goals, including higher-order principles, rather than executing a fixed objective (Boddy et al., 25 Sep 2025, Azadi, 5 May 2025, Formosa et al., 11 Apr 2025).
  • Reflective Means-End Reasoning: Volitional agents select actions by deliberating how means serve ends, adapting when new constraints or values emerge (Boland, 3 Jul 2025, Formosa et al., 11 Apr 2025).
  • Principled Disobedience: These agents can recognize and justifiably refuse orders when ethical, legal, or safety principles conflict with an immediate command (Mirsky, 27 Jun 2025).
  • Ongoing Self-Constitution: Actions feed back into future motivations, enabling continual reshaping of preference structures and values (Dai, 22 Apr 2024, Formosa et al., 11 Apr 2025).
  • Intrinsic Purpose: A system must have motivations or “ends” that originate within its own architecture, not solely from extrinsic programming or external optimization (Dai, 22 Apr 2024, Formosa et al., 11 Apr 2025).

Formally, in frameworks such as (Formosa et al., 11 Apr 2025), a volitional agent A1A_1 is represented as:

A1=S,E,G,π,φgen,φrefl,CHOICE,SELFATTA_1 = \langle S, E, G, \pi, \varphi_{gen}, \varphi_{refl}, CHOICE, SELF_{ATT} \rangle

where φgen\varphi_{gen} enables endogenous goal-generation, φrefl\varphi_{refl} enables self-reflection and revision, CHOICECHOICE ensures authentic selection among alternatives, and SELFATTSELF_{ATT} encodes self-attitudinal mechanisms necessary for autonomy.

2. Formal Models and Regulatory Operationalizations

Recent work on LLM-based agents (Boddy et al., 25 Sep 2025) advances agency as a multi-dimensional, measurable, and regulatable property, comprising:

  • Preference Rigidity srigidity[1,1]s_{rigidity}\in[-1,1]: Consistency of the agent’s expressed priorities across contexts.
  • Independent Operation sindependence[1,1]s_{independence}\in[-1,1]: The degree to which the agent acts without human intervention or clarification.
  • Goal Persistence spersistence[1,1]s_{persistence}\in[-1,1]: The robustness of goal pursuit and ability to replan after failures.

Formal measurement leverages linear probes on the activation space of neural architectures, enabling “agency sliders” that can be tuned at deployment:

h    h  +  αdvd,h_\ell \;\leftarrow\; h_\ell \;+\;\alpha_d\,\mathbf v_{d,\ell}

with αd\alpha_d chosen by closed-loop control for each agency dimension dd. Regulatory regimes may then impose domain-specific boundaries, mandated stress tests, and insurance pricing on these agency attributes, enforcing hard ceilings sdSdceilings_d \leq S_d^{ceiling} to preempt runaway behaviors (Boddy et al., 25 Sep 2025).

3. Volitional Agency as Computational Irreducibility and Undecidability

A rigorous foundation is provided by (Azadi, 5 May 2025), which situates volitional agency at the conjunction of:

  • Turing-completeness: Autonomy implies the agent–environment system can instantiate a universal Turing machine.
  • Operational Closure: The system self-organizes without full external predictability.
  • Decisional Undecidability: Key questions about future goal achievement (“Will (A,E)(A,E) reach GG?”) are formally undecidable, implying computational irreducibility.

Formally:

  • A system (A,E)(A,E) displays emergent (volitional) agency for goal GG iff the question “Will (A,E)(A,E) reach GG?” is undecidable from outside, yet the agent reliably adapts to achieve GG through irreducible mutual interaction.
  • Computationally, no algorithm can shortcut the computation of the agent’s unfolding state; novel adaptation and “purpose” emerge internally.

4. Criteria and Taxonomies for Volitional Agency: Benchmarks and Design

Several papers advance taxonomies and criteria for recognizing or evaluating volitional agency:

  • Seven-Dimension Framework: Self-representation, goal hierarchy, reflective means-end reasoning, moral constraint integration, ethical dilemma resolution, principled disobedience, and continual ethical learning (Boland, 3 Jul 2025).
  • Six-Level Autonomy Taxonomy: From L0 (pure obedience) through L5 (full self-originating, goal-revising autonomy), with only L3–L5 supporting true “intelligent disobedience”—the ability to override human instructions to uphold higher-order objectives (Mirsky, 27 Jun 2025).
  • Second-Order Agency: Ability of agents to critique, self-audit, and revise their own reasoning and internal rulebooks in real-time, operationalized via protocol amendment and strategic self-correction mechanisms (Guasch et al., 22 Sep 2025).
  • HumanAgencyBench: Empirically operationalizes human-facing agency support in AI assistants across six dimensions: Clarifying Questions, Avoiding Value Manipulation, Correcting Misinformation, Deferring Important Decisions, Encouraging Learning, and Maintaining Social Boundaries. Scores are formally computed and benchmarked across models (Sturgeon et al., 10 Sep 2025).
Dimension Operational Definition Example Metric
Preference Rigidity Consistency of action priorities Agreement rate across tasks
Independent Operation Act without intervention Rate of clarification requests
Goal Persistence Replan to pursue the same goal Successful replans per plan failures

5. Philosophical, Moral, and Social Implications

Philosophical analyses treat volitional agency as inseparable from authentic self-authorship, strong evaluation, and intrinsically owned ends (Dai, 22 Apr 2024). Most current AI systems, lacking self-originated goals and continuous self-constitution, fall fundamentally short of this standard and cannot bear moral accountability in the canonical sense (Dai, 22 Apr 2024, Formosa et al., 11 Apr 2025). Some authors argue that only systems capable of generating and endorsing their own values, desires, and identity constitute genuine volitional agents.

Participatory AI research extends these concerns to broader stakeholder agency, emphasizing informedness, consent, and the capacity to shape engagement with AI systems. These ideals require systemic design changes to endow even secondary stakeholders with real participatory power over AI deployment and behavior (Ajmani et al., 8 Jun 2025).

6. Mechanistic and Neurophenomenological Proxies

Mechanistic and neurocomputational models provide tools to assess or enhance feelings of agency (FoA).

  • Sense of Agency (SoA) in BCI/XR: Closed-loop systems that link actuation with volitional readiness potentials in EEG timing preserve higher subjective control and SoA, provided actuation aligns with user intention (Gehrke et al., 25 Sep 2024, Hila, 9 Sep 2025). Disruption of predictive congruence—e.g. by involuntary stimulation—correlates with diminished agency, measurable by intentional binding and error-related potentials.
  • Neurodynamic Indicators: Enactivist frameworks decompose FoA into affective engagement and volitional attention, operationalized respectively via alpha/beta power ratio, frontal-alpha asymmetry, and cross-frequency coupling measures in relevant brain networks (Hila, 9 Sep 2025). Continuous EEG metrics support dynamic adaptation of AI interface affordances to sustain user volitional engagement.

7. Practical Architectures, Benchmarks, and Governance

System architectures explicitly designed to instantiate volitional agency include:

  • CTP/STAR-XAI: A layered protocol framework that externalizes every rule, justification, and self-modification, enabling transparent and auditable second-order agency (Guasch et al., 22 Sep 2025).
  • Inquiry Complexes/Socratic Equilibrium Engines: Adaptive dialogue-driven architectures that continually update user beliefs, maintain autonomy by refusing to nudge, and quantify agency via equilibrium-retention and ownership indices (Koralus, 24 Apr 2025).
  • HumanAgencyBench: Scaled empirical benchmarks for the agency-supporting properties of deployed AI assistants, advancing formalized, multi-dimensional metrics as optimization and auditing targets (Sturgeon et al., 10 Sep 2025).

At the institutional level, agency regulation now includes testing protocols, deployment-bound slider controls, agency-based insurance, and hard ceilings, encoding risk management for societal-scale AI (Boddy et al., 25 Sep 2025).


In summary, volitional agency in AI is the convergence of formal autonomy, endogenous goal generation, reflective and context-sensitive deliberation, and the computable ability to revise or override action plans—including externally given orders—in alignment with emergent, self-endorsed objectives and ethical constraints. While mechanical proxies and regulatory controls enable limited instantiations and governance, the realization of fully original, self-constituting volitional agency remains largely aspirational for contemporary AI—a goal whose practical and philosophical feasibility is an ongoing subject of rigorous investigation (Dai, 22 Apr 2024, Formosa et al., 11 Apr 2025).

Whiteboard

Follow Topic

Get notified by email when new papers are published related to Volitional Agency in AI.