Autonomy Qualification Overview
- Autonomy qualification is the systematic specification, classification, measurement, and certification of a system's self-governance, combining technical and normative dimensions.
- It employs structured taxonomies and regimes—from basic to moral agency—to assess domain-specific autonomy in fields like AI ethics, robotics, and human interaction.
- It leverages diverse methodologies, including psychometric assessments, code inspections, runtime behavior analysis, and fuzzy design, to ensure both system reliability and human autonomy preservation.
Autonomy qualification is the specification, classification, measurement, constraint, and certification of autonomy as a technical and normative property of systems that act with limited or no direct human control. Across AI ethics, robotics, HRI, economics, and formal assurance, the term refers not to a single test but to a family of procedures that make autonomy operational: determining whether a system counts as autonomous at all, identifying the level or scope of its autonomy, measuring how autonomously it behaves, evaluating whether it preserves human autonomy, and certifying whether its autonomy is acceptable for a given task or institution (Varshney, 2020, Formosa et al., 11 Apr 2025, Feng et al., 14 Jun 2025, Gyagenda et al., 2023).
1. Conceptual scope of autonomy
A central line of work treats autonomy as a human-centered normative concept before it becomes a machine property. In bioethics, Beauchamp and Childress define autonomy through two conditions, “liberty” and “agency,” and the recommender-systems literature imports this formulation to argue that AI can compromise autonomy by exerting controlling influences and by reducing informed, meaningful choice (Varshney, 2020). Self-determination theory sharpens this further by defining human autonomy as “having flexibility and control over processes and outcomes,” making autonomy simultaneously an ethical principle and a measurable psychological construct (Varshney, 2020).
Recent work on AI decision-support narrows the focus from global self-governance to “domain-specific autonomy,” defined as “the capacity for self-governed action within defined realms of skill or expertise.” In that literature, the two key components are “skilled competence,” meaning the ability to make informed judgments within a domain, and “authentic value-formation,” meaning the capacity to form genuine domain-relevant values and preferences (Buijsman et al., 30 Jun 2025). This domain-relative view is important because autonomy losses can be local: a person may remain globally autonomous while becoming less autonomous in navigation, clinical judgment, teaching, or financial advice.
A separate philosophical literature applies the term to artificial systems themselves and insists that autonomy is a qualified status rather than a behavioral impression. It distinguishes basic agency, autonomous agency, and moral agency. Basic agency requires interactivity, basic autonomy, adaptability, and goal-directedness; autonomous agency additionally requires self-directed goal setting, meaningful choice among options, critical reflection on values, authenticity, and competency conditions; moral agency adds rational deliberation and moral epistemic capacity (Formosa et al., 11 Apr 2025). The paper formalizes these relations schematically as
and
where , , and denote self-directed goal setting, meaningful choice, and critical reflection respectively (Formosa et al., 11 Apr 2025).
An older software-agent literature offers a more local conception: an agent is autonomous “with regard to an attribute” if it can choose, in a nondeterministic way, among several policies of use for that attribute and can change the policy during execution. That model classifies autonomy as partial, nonsocial, and absolute, because it concerns one attribute rather than the whole agent, does not depend on other agents, and is present or absent rather than graded (0707.1558). Taken together, these traditions show that autonomy qualification is always object-relative: it may target persons, subsystems, decision domains, behavioral modes, or institutional roles.
2. Taxonomies and qualification regimes
Because autonomy is not a scalar primitive in most recent work, qualification often proceeds by assigning a system to a structured taxonomy. One philosophical taxonomy classifies AI into no genuine autonomy, limited or derivative autonomy, robust autonomy, and moral autonomy. On this view, current AI systems occupy at most “basic machine autonomy,” since they operate within “rigid boundaries of pre-programmed objectives and data-driven algorithms” and lack self-directed ends, critical self-reflection, and authentic values (Formosa et al., 11 Apr 2025).
An economic general-equilibrium framework replaces the classical autonomy/instrumentality binary with a four-way welfare-status assignment
where tools have no welfare function, delegates choose on behalf of principals, agents are welfare-bearing non-human choosers, and denotes non-human welfare subjects. The welfare-bearing set is
so autonomy qualification here determines who counts in welfare analysis, what rights enter the commodity space, and how delegation and manipulation affect efficiency claims (Perrier, 23 Apr 2026).
A user-interaction taxonomy defines five escalating “Levels of Autonomy for AI Agents” according to the user’s role: operator, collaborator, consultant, approver, and observer. At level 1 the user manages planning and invokes or approves actions; at level 2 control can transfer between user and agent with shared progress representation; at level 3 the agent plans and executes while eliciting rich feedback; at level 4 the user mainly approves consequential actions under configurable approval conditions; at level 5 the user is reduced to monitoring and an emergency off switch (Feng et al., 14 Jun 2025). This is a design-centered qualification regime: it treats autonomy as “a deliberate design decision, separate from its capability and operational environment” (Feng et al., 14 Jun 2025).
A competence-aware robotics framework uses four operational levels—no autonomy, verified autonomy, supervised autonomy, and unsupervised autonomy—and models autonomy choice as part of the decision problem. The agent’s state is extended from to , where 0 is the set of autonomy levels, and its action space becomes 1, allowing it to choose both a domain action and a level of autonomy for executing that action (Basich et al., 2020).
For fully autonomous robotic systems, another framework distinguishes Level of Autonomy from Degree of Autonomy. Within full autonomous mode, LoA 4 denotes “unconditional full autonomy,” LoA 3 “responsiveness-conditioned full autonomy,” LoA 2 “reliability-conditioned full autonomy,” LoA 1 “responsiveness- and reliability-conditioned full autonomy,” and LoA 0 externally controlled or supervised operation (Gyagenda et al., 2023). This separates categorical qualification from performance margin.
3. Measurement and formalization
One major stream operationalizes autonomy through psychometrics. In recommender systems, “respect for human autonomy” is tied to the Index of Autonomous Functioning (IAF), a validated self-determination-theoretic instrument with three subscales: self-congruence, interest-taking, and low susceptibility to control. The proposal is to aggregate IAF scores over users so that a scale originally measuring trait autonomy becomes a measure of “a technological system’s ability to respect autonomous function” (Varshney, 2020). This turns autonomy qualification into population-level human-subject evaluation rather than log-based proxy measurement.
A different stream measures autonomy from runtime behavior. “A Measure for Level of Autonomy Based on Observable System Behavior” defines observed autonomy as
2
where 3 is an edit-distance function comparing a human action sequence 4 with an observed system sequence 5. The resulting Observational Score is normalized to 0.0–5.9 and mapped to a minimum SAE-style level of autonomy from 1 to 5 (Pittman, 2024). This framework is explicitly aimed at “blind” runtime comparison of unknown systems.
Static autonomy assessment instead inspects orchestration code. “Measuring AI agent autonomy: Towards a scalable approach with code inspection” decomposes autonomy into impact and oversight. Impact is split into actions and environment; oversight into orchestration, human-in-the-loop, and observability. Each attribute is scored on a three-level ordinal scale—Lower, Middle, Higher—using concrete code flags such as code_execution_config, use_docker, max_rounds, human_input_mode, and logging/display calls (Cihon et al., 21 Feb 2025). This yields a multidimensional profile rather than a single score.
Robotics and UAS research often quantifies autonomy through composite metrics. In the NCAP-based framework for unmanned aerial systems, the autonomy coordinate is
6
where 7 captures autonomy level across perception, modeling, planning, and execution, and 8 aggregates component performance. To avoid normalization artifacts, the paper recommends the Weighted Product Method,
9
and defines an absolute “autonomy distance”
0
from a non-autonomous origin (Hertel et al., 2021).
A task-requirements framework for fully autonomous robotic systems derives autonomy from a requisite capability set, reliability, and responsiveness. For each capability 1, reliability and responsiveness are quantified as
2
and
3
The corresponding Degree of Autonomy is then aggregated over the requisite capability set as
4
or in weighted form with capability weights 5 (Gyagenda et al., 2023).
Design-time qualification can also be graded rather than binary. In fuzzy design exploration, a vague requirement 6 is represented by a membership function 7, and the degree to which a specification 8 satisfies it is
9
This is paired with probabilistic model checking through an upper probability 0, and qualification becomes the search for designs maximizing 1 subject to 2 (Morse et al., 2016). In legal-computational work, yet another quantification appears: personal autonomy is modeled through a freedom–responsibility plane with
3
and a lawyer robot’s “autonomy estimation” is
4
4. Human-autonomy preservation
A large part of the literature treats autonomy qualification not as measuring machine independence per se, but as evaluating whether AI erodes or preserves human self-governance. In recommender systems, the core concern is that systems optimized for engagement can “influence human behavior in significant ways, in some cases making people more machine-like,” narrowing expression, reducing informed choice, and shifting motivation from intrinsic to extrinsic forms. Respect for autonomy therefore has both a negative obligation—not to control or unduly interfere with autonomous choice—and a positive obligation—to disclose information in ways that foster autonomous decision making (Varshney, 2020).
Human-robot interaction research makes this design problem concrete. For socially assistive robots, human autonomy is defined as the ability to act independently and freely make decisions regarding oneself, including decisions aligned with one’s values and not produced by coercion or outside pressure. The framework decomposes this into independence, choice, control, and identity, and argues that robot autonomy must be constrained to provide “just enough assistance,” avoid unsolicited intervention, encode user preferences as planning constraints, and explain actions in human-understandable terms (Wilson, 2022). Qualification here concerns whether robot initiative preserves user initiative.
Recent work on AI decision-support extends this to professional and skill-based activity. It analyzes how decision-support systems affect domain-specific autonomy through changes in skilled competence and authentic value-formation, arguing that the absence of reliable failure indicators and the possibility of unconscious value shifts can erode autonomy immediately and over time. The proposed design patterns are “careful role specification,” “implementation of defeater mechanisms,” and “support for reflective practice,” all intended to preserve self-governed action within the target domain (Buijsman et al., 30 Jun 2025).
Educational research reaches a related conclusion through Biesta’s category of subjectification, which the meta-analysis operationalizes through autonomy, agency, self-regulated learning, self-efficacy, motivation, and identity-related development. Across 54 effect sizes, the overall subjectification effect is 5, but the strongest gains are concentrated in “small-scale, long-term studies” and in “tutor-like” or reflective designs; by contrast, qualification outcomes are larger and more robust across the literature (Huang et al., 25 Sep 2025). A plausible implication is that autonomy qualification must distinguish systems that merely improve measurable task performance from systems that preserve the user’s capacity to judge, choose, and form values.
5. Assurance, certification, and institutional qualification
Formal assurance work approaches autonomy qualification as a certification problem. A proposed framework for reliable autonomous systems organizes evidence around three layers—Reactions, Rules, and Principles—and maps different V&V techniques to them: hybrid model checking and control-theoretic analysis for low-level dynamics, model checking and static analysis for rule-following symbolic logic, and BDI-style verification or other high-level formalizations for principled exception handling (Fisher et al., 2020). The same work argues that certifying autonomy requires deriving properties from human licensing, explicit regulations, assumed human traits, and human–system interface redesign.
A more recent governance-oriented proposal makes this explicit through autonomy certificates. An autonomy certificate is “a digital document that prescribes the maximum level of autonomy an agent can operate at” for a given set of technical specifications and an operational environment. Issuance requires both an operational agent and an autonomy case, analogous to a safety case but aimed at proving that the agent behaves at most at a particular autonomy level. Certificates are issued by a third-party body and must be renewed when the model, tools, or environment change (Feng et al., 14 Jun 2025).
Design-space qualification can also proceed without binary certification. Fuzzy design exploration produces a partial ordering of system designs by combining probabilistic requirements with vague requirements and selecting specifications that maximize graded satisfaction under explicit probabilistic thresholds (Morse et al., 2016). This is a qualification regime for architectures and parameter settings rather than deployed agents.
Operational qualification can be standardized through benchmark environments. In NASA’s Space Robotics Challenge Phase 2 qualification round, teams had to complete three tasks in a virtual lunar environment under fixed constraints: report volatile locations within 6 m horizontal error, localize a CubeSat within 7 m, and operate within a strict 45-minute trial budget. Team Mountaineers’ architecture combined multi-layer EKFs, Move Base with DWA, SSD-VGG16 perception, and hierarchical state machines; it finished among the six prize-winning teams out of 114 registrants (Kilic et al., 2021). The significance is not merely competitive: qualification was tied to explicit thresholds, randomized seeds, and repeatable simulation.
Institutional economics generalizes the idea further by qualifying the classical First Fundamental Theorem of Welfare Economics. In the augmented model, welfare functions depend on private goods, autonomy-relevant rights, and institutional state, 8, and competitive equilibrium is replaced by autonomy-complete competitive equilibrium. The resulting theorem shows that equilibrium implies autonomy-Pareto efficiency only when autonomy-related margins—rights, delegation divergence, manipulation, and verification—are internalized; the classical theorem is recovered in the “low-autonomy limit” where all non-humans are tools and rights are fixed (Perrier, 23 Apr 2026). Here autonomy qualification functions at the level of institutions and welfare analysis rather than individual systems.
6. Limits, controversies, and open questions
The strongest controversy concerns whether current AI systems are autonomous in any robust sense. One philosophical analysis argues that contemporary ML systems and LLMs are not genuine agents or autonomous agents because they optimize pre-specified objectives, lack self-directed goal setting, lack meaningful choice among valuable options, and cannot critically reflect on or revise their normative framework. On the same view, consciousness remains necessary for moral patiency, so even hypothetical artificial moral agents may not qualify as moral patients (Formosa et al., 11 Apr 2025).
Other limits are methodological. The recommender-systems literature explicitly states that an abstract mathematical operationalization of respect for human autonomy that avoids human-subject studies remains “an important open question” (Varshney, 2020). Code-based autonomy assessment is scalable, but it “cannot capture unexpected or adaptive behaviors that only emerge in real-world or complex environments,” and it misses user interaction dynamics and latent dependencies (Cihon et al., 21 Feb 2025). Behavior-based runtime scoring faces different constraints: it depends on human-equivalent lookup tables, robust sensor fusion, and enough observations to make edit distance meaningful, and it yields only a minimum level of autonomy (Pittman, 2024).
Quantitative robotic frameworks also rest on strong assumptions. The task-requirements approach assumes orthogonal capabilities, Gaussian error models, and available essential performance requirements (Gyagenda et al., 2023). Competence-aware autonomy optimization assumes stationary human feedback, non-starvation of relevant state-action pairs, and safe paths through level space during gated exploration (Basich et al., 2020). Economic autonomy qualification leaves the welfare-status assignment 9 exogenous and does not provide an existence theorem for autonomy-complete competitive equilibria (Perrier, 23 Apr 2026). Fuzzy design qualification depends on membership functions that encode stakeholder judgments about “fast,” “safe,” or “acceptable,” so the resulting optimum is only as objective as those inputs (Morse et al., 2016).
The broader pattern is that autonomy qualification has become increasingly formal without becoming uniform. Some frameworks qualify autonomy as agency status, some as human involvement, some as psychometric impact, some as capability-performance correspondence, and some as certifiable governance state. This suggests that autonomy qualification is best understood as a layered and domain-specific enterprise: it specifies what kind of autonomy is at issue, for whom, under which institutional conditions, and by what evidence it may be said to exist, be acceptable, or require restriction.