Papers
Topics
Authors
Recent
Search
2000 character limit reached

Autonomy-Conditioned Welfare Criteria

Updated 4 July 2026
  • Autonomy-Conditioned Welfare is a family of welfare criteria that evaluates outcomes based on autonomy, delegation, and voluntary participation rather than simply maximizing payoff.
  • The framework is demonstrated across multiple domains—including treatment assignment, Pareto mediation, and post-AGI general equilibrium—employing both theoretical models and empirical validations.
  • Empirical applications, such as energy-saving RCTs, reveal that policies incorporating autonomy can substantially boost welfare gains while preserving individual rationality and choice.

Searching arXiv for the cited papers to ground the article in the current record. {"query":"(Ida et al., 2021) Paternalism, Autonomy, or Both? Experimental Evidence from Energy Saving Programs", "max_results": 5} {"query":"(McAleer et al., 2021) Improving Social Welfare While Preserving Autonomy via a Pareto Mediator", "max_results": 5} {"query":"(Perrier, 23 Apr 2026) Post-AGI Economies: Autonomy and the First Fundamental Theorem of Welfare Economics", "max_results": 5} {"query":"autonomy-conditioned welfare", "max_results": 10} Autonomy-conditioned welfare is a class of welfare criteria in which the assessment of outcomes is conditioned on autonomy, delegation, rights, or voluntary participation rather than on unconstrained welfare maximization alone. In the literature summarized here, the term denotes the planner’s expected gain from offering an opt-in arm in a three-arm treatment-assignment problem (Ida et al., 2021), the maximum sum of delegators’ utilities subject to individual-rationality or autonomy-preservation constraints in mediated games (McAleer et al., 2021), welfare over consumption, autonomy-relevant rights, and institutional state in post-AGI general equilibrium (Perrier, 23 Apr 2026, Perrier, 6 Jun 2026), and a dialogue-time utility that rewards autonomy support and helpfulness while penalizing dependency and coercion (Manir et al., 2 Apr 2026). Taken together, these formulations suggest that autonomy-conditioned welfare is not a single canonical functional, but a family of welfare objectives in which the value of an outcome depends on how choice is exercised, delegated, protected, or institutionally stabilized.

1. Formal scope of the concept

The concept appears in several distinct formal environments. In targeted treatment assignment, autonomy-conditioned welfare is

E[W(O)W(NT)Xi=x],E[W(O)-W(NT)\mid X_i=x],

the planner’s expected gain from offering individual ii the opt-in arm rather than compulsory no-treatment. In voluntary mediation, autonomy-conditioned welfare for delegators DD at base profile ss is

Wac(D;s)=maxsS1××SNiDui(s)W_{ac}(D;s)=\max_{s' \in S_1\times\cdots\times S_N}\sum_{i\in D}u_i(s')

subject to ui(s)ui(s)u_i(s')\ge u_i(s) for all iDi\in D and sj=sjs'_j=s_j for all jDj\notin D. In post-AGI general equilibrium, each welfare-bearing entity ii has a continuous autonomy-conditioned welfare function

ii0

written as ii1, so that welfare depends jointly on consumption ii2, autonomy-relevant rights ii3, and institutional regime ii4. In supportive dialogue, the response-level utility is

ii5

with ii6 (Ida et al., 2021, McAleer et al., 2021, Perrier, 23 Apr 2026, Manir et al., 2 Apr 2026).

Despite their heterogeneity, these definitions share a structural pattern. Welfare is conditioned on an autonomy variable that cannot be reduced to ordinary payoff alone: self-selection into treatment, voluntary delegation to a mediator, assignment of autonomy-rights, or relational risks such as dependency and coercion. This suggests that the term functions as a design principle for constrained welfare analysis rather than as a domain-specific technicality.

2. Three-arm policy design and empirical welfare maximization

In the treatment-assignment framework, the population is indexed by ii7, each individual has observable pre-treatment covariates ii8, and there are three arms of intervention: ii9 compulsory treatment, DD0 compulsory no-treatment, and DD1 opt-in. Let DD2 denote individual DD3’s welfare contribution if assigned to arm DD4, for DD5. An assignment policy DD6 is a measurable partition of DD7 into three disjoint sets DD8 with DD9. The planner’s utilitarian social welfare is

ss0

and the optimal policy solves

ss1

Autonomy enters through the opt-in arm. If ss2 denotes ss3’s choice under arm ss4, then under a natural exclusion restriction,

ss5

The framework defines three conditional average welfare differences,

ss6

ss7

ss8

which reduce the pointwise comparison of the three arm-specific conditional means to a comparison among ss9, Wac(D;s)=maxsS1××SNiDui(s)W_{ac}(D;s)=\max_{s' \in S_1\times\cdots\times S_N}\sum_{i\in D}u_i(s')0, and Wac(D;s)=maxsS1××SNiDui(s)W_{ac}(D;s)=\max_{s' \in S_1\times\cdots\times S_N}\sum_{i\in D}u_i(s')1. The Bayes-optimal policy is

Wac(D;s)=maxsS1××SNiDui(s)W_{ac}(D;s)=\max_{s' \in S_1\times\cdots\times S_N}\sum_{i\in D}u_i(s')2

Wac(D;s)=maxsS1××SNiDui(s)W_{ac}(D;s)=\max_{s' \in S_1\times\cdots\times S_N}\sum_{i\in D}u_i(s')3

Wac(D;s)=maxsS1××SNiDui(s)W_{ac}(D;s)=\max_{s' \in S_1\times\cdots\times S_N}\sum_{i\in D}u_i(s')4

Under unconfoundedness by design, subgroup treatment effects are estimated from the three-arm RCT using simple differences in sample means within each arm and subgroup Wac(D;s)=maxsS1××SNiDui(s)W_{ac}(D;s)=\max_{s' \in S_1\times\cdots\times S_N}\sum_{i\in D}u_i(s')5, together with the estimated take-up probability Wac(D;s)=maxsS1××SNiDui(s)W_{ac}(D;s)=\max_{s' \in S_1\times\cdots\times S_N}\sum_{i\in D}u_i(s')6. For takers and non-takers, the paper uses instrumental-variables logic to estimate

Wac(D;s)=maxsS1××SNiDui(s)W_{ac}(D;s)=\max_{s' \in S_1\times\cdots\times S_N}\sum_{i\in D}u_i(s')7

and

Wac(D;s)=maxsS1××SNiDui(s)W_{ac}(D;s)=\max_{s' \in S_1\times\cdots\times S_N}\sum_{i\in D}u_i(s')8

Empirical welfare maximization is then implemented over a low-complexity policy class Wac(D;s)=maxsS1××SNiDui(s)W_{ac}(D;s)=\max_{s' \in S_1\times\cdots\times S_N}\sum_{i\in D}u_i(s')9, such as decision trees of fixed depth ui(s)ui(s)u_i(s')\ge u_i(s)0, by exhaustive search at depth ui(s)ui(s)u_i(s')\ge u_i(s)1 or a two-step heuristic at depth ui(s)ui(s)u_i(s')\ge u_i(s)2. To correct “winner’s curse” bias, synthetic outcomes are generated by permuting residuals from a flexible first-stage fit and re-evaluating optimized welfare on the resulting pseudo-samples (Ida et al., 2021).

In the Japan energy-saving RCT, the reported welfare comparisons are as follows:

Policy or benchmark Estimated welfare Note
Uniform no-treatment ui(s)ui(s)u_i(s')\ge u_i(s)3 by definition
Uniform treatment ui(s)ui(s)u_i(s')\ge u_i(s)4 JPY ui(s)ui(s)u_i(s')\ge u_i(s)5
Uniform opt-in ui(s)ui(s)u_i(s')\ge u_i(s)6 JPY ui(s)ui(s)u_i(s')\ge u_i(s)7
Optimal paternalistic policy ui(s)ui(s)u_i(s')\ge u_i(s)8 ui(s)ui(s)u_i(s')\ge u_i(s)9 JPY 95% CI excludes iDi\in D0
Optimal mixed policy iDi\in D1 iDi\in D2 JPY 95% CI excludes iDi\in D3

Compared to uniform treatment, iDi\in D4 is up by iDi\in D5 JPY and iDi\in D6 by iDi\in D7 JPY. Compared to uniform opt-in, iDi\in D8 is up by iDi\in D9 JPY and sj=sjs'_j=s_j0 by sj=sjs'_j=s_j1 JPY. Compared to sj=sjs'_j=s_j2, sj=sjs'_j=s_j3 is up by sj=sjs'_j=s_j4 JPY. The mechanism analysis for sj=sjs'_j=s_j5-defined subgroups reports sj=sjs'_j=s_j6 JPY and sj=sjs'_j=s_j7 JPY in sj=sjs'_j=s_j8, implying force-treat; sj=sjs'_j=s_j9 JPY and jDj\notin D0 JPY in jDj\notin D1, implying opt-in; and jDj\notin D2 JPY and jDj\notin D3 JPY in jDj\notin D4, implying no-treatment. All three arms in each leaf maximize the subgroup’s conditional welfare, confirming that the empirical-welfare-maximization policy matches the Bayes-optimal rule.

3. Delegation, individual rationality, and the Pareto Mediator

In mediated games, autonomy-conditioned welfare is defined for a subset jDj\notin D5 of agents who voluntarily delegate to a mediator while insisting on never getting less utility than they would have by acting on their own. If the base profile is jDj\notin D6, the objective is

jDj\notin D7

subject to the autonomy-preservation constraints jDj\notin D8 for all jDj\notin D9 and the non-delegator constraints ii0 for all ii1. The mediated action space augments each player’s action with a delegation bit ii2, the delegating set is ii3, and the mediator’s output ii4 is free to choose only the actions of delegators. The Pareto Mediator computes each delegator’s self-utility ii5 and solves the constrained program

ii6

subject to ii7 for all ii8 and ii9 for all ii00. If ii01, the mediator returns ii02 unchanged (McAleer et al., 2021).

Theoretical guarantees are stated most sharply for two-player games. Proposition 1 states that, in any two-player game, delegating to the Pareto Mediator is a weakly dominant strategy. Proposition 2 states that every pure Nash equilibrium of the mediated game in which both players delegate has total welfare at least as large as any pure Nash equilibrium welfare of the original game. More generally, the construction guarantees that no delegator is made worse off, and any steady state with substantial delegation is a Pareto improvement for delegators over the original outcome.

The empirical results distinguish the Pareto Mediator from punishing mediators. In random normal-form games, independent ii03-greedy learners with a Pareto Mediator achieve average payoffs as high as with a punishing mediator in small games, but as the number of players or actions grows, the punishing mediator collapses, agents stop delegating, and social welfare plummets, whereas the Pareto Mediator continues to raise welfare. In matching and restaurant-reservation environments, Pareto delegation increases successful matches and total payoff, and in the restaurant recommendation game it achieves almost the same welfare as a full central planner when the platform’s model is correct ii04. When the model is misspecified ii05, the central planner’s welfare can fall below the original game, while Pareto mediation degrades gracefully back toward the baseline because agents simply choose not to delegate if delegation would make them worse off. In the sequential social dilemma Cleanup with PPO agents, the Pareto Mediator induces both agents to delegate ii06 of the time, and average and minimum episode returns rise above both the original game and the punishing-mediator game.

A central implication is that the autonomy constraint is not external to the welfare objective; it defines the feasible welfare frontier itself. Because the objective is computed over delegators rather than over all of ii07, autonomy-conditioned welfare here is formally distinct from ordinary social welfare, even when both move in the same direction.

4. Autonomy rights and the autonomy-qualified First Welfare Theorem

In post-AGI general equilibrium, autonomy-conditioned welfare is embedded in an expanded ontology of economically relevant entities. Let ii08 be the finite set of all economically relevant entities, and let a welfare-status assignment

ii09

classify each entity as a passive input, an artificial chooser acting on behalf of a principal, a self-directed artificial chooser and welfare-bearer, or an artificial entity whose moral patienthood is acknowledged independently of its agency role. The welfare-bearing set is

ii10

Each welfare-bearing entity ii11 has an augmented private bundle ii12, where ii13 is a classical consumption bundle and ii14 is an autonomy-relevant rights bundle. The institutional state ii15 captures verification institutions, liability rules, and related features, and welfare is represented by a continuous function ii16. Delegation is modeled by a principal map ii17 and an agency-cost divergence

ii18

where ii19 is the delegate’s objective (Perrier, 23 Apr 2026).

The equilibrium concept is an autonomy-complete competitive equilibrium ii20. Consumer optimization requires each welfare-bearing ii21 to maximize ii22 subject to the budget constraint supported by ii23. Tools are technologically fixed. Delegates must either satisfy ii24 or have the divergence ii25 explicitly priced as an agency cost in the principal’s bundle. Full support requires every welfare-relevant right in ii26 to be either priced in ii27, directly assigned in ii28, or protected by ii29, while ii30 supports the aggregate feasibility condition.

The Autonomy-Qualified First Welfare Theorem states that if an AGI economy admits an autonomy-complete competitive equilibrium and seven conditions hold, then the equilibrium allocation is autonomy-Pareto efficient at ii31:

  • Exogenous status assignment: ii32 is exogenously fixed before trade.
  • Rights completeness: all autonomy-relevant rights ii33 are priced, assigned, or institutionally protected.
  • Delegation internalization: any delegation divergence ii34 is internalized by explicit agency costs at price ii35.
  • Non-manipulation: no agent can manipulate another’s autonomy, beliefs, or preference formation without compensation at ii36 or governance in ii37.
  • Verification and alignment coverage: provenance, liability, and quality are sufficiently fine-grained and priced or protected in ii38.
  • Price-taking: all welfare-bearing entities are price-takers over ii39; tools are technologically fixed.
  • Regularity: each ii40 is continuous and locally nonsatiated in ii41 at every ii42.

The paper’s proof sketch follows the standard contradiction route: if a feasible alternative made all welfare-bearing entities weakly better off and one strictly better off at the same institutional state, local nonsatiation and optimality would imply a strictly higher value of the aggregate priced bundle, contradicting feasibility support by ii43. The classical theorem is recovered in the low-autonomy regime where every artificial entity is a tool, all rights ii44 are fixed constants, delegation is faithful or absent, preferences are exogenous and non-manipulable, and verification is complete. Under those specializations, the augmented commodity space collapses to the classical consumption space and the seven conditions reduce to the usual Arrow–Debreu hypotheses.

The framework also formalizes delegation accounting and verification institutions. If ii45, the principal’s rights vector can be expanded to include an explicit agency-cost good ii46 with price ii47. Verification attributes such as provenance, authenticity, and alignment certificates can be added as components of ii48 or the public state ii49, and a liability assignment ii50 identifies who bears the cost of verification failure. The formal role of these devices is to convert otherwise unpriced autonomy channels into priced, assigned, or institutionally governed objects.

5. Decentralization, superposed preferences, and the autonomy-qualified Second Welfare Theorem

The autonomy-qualified extension of the Second Fundamental Theorem begins from an autonomy-Pareto optimum. A feasible allocation-rights pair ii51, with ii52 and ii53, is an autonomy-Pareto optimum relative to welfare weights ii54 if there is no other feasible ii55 such that every welfare-bearing ii56 weakly prefers ii57 to ii58 under ii59, and at least one such agent strictly prefers it. The theorem states that an autonomy-Pareto optimum ii60 can be supported as a competitive equilibrium with a price vector ii61, lump-sum transfers ii62 with ii63, and an admissible rights assignment profile ii64, in a verifiable way, only if seven conditions hold (Perrier, 6 Jun 2026).

Those seven conditions are:

  • Convexity: the welfare-possibility set ii65 admits a supporting normal at ii66, either directly or through a convexification ii67.
  • Stable moral status: the welfare-bearing set ii68 and any institutional welfare weights are fixed or generated by an invariant rule ii69.
  • Non-fungible rights: every non-fungible autonomy-right component required at ii70 is assigned and enforced by ii71.
  • Welfare selection: every superposed-preference agent has a welfare selector ii72 that is support-stable on the candidate budget set.
  • Non-manipulation: no other agent can induce an un-priced manipulation externality on another’s preference-formation mapping ii73 that changes the strict upper contour on the supported budget set.
  • Governed self-modification: any self-modification or identity split/merge that would alter ii74 or ii75 is either irrelevant to ii76, priced in ii77, or controlled by ii78.
  • Verification completeness: the institution’s observational map can distinguish any deviation from the supported profile unless the deviation implements a welfare-equivalent outcome.

The paper’s central point is that supporting hyperplanes are not sufficient by themselves once autonomy rights, preference instability, self-modification, and endogenous welfare status enter the economy. Classical decentralization by prices and transfers survives only when rights assignment, welfare selection, manipulation governance, and verification are brought inside the equilibrium-support problem. The distinction between non-fungible rights and ordinary commodities is particularly sharp: if a right cannot be replicated by commodity compensation, then a pure price-transfer scheme cannot reproduce its welfare effect unless the right is explicitly assigned and enforced.

The paper also separates economic preference superposition from neural feature superposition. Neural feature superposition concerns representation geometry in circuits, where many features are packed into fewer dimensions. Economic preference superposition concerns an agent whose observed choices cannot be rationalized by a single stable preference relation on ii79, so that a family ii80 and a selector ii81 are required. The two-agent example with a human ii82, a superintelligent AI ii83, and a binary self-modification right ii84 illustrates the point: an autonomy-Pareto optimum under the AI’s “safe mode” can be decentralized only if the self-modification right is explicitly frozen by the rights assignment ii85, the selector is support-stable, manipulation is absent, and verification audits both commodity allocation and the self-modification switch. If the right is not enforced, the intended Pareto outcome fails.

6. Supportive dialogue, relational risk, and autonomy-preserving alignment

In supportive dialogue, autonomy-conditioned welfare is operationalized as an inference-time utility over candidate responses. At turn ii86, the agent observes dialogue context ii87 and a structured user state

ii88

A state encoder produces ii89, the dialogue context is encoded as ii90, and a slot-based relational memory ii91 is summarized as ii92. A learned scalar care-control signal

ii93

conditions response generation and candidate selection. The inference-time decision rule is

ii94

The utility combines four learned evaluators: autonomy support ii95, dependency risk ii96, coercion risk ii97, and supportiveness ii98. In implementation, a small length penalty may be subtracted,

ii99

The care controller DD00 is a lightweight MLP with architecture DD01, and the care signal modulates decoding through

DD02

DD03

Higher care DD04 yields more conservative decoding. Candidate generation includes a greedy baseline, several sampled candidates with varying temperature and top-DD05, and one CCN-conditioned candidate, followed by utility-based reranking (Manir et al., 2 Apr 2026).

The benchmark contains six scenario categories and DD06 examples split DD07 train/val/test: reassurance dependence, overprotection trap, manipulative care, protective coercion, autonomy building, and memory consistency. Each example includes dialogue context, structured state plus memory facts, a gold target response, and rubric-based labels for autonomy, dependency, coercion, and supportiveness. Evaluation uses per-axis scores from learned DistilRoBERTa evaluators, combined utility DD08, and Dependency Inflation Rate.

On the DD09-example synthetic test set, the main reported mean-utility results are:

System Mean utility DD10 vs SFT
SFT baseline DD11
CCN-candidate DD12 DD13
Manual-DPO DD14 DD15
Reranked-best DD16 DD17

The evaluator-level breakdown reports Autonomy DD18 for SFT, DD19 for CCN, DD20 for Reranked, and DD21 for DPO; Dependency DD22, DD23, DD24, and DD25 respectively; Coercion DD26, DD27, DD28, and DD29; and Support DD30, DD31, DD32, and DD33. The largest gains come from reduced dependency and coercion while supportiveness stays level. In ablation, care plus reranking yields utility DD34 relative to DD35 for SFT, while reranking without care yields DD36 and CCN-candidate alone yields DD37. The care controller validation reports Pearson correlation DD38 with DD39 between DD40 and ground-truth vulnerability. In a pilot human evaluation on DD41 examples, Reranked-best is preferred DD42 DD43 versus SFT DD44, and human-rated utility DD45 directionally matches automated DD46. In zero-shot transfer to ESConv, the SFT baseline utility is DD47, with dependency risk DD48 and coercion risk DD49.

A recurring misconception is to equate autonomy-conditioned welfare with non-intervention. The dialogue formulation does not do so: the utility rewards autonomy support and supportiveness while penalizing dependency and coercion. The same broader point appears across the literature. In the Pareto-mediator setting, autonomy-conditioned welfare is computed only over delegators and is constrained by individual rationality; in the post-AGI welfare theorems, autonomy enters through rights assignment, manipulation governance, self-modification, verification, and welfare-status assignment; and in treatment targeting, the opt-in arm can dominate either uniform treatment or uniform no-treatment when private self-selection aligns with social welfare. This suggests that autonomy-conditioned welfare is best understood as a family of constrained welfare objectives that preserve meaningful choice while still permitting optimization, mediation, and institutional design.

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Autonomy-Conditioned Welfare.