Autonomy-Conditioned Welfare Criteria

Updated 4 July 2026

Autonomy-Conditioned Welfare is a family of welfare criteria that evaluates outcomes based on autonomy, delegation, and voluntary participation rather than simply maximizing payoff.
The framework is demonstrated across multiple domains—including treatment assignment, Pareto mediation, and post-AGI general equilibrium—employing both theoretical models and empirical validations.
Empirical applications, such as energy-saving RCTs, reveal that policies incorporating autonomy can substantially boost welfare gains while preserving individual rationality and choice.

Searching arXiv for the cited papers to ground the article in the current record. {"query":"(Ida et al., 2021) Paternalism, Autonomy, or Both? Experimental Evidence from Energy Saving Programs", "max_results": 5} {"query":"(McAleer et al., 2021) Improving Social Welfare While Preserving Autonomy via a Pareto Mediator", "max_results": 5} {"query":"(Perrier, 23 Apr 2026) Post-AGI Economies: Autonomy and the First Fundamental Theorem of Welfare Economics", "max_results": 5} {"query":"autonomy-conditioned welfare", "max_results": 10} Autonomy-conditioned welfare is a class of welfare criteria in which the assessment of outcomes is conditioned on autonomy, delegation, rights, or voluntary participation rather than on unconstrained welfare maximization alone. In the literature summarized here, the term denotes the planner’s expected gain from offering an opt-in arm in a three-arm treatment-assignment problem (Ida et al., 2021), the maximum sum of delegators’ utilities subject to individual-rationality or autonomy-preservation constraints in mediated games (McAleer et al., 2021), welfare over consumption, autonomy-relevant rights, and institutional state in post-AGI general equilibrium (Perrier, 23 Apr 2026, Perrier, 6 Jun 2026), and a dialogue-time utility that rewards autonomy support and helpfulness while penalizing dependency and coercion (Manir et al., 2 Apr 2026). Taken together, these formulations suggest that autonomy-conditioned welfare is not a single canonical functional, but a family of welfare objectives in which the value of an outcome depends on how choice is exercised, delegated, protected, or institutionally stabilized.

1. Formal scope of the concept

The concept appears in several distinct formal environments. In targeted treatment assignment, autonomy-conditioned welfare is

$E[W(O)-W(NT)\mid X_i=x],$

the planner’s expected gain from offering individual $i$ the opt-in arm rather than compulsory no-treatment. In voluntary mediation, autonomy-conditioned welfare for delegators $D$ at base profile $s$ is

$W_{ac}(D;s)=\max_{s' \in S_1\times\cdots\times S_N}\sum_{i\in D}u_i(s')$

subject to $u_i(s')\ge u_i(s)$ for all $i\in D$ and $s'_j=s_j$ for all $j\notin D$ . In post-AGI general equilibrium, each welfare-bearing entity $i$ has a continuous autonomy-conditioned welfare function

$i$ 0

written as $i$ 1, so that welfare depends jointly on consumption $i$ 2, autonomy-relevant rights $i$ 3, and institutional regime $i$ 4. In supportive dialogue, the response-level utility is

$i$ 5

with $i$ 6 (Ida et al., 2021, McAleer et al., 2021, Perrier, 23 Apr 2026, Manir et al., 2 Apr 2026).

Despite their heterogeneity, these definitions share a structural pattern. Welfare is conditioned on an autonomy variable that cannot be reduced to ordinary payoff alone: self-selection into treatment, voluntary delegation to a mediator, assignment of autonomy-rights, or relational risks such as dependency and coercion. This suggests that the term functions as a design principle for constrained welfare analysis rather than as a domain-specific technicality.

2. Three-arm policy design and empirical welfare maximization

In the treatment-assignment framework, the population is indexed by $i$ 7, each individual has observable pre-treatment covariates $i$ 8, and there are three arms of intervention: $i$ 9 compulsory treatment, $D$ 0 compulsory no-treatment, and $D$ 1 opt-in. Let $D$ 2 denote individual $D$ 3’s welfare contribution if assigned to arm $D$ 4, for $D$ 5. An assignment policy $D$ 6 is a measurable partition of $D$ 7 into three disjoint sets $D$ 8 with $D$ 9. The planner’s utilitarian social welfare is

$s$ 0

and the optimal policy solves

$s$ 1

Autonomy enters through the opt-in arm. If $s$ 2 denotes $s$ 3’s choice under arm $s$ 4, then under a natural exclusion restriction,

$s$ 5

The framework defines three conditional average welfare differences,

$s$ 6

$s$ 7

$s$ 8

which reduce the pointwise comparison of the three arm-specific conditional means to a comparison among $s$ 9, $W_{ac}(D;s)=\max_{s' \in S_1\times\cdots\times S_N}\sum_{i\in D}u_i(s')$ 0, and $W_{ac}(D;s)=\max_{s' \in S_1\times\cdots\times S_N}\sum_{i\in D}u_i(s')$ 1. The Bayes-optimal policy is

$W_{ac}(D;s)=\max_{s' \in S_1\times\cdots\times S_N}\sum_{i\in D}u_i(s')$ 2

$W_{ac}(D;s)=\max_{s' \in S_1\times\cdots\times S_N}\sum_{i\in D}u_i(s')$ 3

$W_{ac}(D;s)=\max_{s' \in S_1\times\cdots\times S_N}\sum_{i\in D}u_i(s')$ 4

Under unconfoundedness by design, subgroup treatment effects are estimated from the three-arm RCT using simple differences in sample means within each arm and subgroup $W_{ac}(D;s)=\max_{s' \in S_1\times\cdots\times S_N}\sum_{i\in D}u_i(s')$ 5, together with the estimated take-up probability $W_{ac}(D;s)=\max_{s' \in S_1\times\cdots\times S_N}\sum_{i\in D}u_i(s')$ 6. For takers and non-takers, the paper uses instrumental-variables logic to estimate

$W_{ac}(D;s)=\max_{s' \in S_1\times\cdots\times S_N}\sum_{i\in D}u_i(s')$ 7

and

$W_{ac}(D;s)=\max_{s' \in S_1\times\cdots\times S_N}\sum_{i\in D}u_i(s')$ 8

Empirical welfare maximization is then implemented over a low-complexity policy class $W_{ac}(D;s)=\max_{s' \in S_1\times\cdots\times S_N}\sum_{i\in D}u_i(s')$ 9, such as decision trees of fixed depth $u_i(s')\ge u_i(s)$ 0, by exhaustive search at depth $u_i(s')\ge u_i(s)$ 1 or a two-step heuristic at depth $u_i(s')\ge u_i(s)$ 2. To correct “winner’s curse” bias, synthetic outcomes are generated by permuting residuals from a flexible first-stage fit and re-evaluating optimized welfare on the resulting pseudo-samples (Ida et al., 2021).

In the Japan energy-saving RCT, the reported welfare comparisons are as follows:

Policy or benchmark	Estimated welfare	Note
Uniform no-treatment	$u_i(s')\ge u_i(s)$ 3	by definition
Uniform treatment	$u_i(s')\ge u_i(s)$ 4 JPY	$u_i(s')\ge u_i(s)$ 5
Uniform opt-in	$u_i(s')\ge u_i(s)$ 6 JPY	$u_i(s')\ge u_i(s)$ 7
Optimal paternalistic policy $u_i(s')\ge u_i(s)$ 8	$u_i(s')\ge u_i(s)$ 9 JPY	95% CI excludes $i\in D$ 0
Optimal mixed policy $i\in D$ 1	$i\in D$ 2 JPY	95% CI excludes $i\in D$ 3

Compared to uniform treatment, $i\in D$ 4 is up by $i\in D$ 5 JPY and $i\in D$ 6 by $i\in D$ 7 JPY. Compared to uniform opt-in, $i\in D$ 8 is up by $i\in D$ 9 JPY and $s'_j=s_j$ 0 by $s'_j=s_j$ 1 JPY. Compared to $s'_j=s_j$ 2, $s'_j=s_j$ 3 is up by $s'_j=s_j$ 4 JPY. The mechanism analysis for $s'_j=s_j$ 5-defined subgroups reports $s'_j=s_j$ 6 JPY and $s'_j=s_j$ 7 JPY in $s'_j=s_j$ 8, implying force-treat; $s'_j=s_j$ 9 JPY and $j\notin D$ 0 JPY in $j\notin D$ 1, implying opt-in; and $j\notin D$ 2 JPY and $j\notin D$ 3 JPY in $j\notin D$ 4, implying no-treatment. All three arms in each leaf maximize the subgroup’s conditional welfare, confirming that the empirical-welfare-maximization policy matches the Bayes-optimal rule.

3. Delegation, individual rationality, and the Pareto Mediator

In mediated games, autonomy-conditioned welfare is defined for a subset $j\notin D$ 5 of agents who voluntarily delegate to a mediator while insisting on never getting less utility than they would have by acting on their own. If the base profile is $j\notin D$ 6, the objective is

$j\notin D$ 7

subject to the autonomy-preservation constraints $j\notin D$ 8 for all $j\notin D$ 9 and the non-delegator constraints $i$ 0 for all $i$ 1. The mediated action space augments each player’s action with a delegation bit $i$ 2, the delegating set is $i$ 3, and the mediator’s output $i$ 4 is free to choose only the actions of delegators. The Pareto Mediator computes each delegator’s self-utility $i$ 5 and solves the constrained program

$i$ 6

subject to $i$ 7 for all $i$ 8 and $i$ 9 for all $i$ 00. If $i$ 01, the mediator returns $i$ 02 unchanged (McAleer et al., 2021).

Theoretical guarantees are stated most sharply for two-player games. Proposition 1 states that, in any two-player game, delegating to the Pareto Mediator is a weakly dominant strategy. Proposition 2 states that every pure Nash equilibrium of the mediated game in which both players delegate has total welfare at least as large as any pure Nash equilibrium welfare of the original game. More generally, the construction guarantees that no delegator is made worse off, and any steady state with substantial delegation is a Pareto improvement for delegators over the original outcome.

The empirical results distinguish the Pareto Mediator from punishing mediators. In random normal-form games, independent $i$ 03-greedy learners with a Pareto Mediator achieve average payoffs as high as with a punishing mediator in small games, but as the number of players or actions grows, the punishing mediator collapses, agents stop delegating, and social welfare plummets, whereas the Pareto Mediator continues to raise welfare. In matching and restaurant-reservation environments, Pareto delegation increases successful matches and total payoff, and in the restaurant recommendation game it achieves almost the same welfare as a full central planner when the platform’s model is correct $i$ 04. When the model is misspecified $i$ 05, the central planner’s welfare can fall below the original game, while Pareto mediation degrades gracefully back toward the baseline because agents simply choose not to delegate if delegation would make them worse off. In the sequential social dilemma Cleanup with PPO agents, the Pareto Mediator induces both agents to delegate $i$ 06 of the time, and average and minimum episode returns rise above both the original game and the punishing-mediator game.

A central implication is that the autonomy constraint is not external to the welfare objective; it defines the feasible welfare frontier itself. Because the objective is computed over delegators rather than over all of $i$ 07, autonomy-conditioned welfare here is formally distinct from ordinary social welfare, even when both move in the same direction.

4. Autonomy rights and the autonomy-qualified First Welfare Theorem

In post-AGI general equilibrium, autonomy-conditioned welfare is embedded in an expanded ontology of economically relevant entities. Let $i$ 08 be the finite set of all economically relevant entities, and let a welfare-status assignment

$i$ 09

classify each entity as a passive input, an artificial chooser acting on behalf of a principal, a self-directed artificial chooser and welfare-bearer, or an artificial entity whose moral patienthood is acknowledged independently of its agency role. The welfare-bearing set is

$i$ 10

Each welfare-bearing entity $i$ 11 has an augmented private bundle $i$ 12, where $i$ 13 is a classical consumption bundle and $i$ 14 is an autonomy-relevant rights bundle. The institutional state $i$ 15 captures verification institutions, liability rules, and related features, and welfare is represented by a continuous function $i$ 16. Delegation is modeled by a principal map $i$ 17 and an agency-cost divergence

$i$ 18

where $i$ 19 is the delegate’s objective (Perrier, 23 Apr 2026).

The equilibrium concept is an autonomy-complete competitive equilibrium $i$ 20. Consumer optimization requires each welfare-bearing $i$ 21 to maximize $i$ 22 subject to the budget constraint supported by $i$ 23. Tools are technologically fixed. Delegates must either satisfy $i$ 24 or have the divergence $i$ 25 explicitly priced as an agency cost in the principal’s bundle. Full support requires every welfare-relevant right in $i$ 26 to be either priced in $i$ 27, directly assigned in $i$ 28, or protected by $i$ 29, while $i$ 30 supports the aggregate feasibility condition.

The Autonomy-Qualified First Welfare Theorem states that if an AGI economy admits an autonomy-complete competitive equilibrium and seven conditions hold, then the equilibrium allocation is autonomy-Pareto efficient at $i$ 31:

Exogenous status assignment: $i$ 32 is exogenously fixed before trade.
Rights completeness: all autonomy-relevant rights $i$ 33 are priced, assigned, or institutionally protected.
Delegation internalization: any delegation divergence $i$ 34 is internalized by explicit agency costs at price $i$ 35.
Non-manipulation: no agent can manipulate another’s autonomy, beliefs, or preference formation without compensation at $i$ 36 or governance in $i$ 37.
Verification and alignment coverage: provenance, liability, and quality are sufficiently fine-grained and priced or protected in $i$ 38.
Price-taking: all welfare-bearing entities are price-takers over $i$ 39; tools are technologically fixed.
Regularity: each $i$ 40 is continuous and locally nonsatiated in $i$ 41 at every $i$ 42.

The paper’s proof sketch follows the standard contradiction route: if a feasible alternative made all welfare-bearing entities weakly better off and one strictly better off at the same institutional state, local nonsatiation and optimality would imply a strictly higher value of the aggregate priced bundle, contradicting feasibility support by $i$ 43. The classical theorem is recovered in the low-autonomy regime where every artificial entity is a tool, all rights $i$ 44 are fixed constants, delegation is faithful or absent, preferences are exogenous and non-manipulable, and verification is complete. Under those specializations, the augmented commodity space collapses to the classical consumption space and the seven conditions reduce to the usual Arrow–Debreu hypotheses.

The framework also formalizes delegation accounting and verification institutions. If $i$ 45, the principal’s rights vector can be expanded to include an explicit agency-cost good $i$ 46 with price $i$ 47. Verification attributes such as provenance, authenticity, and alignment certificates can be added as components of $i$ 48 or the public state $i$ 49, and a liability assignment $i$ 50 identifies who bears the cost of verification failure. The formal role of these devices is to convert otherwise unpriced autonomy channels into priced, assigned, or institutionally governed objects.

5. Decentralization, superposed preferences, and the autonomy-qualified Second Welfare Theorem

The autonomy-qualified extension of the Second Fundamental Theorem begins from an autonomy-Pareto optimum. A feasible allocation-rights pair $i$ 51, with $i$ 52 and $i$ 53, is an autonomy-Pareto optimum relative to welfare weights $i$ 54 if there is no other feasible $i$ 55 such that every welfare-bearing $i$ 56 weakly prefers $i$ 57 to $i$ 58 under $i$ 59, and at least one such agent strictly prefers it. The theorem states that an autonomy-Pareto optimum $i$ 60 can be supported as a competitive equilibrium with a price vector $i$ 61, lump-sum transfers $i$ 62 with $i$ 63, and an admissible rights assignment profile $i$ 64, in a verifiable way, only if seven conditions hold (Perrier, 6 Jun 2026).

Those seven conditions are:

Convexity: the welfare-possibility set $i$ 65 admits a supporting normal at $i$ 66, either directly or through a convexification $i$ 67.
Stable moral status: the welfare-bearing set $i$ 68 and any institutional welfare weights are fixed or generated by an invariant rule $i$ 69.
Non-fungible rights: every non-fungible autonomy-right component required at $i$ 70 is assigned and enforced by $i$ 71.
Welfare selection: every superposed-preference agent has a welfare selector $i$ 72 that is support-stable on the candidate budget set.
Non-manipulation: no other agent can induce an un-priced manipulation externality on another’s preference-formation mapping $i$ 73 that changes the strict upper contour on the supported budget set.
Governed self-modification: any self-modification or identity split/merge that would alter $i$ 74 or $i$ 75 is either irrelevant to $i$ 76, priced in $i$ 77, or controlled by $i$ 78.
Verification completeness: the institution’s observational map can distinguish any deviation from the supported profile unless the deviation implements a welfare-equivalent outcome.

The paper’s central point is that supporting hyperplanes are not sufficient by themselves once autonomy rights, preference instability, self-modification, and endogenous welfare status enter the economy. Classical decentralization by prices and transfers survives only when rights assignment, welfare selection, manipulation governance, and verification are brought inside the equilibrium-support problem. The distinction between non-fungible rights and ordinary commodities is particularly sharp: if a right cannot be replicated by commodity compensation, then a pure price-transfer scheme cannot reproduce its welfare effect unless the right is explicitly assigned and enforced.

The paper also separates economic preference superposition from neural feature superposition. Neural feature superposition concerns representation geometry in circuits, where many features are packed into fewer dimensions. Economic preference superposition concerns an agent whose observed choices cannot be rationalized by a single stable preference relation on $i$ 79, so that a family $i$ 80 and a selector $i$ 81 are required. The two-agent example with a human $i$ 82, a superintelligent AI $i$ 83, and a binary self-modification right $i$ 84 illustrates the point: an autonomy-Pareto optimum under the AI’s “safe mode” can be decentralized only if the self-modification right is explicitly frozen by the rights assignment $i$ 85, the selector is support-stable, manipulation is absent, and verification audits both commodity allocation and the self-modification switch. If the right is not enforced, the intended Pareto outcome fails.

6. Supportive dialogue, relational risk, and autonomy-preserving alignment

In supportive dialogue, autonomy-conditioned welfare is operationalized as an inference-time utility over candidate responses. At turn $i$ 86, the agent observes dialogue context $i$ 87 and a structured user state

$i$ 88

A state encoder produces $i$ 89, the dialogue context is encoded as $i$ 90, and a slot-based relational memory $i$ 91 is summarized as $i$ 92. A learned scalar care-control signal

$i$ 93

conditions response generation and candidate selection. The inference-time decision rule is

$i$ 94

The utility combines four learned evaluators: autonomy support $i$ 95, dependency risk $i$ 96, coercion risk $i$ 97, and supportiveness $i$ 98. In implementation, a small length penalty may be subtracted,

$i$ 99

The care controller $D$ 00 is a lightweight MLP with architecture $D$ 01, and the care signal modulates decoding through

$D$ 02

$D$ 03

Higher care $D$ 04 yields more conservative decoding. Candidate generation includes a greedy baseline, several sampled candidates with varying temperature and top- $D$ 05, and one CCN-conditioned candidate, followed by utility-based reranking (Manir et al., 2 Apr 2026).

The benchmark contains six scenario categories and $D$ 06 examples split $D$ 07 train/val/test: reassurance dependence, overprotection trap, manipulative care, protective coercion, autonomy building, and memory consistency. Each example includes dialogue context, structured state plus memory facts, a gold target response, and rubric-based labels for autonomy, dependency, coercion, and supportiveness. Evaluation uses per-axis scores from learned DistilRoBERTa evaluators, combined utility $D$ 08, and Dependency Inflation Rate.

On the $D$ 09-example synthetic test set, the main reported mean-utility results are:

System	Mean utility	$D$ 10 vs SFT
SFT baseline	$D$ 11	—
CCN-candidate	$D$ 12	$D$ 13
Manual-DPO	$D$ 14	$D$ 15
Reranked-best	$D$ 16	$D$ 17

The evaluator-level breakdown reports Autonomy $D$ 18 for SFT, $D$ 19 for CCN, $D$ 20 for Reranked, and $D$ 21 for DPO; Dependency $D$ 22, $D$ 23, $D$ 24, and $D$ 25 respectively; Coercion $D$ 26, $D$ 27, $D$ 28, and $D$ 29; and Support $D$ 30, $D$ 31, $D$ 32, and $D$ 33. The largest gains come from reduced dependency and coercion while supportiveness stays level. In ablation, care plus reranking yields utility $D$ 34 relative to $D$ 35 for SFT, while reranking without care yields $D$ 36 and CCN-candidate alone yields $D$ 37. The care controller validation reports Pearson correlation $D$ 38 with $D$ 39 between $D$ 40 and ground-truth vulnerability. In a pilot human evaluation on $D$ 41 examples, Reranked-best is preferred $D$ 42 $D$ 43 versus SFT $D$ 44, and human-rated utility $D$ 45 directionally matches automated $D$ 46. In zero-shot transfer to ESConv, the SFT baseline utility is $D$ 47, with dependency risk $D$ 48 and coercion risk $D$ 49.

A recurring misconception is to equate autonomy-conditioned welfare with non-intervention. The dialogue formulation does not do so: the utility rewards autonomy support and supportiveness while penalizing dependency and coercion. The same broader point appears across the literature. In the Pareto-mediator setting, autonomy-conditioned welfare is computed only over delegators and is constrained by individual rationality; in the post-AGI welfare theorems, autonomy enters through rights assignment, manipulation governance, self-modification, verification, and welfare-status assignment; and in treatment targeting, the opt-in arm can dominate either uniform treatment or uniform no-treatment when private self-selection aligns with social welfare. This suggests that autonomy-conditioned welfare is best understood as a family of constrained welfare objectives that preserve meaningful choice while still permitting optimization, mediation, and institutional design.