Autonomy-Conditioned Welfare Criteria
- Autonomy-Conditioned Welfare is a family of welfare criteria that evaluates outcomes based on autonomy, delegation, and voluntary participation rather than simply maximizing payoff.
- The framework is demonstrated across multiple domains—including treatment assignment, Pareto mediation, and post-AGI general equilibrium—employing both theoretical models and empirical validations.
- Empirical applications, such as energy-saving RCTs, reveal that policies incorporating autonomy can substantially boost welfare gains while preserving individual rationality and choice.
Searching arXiv for the cited papers to ground the article in the current record. {"query":"(Ida et al., 2021) Paternalism, Autonomy, or Both? Experimental Evidence from Energy Saving Programs", "max_results": 5} {"query":"(McAleer et al., 2021) Improving Social Welfare While Preserving Autonomy via a Pareto Mediator", "max_results": 5} {"query":"(Perrier, 23 Apr 2026) Post-AGI Economies: Autonomy and the First Fundamental Theorem of Welfare Economics", "max_results": 5} {"query":"autonomy-conditioned welfare", "max_results": 10} Autonomy-conditioned welfare is a class of welfare criteria in which the assessment of outcomes is conditioned on autonomy, delegation, rights, or voluntary participation rather than on unconstrained welfare maximization alone. In the literature summarized here, the term denotes the planner’s expected gain from offering an opt-in arm in a three-arm treatment-assignment problem (Ida et al., 2021), the maximum sum of delegators’ utilities subject to individual-rationality or autonomy-preservation constraints in mediated games (McAleer et al., 2021), welfare over consumption, autonomy-relevant rights, and institutional state in post-AGI general equilibrium (Perrier, 23 Apr 2026, Perrier, 6 Jun 2026), and a dialogue-time utility that rewards autonomy support and helpfulness while penalizing dependency and coercion (Manir et al., 2 Apr 2026). Taken together, these formulations suggest that autonomy-conditioned welfare is not a single canonical functional, but a family of welfare objectives in which the value of an outcome depends on how choice is exercised, delegated, protected, or institutionally stabilized.
1. Formal scope of the concept
The concept appears in several distinct formal environments. In targeted treatment assignment, autonomy-conditioned welfare is
the planner’s expected gain from offering individual the opt-in arm rather than compulsory no-treatment. In voluntary mediation, autonomy-conditioned welfare for delegators at base profile is
subject to for all and for all . In post-AGI general equilibrium, each welfare-bearing entity has a continuous autonomy-conditioned welfare function
0
written as 1, so that welfare depends jointly on consumption 2, autonomy-relevant rights 3, and institutional regime 4. In supportive dialogue, the response-level utility is
5
with 6 (Ida et al., 2021, McAleer et al., 2021, Perrier, 23 Apr 2026, Manir et al., 2 Apr 2026).
Despite their heterogeneity, these definitions share a structural pattern. Welfare is conditioned on an autonomy variable that cannot be reduced to ordinary payoff alone: self-selection into treatment, voluntary delegation to a mediator, assignment of autonomy-rights, or relational risks such as dependency and coercion. This suggests that the term functions as a design principle for constrained welfare analysis rather than as a domain-specific technicality.
2. Three-arm policy design and empirical welfare maximization
In the treatment-assignment framework, the population is indexed by 7, each individual has observable pre-treatment covariates 8, and there are three arms of intervention: 9 compulsory treatment, 0 compulsory no-treatment, and 1 opt-in. Let 2 denote individual 3’s welfare contribution if assigned to arm 4, for 5. An assignment policy 6 is a measurable partition of 7 into three disjoint sets 8 with 9. The planner’s utilitarian social welfare is
0
and the optimal policy solves
1
Autonomy enters through the opt-in arm. If 2 denotes 3’s choice under arm 4, then under a natural exclusion restriction,
5
The framework defines three conditional average welfare differences,
6
7
8
which reduce the pointwise comparison of the three arm-specific conditional means to a comparison among 9, 0, and 1. The Bayes-optimal policy is
2
3
4
Under unconfoundedness by design, subgroup treatment effects are estimated from the three-arm RCT using simple differences in sample means within each arm and subgroup 5, together with the estimated take-up probability 6. For takers and non-takers, the paper uses instrumental-variables logic to estimate
7
and
8
Empirical welfare maximization is then implemented over a low-complexity policy class 9, such as decision trees of fixed depth 0, by exhaustive search at depth 1 or a two-step heuristic at depth 2. To correct “winner’s curse” bias, synthetic outcomes are generated by permuting residuals from a flexible first-stage fit and re-evaluating optimized welfare on the resulting pseudo-samples (Ida et al., 2021).
In the Japan energy-saving RCT, the reported welfare comparisons are as follows:
| Policy or benchmark | Estimated welfare | Note |
|---|---|---|
| Uniform no-treatment | 3 | by definition |
| Uniform treatment | 4 JPY | 5 |
| Uniform opt-in | 6 JPY | 7 |
| Optimal paternalistic policy 8 | 9 JPY | 95% CI excludes 0 |
| Optimal mixed policy 1 | 2 JPY | 95% CI excludes 3 |
Compared to uniform treatment, 4 is up by 5 JPY and 6 by 7 JPY. Compared to uniform opt-in, 8 is up by 9 JPY and 0 by 1 JPY. Compared to 2, 3 is up by 4 JPY. The mechanism analysis for 5-defined subgroups reports 6 JPY and 7 JPY in 8, implying force-treat; 9 JPY and 0 JPY in 1, implying opt-in; and 2 JPY and 3 JPY in 4, implying no-treatment. All three arms in each leaf maximize the subgroup’s conditional welfare, confirming that the empirical-welfare-maximization policy matches the Bayes-optimal rule.
3. Delegation, individual rationality, and the Pareto Mediator
In mediated games, autonomy-conditioned welfare is defined for a subset 5 of agents who voluntarily delegate to a mediator while insisting on never getting less utility than they would have by acting on their own. If the base profile is 6, the objective is
7
subject to the autonomy-preservation constraints 8 for all 9 and the non-delegator constraints 0 for all 1. The mediated action space augments each player’s action with a delegation bit 2, the delegating set is 3, and the mediator’s output 4 is free to choose only the actions of delegators. The Pareto Mediator computes each delegator’s self-utility 5 and solves the constrained program
6
subject to 7 for all 8 and 9 for all 00. If 01, the mediator returns 02 unchanged (McAleer et al., 2021).
Theoretical guarantees are stated most sharply for two-player games. Proposition 1 states that, in any two-player game, delegating to the Pareto Mediator is a weakly dominant strategy. Proposition 2 states that every pure Nash equilibrium of the mediated game in which both players delegate has total welfare at least as large as any pure Nash equilibrium welfare of the original game. More generally, the construction guarantees that no delegator is made worse off, and any steady state with substantial delegation is a Pareto improvement for delegators over the original outcome.
The empirical results distinguish the Pareto Mediator from punishing mediators. In random normal-form games, independent 03-greedy learners with a Pareto Mediator achieve average payoffs as high as with a punishing mediator in small games, but as the number of players or actions grows, the punishing mediator collapses, agents stop delegating, and social welfare plummets, whereas the Pareto Mediator continues to raise welfare. In matching and restaurant-reservation environments, Pareto delegation increases successful matches and total payoff, and in the restaurant recommendation game it achieves almost the same welfare as a full central planner when the platform’s model is correct 04. When the model is misspecified 05, the central planner’s welfare can fall below the original game, while Pareto mediation degrades gracefully back toward the baseline because agents simply choose not to delegate if delegation would make them worse off. In the sequential social dilemma Cleanup with PPO agents, the Pareto Mediator induces both agents to delegate 06 of the time, and average and minimum episode returns rise above both the original game and the punishing-mediator game.
A central implication is that the autonomy constraint is not external to the welfare objective; it defines the feasible welfare frontier itself. Because the objective is computed over delegators rather than over all of 07, autonomy-conditioned welfare here is formally distinct from ordinary social welfare, even when both move in the same direction.
4. Autonomy rights and the autonomy-qualified First Welfare Theorem
In post-AGI general equilibrium, autonomy-conditioned welfare is embedded in an expanded ontology of economically relevant entities. Let 08 be the finite set of all economically relevant entities, and let a welfare-status assignment
09
classify each entity as a passive input, an artificial chooser acting on behalf of a principal, a self-directed artificial chooser and welfare-bearer, or an artificial entity whose moral patienthood is acknowledged independently of its agency role. The welfare-bearing set is
10
Each welfare-bearing entity 11 has an augmented private bundle 12, where 13 is a classical consumption bundle and 14 is an autonomy-relevant rights bundle. The institutional state 15 captures verification institutions, liability rules, and related features, and welfare is represented by a continuous function 16. Delegation is modeled by a principal map 17 and an agency-cost divergence
18
where 19 is the delegate’s objective (Perrier, 23 Apr 2026).
The equilibrium concept is an autonomy-complete competitive equilibrium 20. Consumer optimization requires each welfare-bearing 21 to maximize 22 subject to the budget constraint supported by 23. Tools are technologically fixed. Delegates must either satisfy 24 or have the divergence 25 explicitly priced as an agency cost in the principal’s bundle. Full support requires every welfare-relevant right in 26 to be either priced in 27, directly assigned in 28, or protected by 29, while 30 supports the aggregate feasibility condition.
The Autonomy-Qualified First Welfare Theorem states that if an AGI economy admits an autonomy-complete competitive equilibrium and seven conditions hold, then the equilibrium allocation is autonomy-Pareto efficient at 31:
- Exogenous status assignment: 32 is exogenously fixed before trade.
- Rights completeness: all autonomy-relevant rights 33 are priced, assigned, or institutionally protected.
- Delegation internalization: any delegation divergence 34 is internalized by explicit agency costs at price 35.
- Non-manipulation: no agent can manipulate another’s autonomy, beliefs, or preference formation without compensation at 36 or governance in 37.
- Verification and alignment coverage: provenance, liability, and quality are sufficiently fine-grained and priced or protected in 38.
- Price-taking: all welfare-bearing entities are price-takers over 39; tools are technologically fixed.
- Regularity: each 40 is continuous and locally nonsatiated in 41 at every 42.
The paper’s proof sketch follows the standard contradiction route: if a feasible alternative made all welfare-bearing entities weakly better off and one strictly better off at the same institutional state, local nonsatiation and optimality would imply a strictly higher value of the aggregate priced bundle, contradicting feasibility support by 43. The classical theorem is recovered in the low-autonomy regime where every artificial entity is a tool, all rights 44 are fixed constants, delegation is faithful or absent, preferences are exogenous and non-manipulable, and verification is complete. Under those specializations, the augmented commodity space collapses to the classical consumption space and the seven conditions reduce to the usual Arrow–Debreu hypotheses.
The framework also formalizes delegation accounting and verification institutions. If 45, the principal’s rights vector can be expanded to include an explicit agency-cost good 46 with price 47. Verification attributes such as provenance, authenticity, and alignment certificates can be added as components of 48 or the public state 49, and a liability assignment 50 identifies who bears the cost of verification failure. The formal role of these devices is to convert otherwise unpriced autonomy channels into priced, assigned, or institutionally governed objects.
5. Decentralization, superposed preferences, and the autonomy-qualified Second Welfare Theorem
The autonomy-qualified extension of the Second Fundamental Theorem begins from an autonomy-Pareto optimum. A feasible allocation-rights pair 51, with 52 and 53, is an autonomy-Pareto optimum relative to welfare weights 54 if there is no other feasible 55 such that every welfare-bearing 56 weakly prefers 57 to 58 under 59, and at least one such agent strictly prefers it. The theorem states that an autonomy-Pareto optimum 60 can be supported as a competitive equilibrium with a price vector 61, lump-sum transfers 62 with 63, and an admissible rights assignment profile 64, in a verifiable way, only if seven conditions hold (Perrier, 6 Jun 2026).
Those seven conditions are:
- Convexity: the welfare-possibility set 65 admits a supporting normal at 66, either directly or through a convexification 67.
- Stable moral status: the welfare-bearing set 68 and any institutional welfare weights are fixed or generated by an invariant rule 69.
- Non-fungible rights: every non-fungible autonomy-right component required at 70 is assigned and enforced by 71.
- Welfare selection: every superposed-preference agent has a welfare selector 72 that is support-stable on the candidate budget set.
- Non-manipulation: no other agent can induce an un-priced manipulation externality on another’s preference-formation mapping 73 that changes the strict upper contour on the supported budget set.
- Governed self-modification: any self-modification or identity split/merge that would alter 74 or 75 is either irrelevant to 76, priced in 77, or controlled by 78.
- Verification completeness: the institution’s observational map can distinguish any deviation from the supported profile unless the deviation implements a welfare-equivalent outcome.
The paper’s central point is that supporting hyperplanes are not sufficient by themselves once autonomy rights, preference instability, self-modification, and endogenous welfare status enter the economy. Classical decentralization by prices and transfers survives only when rights assignment, welfare selection, manipulation governance, and verification are brought inside the equilibrium-support problem. The distinction between non-fungible rights and ordinary commodities is particularly sharp: if a right cannot be replicated by commodity compensation, then a pure price-transfer scheme cannot reproduce its welfare effect unless the right is explicitly assigned and enforced.
The paper also separates economic preference superposition from neural feature superposition. Neural feature superposition concerns representation geometry in circuits, where many features are packed into fewer dimensions. Economic preference superposition concerns an agent whose observed choices cannot be rationalized by a single stable preference relation on 79, so that a family 80 and a selector 81 are required. The two-agent example with a human 82, a superintelligent AI 83, and a binary self-modification right 84 illustrates the point: an autonomy-Pareto optimum under the AI’s “safe mode” can be decentralized only if the self-modification right is explicitly frozen by the rights assignment 85, the selector is support-stable, manipulation is absent, and verification audits both commodity allocation and the self-modification switch. If the right is not enforced, the intended Pareto outcome fails.
6. Supportive dialogue, relational risk, and autonomy-preserving alignment
In supportive dialogue, autonomy-conditioned welfare is operationalized as an inference-time utility over candidate responses. At turn 86, the agent observes dialogue context 87 and a structured user state
88
A state encoder produces 89, the dialogue context is encoded as 90, and a slot-based relational memory 91 is summarized as 92. A learned scalar care-control signal
93
conditions response generation and candidate selection. The inference-time decision rule is
94
The utility combines four learned evaluators: autonomy support 95, dependency risk 96, coercion risk 97, and supportiveness 98. In implementation, a small length penalty may be subtracted,
99
The care controller 00 is a lightweight MLP with architecture 01, and the care signal modulates decoding through
02
03
Higher care 04 yields more conservative decoding. Candidate generation includes a greedy baseline, several sampled candidates with varying temperature and top-05, and one CCN-conditioned candidate, followed by utility-based reranking (Manir et al., 2 Apr 2026).
The benchmark contains six scenario categories and 06 examples split 07 train/val/test: reassurance dependence, overprotection trap, manipulative care, protective coercion, autonomy building, and memory consistency. Each example includes dialogue context, structured state plus memory facts, a gold target response, and rubric-based labels for autonomy, dependency, coercion, and supportiveness. Evaluation uses per-axis scores from learned DistilRoBERTa evaluators, combined utility 08, and Dependency Inflation Rate.
On the 09-example synthetic test set, the main reported mean-utility results are:
| System | Mean utility | 10 vs SFT |
|---|---|---|
| SFT baseline | 11 | — |
| CCN-candidate | 12 | 13 |
| Manual-DPO | 14 | 15 |
| Reranked-best | 16 | 17 |
The evaluator-level breakdown reports Autonomy 18 for SFT, 19 for CCN, 20 for Reranked, and 21 for DPO; Dependency 22, 23, 24, and 25 respectively; Coercion 26, 27, 28, and 29; and Support 30, 31, 32, and 33. The largest gains come from reduced dependency and coercion while supportiveness stays level. In ablation, care plus reranking yields utility 34 relative to 35 for SFT, while reranking without care yields 36 and CCN-candidate alone yields 37. The care controller validation reports Pearson correlation 38 with 39 between 40 and ground-truth vulnerability. In a pilot human evaluation on 41 examples, Reranked-best is preferred 42 43 versus SFT 44, and human-rated utility 45 directionally matches automated 46. In zero-shot transfer to ESConv, the SFT baseline utility is 47, with dependency risk 48 and coercion risk 49.
A recurring misconception is to equate autonomy-conditioned welfare with non-intervention. The dialogue formulation does not do so: the utility rewards autonomy support and supportiveness while penalizing dependency and coercion. The same broader point appears across the literature. In the Pareto-mediator setting, autonomy-conditioned welfare is computed only over delegators and is constrained by individual rationality; in the post-AGI welfare theorems, autonomy enters through rights assignment, manipulation governance, self-modification, verification, and welfare-status assignment; and in treatment targeting, the opt-in arm can dominate either uniform treatment or uniform no-treatment when private self-selection aligns with social welfare. This suggests that autonomy-conditioned welfare is best understood as a family of constrained welfare objectives that preserve meaningful choice while still permitting optimization, mediation, and institutional design.