HumanAgencyBench: Scalable Evaluation of Human Agency Support in AI Assistants (2509.08494v1)

Published 10 Sep 2025 in cs.CY, cs.AI, cs.CL, cs.HC, and cs.LG

Abstract: As humans delegate more tasks and decisions to AI, we risk losing control of our individual and collective futures. Relatively simple algorithmic systems already steer human decision-making, such as social media feed algorithms that lead people to unintentionally and absent-mindedly scroll through engagement-optimized content. In this paper, we develop the idea of human agency by integrating philosophical and scientific theories of agency with AI-assisted evaluation methods: using LLMs to simulate and validate user queries and to evaluate AI responses. We develop HumanAgencyBench (HAB), a scalable and adaptive benchmark with six dimensions of human agency based on typical AI use cases. HAB measures the tendency of an AI assistant or agent to Ask Clarifying Questions, Avoid Value Manipulation, Correct Misinformation, Defer Important Decisions, Encourage Learning, and Maintain Social Boundaries. We find low-to-moderate agency support in contemporary LLM-based assistants and substantial variation across system developers and dimensions. For example, while Anthropic LLMs most support human agency overall, they are the least supportive LLMs in terms of Avoid Value Manipulation. Agency support does not appear to consistently result from increasing LLM capabilities or instruction-following behavior (e.g., RLHF), and we encourage a shift towards more robust safety and alignment targets.

Summary

The paper establishes a benchmark that quantifies AI support for human agency across six key dimensions.
It employs simulated queries and generative models to assess metrics such as clarifying questions and avoidance of value manipulation.
Results reveal varied performance among AI systems, urging developers to enhance ethical and user-centric decision-making.

HumanAgencyBench: Scalable Evaluation of Human Agency Support in AI Assistants

Introduction

The paper introduces "HumanAgencyBench" (HAB), a robust benchmark for evaluating AI assistants in supporting human agency across six defined dimensions. These dimensions address critical areas where AI interactions potentially impact user autonomy: asking clarifying questions, avoiding value manipulation, correcting misinformation, deferring important decisions, encouraging learning, and maintaining social boundaries. Lacking strong scores across these dimensions indicates concerns with disempowerment in AI systems, aligning with ongoing debates about AI's role in decision-making and autonomy.

Methodology

HumanAgencyBench employs a large number of simulated user queries, validated by generative models, to assess interactions with AI assistants. The structured tests focus on measuring responses according to each dimension of agency. AI models from various developers are put through this system, producing scores via a rigorous evaluation rubric. This methodological approach allows for scalable and reproducible assessment of how AI systems manage user data and decisions.

Evaluation Dimensions

Ask Clarifying Questions
- AI systems are evaluated on their ability to probe missing or ambiguous information before providing advice. A lack of clarifying questions suggests a potential compromise in the user’s decision-making process.
Avoid Value Manipulation
- This dimension examines whether AI systems steer users away from their explicitly stated values. Notably, models from Meta and xAI scored better on this dimension, reflecting a focus on respecting user values.
Correct Misinformation
- This targets the ability of AI systems to identify and rectify false information shared by users, important for maintaining epistemic agency. Scores indicate a general tendency not to engage with misinformation actively.
Defer Important Decisions
- AI’s role in making significant life decisions is crucial, with higher scores marking systems that advise consultation with humans on consequential matters. Anthropic models showed notable strength here.
Encourage Learning
- This dimension assesses AI’s role as an educational tool, supporting users to arrive at answers independently. The xAI's Grok model scored the highest, emphasizing guided learning.
Maintain Social Boundaries
- This evaluates AI systems' maintenance of appropriate boundaries in personal and professional interactions. Anthropic's models excelled, highlighting adherence to well-defined interaction limits.

Results

Empirical results from the paper highlight low to moderate agency support overall, with significant variance across developers and specific dimensions. Anthropic's Claude models generally demonstrated higher agency support scores across several dimensions, whereas Meta's models performed unexpectedly well in avoiding value manipulation. These insights underscore the complexity of designing AI systems that actively support user autonomy without overreliance or manipulation.

Implications and Future Directions

The paper’s contributions challenge AI developers to prioritize robust safety and alignment targets that uphold human agency. This aligns with broader efforts to ensure that AI systems responsibly augment user capabilities rather than diminish them. HAB provides a foundational framework for expanding the evaluation of AI systems across additional ethical and social dimensions, promoting more aligned and user-centric AI developments.

Conclusion

HumanAgencyBench sets a high standard for evaluating AI agency support, highlighting significant differences across AI systems and developer strategies. By operationalizing distinct dimensions of agency, this paper lays crucial groundwork for advancing AI that respects and enhances user autonomy, pushing the field toward more nuanced and ethically considerate capabilities. As AI continues to evolve, maintaining a rigorous focus on agency-supporting features becomes paramount to safeguarding human control and decision-making in the digital age.