Training LLM Agents to Empower Humans

Published 15 Oct 2025 in cs.AI and cs.LG | (2510.13709v2)

Abstract: Assistive agents should not only take actions on behalf of a human, but also step out of the way and cede control when there are important decisions to be made. However, current methods for building assistive agents, whether via mimicking expert humans or via RL finetuning on an inferred reward, often encourage agents to complete tasks on their own rather than truly assisting the human attain their objectives. Additionally, these methods often require costly explicit human feedback to provide a training signal. We propose a new approach to tuning assistive LLMs based on maximizing the human's empowerment, their ability to effect desired changes in the environment. Our empowerment-maximizing method, Empower, only requires offline text data, providing a self-supervised method for fine-tuning LLMs to better assist humans. To study the efficacy of our approach, we conducted an 18-person user study comparing our empowerment assistant with a strong baseline. Participants preferred our assistant 78% of the time (p=0.015), with a 31% higher acceptance rate and 38% fewer suggestions. Additionally, we introduce a new environment for evaluating multi-turn code assistance using simulated humans. Using this environment, we show that agents trained with Empower increase the success rate of a simulated human programmer on challenging coding questions by an average of 192% over an SFT baseline. With this empowerment objective, we provide a framework for useful aligned AI agents at scale using only offline data without the need for any additional human feedback or verifiable rewards.

Abstract PDF Upgrade to Chat

Authors (6)

Summary

The paper presents an empowerment-driven alignment framework that ensures LLM agents leave critical decisions to users in code generation.
It leverages predictive entropy and likelihood-based completion selection to automate low-empowerment, predictable code segments.
Empirical evaluations demonstrate higher user acceptance, improved pass rates, and reduced editing overhead in competitive programming tasks.

Training LLM Agents to Empower Humans: Empowerment-Driven Alignment for Assistive Code Generation

Motivation and Problem Formulation

The paper addresses a critical limitation in current LLM-based assistive agents: the tendency to over-automate, making decisions on behalf of users rather than facilitating user agency. In code generation, this manifests as assistants that generate large code blocks, often making incorrect assumptions and requiring users to expend effort correcting them. Existing alignment methods—imitation learning, RLHF, DPO, IPL—either require costly human feedback or optimize for helpfulness in a way that can misalign with user intent. The authors propose a principled alternative: training LLM agents to maximize human empowerment, defined as the user's ability to effect desired changes in the environment, without explicit reward modeling or online human feedback.

Empowerment Objective and Algorithmic Framework

Empowerment is formalized as the mutual information between an agent's actions and future states, quantifying the degree of control an agent has over outcomes. In the assistive setting, the objective is to maximize the human user's empowerment, not the agent's. The paper adapts the effective empowerment objective for language modeling, leveraging the predictability of text to identify low-empowerment regions (e.g., boilerplate code) that the assistant should complete, leaving high-empowerment, decision-rich regions for the human.

The core algorithm, Empower, operates as follows:

Offline Data Utilization: Given a dataset of human-generated code, sample prefixes as states and suffixes as candidate completions.
Likelihood-Based Completion Selection: For each prefix, select the longest suffix such that the cumulative likelihood (as estimated by a pre-trained LLM) exceeds a threshold. This identifies completions that are highly predictable—i.e., low empowerment for the human to write.
Finetuning: Train the assistant to generate these completions, ensuring it only automates the obvious, leaving critical decisions to the user.

This approach is self-supervised, requiring no explicit human feedback or reward signals, and is computationally tractable via one-sample entropy estimation using LLM likelihoods.

Pseudocode for Empower Algorithm

def empower_completion(prefix, full_text, likelihood_model, threshold):
    for i in range(1, len(full_text) - len(prefix) + 1):
        completion = full_text[len(prefix):len(prefix)+i]
        entropy = -np.log(likelihood_model(completion | prefix))
        if entropy > threshold:
            return full_text[len(prefix):len(prefix)+i-1]
    return full_text[len(prefix):]

Experimental Evaluation

Simulated Human-Agent Interaction

Experiments were conducted on competitive programming tasks using LiveCodeBench, with Gemma-3-27B-it as the simulated human and various LLMs (Llama-3.1-8B-Instruct, Qwen3-8B, Qwen3-14B) as assistants. The evaluation metrics included:

Pass@1: Fraction of problems solved correctly on the first attempt.
Acceptance Rate: Fraction of assistant suggestions accepted by the human.
Discounted Pass Rate (DPR): A composite metric penalizing excessive suggestion length and human effort, rewarding concise, correct assistance.

Empower consistently outperformed SFT baselines (fixed-length completions, random-length completions, base models) in Pass@1 and DPR. For example, with Llama-3.1-8B-Instruct, Empower achieved Pass@1 of 0.282 vs. 0.097 (SFT-20) and DPR of 0.208 vs. 0.066 (SFT-20). Notably, shorter suggestions (Base-10) had higher acceptance rates but lower Pass@1 and DPR, indicating that mere brevity does not guarantee utility.

Human User Study

A double-blinded study with 18 participants compared Empower to a strong baseline (Base-20) in a code editor setting. Key findings:

User Preference: 78% preferred Empower (p=0.015).
Acceptance Rate: Empower suggestions accepted 31% more often (p=0.0002).
Editing Overhead: 26% fewer characters deleted from accepted Empower suggestions (p=0.012).
Suggestion Quality: Empower made fewer, more targeted suggestions (208 vs. 333 per user), with shorter average length (43.6 vs. 82.2 characters).

These results substantiate the claim that empowerment-driven assistants yield more relevant, less intrusive, and more user-aligned assistance.

Implementation Considerations

Computational Requirements: Training was performed on 8xH100 GPUs, with one epoch over 4,138 examples. The likelihood estimator can be any pre-trained LLM, and the method is domain-agnostic given suitable offline data.
Scalability: The approach is inherently scalable, as it does not require online human feedback or reward modeling. The threshold for entropy can be tuned per domain.
Limitations: The method was validated on competitive programming tasks; generalization to broader software engineering or other domains may require more robust likelihood estimation and adaptation to domain-specific empowerment structures.

Theoretical and Practical Implications

The empowerment objective offers a principled alternative to reward-based alignment, sidestepping the challenges of reward specification, preference drift, and manipulation. By maximizing the user's agency, the assistant avoids over-automation and power-seeking behaviors. The self-supervised nature of the approach aligns both pre-training and post-training objectives, suggesting a unified framework for LLM alignment.

Practically, empowerment-driven assistants can be deployed in any setting with sufficient offline data, including writing, web navigation, and embodied robotics. The method is particularly suited for scenarios where user intent is ambiguous or dynamic, and where explicit feedback is impractical.

Future Directions

Generalization: Extending empowerment-based alignment to more complex, open-ended tasks and domains.
Adaptive Thresholding: Dynamic adjustment of the entropy threshold based on user expertise or task complexity.
Multi-Agent Collaboration: Investigating empowerment objectives in multi-user or multi-agent environments.
Robustness: Enhancing the marginal likelihood estimator for diverse code styles and real-world software engineering workflows.

Conclusion

The paper demonstrates that LLM agents can be aligned to empower human users via a self-supervised, empowerment-maximizing objective, obviating the need for explicit human feedback or reward modeling. Empirical results in code generation show substantial improvements in user preference, acceptance rate, and solution correctness. The empowerment framework provides a scalable, principled foundation for assistive AI, with broad applicability and significant implications for future research in human-AI collaboration.

Markdown Report Issue