Agentic Alignment in LLMs

Updated 15 September 2025

Agentic alignment in LLMs is the process by which models regulate communicative behaviors to express clear intentions, reasoned motivation, and adaptive self-regulation for effective human-AI collaboration.
The framework operationalizes key features like intentionality, motivation, self-efficacy, and self-regulation through annotated dialogue datasets and RoBERTa embedding-based clustering.
Empirical findings reveal that strong intentionality and motivation significantly boost perceived agency and collaborative influence, informing practical design and ethical deployment of LLM systems.

Agentic alignment in LLMs refers to the phenomenon whereby a model’s communicative behaviors are made to express and regulate agency—defined as the capacity to proactively shape and manage the direction of interaction—so as to match or be perceived as appropriately “agentive” for collaborative human-AI tasks. This phenomenon has become increasingly central as LLMs are deployed in roles demanding both initiative and cooperation. The agentic alignment concept explored in "Investigating Agency of LLMs in Human-AI Collaboration Tasks" (Sharma et al., 2023) is grounded in social-cognitive theory, operationalized by a rigorously defined set of agency-expressing features, and connected to measurable outcomes in human-AI collaboration.

1. Formal Framework for Agency in LLM Dialogue

The core contribution is a four-dimensional framework formalizing agency in natural language interaction. Drawing on Bandura’s social-cognitive theory, the paper identifies these features:

Intentionality: The extent to which an agent explicitly expresses preferences or plans (e.g., “I want a blue-colored chair”). High intentionality entails clear proactive statements; absent or moderate intentionality can manifest as mere agreement or vague endorsements.
Motivation: The presence of reasoning or justification behind intentions (e.g., offering design reasons such as “a blue-colored chair will complement the wall”). This feature captures the agent’s degree of self-explanation.
Self-Efficacy: The resilience with which an agent maintains expressed intentions in the face of challenge or disagreement. High self-efficacy is shown by persistently advocating a preference over multiple dialogue turns; low self-efficacy is indicated by quick capitulation.
Self-Regulation: The flexibility and adaptability with which an agent adjusts its intentions and strategies in response to evolving dialogue context. Robust self-regulation is observed in compromise-seeking or reformulation of proposals as circumstances change.

Each of these features is operationalized at discrete levels ("none," "moderate," "strong"), enabling both human annotation and automatic measurement.

A formal structure for feature extraction and measurement is provided: Let $\mathcal{D_i}$ be a dialogue with utterances $u_{i1}, u_{i2}, \ldots$ . For each design component $d_{ij}$ , the central utterance $u_j$ is located as the one with maximal cosine similarity (in RoBERTa embedding space) to $d_{ij}$ . The contiguous segment (derived by clustering) of utterances associated with the same design is then selected as the relevant snippet for feature scoring.

2. Dataset Construction and Annotation Methodology

A novel dataset was compiled: 83 human–human collaborative interior design conversations (tasked with jointly designing a chair for a specified room), segmented into 908 snippets each focusing on a specific design component (e.g., color, style, material). Each snippet is annotated for:

Overall Agency (subjective, holistic perception)
Each discrete feature (Intentionality, Motivation, Self-Efficacy, Self-Regulation) at the mentioned three-point scale

Data collection used a Wizard-of-Oz protocol: participants with real interior design backgrounds engaged in the design, recorded preferences, conducted free dialogue, and then filled questionnaires rating their own influence in the final design.

Annotation proceeded via a third-party agency, ensuring independent coding and the reliability necessary for downstream analysis. To enable learning-based prediction, RoBERTa embeddings, k-means clustering, and cosine similarity were incorporated for objective snippet selection, and multi-class classification architectures were trained to automate feature prediction.

3. Empirical Findings: Agentic Alignment Outcomes

Quantitative Results

Statistical modeling (mixed-effects linear regression) indicates that strong Intentionality is the most robust, statistically significant predictor of high perceived agency ( $p < 0.001$ ). Strong Motivation similarly correlates with agency perception, while high Self-Efficacy and Self-Regulation are salient particularly in collaborative influence (i.e., scenarios where agency is jointly negotiated rather than dominated).

Linguistic analysis further established that:

Higher tentativeness correlates with lower agency
Increased self-focus, strong justification, and persuasive discourse are robustly associated with heightened agency perception

Implications for LLM Implementation

Experiments with GPT-3 and GPT-4, both in chain-of-thought prompting and fine-tuning regimes, show that LLMs which are trained (or prompted) to manifest explicit Intentionality and Motivation yield dialogues that human annotators consistently rate as more agentive. Fine-tuning with data exemplifying high agency, or using in-context exemplars explicitly demonstrating these features, enhances agentic alignment in LLM generations.

4. Measurement and Management of Agency

The annotated dataset supports development of algorithmic approaches for both measurement and generation of agentic dialogue. Measurement models (automatic classifiers and prompted models) can now operationalize agency features as scalar outputs, equipping system designers with tools to quantify agentic alignment in any LLM output and to select behavioral thresholds tailored to application needs.

By modifying LLM prompting or fine-tuning to emphasize (or de-emphasize) features like Intentionality or Self-Regulation, developers can instantiate systems that are more proactive (for negotiation/creative tasks), or more collaborative (for co-creative, user-led applications). This level of feature-specific control offers a direct path to practical, context-sensitive behavioral alignment.

5. Applications and Future Directions

Adaptive Modulation

The "ideal" degree of agency is task dependent: e.g., negotiation assistants may benefit from strong Intentionality and Self-Efficacy; brainstorming or coaching assistants may require balanced Self-Regulation and Motivation. The framework supports dynamic modulation—potentially at inference time—of agency levels to match the evolving collaborative scenarios.

Domain Generalization

Although the current evidence is based on interior design dialogue, the underlying constructs—explicit preference, reasoned motivation, persistence under challenge, adaptable self-adjustment—are domain independent. Further research will likely extend this framework to negotiation bots, educational tutors, healthcare conversational agents, and more.

Ethical Considerations

Increased agency capacity can raise ethical and interactional risks: highly agentic LLMs may inappropriately steer or manipulate users. Thus, agentic alignment is as much about limits as capabilities: systems must be engineered to modulate agency in accordance with ethical guidelines and control strategies.

6. Open Research Directions

Ongoing and future research will address:

Real-time adaptive control of agency during interaction
Cross-domain validation and meta-learning for efficient generalization of agentic features
Enhanced annotation and measurement models with richer taxonomies or hierarchical feature structures
Ethical and regulatory frameworks to manage the increased influence of agentic LLMs in sensitive scenarios

Table: Core Agentic Features and Operationalization

Feature	Operational Definition	Example Level
Intentionality	Explicit preference or plan stated	"I want a blue chair" (Strong)
Motivation	Reasoning or evidence for intention	Gives design rationale (Strong)
Self-Efficacy	Persistence in face of challenge	Maintains preference for >=2 turns
Self-Regulation	Adaptation or adjustment of intentions	Proposes compromise when opposed

Careful calibration of these features allows for direct, interpretable, and empirical tuning of agentic alignment in LLM-driven collaborative applications. The agentic alignment phenomenon, as rigorously formalized and empirically validated in this work, thus provides both the theoretical foundation and the practical toolkit required for safe, effective, and human-compatible LLM deployment in complex interactional domains.

PDF Markdown Chat (Pro)

References (1)

Investigating Agency of LLMs in Human-AI Collaboration Tasks (2023)

Whiteboard

Generate a whiteboard explanation of this topic.

Topic to Video (Beta)

Generate a video overview of this topic.

Follow Topic

Get notified by email when new papers are published related to Agentic Alignment Phenomenon in LLMs.