Papers
Topics
Authors
Recent
Search
2000 character limit reached

Belief Dynamics Reveal the Dual Nature of In-Context Learning and Activation Steering

Published 1 Nov 2025 in cs.LG, cs.AI, cs.CL, and stat.ML | (2511.00617v1)

Abstract: LLMs can be controlled at inference time through prompts (in-context learning) and internal activations (activation steering). Different accounts have been proposed to explain these methods, yet their common goal of controlling model behavior raises the question of whether these seemingly disparate methodologies can be seen as specific instances of a broader framework. Motivated by this, we develop a unifying, predictive account of LLM control from a Bayesian perspective. Specifically, we posit that both context- and activation-based interventions impact model behavior by altering its belief in latent concepts: steering operates by changing concept priors, while in-context learning leads to an accumulation of evidence. This results in a closed-form Bayesian model that is highly predictive of LLM behavior across context- and activation-based interventions in a set of domains inspired by prior work on many-shot in-context learning. This model helps us explain prior empirical phenomena - e.g., sigmoidal learning curves as in-context evidence accumulates - while predicting novel ones - e.g., additivity of both interventions in log-belief space, which results in distinct phases such that sudden and dramatic behavioral shifts can be induced by slightly changing intervention controls. Taken together, this work offers a unified account of prompt-based and activation-based control of LLM behavior, and a methodology for empirically predicting the effects of these interventions.

Summary

  • The paper presents a unified Bayesian framework linking in-context learning and activation steering to predict and control large language model behavior.
  • It models in-context learning as Bayesian inference, revealing a sigmoidal pattern in belief updates with increasing context.
  • Activation steering is shown to adjust internal activation vectors through linear transformations, enabling predictable behavior changes.

Belief Dynamics Reveal the Dual Nature of In-Context Learning and Activation Steering

Introduction to Belief Dynamics in LLM Control

LLMs have become central to modern AI systems, capable of executing complex tasks with high precision. Controlling these models at inference time is critical for ensuring reliability and safety, particularly in avoiding undesirable behaviors. This paper provides a unified Bayesian perspective that links in-context learning (ICL) and activation steering under a common framework of belief dynamics, addressing the shared goal of controlling LLM behavior. This approach provides a predictive model for LLM behavior across various contexts by interpreting these interventions as modifications to the model's belief in latent concepts. Figure 1

Figure 1: Overview of our unified Bayesian theory of in-context learning and activation steering.

In-Context Learning: Bayesian Inference and Behavioral Dynamics

Concept of In-Context Learning

In-context learning (ICL) enables models to adapt based on the input prompts provided during inference, leveraging context to modulate hypotheses learned during pretraining. This learning mode has been shown to follow a sigmoidal pattern as evidence accumulates with increasing context length, elucidating the sudden shifts in LLM behavior observed empirically.

Bayesian Modeling of ICL

The paper models ICL as Bayesian inference, where the belief in latent concepts is updated based on observed data. The posterior belief, p(c∣x)p(c|x), is derived from both the likelihood function and prior probabilities. The likelihood function scales sub-linearly with the number of in-context examples, revealing sharp transitions typical of many-shot learning dynamics. Figure 2

Figure 2: Belief updating with concept vectors.

Activation Steering: Steering Vectors and Belief Updating

Mechanism of Activation Steering

Activation steering directs model outputs by intervening in the model's internal activations, adjusting the belief in certain concepts through steering vectors. These vectors are computed based on contrasting datasets, reflecting different persona or behavior categories, and can shift model behavior predictably.

Linear Representation Hypothesis

The linear representation hypothesis posits that concepts are linearly encoded in LLMs, which can be manipulated through simple vector arithmetic. This hypothesis is pivotal in understanding how steering vectors impact model behavior and enable predictable alterations to concept beliefs. Figure 3

Figure 3: Change in behavior as a function of steering vector magnitude.

Experiments: Many-Shot ICL and Activation Steering

Experimental Design

The experiments employ harmful and non-harmful persona datasets to evaluate the impact of ICL and steering on model behavior. The datasets chosen enable observing sharp learning trends and prediction of behavior change across varying contexts and steering magnitudes.

Observations and Predictions

The study identifies a sigmoidal response in LLM behavior when subjected to steering, modulated by context length and steering magnitude. The interaction between ICL and steering reveals phase boundaries, where model behavior shifts abruptly based on intervention controls. Figure 4

Figure 4: In-context learning and activation steering jointly affect behavior.

Conclusion

The paper successfully integrates two prevailing methods of LLM control, offering a comprehensive Bayesian framework to predict and understand model behavior. This framework not only explains past empirical observations but offers predictive capabilities for novel phenomena, such as the joint effects of ICL and steering. Future work can leverage these insights for better control mechanisms, ensuring safer deployment of LLMs in sensitive applications. Figure 5

Figure 5: With large enough magnitudes, the Linear Representation Hypothesis breaks down.

This research enhances the theoretical foundation for manipulating complex models, guiding future explorations toward optimizing AI safety and performance through informed intervention strategies.

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 8 tweets with 152 likes about this paper.