Value-Guided Construal Models
- Value-Guided Construal (VGC) models are frameworks that optimize internal representations by balancing expected utility against representational costs.
- They are applied across domains like LLM decoding, human planning, moral reasoning, and goal-conditioned world modeling using task-specific value functions.
- Algorithmic implementations such as MAVIS, IVO, and JEPA demonstrate efficiency gains and improved decision-making under resource constraints.
A Value-Guided Construal (VGC) model is a theoretical and algorithmic framework for adaptive representation, decision, or generation, in which the construction or selection of internal representations, policies, or outputs is explicitly optimized under task-specific value functions and resource constraints. Originating from resource-rational and bounded rationality perspectives, VGC formalizes how agents—biological or artificial—simplify complex environments, balance utility against representational cost, or dynamically steer large generative models, by learning value functions or explicit construal policies that guide which information enters into planning, inference, or output construction. In recent years, a diverse literature spanning computational cognitive science, model-based control, and LLM alignment has instantiated the VGC approach in domains ranging from human-like mental simulation to multi-objective LLM decoding and goal-conditioned world-model planning.
1. Formal Foundations of Value-Guided Construal
VGC models share a canonical structure: they define a trade-off between the utility of a representation, policy, or output—usually quantified by a value or reward function—and its cost, complexity, or resource consumption. The general objective is where is a representation (or construal) selected from space , is expected task utility under , is a cost or complexity measure (e.g., , KL divergence, coding length), and is a trade-off parameter. This abstraction includes:
- Perceptual/Planning Representational VGC: is a subset of environment features, obstacles, or object encodings as in maze or grid-world planning (Castanheira et al., 11 Jun 2025, Chen et al., 20 Jan 2026).
- Value-Weighted LLM Decoding: as a sequence prefix or candidate output, with utility defined via external or learned reward models, and cost as divergence from a pretrained model (Carleton et al., 19 Aug 2025, Liu et al., 4 Mar 2025).
- Goal-Conditioned World Modeling: 0 as a state embedding for planning, with value defined by negative cost-to-go or embedding distance, cost as representational mismatch (Destrade et al., 28 Dec 2025).
Notably, VGC is not tied to any particular format of value function or resource cost, allowing both soft-inclusion (e.g., smoothed by attentional kernels (Castanheira et al., 11 Jun 2025)) and discrete selection.
2. Value-Guided Decoding and Inference in LLMs
Recent applications of VGC in LLMs optimize over output sequences using learned value functions to steer generation towards user-specified objectives without full retraining.
Multi-Objective Alignment (MAVIS)
MAVIS (Carleton et al., 19 Aug 2025) trains a set of per-objective value models 1—each a lightweight LM with a regression head—to estimate KL-regularized expected returns. At inference, user weights 2 induce a tilting function:
3
Token-level policies are adjusted as:
4
Each 5 is trained by KL-regularized policy iteration, where empirical returns penalized by log-probability ratios are regressed onto value heads. This enables post hoc adjustment of output tradeoffs among multiple goals, strict monotonic policy improvement, and expansion of the achievable Pareto frontier relative to baseline mixtures.
Iterative Value Function Optimization (IVO)
IVO (Liu et al., 4 Mar 2025) introduces a critic 6 trained with Monte Carlo rollouts and regression, and iteratively improves the policy via:
7
This approach allows steering of decoding to maximize reward without updating LLM backbone weights, substantially reducing computational cost relative to RLHF. IVO achieves significant empirical gains on summarization, dialog, and instruction tasks, dominates prior value-guided sampling methods (FUDGE, ARGS, VAS), and yields favorable GPT-4 win-rates.
3. Value-Guided Construal in Human Planning and Mental Simulation
VGC has been applied to models of human planning, exemplifying the resource-rational principle that agents filter and encode only task-relevant features.
Just-in-Time (JIT) World Modeling
JIT planning (Chen et al., 20 Jan 2026) implements VGC not via explicit search over representations, but through an interleaved simulate–lookahead–encode process. The agent maintains a working memory (construal) 8 containing only a small subset 9 of all possible objects or obstacles.
- Simulation steps trigger a lookahead that identifies unencoded but soon-to-be-relevant objects.
- Objects flagged are dynamically encoded; unused items decay probabilistically according to power-law forgetting.
- The process supports efficient prediction and planning with high correlation to human behavioral probes, reducing average objects represented and matching or exceeding classical VGC models in variant tasks.
Efficiency is obtained by estimating need probabilities for each object via Monte Carlo over sampled trajectories, updating construals "just in time" as demanded by the evolving simulation state.
Attentional and Perceptual Modulation
Extensions incorporating visuospatial attention ("spotlight-VGC") (Castanheira et al., 11 Jun 2025) introduce soft gating over which features enter the task representation, parameterized by spatial kernels or lateralization, tuned via participant-specific attention radius. The agent's attention function 0 influences which environmental features are included in the simplified model via smoothed inclusion probabilities, accounting for human-like crowding and lateralization effects in virtual maze navigation.
4. VGC in Moral Reasoning with LLMs
VGC is also employed for moral and value-sensitive LLMs (Chakraborty et al., 17 Jun 2025). Here, the construal is instantiated as a combination of structured prompts reflecting value systems and ethical theories, eliciting chain-of-thought-style justifications and decisions.
- A taxonomy of prompts combines psychological value frameworks (e.g., Schwartz, Moral Foundations) and explicit ethical theories (e.g., Care Ethics) to scaffold model reasoning.
- A distillation pipeline transfers competence from large teacher models, minimizing a hybrid loss over token-by-token imitation and semantic consistency, yielding scalable, interpretable, and value-grounded reasoning in small models.
- Structured prompting and distillation yield consistent improvements in moral decision accuracy and justification coherence over label-only baselines.
5. VGC in Goal-Conditioned World Models and Control
In model-based control, VGC formalizes how value structure shapes representation and action planning.
JEPA World Models
Destrade et al. (Destrade et al., 28 Dec 2025) introduce VGC within a Joint-Embedded Predictive Architecture (JEPA) for goal-reaching tasks.
- The value function 1—negative cost-to-go for reaching goal 2—is approximated by 3, with 4 a Euclidean or quasi-metric distance in embedding space.
- Training alternates or jointly optimizes a JEPA prediction loss and an Implicit Q-Learning (IQL) value loss, with expectile regression (5) shaping the embedding geometry for effective planning.
- Model Predictive Path Integral control (MPPI) uses these distances at test time for high-accuracy action planning, outperforming contrastive and standard regression approaches, but displays limitations in long-range calibration and stochastic settings.
6. Algorithmic Summaries and Theoretical Guarantees
Across VGC instantiations, several recurrent themes and guarantees emerge:
| Domain | Value Function | Inference/Planning Mechanism | Theoretical Guarantee |
|---|---|---|---|
| LLM Decoding | KL-regularized, per-objective | Exponential tilting (MAVIS), IVO top-k/beam | Monotonic improvement, Pareto optimality (Carleton et al., 19 Aug 2025, Liu et al., 4 Mar 2025) |
| Human Planning | Utility vs. complexity | Greedy/spotlight representational search, JIT | Efficiency, tight human fits (Chen et al., 20 Jan 2026, Castanheira et al., 11 Jun 2025) |
| World Modeling | Negative embedding distance | MPPI w/ JEPA, value-influenced control | Improved planning accuracy (Destrade et al., 28 Dec 2025) |
| Moral Reasoning | Prompt-structured value tradeoff | Structured prompting, distillation | Accuracy, coherence improvements (Chakraborty et al., 17 Jun 2025) |
KL-regularized policy iteration in LLM contexts enjoys strict monotonic policy improvement, provable convergence to optimal token-level policies under certain bandit settings, and empirical Pareto-front dominance in multi-objective evaluation (Carleton et al., 19 Aug 2025). In perceptual VGC, resource-bounded optimization yields fit measures (e.g., human–model correlation, RMSE, log-likelihood) that closely match human data (Chen et al., 20 Jan 2026, Castanheira et al., 11 Jun 2025).
7. Limitations, Efficiency Gains, and Open Challenges
VGC models achieve substantial empirical and computational efficiency over traditional RLHF or brute-force search:
- MAVIS and IVO require only small value heads and limited rollout sampling, yielding speedups of 61007 for LLM alignment (Carleton et al., 19 Aug 2025, Liu et al., 4 Mar 2025).
- JIT and attentional VGC in perceptual domains encode significantly fewer features for similar predictive power, trading occasional planning suboptimality for memory savings (Chen et al., 20 Jan 2026).
- In JEPA world models, value-guided construals improve planning accuracy but struggle with rare state–goal pairs and with calibration far from the goal; improvements require either hierarchical latents or more strategically curated datasets (Destrade et al., 28 Dec 2025).
Broader open issues include:
- Extending VGC to domains with high non-stationarity or combinatorial construal spaces.
- Scalability of representational search in environments with ambiguous or weakly-structured value signals.
- Formal generalization bounds under resource constraints and finite-sample regimes.
References:
- MAVIS: Multi-Objective Alignment via Value-Guided Inference-Time Search (Carleton et al., 19 Aug 2025)
- Iterative Value Function Optimization for Guided Decoding (Liu et al., 4 Mar 2025)
- "Just in Time" World Modeling Supports Human Planning and Reasoning (Chen et al., 20 Jan 2026)
- How attention simplifies mental representations for planning (Castanheira et al., 11 Jun 2025)
- Structured Moral Reasoning in LLMs (Chakraborty et al., 17 Jun 2025)
- Value-guided action planning with JEPA world models (Destrade et al., 28 Dec 2025)