Sky View Factor (SVF) in Urban Microclimates
- Sky View Factor (SVF) is defined as the ratio of visible sky area from a given point, crucial for assessing urban radiative environments.
- SVF informs urban planning by evaluating solar radiation, urban heat island effects, and energy efficiency in built environments.
- Measurement methods such as fisheye photography, GIS analysis, and remote sensing provide precise metrics for urban microclimate studies.
Value-Guided Construal (VGC) Models provide a unifying computational framework for how cognitive agents, artificial systems, or resource-constrained planners construct internal representations or policies by optimizing a trade-off between expected utility and representation or action complexity, often guided by explicit or implicit value functions. The VGC paradigm appears across domains—including LLM alignment, planning under resource constraints, simulation-based world modeling, and moral decision-making—wherein the value of possible futures, representations, or construals determines what gets attended to, encoded, or chosen.
1. Formal Foundations of Value-Guided Construal
The core of VGC is the explicit optimization of a utility-complexity trade-off governed by a learned value function. Given a configuration space (e.g., partial representations, proto-policies, or token prefixes), each construal or sequence is evaluated by
where is expected utility (e.g., reward, planning efficacy, representational fidelity) and the complexity cost (e.g., number of encoded features/objects, policy entropy, or a coding penalty). This objective is typically solved via either direct enumeration (tractable in low-dimensional cases), iterative optimization, or learning a parameterized value function (e.g., ).
In multi-objective alignment for LLMs, the framework extends to
where is a reference distribution and are distinct scalar objectives, each with user-specified weights 0 (Carleton et al., 19 Aug 2025). VGC then shapes the output by tilting probabilities toward high-value completions, either at the sequence or token level.
2. Model Classes: Value Models and Policy Construction
VGC instantiates a value model 1 or 2 to guide decision-making at critical junctures:
- LLMs and Multi-Objective LLM Alignment: Separate “value models” 3 are trained per-objective, using KL-regularized policy improvement:
4
This results in a sequence- or token-level tilting, executed via a softmax over the weighted sum of objective-scaled value heads (Carleton et al., 19 Aug 2025). At inference, user-defined weights allow on-the-fly reweighting.
- Iterative Value Function Optimization (IVO): In guided decoding, a critic network 5 is refined by regression to Monte Carlo rollout returns, and used to steer beam search or top-6 decoding, with policy improvement via
7
(Liu et al., 4 Mar 2025). An iterative loop alternates rollout, critic regression, and improved policy construction, yielding strict monotonic improvement in KL-regularized value.
- World Modeling and Resource-Bounded Planning: JIT world modeling and simplified-planning VGC treat the current construal 8 (subset of objects/features) as the state, updating 9 online via simulation-driven need probabilities, lookahead, and dynamic encoding/forgetting of features (Chen et al., 20 Jan 2026, Castanheira et al., 11 Jun 2025).
3. Algorithmic Structures and Optimization Regimes
The algorithmic core of VGC models comprises:
- KL-Regularized Policy Improvement (MAVIS): Alternates Monte Carlo rollouts, per-token value regression, and soft policy iteration for each objective, converging to the optimal constrained policy. Guarantees monotonic improvement in the regularized value, exact multi-objective optimality in bandit regimes, and Pareto frontier expansion over ensembling baselines (Carleton et al., 19 Aug 2025).
- Iterative On-Policy Optimization (IVO): Explicitly alternates between Monte Carlo batch estimation of value, on-policy critic fit, and policy update (reweighting tokens or beams), capitalizing on variance reduction via diverse trajectory sampling (Liu et al., 4 Mar 2025). Rapid convergence is empirically observed for 0–1 iterations and 2 sampled trajectories per prompt.
- Just-in-Time Construal (JIT): Eschews global combinatorial search over construals, instead using trajectory-wise incremental encoding as simulation encounters new objects. A power-law forgetting rule maintains a bounded memory buffer (Chen et al., 20 Jan 2026).
- Spotlight Attention Modulation: Augments construal probabilities with spatially local smoothing kernels (spotlight functions), accounting for visuospatial attention effects on which features enter the representation, explaining individual and crowding effects (Castanheira et al., 11 Jun 2025).
4. Domains of Application
Multi-Objective LLM Alignment
MAVIS realizes VGC for LLMs by learning per-objective value heads and tilting base distributions via exponential scaling at inference. Users can dynamically set the trade-off vector, enabling per-deployment customization without further fine-tuning. Empirically, MAVIS matches or surpasses RLHF- and ensembling-based baselines, dominates the Pareto frontier, and achieves monotonic, KL-bounded improvement without modifying underlying model weights (Carleton et al., 19 Aug 2025).
Guided Decoding in LLMs
IVO applies VGC at decoding time using an accurate, iteratively-refined value function, offering a lightweight alternative to full RLHF. Blockwise and top-3 steered searches yield state-of-the-art alignment and reward metrics while saving up to 4 in computational costs over PPO-style RLHF (Liu et al., 4 Mar 2025).
Planning and Reasoning under Resource Constraints
VGC-based models for grid-world and physical simulation domains formalize the selection of features or objects for encoding via resource–utility trade-offs, either globally (construal optimization) or incrementally (JIT). Empirical studies confirm that such models predict both planning and memory patterns in human participants—with “spotlight” attention extensions matching observed spatial and crowding effects (Chen et al., 20 Jan 2026, Castanheira et al., 11 Jun 2025).
Moral and Value-Grounded Reasoning
Structured prompting of LLMs for moral decision-making leverages VGC by grounding prompts in explicit value systems and ethical theories, or in first-principles cognitive strategies. Distillation transfers such value-aligned construals to smaller models. Inductive results show improvements in accuracy and coherence on multiple benchmarks (Chakraborty et al., 17 Jun 2025).
5. Theoretical Guarantees and Empirical Results
VGC-based models benefit from:
- Monotonic Improvement: Both policy iteration in MAVIS and IVO guarantee monotonic ascent in KL-regularized policy value, converging toward the optimum (Carleton et al., 19 Aug 2025, Liu et al., 4 Mar 2025).
- Exact/Approximate Recovery: MAVIS exactly recovers the multi-objective optimal solution in the bandit regime and approximates it at the token level for general decoding (Carleton et al., 19 Aug 2025).
- Pareto Frontier Expansion: Token-level VGC resolves cross-objective conflicts more finely than ensembling or mixture-based baselines (MOD, RSoup), demonstrating expanded Pareto frontiers in empirical alignment tasks (Carleton et al., 19 Aug 2025).
- Empirical Superiority: Across summarization, dialogue, and instruction-following benchmarks, VGC models—especially those leveraging IVO or MAVIS—outperform or match RLHF, FUDGE, ARGS, and DPO on reward and win-rate metrics with reduced resource requirements (Liu et al., 4 Mar 2025).
- Resource-Rationality in Planning: JIT and spotlight-VGC models not only fit human awareness and recall data but also exhibit greater efficiency (representing fewer objects) than classic VGC, with stochastic lookahead and spatial proximity constraints accounting for observed behavioral patterns (Chen et al., 20 Jan 2026, Castanheira et al., 11 Jun 2025).
Representative Empirical Results Table
| Domain | VGC Method | Empirical Result |
|---|---|---|
| LLM Multi-Objective Alignment | MAVIS | Pareto expansion; outperforms RLHF baselines (Carleton et al., 19 Aug 2025) |
| Guided Decoding (summarization/dialogue) | IVO | +0.4–0.6 reward vs. FUDGE/VAS; 77.5% win-rate (Liu et al., 4 Mar 2025) |
| Planning/Physical Simulation | JIT World Model | Human recall 5 (JIT) vs 6 (classic VGC) (Chen et al., 20 Jan 2026) |
| Moral Reasoning in LLMs | Prompted + Distill | 7 accuracy (Schwartz+Care), 8 (First Principles) (Chakraborty et al., 17 Jun 2025) |
6. Extensions, Computational Considerations, and Limitations
- Blockwise Decoding: VGC-guided blockwise beam search reduces computational overhead by 9 over per-token evaluation (Liu et al., 4 Mar 2025).
- Attentional and Representational Constraints: Spotlight-VGC and JIT architectures implement perceptually and biologically plausible constraints, enabling scaling to real-world, cluttered domains (Castanheira et al., 11 Jun 2025, Chen et al., 20 Jan 2026).
- Action Planning in Learned World Models: VGC constraints on JEPA state embeddings (alignment of negative value to distance) yield significant gains in planning performance, especially under “separate” (non-joint) optimization for goal-conditioned values (Destrade et al., 28 Dec 2025).
Limitations
- Coverage of State/Goal Space: VGC with value learning is locally most accurate; rare or remote state-goal pairs may be mis-valued without targeted data (Destrade et al., 28 Dec 2025).
- Stochastic Dynamics: Bias in value estimates (e.g., from IQL) can impair performance in highly stochastic environments (Destrade et al., 28 Dec 2025).
- Human Planning: JIT and spotlight models provide better match to human data than vanilla VGC, indicating that pure global optimization is psychologically unrealistic; online, need-probability-based encoding is both tractable and predictive (Chen et al., 20 Jan 2026, Castanheira et al., 11 Jun 2025).
7. Significance and Outlook
Value-Guided Construal models offer a principled, unified framework for adaptive representation, inference, and decision-making across high-dimensional domains. By operationalizing the trade-off between task performance and resource complexity via learned or constructed value functions, VGC methods enable flexible, preference-sensitive, and tractable behavior in settings ranging from LLM alignment and decoding to world modeling and human planning (Carleton et al., 19 Aug 2025, Liu et al., 4 Mar 2025, Chen et al., 20 Jan 2026, Castanheira et al., 11 Jun 2025, Destrade et al., 28 Dec 2025, Chakraborty et al., 17 Jun 2025). The paradigm continues to expand into new domains, leveraging theoretical guarantees of monotonicity, empirical superiority, and cognitive plausibility.