Sequential Decision Frameworks

Updated 25 February 2026

Sequential Decision Frameworks are formal models that structure a series of interdependent decisions over time using axiomatic and probabilistic methods.
They integrate techniques from reinforcement learning, automated planning, robust control, and causal inference to address uncertainty and constraints.
Recent advances include unification with human feedback, causal extensions, and language-based agents for enhanced scalability and adaptability.

Sequential Decision Frameworks formalize the process of making a series of interdependent decisions over time, operating under stochastic dynamics, partial information, and often under constraints. These frameworks unify diverse methodologies—ranging from reinforcement learning, automated planning, causal inference, robust control, to modern LLM–driven agents—by abstracting the agent–environment interaction as the search for policies that optimize cumulative utilities under uncertainty and structural axioms.

1. Foundations: Axiomatic and Probabilistic Structures

Classical utility theory, as extended to the sequential setting, provides the behavioral foundation for most sequential decision frameworks. The canonical result—extending von Neumann-Morgenstern (VNM) rationality—shows that preferences over trajectory lotteries satisfying completeness, transitivity, continuity, and independence admit a linear utility functional on trajectories. Adding a Markovian "memorylessness" axiom, one obtains a utility recursion of the form

$u(t \cdot \tau) = r(t) + m(t)\, u(\tau),$

where $t$ is a transition, $r$ is immediate reward, and $m$ is a per-transition continuation factor. This characterizes the Affine-Reward Markov Decision Process (AR-MDP), encompassing both standard additive-reward MDPs and more general, state-dependent discounting (Shakerinava et al., 2022).

Under further axiomatic strengthening (additivity and path-obliviousness), these frameworks yield respectively: (i) the sum-of-rewards form familiar from RL, and (ii) potential-based utilities, where only the initial and final states matter ( $u(\tau) = \Phi(s_{\rm terminal}) - \Phi(s_0)$ ).

2. Canonical Modeling Formulations

Sequential decision problems are most commonly modeled as Markov Decision Processes (MDPs), Partially Observable MDPs (POMDPs), and their generalizations:

Framework	State Representation	Utility Structure	Uncertainty
Standard MDP	$s \in S$	$\sum r(s_t,a_t)$	$P(s_{t+1}\|s_t,a_t)$
Affine-Reward MDP (AR-MDP)	$s \in S$	$u(\tau)$ as above	State–action–dependent
Constrained/Robust MDP (RMDP)	$s \in S$	$r(s,a)$ , cost constraints $c_i$	$P$ in ambiguity set $U$
Sequential Language-based	$s$ via $\Omega$	$U(\alpha)$ , action as plan	Interpretive via $\sigma$

The algebraic Plausibility-Feasibility-Utility (PFU) framework extends the above to a unified factor-graph model, combining uncertainty, constraints, and utilities via algebraic semirings, and supporting generic variable elimination algorithms (Pralet et al., 2011).

3. General Solution Algorithms and Structural Properties

Sequential decision frameworks are solved by dynamic programming principles. For MDPs and generalizations, the standard Bellman recursion is modified to reflect the utility structure and the properties of uncertainty. In the robust setting, the Bellman operator becomes:

$V^*(s) = \max_{a} \min_{P(\cdot|s,a) \in U(s,a)} \left[ r(s,a) + \gamma \sum_{s'} P(s'|s,a) V^*(s') \right],$

where $U(s,a)$ encodes ambiguity about transitions. Key tractability results hinge on various "rectangularity" assumptions about $U$ : (s,a)-rectangular sets yield tractable robust value iteration, while general sets render planning NP-hard (Ou et al., 2024).

Lyapunov-based control formulations introduce parameter adaptation and guarantee convergence of policies in problems with time-varying environment parameters (Srivastava et al., 2022).

4. Recent Advances: Unification and Human-in-the-Loop Extensions

Recent frameworks have systematically unified (deep) reinforcement learning with automated planning within a Bayesian paradigm. Sequential Decision Making (SDM) tasks are framed as inference over distributions on policy spaces, with algorithms maintaining and updating beliefs over solution policies via observed performance and structural similarity kernels (Núñez-Molina et al., 2023).

Human guidance has been integrated through multi-faceted feedback—evaluative signals, preferences, demonstrations (including partial or high-level), and attention data. These frameworks incorporate human input as direct reward surrogates, as inverse reinforcement learning objectives, or as hierarchical segmentation cues (Zhang et al., 2021). Dynamic teaching models formalize a teacher as an agent who dynamically selects demonstration policies to minimize learner sample complexity given state–action–environmental constraints, using teaching dimension frameworks extended to sequential, noisy, and constraint-coupled setups (Walsh et al., 2012).

5. Extensions: Causality, Language, and Deep Sequential Agents

Causal inference researchers have adapted marked point process models to encode the random, sequential occurrence of decisions and outcomes, extending the potential outcomes framework to assess causal effects where the number and timing of actions are themselves stochastic and potentially confounded (Gomez et al., 2023).

Language-based sequential decision frameworks introduce actions as syntactic plans in a propositional logic, requiring the specification of a selection function to ground under-specified outcomes and yielding new representation theorems for expected utility over plans (Bjorndahl et al., 2023).

LLMs for sequential decision making—framed as text-mediated stochastic games (TSMG) or via transformer-based masked modeling (UniMASK)—enable agents to solve general SDM tasks by translating masking or parsing schemes into queries about trajectories, with reinforcement algorithms such as Multi-Step Group-Relative Policy Optimization (MS-GRPO) used for efficient multi-step credit assignment in language-mediated agent-environment loops (Carroll et al., 2022, Dilkes et al., 14 Aug 2025).

6. Robust, Constrained, and Stochastic Multi-Stage Optimization

Many frameworks tackle uncertainty or risk aversion beyond classical MDPs. Robust MDPs systematically consider worst-case transitions over ambiguity sets, leveraging mathematical programming (LP, SOCP, semi-infinite duals) to encode parametric, moment-based, or discrepancy-based uncertainty (Ou et al., 2024). Sequential stochastic programming frameworks, critical in applications like multi-stage energy markets, harness two-stage or multi-phase models in which scenario-based recourse is continually updated as new information is observed. Multi-stage frameworks yield significant benefits in constrained, real-world domains (Al-Lawati et al., 2020, Rosemberg et al., 2024).

7. Broader Implications and Ongoing Challenges

Sequential decision frameworks have exposed and formalized the behavioral axioms—such as additivity, path-independence, and preference rationality—that underlie both RL and planning. These theoretical guarantees inform the design of learning algorithms (e.g., for AR-MDPs or privacy-preserving RL), clarify when classical models suffice, and when richer axiomatic or inference-based models are required (Shakerinava et al., 2022, Fan et al., 15 Apr 2025). They also illuminate key practical trade-offs in privacy, interpretability, computational scalability, and model expressiveness.

Open problems include: specifying functionally correct objectives in dynamic user-facing systems, efficiently learning from complex human feedback, integrating causal and robust formulations in practical algorithms, and constructing universal sequential agents that generalize across tasks, languages, and input modalities.

References:

Utility Theory for Sequential Decision Making (Shakerinava et al., 2022)
Sequential Decision-Making for Inline Text Autocomplete (Chitnis et al., 2024)
Time-Varying Parameters in Sequential Decision Making Problems (Srivastava et al., 2022)
Position Paper: Rethinking Privacy in RL for Sequential Decision-making in the Age of LLMs (Fan et al., 15 Apr 2025)
Informational entropy thresholds as a physical mechanism to explain power-law time distributions in sequential decision-making (Cristín et al., 2021)
Scalable Generalized Bayesian Online Neural Network Training for Sequential Decision Making (Duran-Martin et al., 13 Jun 2025)
Dynamic Teaching in Sequential Decision Making Environments (Walsh et al., 2012)
Unveiling Bias in Sequential Decision Making: A Causal Inference Approach for Stochastic Service Systems (Gomez et al., 2023)
An Algebraic Graphical Model for Decision with Uncertainties, Feasibilities, and Utilities (Pralet et al., 2011)
UniMASK: Unified Inference in Sequential Decision Problems (Carroll et al., 2022)
Decision making with dynamic probabilistic forecasts (Tankov et al., 2021)
Towards a Unified Framework for Sequential Decision Making (Núñez-Molina et al., 2023)
Two-Stage Stochastic Optimization Frameworks to Aid in Decision-Making Under Uncertainty for Variable Resource Generators Participating in a Sequential Energy Market (Al-Lawati et al., 2020)
Sequential Language-based Decisions (Bjorndahl et al., 2023)
Recent Advances in Leveraging Human Guidance for Sequential Decision-Making Tasks (Zhang et al., 2021)
Efficiently Training Deep-Learning Parametric Policies using Lagrangian Duality (Rosemberg et al., 2024)
Sequential Decision-Making under Uncertainty: A Robust MDPs review (Ou et al., 2024)
Reinforced LLMs for Sequential Decision Making (Dilkes et al., 14 Aug 2025)