Papers
Topics
Authors
Recent
Search
2000 character limit reached

Sequential Decision Frameworks

Updated 25 February 2026
  • Sequential Decision Frameworks are formal models that structure a series of interdependent decisions over time using axiomatic and probabilistic methods.
  • They integrate techniques from reinforcement learning, automated planning, robust control, and causal inference to address uncertainty and constraints.
  • Recent advances include unification with human feedback, causal extensions, and language-based agents for enhanced scalability and adaptability.

Sequential Decision Frameworks formalize the process of making a series of interdependent decisions over time, operating under stochastic dynamics, partial information, and often under constraints. These frameworks unify diverse methodologies—ranging from reinforcement learning, automated planning, causal inference, robust control, to modern LLM–driven agents—by abstracting the agent–environment interaction as the search for policies that optimize cumulative utilities under uncertainty and structural axioms.

1. Foundations: Axiomatic and Probabilistic Structures

Classical utility theory, as extended to the sequential setting, provides the behavioral foundation for most sequential decision frameworks. The canonical result—extending von Neumann-Morgenstern (VNM) rationality—shows that preferences over trajectory lotteries satisfying completeness, transitivity, continuity, and independence admit a linear utility functional on trajectories. Adding a Markovian "memorylessness" axiom, one obtains a utility recursion of the form

u(tτ)=r(t)+m(t)u(τ),u(t \cdot \tau) = r(t) + m(t)\, u(\tau),

where tt is a transition, rr is immediate reward, and mm is a per-transition continuation factor. This characterizes the Affine-Reward Markov Decision Process (AR-MDP), encompassing both standard additive-reward MDPs and more general, state-dependent discounting (Shakerinava et al., 2022).

Under further axiomatic strengthening (additivity and path-obliviousness), these frameworks yield respectively: (i) the sum-of-rewards form familiar from RL, and (ii) potential-based utilities, where only the initial and final states matter (u(τ)=Φ(sterminal)Φ(s0)u(\tau) = \Phi(s_{\rm terminal}) - \Phi(s_0)).

2. Canonical Modeling Formulations

Sequential decision problems are most commonly modeled as Markov Decision Processes (MDPs), Partially Observable MDPs (POMDPs), and their generalizations:

Framework State Representation Utility Structure Uncertainty
Standard MDP sSs \in S r(st,at)\sum r(s_t,a_t) P(st+1st,at)P(s_{t+1}|s_t,a_t)
Affine-Reward MDP (AR-MDP) sSs \in S u(τ)u(\tau) as above State–action–dependent
Constrained/Robust MDP (RMDP) sSs \in S r(s,a)r(s,a), cost constraints cic_i PP in ambiguity set UU
Sequential Language-based ss via Ω\Omega U(α)U(\alpha), action as plan Interpretive via σ\sigma

The algebraic Plausibility-Feasibility-Utility (PFU) framework extends the above to a unified factor-graph model, combining uncertainty, constraints, and utilities via algebraic semirings, and supporting generic variable elimination algorithms (Pralet et al., 2011).

3. General Solution Algorithms and Structural Properties

Sequential decision frameworks are solved by dynamic programming principles. For MDPs and generalizations, the standard Bellman recursion is modified to reflect the utility structure and the properties of uncertainty. In the robust setting, the Bellman operator becomes:

V(s)=maxaminP(s,a)U(s,a)[r(s,a)+γsP(ss,a)V(s)],V^*(s) = \max_{a} \min_{P(\cdot|s,a) \in U(s,a)} \left[ r(s,a) + \gamma \sum_{s'} P(s'|s,a) V^*(s') \right],

where U(s,a)U(s,a) encodes ambiguity about transitions. Key tractability results hinge on various "rectangularity" assumptions about UU: (s,a)-rectangular sets yield tractable robust value iteration, while general sets render planning NP-hard (Ou et al., 2024).

Lyapunov-based control formulations introduce parameter adaptation and guarantee convergence of policies in problems with time-varying environment parameters (Srivastava et al., 2022).

4. Recent Advances: Unification and Human-in-the-Loop Extensions

Recent frameworks have systematically unified (deep) reinforcement learning with automated planning within a Bayesian paradigm. Sequential Decision Making (SDM) tasks are framed as inference over distributions on policy spaces, with algorithms maintaining and updating beliefs over solution policies via observed performance and structural similarity kernels (Núñez-Molina et al., 2023).

Human guidance has been integrated through multi-faceted feedback—evaluative signals, preferences, demonstrations (including partial or high-level), and attention data. These frameworks incorporate human input as direct reward surrogates, as inverse reinforcement learning objectives, or as hierarchical segmentation cues (Zhang et al., 2021). Dynamic teaching models formalize a teacher as an agent who dynamically selects demonstration policies to minimize learner sample complexity given state–action–environmental constraints, using teaching dimension frameworks extended to sequential, noisy, and constraint-coupled setups (Walsh et al., 2012).

5. Extensions: Causality, Language, and Deep Sequential Agents

Causal inference researchers have adapted marked point process models to encode the random, sequential occurrence of decisions and outcomes, extending the potential outcomes framework to assess causal effects where the number and timing of actions are themselves stochastic and potentially confounded (Gomez et al., 2023).

Language-based sequential decision frameworks introduce actions as syntactic plans in a propositional logic, requiring the specification of a selection function to ground under-specified outcomes and yielding new representation theorems for expected utility over plans (Bjorndahl et al., 2023).

LLMs for sequential decision making—framed as text-mediated stochastic games (TSMG) or via transformer-based masked modeling (UniMASK)—enable agents to solve general SDM tasks by translating masking or parsing schemes into queries about trajectories, with reinforcement algorithms such as Multi-Step Group-Relative Policy Optimization (MS-GRPO) used for efficient multi-step credit assignment in language-mediated agent-environment loops (Carroll et al., 2022, Dilkes et al., 14 Aug 2025).

6. Robust, Constrained, and Stochastic Multi-Stage Optimization

Many frameworks tackle uncertainty or risk aversion beyond classical MDPs. Robust MDPs systematically consider worst-case transitions over ambiguity sets, leveraging mathematical programming (LP, SOCP, semi-infinite duals) to encode parametric, moment-based, or discrepancy-based uncertainty (Ou et al., 2024). Sequential stochastic programming frameworks, critical in applications like multi-stage energy markets, harness two-stage or multi-phase models in which scenario-based recourse is continually updated as new information is observed. Multi-stage frameworks yield significant benefits in constrained, real-world domains (Al-Lawati et al., 2020, Rosemberg et al., 2024).

7. Broader Implications and Ongoing Challenges

Sequential decision frameworks have exposed and formalized the behavioral axioms—such as additivity, path-independence, and preference rationality—that underlie both RL and planning. These theoretical guarantees inform the design of learning algorithms (e.g., for AR-MDPs or privacy-preserving RL), clarify when classical models suffice, and when richer axiomatic or inference-based models are required (Shakerinava et al., 2022, Fan et al., 15 Apr 2025). They also illuminate key practical trade-offs in privacy, interpretability, computational scalability, and model expressiveness.

Open problems include: specifying functionally correct objectives in dynamic user-facing systems, efficiently learning from complex human feedback, integrating causal and robust formulations in practical algorithms, and constructing universal sequential agents that generalize across tasks, languages, and input modalities.


References:

Definition Search Book Streamline Icon: https://streamlinehq.com
References (18)

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Sequential Decision Frameworks.