Finite Horizon Planning

Updated 13 April 2026

Finite horizon planning is a decision-making approach that optimizes actions within a fixed time window with explicit terminal costs or constraints.
It employs techniques like MPC and finite-horizon MDPs to address challenges in control, trajectory planning, and resource allocation.
The methodology balances optimality and computational efficiency, adapting non-stationary policies to evolving system dynamics and constraints.

Finite horizon planning refers to the family of decision-making methodologies in which actions are optimized over a window of time that is explicitly limited to a fixed duration or number of steps. In contrast to infinite-horizon or steady-state formulations, finite-horizon models are temporally localized, focusing on optimality, safety, or resource allocation within a prescribed time interval. This temporal truncation arises in diverse domains, including optimal control, model predictive control (MPC), reinforcement learning (finite-horizon MDPs), decentralized multi-agent planning, economic policy design, and large-scale discrete or continuous systems with receding or rolling horizons.

1. Core Principles and Mathematical Formulation

Finite horizon planning is characterized by its explicit time-bounded structure. The canonical formulation is an optimal control or Markov decision process (MDP) with a finite time interval $[0, T]$ (continuous time) or finite horizon $H$ (discrete time). For continuous dynamics, this takes the form: $\min_{u(\cdot)} \int_{0}^{T} L(x(t),u(t),t)\,dt + \Phi(x(T))$ subject to

$\dot{x}(t) = f(x(t),u(t),t), \quad x(0) = x_0, \quad u(t) \in U$

For discrete-time MDPs, the analogous formulation is

$\max_{\pi} \mathbb{E} \left[ \sum_{t=0}^{H-1} r_{t}(s_t, a_t, s_{t+1}) + r_H(s_H) \right]$

where policies $\pi = (\pi_0, ..., \pi_{H-1})$ may be non-stationary and time-varying. Terminal conditions or costs $\Phi(x(T))$ , or $r_H(s_H)$ , are explicit and essential for capturing post-horizon objectives or constraints.

The finite horizon ensures policies and value functions are inherently non-stationary: optimal decisions at each stage explicitly depend on the elapsed and remaining time. This non-stationarity leads to distinct algorithmic and theoretical properties compared to infinite-horizon models (Guin et al., 2022, Li et al., 2021).

2. Receding Horizon Model Predictive Control

A central methodology exploiting finite horizon planning is Model Predictive Control (MPC), also termed Receding Horizon Control (RHC) (Krishna et al., 2021, Papaioannou et al., 2023, Krener, 2019, Wang et al., 2023). In MPC/RHC, at each time step:

The system state is measured or estimated.
A finite-horizon trajectory optimization is solved over $N$ (resp. $T$ ) future steps.
Only the first action is executed; the remainder are discarded.
The process repeats at the next time with updated state and horizon.

The optimization at each step accounts for current dynamics, constraints, and state costs. The closed-loop behavior emerges from this sequential rolling process. Terminal costs or constraints, as well as the length of the horizon, critically affect stability and performance.

A prototypical discrete formulation is: $H$ 0 with $H$ 1 a terminal value/penalty, and constraints on $H$ 2 and $H$ 3. Selection of $H$ 4 balances performance and computational tractability: short horizons admit real-time solution but risk myopia, whereas long horizons improve optimality but increase computational burden (Papaioannou et al., 2023, Krishna et al., 2021). Adaptive schemes (AHMPC) dynamically adjust $H$ 5 to minimize online load while ensuring stability (Krener, 2019).

3. Finite Horizon Planning in Reinforcement Learning

In reinforcement learning, finite-horizon planning underpins both exploration and exploitation, and introduces significant differences from discounted infinite-horizon MDPs. For a finite-horizon $H$ 6-step MDP $H$ 7, the value function and optimal policy must be non-stationary (i.e., dependent on the current stage $H$ 8) (Li et al., 2021, Guin et al., 2022, Xu et al., 31 Jan 2026).

Sample complexity in finite-horizon RL can in principle be made independent of $H$ 9: carefully designed episodic algorithms achieve PAC (probably approximately correct) guarantees with no explicit dependence on the horizon, under bounded-reward assumptions (Li et al., 2021). Key algorithmic strategies include quantile-based trajectory truncation, state- and action-pairwise visitation counting, and ensemble pessimism.

Moreover, finite-horizon planning is critical for thresholded $\min_{u(\cdot)} \int_{0}^{T} L(x(t),u(t),t)\,dt + \Phi(x(T))$ 0-step lookahead Q-learning algorithms, where online agents optimize expected returns over the next $\min_{u(\cdot)} \int_{0}^{T} L(x(t),u(t),t)\,dt + \Phi(x(T))$ 1 steps only, significantly accelerating convergence relative to classical RL at a modest suboptimality cost (Xu et al., 31 Jan 2026). Both policy-gradient and value-based approaches in finite-horizon constrained MDPs require explicit time-indexed parameterizations and the handling of non-stationary value functions (Guin et al., 2022).

4. Structural and Computational Implications

Finite-horizon planning problems are distinguished by:

Non-stationarity: Optimal controls/policies $\min_{u(\cdot)} \int_{0}^{T} L(x(t),u(t),t)\,dt + \Phi(x(T))$ 2 depend on time-to-go; value functions $\min_{u(\cdot)} \int_{0}^{T} L(x(t),u(t),t)\,dt + \Phi(x(T))$ 3 are indexed by remaining horizon (Guin et al., 2022, Mifrani et al., 19 Feb 2025).
Terminal effects: Explicit terminal costs, constraints, and terminal feedbacks are pivotal—correctly chosen, they improve performance and ensure stability (Krener, 2019, Krishna et al., 2021).
Feedback law construction: For high-dimensional continuous systems, learning-based parametrizations (e.g., polynomial ansatz) allow approximate yet efficient feedback synthesis, bypassing the curse of dimensionality inherent to direct discretization of the finite-horizon Hamilton–Jacobi–Bellman PDE (Kunisch et al., 2023).
Multi-objective settings: In vector-valued MDPs, finite-horizon policies achieving Pareto-optimality correspond to efficient vertices of a vector LP; enumeration and theoretical characterization are efficient relative to exhaustive policy search (Mifrani et al., 19 Feb 2025).
Mixed-integer formulations: Exact finite-horizon planning in decentralized POMDPs (Dec-POMDPs) can be expressed as sequence-form MILPs, drastically reducing representational complexity compared to naive policy-tree enumeration (0707.2506).
Combinatorial and adversarial settings: In adversarially-terminated graph planning, stationary memoryless plans become optimal only under expected finite-horizon constraints, admitting polynomial-time algorithms where fixed-horizon planning is NP-complete (Chatterjee et al., 2018).

5. Applications and Domain-Specific Methodologies

Finite horizon planning underpins a broad spectrum of practical applications:

Trajectory Planning in Unsteady Flows: Energy-optimal deployment of mobile sensors exploits short finite-horizon MPC, with optimal paths "surfing" Lagrangian coherent structures quantified via finite-time Lyapunov exponents (FTLE). Even short predictive windows (relative to flow time-scales) suffice for near-optimality (Krishna et al., 2021).
UAV Inspection and Formation: Receding horizon MIQP-based algorithms allow UAVs to plan smooth, collision-free 3D inspection and formation trajectories, incorporating both kinematic and visibility constraints (Papaioannou et al., 2023, Jond et al., 29 Jun 2025).
**Multi-Contact Rob