Oversubscription Planning: Formulation & Methods

Updated 26 November 2025

Oversubscription planning is a decision-making framework that integrates hard and soft goals by maximizing utility within budget constraints while allowing controlled constraint violations.
It utilizes various methods such as symbolic search, reinforcement learning, and imitation learning to address its inherent PSPACE-completeness and scalability challenges.
The formulation underpins advanced cloud resource management and adaptive scheduling through probabilistic, temporal, and multidimensional extensions.

Oversubscription planning formulation describes a class of sequential decision problems where resource allocations or plan objectives exceed strict feasibility constraints, subject to explicit soft goals, probabilistic constraints, or resource risk budgets. It generalizes classical planning and traditional resource allocation by allowing partial satisfaction of goals, controlled violation of capacity or safety limits, or stochastic guarantees on constraint satisfaction. Oversubscription planning, and its formal models, have been instrumental in AI planning, cloud resource management, and adaptive scheduling under uncertainty.

1. Formal Definitions and Key Model Components

The canonical oversubscription planning (OSP) formalism, as found in symbolic planning research, extends classical planning via hard and soft goals with explicit cost and utility accounting (Speck, 2022). An OSP task is the tuple:

$\Pi_o = \langle F, A, s_0, G_h, G_s, U, c, B \rangle$

where:

$F$ : set of fluents (propositional atoms/state variables)
$A$ : set of actions, each with precondition $\mathrm{pre}(a)$ , effect $\mathrm{eff}(a)$ , (possibly state-dependent) cost $c_a$
$s_0$ : initial state
$G_h \subseteq F$ : hard goals (must be achieved)
$G_s \subseteq F$ : soft goals
$U: G_s \to \mathbb{N}_0$ : soft goal utilities
$c$ : action cost function
$B$ : total cost budget

A plan $\pi$ is a sequence of applicable actions with:

$\mathrm{cost}(\pi) = \sum_{i=0}^{n-1} c_{a_i}(s_i) \leq B$
Utility $\mathrm{utility}(\pi) = \sum_{g \in G_s \cap s_n} U(g)$ maximizing soft-goal utility subject to cost and hard goals:

$\pi^* = \arg\max_{\pi:\ \mathrm{cost}(\pi)\leq B,\,s_n \supseteq G_h} \mathrm{utility}(\pi)$

The problem is PSPACE-complete; compiling OSP into classical STRIPS with polynomial-size reduction is impossible in general, as soft-goal utilities and budget constraints introduce additional expressivity not captured by classical plan path-length objectives (Speck, 2022).

2. Chance-Constrained and Stochastic Variants

In modern resource management (notably cloud oversubscription), non-determinism and risk are accommodated via chance constraints:

Safety constraints are relaxed to require high-probability satisfaction, e.g.,

$\Pr \left[ \frac{1}{T} \sum_{t=0}^T 1\Big\{ \sum_{n=1}^N U_{n,k}^t \geq \beta B \Big\} < \delta \right] \geq \alpha$

for all machines $k=1,\ldots,K$ , with $U_{n,k}^t$ stochastic usage, $\beta$ a utilization threshold, and $\delta$ a hot-machine budget (Sheng et al., 2022).

In chance-constrained OSP, objectives and constraints explicitly reference probabilities or statistical guarantees, e.g., $1-\delta$ bound on violation of cumulative congestion costs (Wang et al., 13 Jan 2024). Gaussian models enable deterministic reformulations of such chance constraints, e.g.,

$\mu_t + m(\delta_t) \leq g_t,\qquad m(\delta_t) = -\Phi^{-1}(\delta_t)\sigma_t$

where $\mu_t$ is the predicted mean usage, $\sigma_t$ is estimated uncertainty, and $\Phi$ is the standard normal CDF (Wang et al., 13 Jan 2024).

3. Solution Methods: Symbolic, RL-based, and IL-based Planning

Symbolic search applies decision diagrams (BDDs/ADDs) to enumerate all reachable states at each cost layer, enabling efficient exploration without explicit enumeration or admissible heuristics (Speck, 2022). This "blind search" is effective when heuristic evaluation for the composite cost-utility objective is computationally infeasible. For each cost $g=0,\ldots,B$ , the algorithm explores all reachable states, tracks achieved utilities, and reconstructs plans with maximal soft-goal satisfaction within budget.

For stochastic, multi-agent settings such as cloud resource oversubscription:

The problem is formulated as a chance-constrained Decentralized Partially Observable Markov Decision Process (Dec-POMDP), with each agent (e.g., cloud subscriber) controlling allocation actions based on local observations (Sheng et al., 2022).
The multi-agent RL solution (C2MARL) employs Value-Decomposition Networks (VDN), primal-dual Lagrangian relaxation, and dynamic updates of penalty multipliers to optimize global utilization while maintaining probabilistic safety guarantees via cluster-level chance constraints, e.g.,

$\Pr_\pi\left[ \frac{1}{T} \sum_{t=0}^T \mathcal{C}_c(s_t) < \delta \right] \geq \alpha$

with $\mathcal{C}_c$ the cluster-hot indicator aggregated over all machines (Sheng et al., 2022).

Imitation learning (COIN) minimizes policy loss against expert trajectories, enforcing stochastic safety via chance constraints on cumulative risk, using forward and backward value ensembles to robustly estimate aleatoric uncertainty (Wang et al., 13 Jan 2024). Deterministic surrogates for chance constraints are constructed by estimating uncertainty with ensemble value networks.

4. Temporal and Multidimensional Extensions in Cloud and Resource Scheduling

Practical oversubscription in cloud environments requires multidimensional (CPU, memory, network, storage) and temporal considerations. The Coach system formalizes placement as a vector bin-packing problem across discrete time windows. For each VM, both guaranteed (persistent allocation, PA) and variable (oversubscribed, VA) resource assignments are computed using percentile-based demand predictions (e.g., $P_{95}$ ), with constraints ensuring both burst- and window-based feasibility (Reidys et al., 19 Jan 2025):

$g_{i,r} = \max_{t \in TW} G_{i,r,t}$
$v_{i,r,t} = \max\{0, D_{i,r,t} - g_{i,r}\}$
Server constraints:

$\sum_{i\in I} x_{i,s} g_{i,r} \leq C_{s,r}\quad \sum_{i\in I} x_{i,s} (g_{i,r} + v_{i,r,t}) \leq C_{s,r} \quad \forall s, r, t$

where $D_{i,r,t}$ and $G_{i,r,t}$ are predicted demands and guarantees, $x_{i,s}$ is the placement decision, and $C_{s,r}$ server capacities.

Memory oversubscription is handled by designating PA/VA pages, with hardware-enforced protection and periodic monitoring for page faults and pressure to trigger trimming, extension, or VM migration (Reidys et al., 19 Jan 2025). The system exploits temporal demand complementarity, using random-forest predictors and short-term models (EWMA, LSTM) to drive both placement and online mitigation.

5. Complexity, Expressivity, and Empirical Observations

OSP is provably PSPACE-complete—matching classical planning in worst-case complexity. Crucially, generic compilation to classical STRIPS models is infeasible due to the combinatorial blowup from encoding utilities and soft goals (Speck, 2022). Symbolic search methods, particularly BDD/ADD-based approaches, natively represent the two-dimensional objective structure and avoid the necessity for admissible heuristics that track both cost and utility.

Empirical studies on benchmark sets show that symbolic OSP search (uBDD variant) achieves superior instance coverage and anytime performance relative to explicit uniform-cost, translation-based net-benefit planners, and branch-and-bound search (Speck, 2022).

In stochastic settings, RL-based primal-dual algorithms with value decomposition scale to thousands of interacting agents and large state spaces. The reduction of $O(K)$ machine-level chance constraints to a single cluster constraint ( $O(1)$ ) is essential for practical scalability and safety interpretability (Sheng et al., 2022). Imitation-learning based approaches such as COIN provide policy guarantees under aleatoric uncertainty and efficiently exploit historical telemetry (Wang et al., 13 Jan 2024). Temporal and multidimensional extensions in production cloud systems achieve substantial increases (up to 26%) in VM placement density at negligible violation rates through fast, prediction-driven scheduling (Reidys et al., 19 Jan 2025).

6. Summary Table: Oversubscription Planning Formalisms

Model/Setting	Formalization Aspect	Resolution Technique
Symbolic OSP (Speck, 2022)	Cost-budget, hard+soft goals, utility	BDD/ADD symbolic uniform-cost search
Chance-constrained RL (Sheng et al., 2022)	Probabilistic safety, multi-agent, cluster constraints	C2MARL: Dec-POMDP, VDN, primal-dual RL
Imitation learning under uncertainty (Wang et al., 13 Jan 2024)	Chance constraint on cumulative risk, Gaussian surrogates, uncertainty ensembles	COIN: ensemble critics, behavioral cloning, analytic margin
Temporal, vector bin-pack (Reidys et al., 19 Jan 2025)	Time windows, per-resource PA/VA split, prediction-driven, physical server constraints	First-fit vector packing, RF/LSTM predictors, online mitigation

These frameworks collectively demonstrate that oversubscription planning bridges classical utility-maximizing goal satisfaction under budget, chance-constrained safety in stochastic environments, and large-scale, multidimensional, and temporal cloud resource orchestration. The field centers on principled mathematical formulations, tractable approximations or reductions, and scalable search and learning algorithms that support soft goals, partial satisfaction, and explicit risk controls.