Slapo: A Schedule Language

Updated 25 October 2025

The paper introduces Slapo as a formal scheduling language that decouples scheduling logic from system and model specifications through layered optimization and formal validation methods.
Slapo's design integrates dynamic and static graph primitives to enable incremental performance improvements, achieving up to 2.92× throughput gains on deep learning tasks.
It offers explainable scheduling via formal mappings to Petri nets and argumentation frameworks, facilitating transparent state-space exploration and SLO optimization.

Slapo is a schedule language developed for progressive optimization and formal modeling of scheduling tasks, with primary deployment in large-scale deep learning model training and broader potential for declarative scheduling scenario synthesis and explainable scheduling. It is distinguished by its capacity to decouple scheduling logic from system and model specification, its formal mapping to mathematical or logical frameworks such as Petri nets and argumentation, and its extensible set of high-level primitives. Slapo enables practitioners to incrementally layer performance optimizations, encode scheduling objectives (including Service Level Objectives, SLOs), and validate executable models for correctness, efficiency, and user constraints.

1. Conceptual Foundations and Formal Model

Slapo emerges at the intersection of domain-specific scheduling languages and deductive programming, allowing a separation of model specification and execution logic for scheduling problems. Inspired by declarative programming paradigms, Slapo models scheduling as a description of desired outcomes—rather than procedural steps—using high-level constructs such as objects, predicates, actions, initial and goal states. For scheduling problem domains, Slapo formalizes the model in a structure akin to the Planning Domain Definition Language (PDDL):

$\mathcal{M}_{\text{pDDL}}(\text{pd}) = (\text{predicates},\, \text{actions},\, \text{objects},\, \text{initial\_state},\, \text{goal\_state})$

Parallel to this, the scheduling logic may be interpreted as a Predicate Transition Petri Net (Pr/T Petri Net), supporting formal state-space analysis and solution synthesis:

$\mathcal{PN} = (S,\, T,\, F,\, K,\, W,\, M_0)$

where $S$ is the set of places, $T$ is the set of transitions (events/actions), $F$ is the set of arcs, $K$ is the capacity function, $W$ is the arc weight function, and $M_0$ is the initial marking.

The transformation from declarative specification to executable model is formalized with a commutative diagram:

$\begin{CD} \mathcal{M}(\text{pd}) @>TR_j>> \mathcal{PDDL}_i @>TR_k>> \mathcal{PN}_p \end{CD}$

This forms the basis for generating a reachability tree over the solution space and supports further abstraction to eFSMs or message sequence charts for protocol synthesis (Blaskovic et al., 2013).

2. Language Design: Primitives, Extensibility, and Abstraction

Slapo provides a suite of modular schedule primitives, each facilitating different axes of optimization and model transformation. These primitives are classified according to their operational context:

Dynamic Graph Primitives (PyTorch)

.replace(new_mod): Swap a model submodule for a high-performance or custom kernel.
.shard(param, axis): Partition model parameters for tensor parallelism.
.sync(type): Insert appropriate communication/synchronization operations (all-reduce, etc.).
.checkpoint(): Apply activation checkpointing at varying granularities.

Static Graph Primitives

.trace(...): Convert a model subgraph to a static computational graph (torch.fx).
.fuse(subgraph, compiler): Operator fusion targeting improved memory and compute efficiency.
.pipeline_split(): Partition a model into pipeline stages for distributed training.

Extensibility

A base primitive interface allows users to implement and register custom primitives, enabling arbitrary transformation logic, plug-in scheduling strategies, and integration of external optimization libraries.

3. Optimization Methodologies and Progressive Scheduling

Slapo excels in its progressive, "as-needed" optimization paradigm by incrementally stacking primitives on a vanilla model, maintaining both debuggability and programmatic transparency. Optimization techniques codified within the language include:

High-performance kernel substitution: E.g., replacing standard attention with Flash Attention by descriptive .replace() calls.
3D parallelism: Facilitates tensor (parameter sharding), data, and pipeline parallelism with appropriate orchestration and synchronization.
Activation checkpointing: Offers flexible trade-off between memory conservation and recomputation overhead for large model training.

The language supports stepwise optimization: a practitioner may begin with a plain PyTorch model, apply checkpointing, subsequently introduce parallelism by parameter sharding and synchronization primitives, and finally substitute high-performance kernels, all in a transparent and modular schedule specification (Chen et al., 2023).

4. Explainable Scheduling and Argumentation

Building on frameworks for explainable scheduling, Slapo can, in principle, integrate formal argumentation layers. Three types of Abstract Argumentation Frameworks (AFs) are relevant:

Feasibility AFs: Model schedule assignments and conflicts; stable extensions correspond to feasible schedules.
Optimality AFs: Encode efficient/near-optimal schedules, utilizing properties like Single Exchange Property (SEP) and Pairwise Exchange Property (PEP):

$\text{SEP}:~ C_i - C_{i'} \leq p_j,\qquad \text{PEP}:~ C_i + p_{j'} \leq C_{i'} + p_j$

Fixed Decision AFs: Represent user-imposed constraints via attack graph modifications; allow tractable, certificate-like explanations.

Slapo could supplement its schedule descriptions with explanation-generating layers, providing natural language or formal justifications for schedule properties, and facilitate interactive “what-if” analysis with instantaneous feedback on constraint satisfaction (Čyras et al., 2018).

5. Multi-SLO Scheduling and Inference Optimization

Slapo can be extended to handle multi-SLO (Service Level Objective) scenarios as encountered in LLM inference serving (Huang et al., 21 Apr 2025). SLO-aware scheduling requires:

Encoding per-request objectives, such as end-to-end latency, time-to-first-token (TTFT), and time-per-output-token (TPOT).
Computing predicted execution times via parameterized latency models:

$t_p(b, l_i) = \alpha_p b l_i + \beta_p b + \gamma_p l_i + \delta_p$

Utilizing scheduling algorithms such as simulated annealing for batch and priority assignment, maximizing the "goodput" metric:

$G = \frac{n}{t}$

where $n$ is the number of requests meeting SLOs and $t$ is total accumulated latency.

Slapo could expose schedule-level constructs for objective specification, latency modeling, and plug-in heuristic modules (e.g., simulated annealing) to optimize task ordering and batching, enabling up to $5\times$ improvement in SLO attainment and $31.6\%$ latency reduction compared to FCFS-based systems.

6. State Space Exploration and Solution Synthesis

For scheduling scenario synthesis, Slapo leverages state-space exploration techniques:

The reachability tree generated from the Petri net model enumerates all possible markings (system states) resulting from transition sequences.
Each branch is evaluated against goal constraints; valid solution paths are isolated when cumulative metrics (e.g., elapsed time $t_{elapsed} \le t_{max}$ ) and logical objectives (e.g., CTL formulas) are satisfied.
For example, in the "4ws1tob-problem," only $16$ out of $824$ paths meet all constraints. These are visualized via eFSM or message sequence charts for further protocol or code artifact synthesis (Blaskovic et al., 2013).

A plausible implication is that Slapo can be extended to generate comprehensive visualizations and skeletons for system or protocol design, grounded in formally validated solution sets.

7. Comparative Analysis, Results, and Future Directions

Slapo achieves notable throughput gains in large deep learning model training—up to $2.92\times$ on 8 NVIDIA V100 GPUs and $1.41\times$ across 64 GPUs when compared to DeepSpeed and Megatron-LM, as a direct result of systematic, layered scheduling optimizations (Chen et al., 2023).

Its separation of concerns preserves programmability and model introspection, overcoming limitations of monolithic or static-graph optimization frameworks. Limitations such as state-space explosion and integration challenges with low-level hardware systems remain active areas.

Future directions enumerated include:

Generalization of auto-scheduling methodologies akin to Ansor.
Expansion of primitive libraries and tighter distributed runtime integration.
Further abstraction for specifying complex multi-objective schedule constraints.
Exploration of SAT-based and other tractable formal verification methods for correctness and scalability.

This suggests that Slapo is positioned both as a practical schedule language for deep learning systems and as a foundation for formal, explainable scheduling in broader domains, with ongoing work to address scaling and usability in dynamic, heterogeneous environments.