ReflecSched Architecture
- ReflecSched is a dual-purpose scheduling architecture combining DFJSP and datacenter scheduling via hierarchical reflection and clear modular decomposition.
- Its DFJSP design leverages LLM-guided, multi-horizon simulation to optimize job assignments and reduce token usage while mitigating myopic decision-making.
- The datacenter reference architecture formalizes scheduler stages and policies, ensuring robust, reproducible performance through explicit mechanism-policy separation.
ReflecSched is a dual-meaning term in scheduling research, denoting both a domain-specific LLM-powered framework for Dynamic Flexible Job-Shop Scheduling (DFJSP) (Cao et al., 3 Aug 2025) and a reference architecture for datacenter scheduling systems (Andreadis et al., 2018). The DFJSP instantiation leverages hierarchical multi-horizon simulation and LLM-guided reflection to address structural pitfalls in direct LLM scheduling, while the reference architecture formalizes the modular decomposition and workflow orchestration of complex datacenter schedulers. Both formulations prioritize explicit separation of mechanism and policy, hierarchical structure, and fine-grained workflow specification for robust, comparative, and reproducible scheduling.
1. Formal Specification and Compositional Structure
ReflecSched architectures are defined by detailed, explicit specification of pipeline components and their interactions. For DFJSP, the architecture partitions the pipeline into two sequential modules:
- Hierarchical Reflection Module: Activated on stochastic events (e.g., JobArrival, MachineBreakdown), this module ingests the current shop-floor state , performs multi-horizon simulations under diverse priority dispatching rules (PDRs), and synthesizes experience in natural language.
- Experience-Guided Decision-Making Module: Invoked at all decision points, it consumes both current state and the distilled experience summary , prompting the LLM to select non-myopic actions.
In the datacenter context, the ReflecSched reference architecture (RA) is formally characterized as:
where Stages enumerates all workflow steps (Jobs J₁–J₇, Tasks T₁–T₁₂, Management M₁–M₆, Broker B, Resources R₁–R₇), denotes workflow data/control-flow, Groups partitions stages into the four semantic components, and Policies denotes the set of selectable policy algorithms at key control points (Andreadis et al., 2018). Each stage is a function mapping inputs to outputs plus possible side-effects.
2. Module Inputs, Outputs, and Stage Granularity
In the DFJSP-specific ReflecSched design, state and event flows are precisely structured:
- State : Split into immutable (job structures, machine specs, processing times) and mutable (available operations queue , machine statuses, time ) components.
- Event : JSON-like records triggering reflection, specifying the stochastic event type and relevant parameters.
- Simulation Traces : Sequences of state-action pairs and cost metrics over planning horizons , retaining explicit PDR and Gantt timelines.
- Strategic Experience : A concise natural-language summary, distilled from simulation extremes (best/worst traced makespan trajectories).
- Final Action : Operation–machine assignment selected by the LLM conditioned on .
For datacenter scheduling, the reference architecture requires every stage to define its input/output domains, policy dimensions, and side effects. Composite job and task flows are expressed as directed acyclic graphs over stages, e.g., incoming job admission (J₁–J₃), per-job/task iteration and setup (J₄–J₅, T₁–T₅), monitoring and cleanup (T₆–T₈, J₆–J₇), and resource management (R₁–R₇).
3. Hierarchical and Reflective Mechanisms
The hierarchical reflection mechanism in DFJSP ReflecSched orchestrates multi-level simulation and experience synthesis:
- At each level (from coarse- to fine-grained), rollouts are generated using randomized base policies from the PDR pool.
- For , these are long-horizon rollouts; at , each candidate action in is explored via short-horizon trajectory extensions.
- Cost for each trace is given by .
- Best/worst traces are selected for strategic experience distillation .
- Final decision prompts supply both current context and the latest to the LLM, ensuring global–local insight integration.
The datacenter RA implements hierarchy by defining global and local schedulers with identical stage-sets. Tasks are dispatched from global to local scopes (at M₂), with heavy per-task state and monitoring responsibilities off-loaded to local schedulers to achieve scalability, modularity, and fault isolation.
4. Policy/Mechanism Separation and Prompt Engineering
Both DFJSP ReflecSched and the datacenter RA enforce a strict separation of fixed mechanism and tunable policy at every decision locus:
- For example, resource selection stage R₅ is formalized as: first, a mechanism computes feasible allocations; second, a policy chooses among these (e.g., FirstFit, BestFit).
- In DFJSP, explicit prompt-engineering strategies are followed: zero-shot invocation, chain-of-thought cues, bullet-point summaries, and bounded context lengths (dynamic prompt length tokens versus baseline $3$k–$4$k tokens).
- Rollout, reflection, and decision stages employ different sampling temperatures ( majority-vote for reflection/BEST selection, deterministic inference for final action).
A tabular summary of the inputs/outputs of key DFJSP ReflecSched modules is provided below.
| Module | Inputs | Outputs |
|---|---|---|
| Hierarchical Reflection | , event | Strategic Experience |
| Simulation (per level ) | , PDR policy pool, horizon | Rollout traces |
| Experience-Guided Decision | , | Next action |
5. Performance Metrics and Empirical Validation
The effectiveness of ReflecSched is evaluated through several key metrics:
- Relative Percentage Deviation (RPD): , quantifying solution quality versus the best observed makespan.
- Win Rate (WR): , comparing instance-level success rates.
- Token Consumption: ReflecSched achieves 35% lower token usage per instance compared to LLM-direct baselines.
- Robustness: Performance is stable across model sizes (8B–32B) and problem scales (Small/Normal).
- Ablation Study: With and , multi-level reflection achieves RPD; single-level () plateaus at .
- Theoretical Guarantee: Under faithful reflection, (ReflecSched’s decision policy) is never worse than (any constituent heuristic).
For datacenter scheduling, simulation using trace replay and real-world “dry-run” experiments validate that underspecification of even a few policy stages yields up to variation in job makespan, highlighting the necessity of RA's fine-grained modularity.
6. Pitfall Mitigation and Architectural Advantages
DFJSP ReflecSched directly addresses three fundamental LLM scheduling pitfalls:
- Long-Context Paradox: Avoided by compressing static and simulated strategic data into , a concise text summary, rather than raw/full prompts.
- Under-utilization of Heuristics: Simulation trajectories are fully heuristic-driven, ensuring key PDR behavior is represented in distilled experience.
- Myopic Greed: Multi-horizon reflection exposes and encodes trade-offs, supporting global (non-myopic) decision-making.
The datacenter RA's benefit is formally grounded in its minimal, complete decomposition of scheduler stages, strict responsibility separation, and ability to be mapped onto both classical and modern schedulers for reproducibility and comparability.
7. Illustrative Example and Significance
A representative DFJSP ReflecSched iteration proceeds as follows: a machine breakdown at triggers hierarchical reflection; the state is extracted and multi-level rollouts are executed; the LLM summarizes strategic differences (“prioritize operations that free up M2... defer processing on M3...”). The informed decision prompt leads the LLM to output an assignment (“Assign O7→M5 now”) that exhibits lookahead not present in direct policies (Cao et al., 3 Aug 2025).
The reference architecture is extensible, formally permitting both single-level and multi-level instantiations, and is agnostic to specific policy selections, making it suitable as a foundation for both theoretical analyses and practical system design/comparison across a range of cluster, cloud, and job-shop scheduling scenarios (Andreadis et al., 2018).