Ordering Requests Under Prompt Sharing in LLM Serving
Ascertain optimal request-ordering policies for LLM serving systems when prompt segments are shared across requests, balancing batching benefits from shared prefixes against prioritizing small standalone requests to minimize latency and resource underutilization.
References
Shared prompts can reduce the cost of the prefill phase when requests sharing the same context are batched; however, it remains unclear how best to order such requests.
— Queueing, Predictions, and LLMs: Challenges and Open Problems
(2503.07545 - Mitzenmacher et al., 10 Mar 2025) in Section 4.2 (Adaptive Scheduling)