Learning-based Optimal Admission Control in a Single Server Queuing System (2212.11316v2)
Abstract: We consider a long-term average profit maximizing admission control problem in an M/M/1 queuing system with unknown service and arrival rates. With a fixed reward collected upon service completion and a cost per unit of time enforced on customers waiting in the queue, a dispatcher decides upon arrivals whether to admit the arriving customer or not based on the full history of observations of the queue-length of the system. (Naor 1969, Econometrica) showed that if all the parameters of the model are known, then it is optimal to use a static threshold policy -- admit if the queue-length is less than a predetermined threshold and otherwise not. We propose a learning-based dispatching algorithm and characterize its regret with respect to optimal dispatch policies for the full information model of Naor (1969). We show that the algorithm achieves an $O(1)$ regret when all optimal thresholds with full information are non-zero, and achieves an $O(\ln{1+\epsilon}(N))$ regret for any specified $\epsilon>0$, in the case that an optimal threshold with full information is $0$ (i.e., an optimal policy is to reject all arrivals), where $N$ is the number of arrivals.
- Balsubramani A (2015) Sharp finite-time iterated-logarithm martingale concentration.
- Bertsekas D (2019) Reinforcement learning and optimal control (Athena Scientific).
- Cohen A (2019a) Asymptotic analysis of a multiclass queueing control problem under heavy traffic with model uncertainty. Stoch. Syst. 9(4):359–391, URL http://dx.doi.org/10.1287/stsy.2019.0034.
- Walton N, Xu K (2021) Learning and information in stochastic networks and queues. URL http://dx.doi.org/10.48550/ARXIV.2105.08769.