Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
144 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Learning to Schedule Online Tasks with Bandit Feedback (2402.16463v1)

Published 26 Feb 2024 in cs.LG and cs.DC

Abstract: Online task scheduling serves an integral role for task-intensive applications in cloud computing and crowdsourcing. Optimal scheduling can enhance system performance, typically measured by the reward-to-cost ratio, under some task arrival distribution. On one hand, both reward and cost are dependent on task context (e.g., evaluation metric) and remain black-box in practice. These render reward and cost hard to model thus unknown before decision making. On the other hand, task arrival behaviors remain sensitive to factors like unpredictable system fluctuation whereby a prior estimation or the conventional assumption of arrival distribution (e.g., Poisson) may fail. This implies another practical yet often neglected challenge, i.e., uncertain task arrival distribution. Towards effective scheduling under a stationary environment with various uncertainties, we propose a double-optimistic learning based Robbins-Monro (DOL-RM) algorithm. Specifically, DOL-RM integrates a learning module that incorporates optimistic estimation for reward-to-cost ratio and a decision module that utilizes the Robbins-Monro method to implicitly learn task arrival distribution while making scheduling decisions. Theoretically, DOL-RM achieves convergence gap and no regret learning with a sub-linear regret of $O(T{3/4})$, which is the first result for online task scheduling under uncertain task arrival distribution and unknown reward and cost. Our numerical results in a synthetic experiment and a real-world application demonstrate the effectiveness of DOL-RM in achieving the best cumulative reward-to-cost ratio compared with other state-of-the-art baselines.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (45)
  1. Dr Amit Agarwal and Saloni Jain. 2014. Efficient Optimal Algorithm of Task Scheduling in Cloud Computing Environment. International Journal of Computer Trends and Technology 9, 7 (2014), 344–349.
  2. Afra A Alabbadi and Maysoon F Abulkhair. 2021. Multi-Objective Task Scheduling Optimization in Spatial Crowdsourcing. Algorithms 14, 3 (2021), 77–97.
  3. Task Scheduling Techniques in Cloud Computing: A Literature Survey. Future Generation Computer Systems 91 (2019), 407–415.
  4. PUNEET BANSAL. 2019. Intel Image Classification. Retrieved December 12, 2023 from https://www.kaggle.com/datasets/puneet6060/intel-image-classification/data
  5. Learning to Control Renewal Processes with Bandit Feedback. Proceedings of the ACM on Measurement and Analysis of Computing Systems 3, 2 (2019), 1–32.
  6. Budget-Constrained Bandits over General Cost and Reward Distributions. In Proceedings of AISTATS.
  7. Cascading Bandits with Two-Level Feedback. In Proceedings of IEEE ISIT.
  8. Budgeted Combinatorial Multi-Armed Bandits. arXiv preprint arXiv:2202.03704 (2022).
  9. Task Matching and Scheduling for Multiple Workers in Spatial Crowdsourcing. In Proceedings of ACM SIGSPATIAL.
  10. Multi-Armed Bandit with Budget Constraint and Variable Costs. In Proceedings of AAAI.
  11. Maciej Drozdowski. 1996. Scheduling Multiprocessor Tasks: An Overview. European Journal of Operational Research 94, 2 (1996), 215–230.
  12. Bennett Fox. 1966. Markov Renewal Programming by Linear Fractional Programming. SIAM J. Appl. Math. 14, 6 (1966), 1418–1432.
  13. Cost-Aware Learning and Optimization for Opportunistic Spectrum Access. IEEE Transactions on Cognitive Communications and Networking 5, 1 (2018), 15–27.
  14. Cost-Aware Cascading Bandits. IEEE Transactions on Signal Processing 68 (2020), 3692–3706.
  15. Bansor: Improving Tensor Program Auto-Scheduling with Bandit Based Reinforcement Learning. In Proceedings of IEEE ICTAI.
  16. Task Scheduling Optimization in Cloud Computing Based on Heuristic Algorithm. Journal of Networks 7, 3 (2012), 547–553.
  17. Task Scheduling in Multiprocessor System using Genetic Algorithm. In Proceedings of IEEE ICMLC.
  18. VIJAY GUPTA. 2019. Weather Classification. Retrieved July 13, 2023 from https://www.kaggle.com/datasets/vijaygiitk/multiclass-weather-dataset
  19. Budgeted Multi-Armed Bandits with Asymmetric Confidence Intervals. arXiv preprint arXiv:2306.07071 (2023).
  20. Mohammad Hossin and Md Nasir Sulaiman. 2015. A Review on Evaluation Metrics for Data Classification Evaluations. International Journal of Data Mining & Knowledge Management Process 5, 2 (2015), 01–11.
  21. Multiprocessor Task Scheduling in Multistage Hybrid Flow-Shops: A Parallel Greedy Algorithm Approach. Applied Soft Computing 10, 4 (2010), 1293–1300.
  22. QoS-Based Task Scheduling in Crowdsourcing Environments. In Proceedings of ICSOC.
  23. MURAT KOKLU. 2022. Rice Image Dataset. Retrieved December 12, 2023 from https://www.kaggle.com/datasets/muratkokludataset/rice-image-dataset/data
  24. Tor Lattimore and Csaba Szepesvári. 2020. Bandit Algorithms. Cambridge University Press.
  25. Haifang Li and Yingce Xia. 2017. Infinitely Many-Armed Bandits with Budget Constraints. In Proceedings of AAAI.
  26. Age of processing: Age-Driven Status Sampling and Processing Offloading for Edge-Computing-Enabled Real-Time IoT Applications. IEEE Internet of Things Journal 8, 19 (2021), 14471–14484.
  27. Pond: Pessimistic-Optimistic Online Dispatching. arXiv preprint arXiv:2010.09995 (2020).
  28. An Efficient Pessimistic-Optimistic Algorithm for Stochastic Linear Bandits with General Constraints. In Advances in Neural Information Processing Systems 34 - 35th Conference on Neural Information Processing Systems, NeurIPS 2021.
  29. Michael J Neely. 2010. Stochastic Network Optimization with Application to Communication and Queueing Systems. Synthesis Lectures on Communication Networks 3, 1 (2010), 1–211.
  30. Michael J. Neely. 2013. Dynamic Optimization and Learning for Renewal Systems. IEEE Trans. Automat. Control 58, 1 (2013), 32–46.
  31. Michael J Neely. 2021. Fast Learning for Renewal Optimization in Online Task Scheduling. Journal of Machine Learning Research 22, 1 (2021), 12785–12828.
  32. Evolutionary Algorithm-Based Multi-Objective Task Scheduling Optimization Model in Cloud Environments. World Wide Web 18, 6 (2015), 1737–1757.
  33. MAHMOUD REDA. 2021. Satellite Image Classification. Retrieved December 12, 2023 from https://www.kaggle.com/datasets/mahmoudreda55/satellite-image-classification/data
  34. Herbert Robbins and Sutton Monro. 1951. A Stochastic Approximation Method. The Annals of Mathematical Statistics 22, 3 (1951), 400–407.
  35. A Tutorial on Thompson Sampling. Foundations and Trends® in Machine Learning 11, 1 (2018), 1–96.
  36. SACHIN. 2019. Cats-vs-Dogs. Retrieved July 15, 2023 from https://www.kaggle.com/datasets/shaunthesheep/microsoft-catsvsdogs-dataset
  37. Learning to Plan Variable Length Sequences of Actions with a Cascading Bandit Click Model of User Feedback. In Proceedings of AISTATS.
  38. Rayadurgam Srikant and Lei Ying. 2014. Communication Networks: An Optimization, Control, and Stochastic Networks Perspective. Cambridge University Press.
  39. Reinforcement Learning for Cost-Aware Markov Decision Processes. In Proceedings of ICML.
  40. Epsilon-First Policies for Budget-Limited Multi-Armed Bandits. In Proceedings of AAAI.
  41. Knapsack Based Optimal Policies for Budget-Limited Multi-Armed Bandits. In Proceedings of AAAI.
  42. A Cascading Bandit Approach To Efficient Mobility Management In Ultra-Dense Networks. In Proceedings of International Workshop on IEEE MLSP.
  43. Budgeted Bandit Problems with Continuous Random Costs. In Proceedings of ACML.
  44. Finite Budget Analysis of Multi-Armed Bandit Problems. Neurocomputing 258 (2017), 13–29.
  45. Datong Zhou and Claire Tomlin. 2018. Budget-Constrained Multi-Armed Bandits with Multiple Plays. In Proceedings of AAAI.

Summary

We haven't generated a summary for this paper yet.