Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

PCL-Indexability and Whittle Index for Restless Bandits with General Observation Models (2307.03034v2)

Published 6 Jul 2023 in stat.ML and cs.LG

Abstract: In this paper, we consider a general observation model for restless multi-armed bandit problems. The operation of the player needs to be based on certain feedback mechanism that is error-prone due to resource constraints or environmental or intrinsic noises. By establishing a general probabilistic model for dynamics of feedback/observation, we formulate the problem as a restless bandit with a countable belief state space starting from an arbitrary initial belief (a priori information). We apply the achievable region method with partial conservation law (PCL) to the infinite-state problem and analyze its indexability and priority index (Whittle index). Finally, we propose an approximation process to transform the problem into which the AG algorithm of Ni~no-Mora and Bertsimas for finite-state problems can be applied to. Numerical experiments show that our algorithm has an excellent performance.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (36)
  1. Optimality of myopic sensing in multichannel opportunistic access. IEEE Transactions on Information Theory, 55(9):4040–4050.
  2. Linear programming in infinite-dimensional spaces: theory and applications. John Wiley & Sons.
  3. Bandit problems: sequential allocation of experiments (monographs on statistics and applied probability). London: Chapman and Hall, 5(71-87):7–7.
  4. Bertsimas, D. (1995). The achievable region method in the optimal control of queueing systems; formulations, bounds and policies. Queueing systems, 21:337–389.
  5. Conservation laws, extended polymatroids and multiarmed bandit problems; a polyhedral approach to indexable systems. Mathematics of Operations Research, 21(2):257–306.
  6. Restless bandits, linear programming relaxations, and a primal-dual index heuristic. Operations Research, 48(1):80–90.
  7. A characterization of waiting time performance realizable by single-server queues. Operations Research, 28(3-part-ii):810–821.
  8. The region of achievable performance in a model of Klimov.
  9. The irrevocable multiarmed bandit problem. Operations Research, 59(2):383–399.
  10. Characterization and optimization of achievable performance in general queueing systems. Operations Research, 36(5):733–741.
  11. M/g/c queueing systems with multiple customer classes: Characterization and control of achievable performance under nonpreemptive priority rules. Management Science, 34(9):1121–1138.
  12. Four proofs of gittins’ multiarmed bandit theorem. Applied Probability Trust, 70:427.
  13. Analysis and synthesis of computer systems, volume 4. World Scientific.
  14. Multi-armed bandit allocation indices. John Wiley & Sons.
  15. Gittins, J. C. (1979). Bandit processes and dynamic allocation indices. Journal of the Royal Statistical Society: Series B (Methodological), 41(2):148–164.
  16. Index policies for a class of discounted restless bandits. Advances in Applied Probability, 34(4):754–774.
  17. Portfolio allocation for bayesian optimization. In UAI, pages 327–336.
  18. Klimov, G. P. (1975). Time-sharing service systems. i. Theory of Probability & Its Applications, 19(3):532–551.
  19. Liu, K. (2021). Index policy for a class of partially observable markov decision processes. arXiv preprint arXiv:2107.11939.
  20. Indexability and whittle index for restless bandit problems involving reset processes. In 2011 50th IEEE Conference on Decision and Control and European Control Conference, pages 7690–7696. IEEE.
  21. Indexability of restless bandit problems and optimality of whittle index for dynamic multichannel access. IEEE Transactions on Information Theory, 56(11):5547–5567.
  22. Dynamic multichannel access with imperfect channel state detection. IEEE Transactions on Signal Processing, 58(5):2795–2808.
  23. Niño-Mora, J. (2001). Restless bandits, partial conservation laws and indexability. Advances in Applied Probability, 33(1):76–98.
  24. Niño-Mora, J. (2002). Dynamic allocation indices for restless projects and queueing admission control: a polyhedral approach. Mathematical programming, 93(3):361–413.
  25. Niño-Mora, J. (2006). Restless bandit marginal productivity indices, diminishing returns, and optimal control of make-to-order/make-to-stock m/g/1 queues. Mathematics of Operations Research, 31(1):50–84.
  26. Niño-Mora, J. (2007). Dynamic priority allocation via restless bandit marginal productivity indices. Top, 15(2):161–198.
  27. The complexity of optimal queueing network control. In Proceedings of IEEE 9th Annual Conference on Structure in Complexity Theory, pages 318–322. IEEE.
  28. Press, W. H. (2009). Bandit solutions provide unified ethical models for randomized clinical trials and comparative effectiveness research. Proceedings of the National Academy of Sciences, 106(52):22387–22392.
  29. Robbins, H. (1952). Some aspects of the sequential design of experiments.
  30. Rudin, W. et al. (1976). Principles of mathematical analysis, volume 3. McGraw-hill New York.
  31. Portfolio choices with orthogonal bandit learning. In Twenty-fourth international joint conference on artificial intelligence.
  32. The optimal control of partially observable markov processes over a finite horizon. Operations research, 21(5):1071–1088.
  33. Sondik, E. J. (1978). The optimal control of partially observable markov processes over the infinite horizon: Discounted costs. Operations research, 26(2):282–304.
  34. On an index policy for restless bandits. Journal of applied probability, 27(3):637–648.
  35. Whittle, P. (1988). Restless bandits: Activity allocation in a changing world. Journal of applied probability, 25(A):287–298.
  36. A survey of dynamic spectrum access. IEEE signal processing magazine, 24(3):79–89.

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets