PCL-Indexability and Whittle Index for Restless Bandits with General Observation Models (2307.03034v2)
Abstract: In this paper, we consider a general observation model for restless multi-armed bandit problems. The operation of the player needs to be based on certain feedback mechanism that is error-prone due to resource constraints or environmental or intrinsic noises. By establishing a general probabilistic model for dynamics of feedback/observation, we formulate the problem as a restless bandit with a countable belief state space starting from an arbitrary initial belief (a priori information). We apply the achievable region method with partial conservation law (PCL) to the infinite-state problem and analyze its indexability and priority index (Whittle index). Finally, we propose an approximation process to transform the problem into which the AG algorithm of Ni~no-Mora and Bertsimas for finite-state problems can be applied to. Numerical experiments show that our algorithm has an excellent performance.
- Optimality of myopic sensing in multichannel opportunistic access. IEEE Transactions on Information Theory, 55(9):4040–4050.
- Linear programming in infinite-dimensional spaces: theory and applications. John Wiley & Sons.
- Bandit problems: sequential allocation of experiments (monographs on statistics and applied probability). London: Chapman and Hall, 5(71-87):7–7.
- Bertsimas, D. (1995). The achievable region method in the optimal control of queueing systems; formulations, bounds and policies. Queueing systems, 21:337–389.
- Conservation laws, extended polymatroids and multiarmed bandit problems; a polyhedral approach to indexable systems. Mathematics of Operations Research, 21(2):257–306.
- Restless bandits, linear programming relaxations, and a primal-dual index heuristic. Operations Research, 48(1):80–90.
- A characterization of waiting time performance realizable by single-server queues. Operations Research, 28(3-part-ii):810–821.
- The region of achievable performance in a model of Klimov.
- The irrevocable multiarmed bandit problem. Operations Research, 59(2):383–399.
- Characterization and optimization of achievable performance in general queueing systems. Operations Research, 36(5):733–741.
- M/g/c queueing systems with multiple customer classes: Characterization and control of achievable performance under nonpreemptive priority rules. Management Science, 34(9):1121–1138.
- Four proofs of gittins’ multiarmed bandit theorem. Applied Probability Trust, 70:427.
- Analysis and synthesis of computer systems, volume 4. World Scientific.
- Multi-armed bandit allocation indices. John Wiley & Sons.
- Gittins, J. C. (1979). Bandit processes and dynamic allocation indices. Journal of the Royal Statistical Society: Series B (Methodological), 41(2):148–164.
- Index policies for a class of discounted restless bandits. Advances in Applied Probability, 34(4):754–774.
- Portfolio allocation for bayesian optimization. In UAI, pages 327–336.
- Klimov, G. P. (1975). Time-sharing service systems. i. Theory of Probability & Its Applications, 19(3):532–551.
- Liu, K. (2021). Index policy for a class of partially observable markov decision processes. arXiv preprint arXiv:2107.11939.
- Indexability and whittle index for restless bandit problems involving reset processes. In 2011 50th IEEE Conference on Decision and Control and European Control Conference, pages 7690–7696. IEEE.
- Indexability of restless bandit problems and optimality of whittle index for dynamic multichannel access. IEEE Transactions on Information Theory, 56(11):5547–5567.
- Dynamic multichannel access with imperfect channel state detection. IEEE Transactions on Signal Processing, 58(5):2795–2808.
- Niño-Mora, J. (2001). Restless bandits, partial conservation laws and indexability. Advances in Applied Probability, 33(1):76–98.
- Niño-Mora, J. (2002). Dynamic allocation indices for restless projects and queueing admission control: a polyhedral approach. Mathematical programming, 93(3):361–413.
- Niño-Mora, J. (2006). Restless bandit marginal productivity indices, diminishing returns, and optimal control of make-to-order/make-to-stock m/g/1 queues. Mathematics of Operations Research, 31(1):50–84.
- Niño-Mora, J. (2007). Dynamic priority allocation via restless bandit marginal productivity indices. Top, 15(2):161–198.
- The complexity of optimal queueing network control. In Proceedings of IEEE 9th Annual Conference on Structure in Complexity Theory, pages 318–322. IEEE.
- Press, W. H. (2009). Bandit solutions provide unified ethical models for randomized clinical trials and comparative effectiveness research. Proceedings of the National Academy of Sciences, 106(52):22387–22392.
- Robbins, H. (1952). Some aspects of the sequential design of experiments.
- Rudin, W. et al. (1976). Principles of mathematical analysis, volume 3. McGraw-hill New York.
- Portfolio choices with orthogonal bandit learning. In Twenty-fourth international joint conference on artificial intelligence.
- The optimal control of partially observable markov processes over a finite horizon. Operations research, 21(5):1071–1088.
- Sondik, E. J. (1978). The optimal control of partially observable markov processes over the infinite horizon: Discounted costs. Operations research, 26(2):282–304.
- On an index policy for restless bandits. Journal of applied probability, 27(3):637–648.
- Whittle, P. (1988). Restless bandits: Activity allocation in a changing world. Journal of applied probability, 25(A):287–298.
- A survey of dynamic spectrum access. IEEE signal processing magazine, 24(3):79–89.