Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
167 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Sample Efficient Myopic Exploration Through Multitask Reinforcement Learning with Diverse Tasks (2403.01636v2)

Published 3 Mar 2024 in stat.ML and cs.LG

Abstract: Multitask Reinforcement Learning (MTRL) approaches have gained increasing attention for its wide applications in many important Reinforcement Learning (RL) tasks. However, while recent advancements in MTRL theory have focused on the improved statistical efficiency by assuming a shared structure across tasks, exploration--a crucial aspect of RL--has been largely overlooked. This paper addresses this gap by showing that when an agent is trained on a sufficiently diverse set of tasks, a generic policy-sharing algorithm with myopic exploration design like $\epsilon$-greedy that are inefficient in general can be sample-efficient for MTRL. To the best of our knowledge, this is the first theoretical demonstration of the "exploration benefits" of MTRL. It may also shed light on the enigmatic success of the wide applications of myopic exploration in practice. To validate the role of diversity, we conduct experiments on synthetic robotic control environments, where the diverse task set aligns with the task selection by automatic curriculum learning, which is empirically shown to improve sample-efficiency.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (64)
  1. Provable benefits of representational transfer in reinforcement learning. arXiv preprint arXiv:2205.14571.
  2. Modular multitask reinforcement learning with policy sketches. In International Conference on Machine Learning, pages 166–175. PMLR.
  3. Hindsight experience replay. Advances in neural information processing systems, 30.
  4. Near-optimal regret bounds for reinforcement learning. Advances in neural information processing systems, 21.
  5. Regal: a regularization based algorithm for reinforcement learning in weakly communicating mdps. In Uncertainty in Artificial Intelligence: Proceedings of the 25th Conference, pages 35–42. AUAI Press.
  6. Curriculum learning. In Proceedings of the 26th annual international conference on machine learning, pages 41–48.
  7. Sample complexity of multi-task reinforcement learning. arXiv preprint arXiv:1309.6821.
  8. Sparse multi-task reinforcement learning. Advances in neural information processing systems, 27.
  9. Goal-conditioned reinforcement learning with imagined subgoals. In International Conference on Machine Learning, pages 1430–1440. PMLR.
  10. On the statistical efficiency of reward-free exploration in non-linear rl. arXiv preprint arXiv:2206.10770.
  11. Improved no-regret algorithms for stochastic shortest path with linear mdp. In International Conference on Machine Learning, pages 3204–3245. PMLR.
  12. Provable benefit of multitask representation learning in reinforcement learning. arXiv preprint arXiv:2206.05900.
  13. Temporally-extended {e⁢p⁢s⁢i⁢l⁢o⁢n𝑒𝑝𝑠𝑖𝑙𝑜𝑛epsilonitalic_e italic_p italic_s italic_i italic_l italic_o italic_n}-greedy exploration. arXiv preprint arXiv:2006.01782.
  14. Unifying pac and regret: Uniform pac bounds for episodic reinforcement learning. Advances in Neural Information Processing Systems, 30.
  15. Guarantees for epsilon-greedy reinforcement learning with function approximation. In International Conference on Machine Learning, pages 4666–4689. PMLR.
  16. Few-shot learning via learning the representation, provably. arXiv preprint arXiv:2002.09434.
  17. Model-free lqr design by q-function learning. Automatica, 137:110060.
  18. Can the artificial intelligence technique of reinforcement learning use continuously-monitored digital data to optimize treatment for weight loss? Journal of behavioral medicine, 42:276–290.
  19. Did we personalize? assessing personalization by an online reinforcement learning algorithm using resampling. arXiv preprint arXiv:2304.05365.
  20. Automated curriculum learning for neural networks. In international conference on machine learning, pages 1311–1320. PMLR.
  21. Multi-task deep reinforcement learning with popart. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pages 3796–3803.
  22. Self-paced curriculum learning. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 29.
  23. Jiang, N. (2018). Pac reinforcement learning with an imperfect model. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 32.
  24. Reward-free exploration for reinforcement learning. In International Conference on Machine Learning, pages 4870–4879. PMLR.
  25. Bellman eluder dimension: New rich classes of rl problems, and sample-efficient algorithms. Advances in neural information processing systems, 34:13406–13418.
  26. Provably efficient reinforcement learning with linear function approximation. In Conference on Learning Theory, pages 2137–2143. PMLR.
  27. Is pessimism provably efficient for offline rl? In International Conference on Machine Learning, pages 5084–5096. PMLR.
  28. Scalable deep reinforcement learning for vision-based robotic manipulation. In Conference on Robot Learning, pages 651–673. PMLR.
  29. Actor-critic algorithms. Advances in neural information processing systems, 12.
  30. Model-free design of stochastic lqr controller from a primal–dual optimization perspective. Automatica, 140:110253.
  31. Understanding the complexity gains of single-task rl with a curriculum. arXiv preprint arXiv:2212.12809.
  32. Personalized heartsteps: A reinforcement learning algorithm for optimizing physical activity. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, 4(1):1–22.
  33. Goal-conditioned reinforcement learning: Problems and solutions. arXiv preprint arXiv:2201.08299.
  34. When simple exploration is sample efficient: Identifying sufficient conditions for random exploration to yield pac rl algorithms. arXiv preprint arXiv:1805.09045.
  35. On the power of multitask representation learning in linear mdp. arXiv preprint arXiv:2106.08053.
  36. The benefit of multitask representation learning. Journal of Machine Learning Research, 17(81):1–32.
  37. Human-level control through deep reinforcement learning. nature, 518(7540):529–533.
  38. Curriculum learning for reinforcement learning domains: A framework and survey. The Journal of Machine Learning Research, 21(1):7382–7431.
  39. Why is posterior sampling better than optimism for reinforcement learning? In International conference on machine learning, pages 2701–2710. PMLR.
  40. Deep exploration via randomized value functions. J. Mach. Learn. Res., 20(124):1–62.
  41. Curriculum learning of multiple tasks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 5492–5500.
  42. Teacher algorithms for curriculum learning of deep rl in continuously parameterized environments. In Conference on Robot Learning, pages 835–853. PMLR.
  43. Teachmyagent: a benchmark for automatic curriculum learning in deep rl. In International Conference on Machine Learning, pages 9052–9063. PMLR.
  44. Learning representations by back-propagating errors. Nature, 323(6088):533–536.
  45. Eluder dimension and the sample complexity of optimistic exploration. Advances in Neural Information Processing Systems, 26.
  46. Learning to optimize via posterior sampling. Mathematics of Operations Research, 39(4):1221–1243.
  47. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347.
  48. Naive exploration is optimal for online lqr. In International Conference on Machine Learning, pages 8937–8948. PMLR.
  49. On the theory of transfer learning: The importance of task diversity. Advances in neural information processing systems, 33:7852–7862.
  50. Representation learning for online and offline rl in low-rank mdps. arXiv preprint arXiv:2110.04652.
  51. On reward-free reinforcement learning with linear function approximation. Advances in neural information processing systems, 33:17816–17826.
  52. A survey on curriculum learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(9):4555–4576.
  53. Optimism in reinforcement learning with generalized linear function approximation. arXiv preprint arXiv:1912.04136.
  54. Bellman-consistent pessimism for offline reinforcement learning. Advances in neural information processing systems, 34:6683–6694.
  55. Decision making problems with funnel structure: A multi-task learning approach with application to email marketing campaigns. In International Conference on Artificial Intelligence and Statistics, pages 127–135. PMLR.
  56. Representation learning beyond linear prediction functions. Advances in Neural Information Processing Systems, 34:4792–4804.
  57. On the statistical benefits of curriculum learning. In International Conference on Machine Learning, pages 24663–24682. PMLR.
  58. Nearly minimax algorithms for linear bandits with shared representation. arXiv preprint arXiv:2203.15664.
  59. Multi-task reinforcement learning with soft modularization. Advances in Neural Information Processing Systems, 33:4767–4777.
  60. Towards applicable reinforcement learning: Improving the generalization and sample efficiency with policy ensemble. arXiv preprint arXiv:2205.09284.
  61. Encouraging physical activity in patients with diabetes: intervention using a reinforcement learning system. Journal of medical Internet research, 19(10):e338.
  62. Provably efficient multi-task reinforcement learning with model transfer. Advances in Neural Information Processing Systems, 34:19771–19783.
  63. Is reinforcement learning more difficult than bandits? a near-optimal algorithm escaping the curse of horizon. In Conference on Learning Theory, pages 4528–4531. PMLR.
  64. Ensembling diverse policies improves generalizability of reinforcement learning algorithms in continuous control tasks.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets