Data-Driven Knowledge Transfer in Batch $Q^*$ Learning (2404.15209v2)
Abstract: In data-driven decision-making in marketing, healthcare, and education, it is desirable to utilize a large amount of data from existing ventures to navigate high-dimensional feature spaces and address data scarcity in new ventures. We explore knowledge transfer in dynamic decision-making by concentrating on batch stationary environments and formally defining task discrepancies through the lens of Markov decision processes (MDPs). We propose a framework of Transferred Fitted $Q$-Iteration algorithm with general function approximation, enabling the direct estimation of the optimal action-state function $Q*$ using both target and source data. We establish the relationship between statistical performance and MDP task discrepancy under sieve approximation, shedding light on the impact of source and target sample sizes and task discrepancy on the effectiveness of knowledge transfer. We show that the final learning error of the $Q*$ function is significantly improved from the single task rate both theoretically and empirically.
- Provable benefits of representational transfer in reinforcement learning. In Proceedings of Thirty Sixth Conference on Learning Theory, Volume 195.
- Successor features for transfer in reinforcement learning. In Advances in Neural Information Processing Systems, Volume 30.
- Transfer learning for nonparametric regression: Non-asymptotic minimax analysis and adaptive procedure. arXiv preprint arXiv:2401.12272.
- Transfer learning for nonparametric classification: Minimax rate and adaptive classifier. The Annals of Statistics 49(1), 100–128.
- Transferred Q-learning. arXiv preprint arXiv:2202.04709.
- Information-theoretic considerations in batch reinforcement learning. In Proceedings of the 36th International Conference on Machine Learning, Volume 97.
- Chen, X. (2007). Large sample sieve estimation of semi-nonparametric models. Handbook of Econometrics 6, 5549–5632.
- Provable benefit of multitask representation learning in reinforcement learning. In Advances in Neural Information Processing Systems, Volume 35.
- Adaptive and robust multi-task learning. The Annals of Statistics 51(5), 2015–2039.
- Tree-based batch mode reinforcement learning. Journal of Machine Learning Research 6(18), 503–556.
- A theoretical analysis of deep Q-learning. In Proceedings of the 2nd Conference on Learning for Dynamics and Control, Volume 120.
- Huang, J. Z. (1998). Projection estimation in multiple regression with application to functional ANOVA models. The Annals of Statistics 26(1), 242–272.
- MIMIC-III, a freely accessible critical care database. Scientific Data 3(1), 1–9.
- The artificial intelligence clinician learns optimal treatment strategies for sepsis in intensive care. Nature Medicine 24(11), 1716–1720.
- Settling the sample complexity of model-based offline reinforcement learning. The Annals of Statistics 52(1), 233–260.
- Transfer learning for high-dimensional linear regression: Prediction, estimation and minimax optimality. Journal of the Royal Statistical Society Series B: Statistical Methodology 84(1), 149–173.
- Transfer learning in large-scale Gaussian graphical models with false discovery rate control. Journal of the American Statistical Association 118(543), 2171–2183.
- Estimation and inference for high-dimensional generalized linear models with knowledge transfer. Journal of the American Statistical Association, 1–12.
- Liu, X. (2023). Dynamic coupon targeting using batch deep reinforcement learning: An application to livestream shopping. Marketing Science 42(4), 637–658.
- On the power of multitask representation learning in linear MDP. arXiv preprint arXiv:2106.08053.
- Imitation-regularized offline learning. In Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics, Volume 89.
- Minimax optimal approaches to the label shift problem in non-parametric settings. The Journal of Machine Learning Research 23(1), 15698–15742.
- Finite-time bounds for fitted value iteration. Journal of Machine Learning Research 9(27), 815–857.
- Murphy, S. A. (2005). A generalization error for Q-learning. Journal of Machine Learning Research 6(37), 1073–1097.
- A reinforcement learning approach to weaning of mechanical ventilation in intensive care units. arXiv preprint arXiv:1704.06300.
- Faster teaching via POMDP planning. Cognitive Science 40(6), 1290–1332.
- Riedmiller, M. (2005). Neural fitted Q iteration–first experiences with a data efficient neural reinforcement learning method. In Machine Learning: ECML 2005, Volume 3720.
- Statistical inference of the value function for reinforcement learning in infinite-horizon settings. Journal of the Royal Statistical Society Series B: Statistical Methodology 84(3), 765–793.
- Pessimistic Q-learning for offline reinforcement learning: Towards optimal sample complexity. In Proceedings of the 39th International Conference on Machine Learning, Volume 162.
- Reward is enough. Artificial Intelligence 299, 103535.
- Sutton, R. S. and A. G. Barto (2018). Reinforcement Learning: An Introduction. MIT Press.
- Transfer learning under high-dimensional generalized linear models. Journal of the American Statistical Association 118(544), 2684–2697.
- Leveraging demonstrations for deep reinforcement learning on robotics problems with sparse rewards. arXiv preprint arXiv:1707.08817.
- Minimax optimal transfer learning for kernel-based nonparametric regression. arXiv preprint arXiv:2310.13966.
- Wang, K. (2023). Pseudo-labeling for kernel ridge regression under covariate shift. arXiv preprint arXiv:2302.10160.
- Q* approximation schemes for batch reinforcement learning: A theoretical comparison. In Proceedings of the 36th Conference on Uncertainty in Artificial Intelligence, Volume 124.
- Batch value-function approximation with only realizability. In Proceedings of the 38th International Conference on Machine Learning, Volume 139.
- Model-based reinforcement learning is minimax-optimal for offline zero-sum Markov games. arXiv preprint arXiv:2206.04044.
- Federated natural policy gradient methods for multi-task reinforcement learning. arXiv preprint arXiv:2311.00201.
- Knowledge transfer for deep reinforcement learning with hierarchical experience replay. In Proceedings of the AAAI Conference on Artificial Intelligence, Volume 31.
- Optimal multi-distribution learning. arXiv preprint arXiv:2312.05134.
- Transfer learning in deep reinforcement learning: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(11), 13344–13362.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.