A priori Estimates for Deep Residual Network in Continuous-time Reinforcement Learning (2402.16899v3)
Abstract: Deep reinforcement learning excels in numerous large-scale practical applications. However, existing performance analyses ignores the unique characteristics of continuous-time control problems, is unable to directly estimate the generalization error of the Bellman optimal loss and require a boundedness assumption. Our work focuses on continuous-time control problems and proposes a method that is applicable to all such problems where the transition function satisfies semi-group and Lipschitz properties. Under this method, we can directly analyze the \emph{a priori} generalization error of the Bellman optimal loss. The core of this method lies in two transformations of the loss function. To complete the transformation, we propose a decomposition method for the maximum operator. Additionally, this analysis method does not require a boundedness assumption. Finally, we obtain an \emph{a priori} generalization error without the curse of dimensionality.
- Human-level control through deep reinforcement learning. Nature, 518(7540):529–533, 2015.
- Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
- Trust region policy optimization. In International Conference on Machine Learning, pages 1889–1897. PMLR, 2015.
- Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In International Conference on Machine Learning, pages 1861–1870. PMLR, 2018.
- Mastering the game of Go with deep neural networks and tree search. Nature, 529(7587):484–489, 2016.
- Reinforcement learning based recommender systems: A survey. ACM Computing Surveys, 55(7):1–38, 2022.
- Reinforcement learning for combinatorial optimization: A survey. Computers & Operations Research, 134:105400, 2021.
- A Survey on Deep Reinforcement Learning Algorithms for Robotic Manipulation. Sensors, 23(7):3762, 2023.
- Reinforcement learning for UAV attitude control. ACM Transactions on Cyber-Physical Systems, 3(2):1–21, 2019.
- Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature, 575(7782):350–354, 2019.
- Reinforcement learning for quantitative trading. ACM Transactions on Intelligent Systems and Technology, 14(3):1–29, 2023.
- Risk bounds and rademacher complexity in batch reinforcement learning. In International Conference on Machine Learning, pages 2892–2902. PMLR, 2021.
- An L2superscript𝐿2{L}^{2}italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT Analysis of Reinforcement Learning in High Dimensions with Kernel and Neural Network Approximation. arXiv preprint arXiv:2104.07794, 2021.
- A theoretical analysis of deep Q-learning. In Learning for dynamics and control, pages 486–489. PMLR, 2020.
- On function approximation in reinforcement learning: Optimism in the face of large state spaces. arXiv preprint arXiv:2011.04622, 2020.
- Provably efficient reinforcement learning with general value function approximation. arXiv preprint arXiv:2005.10804, 2020.
- Sample complexity of offline reinforcement learning with deep ReLU networks. arXiv preprint arXiv:2103.06671, 2021.
- Deep reinforcement learning with robust and smooth policy. In International Conference on Machine Learning, pages 8707–8718. PMLR, 2020.
- Martin Riedmiller. Neural fitted Q iteration–first experiences with a data efficient neural reinforcement learning method. In Machine Learning: ECML 2005: 16th European Conference on Machine Learning, Porto, Portugal, October 3-7, 2005. Proceedings 16, pages 317–328. Springer, 2005.
- Reinforcement learning: An introduction. MIT press, 2018.
- Deep reinforcement learning with double Q-learning. In Proceedings of the AAAI conference on artificial intelligence, volume 30, 2016.
- Rainbow: Combining improvements in deep reinforcement learning. In Proceedings of the AAAI conference on artificial intelligence, volume 32, 2018.
- Justin A Boyan. Least-squares temporal difference learning. In ICML, pages 49–56, 1999.
- Linear least-squares algorithms for temporal difference learning. Machine learning, 22:33–57, 1996.
- Error propagation for approximate policy and value iteration. Advances in Neural Information Processing Systems, 23, 2010.
- Finite-sample analysis of least-squares policy iteration. Journal of Machine Learning Research, 13:3041–3074, 2012.
- Analysis of classification-based policy iteration algorithms. Journal of Machine Learning Research, 17(19):1–30, 2016.
- On the Rate of Convergence and Error Bounds for LSTD (λ𝜆\lambdaitalic_λ). In International Conference on Machine Learning, pages 1521–1529. PMLR, 2015.
- Least-squares policy iteration. The Journal of Machine Learning Research, 4:1107–1149, 2003.
- Universal ε𝜀\varepsilonitalic_ε-approximators for integrals. In Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms, pages 598–607. SIAM, 2010.
- A unified framework for approximating and clustering data. In Proceedings of the forty-third annual ACM symposium on Theory of computing, pages 569–578, 2011.
- Turning big data into tiny data: Constant-size coresets for k-means, PCA, and projective clustering. SIAM Journal on Computing, 49(3):601–657, 2020.
- Stronger generalization bounds for deep nets via a compression approach. In International Conference on Machine Learning, pages 254–263. PMLR, 2018.
- Spectrally-normalized margin bounds for neural networks. Advances in neural information processing systems, 30, 2017.
- Approximation and estimation for high-dimensional deep learning networks. arXiv preprint arXiv:1809.03090, 2018.
- Learning and generalization in overparameterized neural networks, going beyond two layers. Advances in neural information processing systems, 32, 2019.
- A priori estimates of the population risk for two-layer neural networks. arXiv preprint arXiv:1810.06397, 2018.
- A priori estimates of the population risk for residual networks. arXiv preprint arXiv:1903.02154, 2019.
- Capacity control of ReLU neural networks by basis-path norm. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pages 5925–5932, 2019.
- Norm-based capacity control in neural networks. In Conference on learning theory, pages 1376–1401. PMLR, 2015.
- Shai Shalev Shwartz and Shai Ben David. Understanding machine learning: From theory to algorithms. Cambridge university press, 2014.
- Claudio Filipi Gonçalves Dos Santos and João Paulo Papa. Avoiding overfitting: A survey on regularization methods for convolutional neural networks. ACM Computing Surveys (CSUR), 54(10s):1–25, 2022.
- Spectral norm regularization for improving the generalizability of deep learning. arXiv preprint arXiv:1705.10941, 2017.
- Sharpness-aware minimization for efficiently improving generalization. arXiv preprint arXiv:2010.01412, 2020.
- Feature selection for neural networks using group lasso regularization. IEEE Transactions on Knowledge and Data Engineering, 32(4):659–673, 2019.
- Improving interpretability and regularization in deep learning. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 26(2):256–265, 2017.
- Optimizing for interpretability in deep neural networks with tree regularization. Journal of Artificial Intelligence Research, 72:1–37, 2021.
- Andrew R Barron. Universal approximation bounds for superpositions of a sigmoidal function. IEEE Transactions on Information theory, 39(3):930–945, 1993.
Sponsored by Paperpile, the PDF & BibTeX manager trusted by top AI labs.
Get 30 days freePaper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.