Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 59 tok/s
Gemini 2.5 Pro 52 tok/s Pro
GPT-5 Medium 40 tok/s Pro
GPT-5 High 27 tok/s Pro
GPT-4o 104 tok/s Pro
Kimi K2 195 tok/s Pro
GPT OSS 120B 467 tok/s Pro
Claude Sonnet 4 37 tok/s Pro
2000 character limit reached

Offline Supervised Learning V.S. Online Direct Policy Optimization: A Comparative Study and A Unified Training Paradigm for Neural Network-Based Optimal Feedback Control (2211.15930v3)

Published 29 Nov 2022 in math.OC and cs.AI

Abstract: This work is concerned with solving neural network-based feedback controllers efficiently for optimal control problems. We first conduct a comparative study of two prevalent approaches: offline supervised learning and online direct policy optimization. Albeit the training part of the supervised learning approach is relatively easy, the success of the method heavily depends on the optimal control dataset generated by open-loop optimal control solvers. In contrast, direct policy optimization turns the optimal control problem into an optimization problem directly without any requirement of pre-computing, but the dynamics-related objective can be hard to optimize when the problem is complicated. Our results underscore the superiority of offline supervised learning in terms of both optimality and training time. To overcome the main challenges, dataset and optimization, in the two approaches respectively, we complement them and propose the Pre-train and Fine-tune strategy as a unified training paradigm for optimal feedback control, which further improves the performance and robustness significantly. Our code is accessible at https://github.com/yzhao98/DeepOptimalControl.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (40)
  1. J. Han, W. E, Deep learning approximation for stochastic control problems, arXiv preprint arXiv:1611.07422 (2016).
  2. Adaptive deep learning for high-dimensional Hamilton–Jacobi–Bellman equations, SIAM Journal on Scientific Computing 43 (2021) A1221–A1247.
  3. AI Pontryagin or how artificial neural networks learn to control dynamical systems, Nature Communications 13 (2022) 333.
  4. Empowering optimal control with machine learning: A perspective from model predictive control, arXiv preprint arXiv:2205.07990 (2022).
  5. Faster policy learning with continuous-time gradients, in: Learning for Dynamics and Control, PMLR, 2021, pp. 1054–1067.
  6. Optimal feedback law recovery by gradient-augmented sparse polynomial regression, The Journal of Machine Learning Research 22 (2021) 2205–2236.
  7. Learning optimal feedback operators and their sparse polynomial approximations, Journal of Machine Learning Research 24 (2023) 1–38.
  8. W. Kang, L. C. Wilcox, Mitigating the curse of dimensionality: sparse grid characteristics method for optimal feedback control and HJB equations, Computational Optimization and Applications 68 (2017) 289–315.
  9. Kernel dependency estimation, Advances in neural information processing systems 15 (2002).
  10. Sympocnet: Solving optimal control problems with applications to high-dimensional multiagent path planning problems, SIAM Journal on Scientific Computing 44 (2022) B1341–B1368.
  11. A neural network approach for high-dimensional optimal control applied to multiagent path finding, IEEE Transactions on Control Systems Technology 31 (2022) 235–251.
  12. J. Kierzenka, L. F. Shampine, A bvp solver based on residual control and the maltab pse, ACM Transactions on Mathematical Software (TOMS) 27 (2001) 299–316.
  13. An algorithmic perspective on imitation learning, Foundations and Trends® in Robotics 7 (2018) 1–179.
  14. M. Bain, C. Sammut, A framework for behavioural cloning., in: Machine Intelligence 15, 1995, pp. 103–129.
  15. H. Bock, K. Plitt, A multiple shooting algorithm for direct solution of optimal control problems, IFAC Proceedings Volumes 17 (1984) 1603–1608.
  16. J. T. Betts, Survey of numerical methods for trajectory optimization, Journal of Guidance, Control, and Dynamics 21 (1998) 193–207.
  17. I. M. Ross, F. Fahroo, A direct method for solving nonsmooth optimal control problems, IFAC Proceedings Volumes 35 (2002) 479–484.
  18. Fast direct multiple shooting algorithms for optimal robot control, in: Fast motions in biomechanics and robotics, Springer, 2006, pp. 65–93.
  19. K. Kunisch, D. Walter, Semiglobal optimal feedback stabilization of autonomous systems via deep neural network approximation, ESAIM: Control, Optimisation and Calculus of Variations 27 (2021) 16.
  20. Learning representations by back-propagating errors, Nature 323 (1986) 533–536.
  21. Proximal policy optimization algorithms, arXiv preprint arXiv:1707.06347 (2017).
  22. A machine learning enhanced algorithm for the optimal landing problem, in: 3rd Annual Conference on Mathematical and Scientific Machine Learning, PMLR, 2022, pp. 1–20.
  23. J. Long, J. Han, Perturbational complexity by distribution mismatch: A systematic analysis of reinforcement learning in reproducing kernel hilbert space, Journal of Machine Learning vol 1 (2022) 1–34.
  24. Initial value problem enhanced sampling for closed-loop optimal control design with deep neural networks, arXiv preprint arXiv:2209.04078 (2022).
  25. Awac: Accelerating online reinforcement learning with offline datasets, arXiv preprint arXiv:2006.09359 (2020).
  26. Offline-to-online reinforcement learning via balanced replay and pessimistic q-ensemble, in: Conference on Robot Learning, PMLR, 2022, pp. 1702–1712.
  27. Offline reinforcement learning: Tutorial, review, and perspectives on open problems, arXiv preprint arXiv:2005.01643 (2020).
  28. W. Kang, L. Wilcox, A causality free computational method for HJB equations with application to rigid body satellites, in: AIAA Guidance, Navigation, and Control Conference, 2015, p. 2009.
  29. Design and control of an indoor micro quadrotor, in: IEEE International Conference on Robotics and Automation, volume 5, IEEE, 2004, pp. 4393–4398.
  30. T. Madani, A. Benallegue, Control of a quadrotor mini-helicopter via full state backstepping technique, in: Proceedings of the 45th IEEE Conference on Decision and Control, IEEE, 2006, pp. 1515–1520.
  31. Multirotor aerial vehicles: Modeling, estimation, and control of quadrotor, IEEE Robotics and Automation magazine 19 (2012) 20–32.
  32. How does batch normalization help optimization?, in: Advances in Neural Information Processing Systems, volume 32, 2018.
  33. Adaptive checkpoint adjoint method for gradient estimation in neural ODE, in: International Conference on Machine Learning, PMLR, 2020, pp. 11639–11649.
  34. Neural ordinary differential equations, in: Advances in Neural Information Processing Systems, volume 32, 2018.
  35. Anode: Unconditionally accurate memory-efficient gradients for neural odes, arXiv preprint arXiv:1902.10298 (2019).
  36. Learning robust perceptive locomotion for quadrupedal robots in the wild, Science Robotics 7 (2022) eabk2822.
  37. Training language models to follow instructions with human feedback, in: S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, A. Oh (Eds.), Advances in Neural Information Processing Systems, volume 35, Curran Associates, Inc., 2022, pp. 27730–27744.
  38. Stable-baselines3: Reliable reinforcement learning implementations, Journal of Machine Learning Research 22 (2021) 1–8.
  39. High-dimensional continuous control using generalized advantage estimation, in: Proceedings of the International Conference on Learning Representations, 2016.
  40. D. P. Kingma, J. Ba, Adam: a method for stochastic optimization, in: Proceedings of the International Conference on Learning Representations, 2015.
Citations (5)
List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-Up Questions

We haven't generated follow-up questions for this paper yet.

Authors (2)

Github Logo Streamline Icon: https://streamlinehq.com