Multi-Fidelity Reinforcement Learning for Time-Optimal Quadrotor Re-planning (2403.08152v1)
Abstract: High-speed online trajectory planning for UAVs poses a significant challenge due to the need for precise modeling of complex dynamics while also being constrained by computational limitations. This paper presents a multi-fidelity reinforcement learning method (MFRL) that aims to effectively create a realistic dynamics model and simultaneously train a planning policy that can be readily deployed in real-time applications. The proposed method involves the co-training of a planning policy and a reward estimator; the latter predicts the performance of the policy's output and is trained efficiently through multi-fidelity Bayesian optimization. This optimization approach models the correlation between different fidelity levels, thereby constructing a high-fidelity model based on a low-fidelity foundation, which enables the accurate development of the reward model with limited high-fidelity experiments. The framework is further extended to include real-world flight experiments in reinforcement learning training, allowing the reward model to precisely reflect real-world constraints and broadening the policy's applicability to real-world scenarios. We present rigorous evaluations by training and testing the planning policy in both simulated and real-world environments. The resulting trained policy not only generates faster and more reliable trajectories compared to the baseline snap minimization method, but it also achieves trajectory updates in 2 ms on average, while the baseline method takes several minutes.
- Abbeel P, Coates A and Ng AY (2010) Autonomous helicopter aerobatics through apprenticeship learning. The International Journal of Robotics Research 29(13): 1608–1639.
- arXiv preprint arXiv:1910.07113 .
- Bahdanau D, Cho KH and Bengio Y (2015) Neural machine translation by jointly learning to align and translate. In: 3rd International Conference on Learning Representations, ICLR 2015.
- In: Robotics: Science and Systems.
- arXiv preprint arXiv:1511.06349 .
- Burke D, Chapman A and Shames I (2020) Generating minimum-snap quadrotor trajectories really fast. In: 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, pp. 1487–1492.
- In: 2016 International joint conference on neural networks (IJCNN). IEEE, pp. 3338–3345.
- Chang P and Padif T (2020) Sim2real2sim: Bridging the gap between simulation and real-world in flexible object manipulation. In: 2020 Fourth IEEE International Conference on Robotic Computing (IRC). IEEE, pp. 56–62.
- In: 2019 International Conference on Robotics and Automation (ICRA). IEEE, pp. 8973–8979.
- Cho K, van Merrienboer B and Gulcehre C (2014) Proceedings of the 2014 conference on empirical methods in natural language processing (emnlp) .
- Advances in neural information processing systems 30.
- arXiv preprint arXiv:1905.09638 .
- arXiv preprint arXiv:1905.03406 .
- arXiv preprint arXiv:1903.07320 .
- Cutler M, Walsh TJ and How JP (2015) Real-world reinforcement learning via multifidelity simulators. IEEE Transactions on Robotics 31(3): 655–671.
- Deisenroth M and Rasmussen CE (2011) Pilco: A model-based and data-efficient approach to policy search. In: Proceedings of the 28th International Conference on machine learning (ICML-11). Citeseer, pp. 465–472.
- Dribusch C, Missoum S and Beran P (2010) A multifidelity approach for the construction of explicit decision boundaries: application to aeroelasticity. Structural and Multidisciplinary Optimization 42(5): 693–705.
- Dushenko S, Ambal K and McMichael RD (2020) Sequential bayesian experiment design for optically detected magnetic resonance of nitrogen-vacancy centers. Physical review applied 14(5): 054036.
- Foehn P, Romero A and Scaramuzza D (2021) Time-optimal planning for quadrotor waypoint flight. Science Robotics 6(56): eabh1221.
- Freeman LC (1965) Elementary applied statistics.
- Gal Y and Ghahramani Z (2016) Dropout as a Bayesian approximation: Representing model uncertainty in deep learning. In: international conference on machine learning. pp. 1050–1059.
- Gal Y, McAllister R and Rasmussen CE (2016) Improving pilco with bayesian neural network dynamics models. In: Data-Efficient Machine Learning workshop, ICML, volume 4. p. 25.
- In: 2018 IEEE International Conference on Robotics and Automation (ICRA). IEEE, pp. 344–351.
- In: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, pp. 4715–4722.
- In: Advances in Neural Information Processing Systems.
- arXiv preprint arXiv:2209.14375 .
- arXiv preprint arXiv:1905.11377 .
- IEEE Transactions on Neural Networks and Learning Systems .
- Hensman J, Fusi N and Lawrence ND (2013) Gaussian processes for big data. In: Proceedings of the Twenty-Ninth Conference on Uncertainty in Artificial Intelligence, UAI’13. Arlington, Virginia, USA: AUAI Press, p. 282–290.
- Hensman J, Matthews A and Ghahramani Z (2015) Scalable variational gaussian process classification. In: Artificial Intelligence and Statistics. PMLR, pp. 351–360.
- Hernández-Lobato JM, Hoffman MW and Ghahramani Z (2014) Predictive entropy search for efficient global optimization of black-box functions. In: Advances in neural information processing systems. pp. 918–926.
- Science Robotics 4(26): eaau5872.
- Kaspar M, Osorio JDM and Bock J (2020) Sim2real transfer for reinforcement learning without dynamics randomization. In: 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, pp. 4383–4388.
- Nature 620(7976): 982–987.
- Kennedy MC and O’Hagan A (2000) Predicting the output from a complex computer code when fast approximations are available. Biometrika 87(1): 1–13.
- Kingma DP and Welling M (2014) Auto-encoding variational bayes. The International Conference on Learning Representations (ICLR) .
- arXiv preprint arXiv:2012.06899 .
- Advances in Neural Information Processing Systems 32.
- Le Gratiet L and Garnier J (2014) Recursive co-kriging model for design of computer experiments with multiple levels of fidelity. International Journal for Uncertainty Quantification 4(5).
- In: International Conference on Machine Learning. PMLR, pp. 6131–6141.
- In: 2022 International Conference on Robotics and Automation (ICRA). IEEE, pp. 8282–8289.
- Lockwood O and Si M (2022) A review of uncertainty for deep reinforcement learning. In: Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment, volume 18. pp. 155–162.
- Nature 566(7743): 224–229.
- arXiv preprint arXiv:2309.11637 .
- Mellinger D and Kumar V (2011) Minimum snap trajectory generation and control for quadrotors. In: 2011 IEEE International Conference on Robotics and Automation. IEEE, pp. 2520–2525.
- Menger K (1930) Untersuchungen über allgemeine metrik. vierte untersuchung. zur metrik der kurven. Mathematische Annalen 103: 466–501.
- Mockus J, Tiesis V and Zilinskas A (1978) The application of Bayesian methods for seeking the extremum. Towards global optimization 2(117-129): 2.
- In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, pp. 59–66.
- Mordatch I, Lowrey K and Todorov E (2015) Ensemble-cio: Full-body dynamic motion planning that transfers to physical humanoids. In: 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, pp. 5307–5314.
- Myung JI, Cavagnaro DR and Pitt MA (2013) A tutorial on adaptive design optimization. Journal of mathematical psychology 57(3-4): 53–67.
- Nguyen-Tuong D and Peters J (2010) Using model knowledge for learning inverse dynamics. In: 2010 IEEE international conference on robotics and automation. IEEE, pp. 2677–2682.
- Advances in neural information processing systems 29.
- In: 2018 IEEE international conference on robotics and automation (ICRA). IEEE, pp. 3803–3810.
- Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences 473(2198): 20160751.
- arXiv preprint arXiv:2309.06837 .
- Rajnarayan DG (2009) Trading risk and performance for engineering design optimization using multifidelity analyses. Stanford University.
- Richter C, Bry A and Roy N (2016) Polynomial trajectory planning for aggressive quadrotor flight in dense indoor environments. In: Robotics Research. Springer, pp. 649–666.
- Romero A, Penicka R and Scaramuzza D (2022a) Time-optimal online replanning for agile quadrotor flight. Robotics and Automation Letters (RA-L) .
- IEEE Transactions on Robotics .
- Ryou G, Tal E and Karaman S (2021) Multi-fidelity black-box optimization for time-optimal quadrotor maneuvers. The International Journal of Robotics Research 40(12-14): 1352–1369.
- Ryou G, Tal E and Karaman S (2022) Real-time generation of time-optimal quadrotor trajectories with semi-supervised seq2seq learning. In: Conference on Robot Learning (CoRL).
- In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 4691–4699.
- arXiv preprint arXiv:1707.06347 .
- Snelson E and Ghahramani Z (2006) Sparse Gaussian processes using pseudo-inputs. In: Advances in neural information processing systems. pp. 1257–1264.
- IEEE Transactions on Information Theory 58(5): 3250–3265.
- arXiv preprint arXiv:2010.14603 .
- Sun W, Tang G and Hauser K (2021) Fast uav trajectory optimization using bilevel optimization with analytical gradients. IEEE Transactions on Robotics 37(6): 2010–2024.
- In: Machine learning proceedings 1990. Elsevier, pp. 216–224.
- arXiv preprint arXiv:1901.08275 .
- Tal E and Karaman S (2020) Accurate tracking of aggressive quadrotor trajectories using incremental nonlinear dynamic inversion and differential flatness. IEEE Transactions on Control Systems Technology 29(3): 1203–1218.
- Robotics: Science and Systems XIV .
- Tordesillas J, Lopez BT and How JP (2019) Faster: Fast and safe trajectory planner for flights in unknown environments. In: 2019 IEEE/RSJ international conference on intelligent robots and systems (IROS). IEEE, pp. 1934–1940.
- Wang Z and Jegelka S (2017) Max-value entropy search for efficient Bayesian optimization. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70. JMLR. org, pp. 3627–3635.
- Williams CK and Rasmussen CE (2006) Gaussian processes for machine learning, volume 2. MIT press Cambridge, MA.
- In: 2017 IEEE International Conference on Robotics and Automation (ICRA). IEEE, pp. 1714–1721.
- In: Artificial intelligence and statistics. PMLR, pp. 370–378.
- Journal of Machine Learning Research 18(136): 1–46.
- arXiv preprint arXiv:2309.15191 .
- In: International Conference on Machine Learning. PMLR, pp. 11319–11328.
- In: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, pp. 1241–1246.
- In: Advances in Neural Information Processing Systems. pp. 2431–2440.
- In: Proceedings of the 56th Annual Design Automation Conference 2019. ACM, p. 64.
- IEEE Robotics and Automation Letters .
- Gilhyun Ryou (6 papers)
- Geoffrey Wang (1 paper)
- Sertac Karaman (77 papers)