Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Multi-Fidelity Reinforcement Learning for Time-Optimal Quadrotor Re-planning (2403.08152v1)

Published 13 Mar 2024 in cs.RO

Abstract: High-speed online trajectory planning for UAVs poses a significant challenge due to the need for precise modeling of complex dynamics while also being constrained by computational limitations. This paper presents a multi-fidelity reinforcement learning method (MFRL) that aims to effectively create a realistic dynamics model and simultaneously train a planning policy that can be readily deployed in real-time applications. The proposed method involves the co-training of a planning policy and a reward estimator; the latter predicts the performance of the policy's output and is trained efficiently through multi-fidelity Bayesian optimization. This optimization approach models the correlation between different fidelity levels, thereby constructing a high-fidelity model based on a low-fidelity foundation, which enables the accurate development of the reward model with limited high-fidelity experiments. The framework is further extended to include real-world flight experiments in reinforcement learning training, allowing the reward model to precisely reflect real-world constraints and broadening the policy's applicability to real-world scenarios. We present rigorous evaluations by training and testing the planning policy in both simulated and real-world environments. The resulting trained policy not only generates faster and more reliable trajectories compared to the baseline snap minimization method, but it also achieves trajectory updates in 2 ms on average, while the baseline method takes several minutes.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (83)
  1. Abbeel P, Coates A and Ng AY (2010) Autonomous helicopter aerobatics through apprenticeship learning. The International Journal of Robotics Research 29(13): 1608–1639.
  2. arXiv preprint arXiv:1910.07113 .
  3. Bahdanau D, Cho KH and Bengio Y (2015) Neural machine translation by jointly learning to align and translate. In: 3rd International Conference on Learning Representations, ICLR 2015.
  4. In: Robotics: Science and Systems.
  5. arXiv preprint arXiv:1511.06349 .
  6. Burke D, Chapman A and Shames I (2020) Generating minimum-snap quadrotor trajectories really fast. In: 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, pp. 1487–1492.
  7. In: 2016 International joint conference on neural networks (IJCNN). IEEE, pp. 3338–3345.
  8. Chang P and Padif T (2020) Sim2real2sim: Bridging the gap between simulation and real-world in flexible object manipulation. In: 2020 Fourth IEEE International Conference on Robotic Computing (IRC). IEEE, pp. 56–62.
  9. In: 2019 International Conference on Robotics and Automation (ICRA). IEEE, pp. 8973–8979.
  10. Cho K, van Merrienboer B and Gulcehre C (2014) Proceedings of the 2014 conference on empirical methods in natural language processing (emnlp) .
  11. Advances in neural information processing systems 30.
  12. arXiv preprint arXiv:1905.09638 .
  13. arXiv preprint arXiv:1905.03406 .
  14. arXiv preprint arXiv:1903.07320 .
  15. Cutler M, Walsh TJ and How JP (2015) Real-world reinforcement learning via multifidelity simulators. IEEE Transactions on Robotics 31(3): 655–671.
  16. Deisenroth M and Rasmussen CE (2011) Pilco: A model-based and data-efficient approach to policy search. In: Proceedings of the 28th International Conference on machine learning (ICML-11). Citeseer, pp. 465–472.
  17. Dribusch C, Missoum S and Beran P (2010) A multifidelity approach for the construction of explicit decision boundaries: application to aeroelasticity. Structural and Multidisciplinary Optimization 42(5): 693–705.
  18. Dushenko S, Ambal K and McMichael RD (2020) Sequential bayesian experiment design for optically detected magnetic resonance of nitrogen-vacancy centers. Physical review applied 14(5): 054036.
  19. Foehn P, Romero A and Scaramuzza D (2021) Time-optimal planning for quadrotor waypoint flight. Science Robotics 6(56): eabh1221.
  20. Freeman LC (1965) Elementary applied statistics.
  21. Gal Y and Ghahramani Z (2016) Dropout as a Bayesian approximation: Representing model uncertainty in deep learning. In: international conference on machine learning. pp. 1050–1059.
  22. Gal Y, McAllister R and Rasmussen CE (2016) Improving pilco with bayesian neural network dynamics models. In: Data-Efficient Machine Learning workshop, ICML, volume 4. p. 25.
  23. In: 2018 IEEE International Conference on Robotics and Automation (ICRA). IEEE, pp. 344–351.
  24. In: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, pp. 4715–4722.
  25. In: Advances in Neural Information Processing Systems.
  26. arXiv preprint arXiv:2209.14375 .
  27. arXiv preprint arXiv:1905.11377 .
  28. IEEE Transactions on Neural Networks and Learning Systems .
  29. Hensman J, Fusi N and Lawrence ND (2013) Gaussian processes for big data. In: Proceedings of the Twenty-Ninth Conference on Uncertainty in Artificial Intelligence, UAI’13. Arlington, Virginia, USA: AUAI Press, p. 282–290.
  30. Hensman J, Matthews A and Ghahramani Z (2015) Scalable variational gaussian process classification. In: Artificial Intelligence and Statistics. PMLR, pp. 351–360.
  31. Hernández-Lobato JM, Hoffman MW and Ghahramani Z (2014) Predictive entropy search for efficient global optimization of black-box functions. In: Advances in neural information processing systems. pp. 918–926.
  32. Science Robotics 4(26): eaau5872.
  33. Kaspar M, Osorio JDM and Bock J (2020) Sim2real transfer for reinforcement learning without dynamics randomization. In: 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, pp. 4383–4388.
  34. Nature 620(7976): 982–987.
  35. Kennedy MC and O’Hagan A (2000) Predicting the output from a complex computer code when fast approximations are available. Biometrika 87(1): 1–13.
  36. Kingma DP and Welling M (2014) Auto-encoding variational bayes. The International Conference on Learning Representations (ICLR) .
  37. arXiv preprint arXiv:2012.06899 .
  38. Advances in Neural Information Processing Systems 32.
  39. Le Gratiet L and Garnier J (2014) Recursive co-kriging model for design of computer experiments with multiple levels of fidelity. International Journal for Uncertainty Quantification 4(5).
  40. In: International Conference on Machine Learning. PMLR, pp. 6131–6141.
  41. In: 2022 International Conference on Robotics and Automation (ICRA). IEEE, pp. 8282–8289.
  42. Lockwood O and Si M (2022) A review of uncertainty for deep reinforcement learning. In: Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment, volume 18. pp. 155–162.
  43. Nature 566(7743): 224–229.
  44. arXiv preprint arXiv:2309.11637 .
  45. Mellinger D and Kumar V (2011) Minimum snap trajectory generation and control for quadrotors. In: 2011 IEEE International Conference on Robotics and Automation. IEEE, pp. 2520–2525.
  46. Menger K (1930) Untersuchungen über allgemeine metrik. vierte untersuchung. zur metrik der kurven. Mathematische Annalen 103: 466–501.
  47. Mockus J, Tiesis V and Zilinskas A (1978) The application of Bayesian methods for seeking the extremum. Towards global optimization 2(117-129): 2.
  48. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, pp. 59–66.
  49. Mordatch I, Lowrey K and Todorov E (2015) Ensemble-cio: Full-body dynamic motion planning that transfers to physical humanoids. In: 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, pp. 5307–5314.
  50. Myung JI, Cavagnaro DR and Pitt MA (2013) A tutorial on adaptive design optimization. Journal of mathematical psychology 57(3-4): 53–67.
  51. Nguyen-Tuong D and Peters J (2010) Using model knowledge for learning inverse dynamics. In: 2010 IEEE international conference on robotics and automation. IEEE, pp. 2677–2682.
  52. Advances in neural information processing systems 29.
  53. In: 2018 IEEE international conference on robotics and automation (ICRA). IEEE, pp. 3803–3810.
  54. Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences 473(2198): 20160751.
  55. arXiv preprint arXiv:2309.06837 .
  56. Rajnarayan DG (2009) Trading risk and performance for engineering design optimization using multifidelity analyses. Stanford University.
  57. Richter C, Bry A and Roy N (2016) Polynomial trajectory planning for aggressive quadrotor flight in dense indoor environments. In: Robotics Research. Springer, pp. 649–666.
  58. Romero A, Penicka R and Scaramuzza D (2022a) Time-optimal online replanning for agile quadrotor flight. Robotics and Automation Letters (RA-L) .
  59. IEEE Transactions on Robotics .
  60. Ryou G, Tal E and Karaman S (2021) Multi-fidelity black-box optimization for time-optimal quadrotor maneuvers. The International Journal of Robotics Research 40(12-14): 1352–1369.
  61. Ryou G, Tal E and Karaman S (2022) Real-time generation of time-optimal quadrotor trajectories with semi-supervised seq2seq learning. In: Conference on Robot Learning (CoRL).
  62. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 4691–4699.
  63. arXiv preprint arXiv:1707.06347 .
  64. Snelson E and Ghahramani Z (2006) Sparse Gaussian processes using pseudo-inputs. In: Advances in neural information processing systems. pp. 1257–1264.
  65. IEEE Transactions on Information Theory 58(5): 3250–3265.
  66. arXiv preprint arXiv:2010.14603 .
  67. Sun W, Tang G and Hauser K (2021) Fast uav trajectory optimization using bilevel optimization with analytical gradients. IEEE Transactions on Robotics 37(6): 2010–2024.
  68. In: Machine learning proceedings 1990. Elsevier, pp. 216–224.
  69. arXiv preprint arXiv:1901.08275 .
  70. Tal E and Karaman S (2020) Accurate tracking of aggressive quadrotor trajectories using incremental nonlinear dynamic inversion and differential flatness. IEEE Transactions on Control Systems Technology 29(3): 1203–1218.
  71. Robotics: Science and Systems XIV .
  72. Tordesillas J, Lopez BT and How JP (2019) Faster: Fast and safe trajectory planner for flights in unknown environments. In: 2019 IEEE/RSJ international conference on intelligent robots and systems (IROS). IEEE, pp. 1934–1940.
  73. Wang Z and Jegelka S (2017) Max-value entropy search for efficient Bayesian optimization. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70. JMLR. org, pp. 3627–3635.
  74. Williams CK and Rasmussen CE (2006) Gaussian processes for machine learning, volume 2. MIT press Cambridge, MA.
  75. In: 2017 IEEE International Conference on Robotics and Automation (ICRA). IEEE, pp. 1714–1721.
  76. In: Artificial intelligence and statistics. PMLR, pp. 370–378.
  77. Journal of Machine Learning Research 18(136): 1–46.
  78. arXiv preprint arXiv:2309.15191 .
  79. In: International Conference on Machine Learning. PMLR, pp. 11319–11328.
  80. In: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, pp. 1241–1246.
  81. In: Advances in Neural Information Processing Systems. pp. 2431–2440.
  82. In: Proceedings of the 56th Annual Design Automation Conference 2019. ACM, p. 64.
  83. IEEE Robotics and Automation Letters .
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Gilhyun Ryou (6 papers)
  2. Geoffrey Wang (1 paper)
  3. Sertac Karaman (77 papers)
Citations (1)

Summary

We haven't generated a summary for this paper yet.

Youtube Logo Streamline Icon: https://streamlinehq.com