Space Processor Computation Time Analysis for Reinforcement Learning and Run Time Assurance Control Policies (2405.06771v1)
Abstract: As the number of spacecraft on orbit continues to grow, it is challenging for human operators to constantly monitor and plan for all missions. Autonomous control methods such as reinforcement learning (RL) have the power to solve complex tasks while reducing the need for constant operator intervention. By combining RL solutions with run time assurance (RTA), safety of these systems can be assured in real time. However, in order to use these algorithms on board a spacecraft, they must be able to run in real time on space grade processors, which are typically outdated and less capable than state-of-the-art equipment. In this paper, multiple RL-trained neural network controllers (NNCs) and RTA algorithms were tested on commercial-off-the-shelf (COTS) and radiation tolerant processors. The results show that all NNCs and most RTA algorithms can compute optimal and safe actions in well under 1 second with room for further optimization before deploying in the real world.
- Silver, D., Huang, A., Maddison, C., Guez, A., Sifre, L., Driessche, G., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M., Dieleman, S., Grewe, D., Nham, J., Kalchbrenner, N., Sutskever, I., Lillicrap, T., Leach, M., Kavukcuoglu, K., Graepel, T., and Hassabis, D., “Mastering the Game of Go with Deep Neural Networks and Tree Search,” Nature, Vol. 529, 2016, pp. 484–489. 10.1038/nature16961.
- Vinyals, O., Babuschkin, I., Czarnecki, W. M., Mathieu, M., Dudzik, A., Chung, J., Choi, D. H., Powell, R., Ewalds, T., Georgiev, P., et al., “Grandmaster level in StarCraft II using multi-agent reinforcement learning,” Nature, Vol. 575, No. 7782, 2019, pp. 350–354.
- Wurman, P. R., Barrett, S., Kawamoto, K., MacGlashan, J., Subramanian, K., Walsh, T. J., Capobianco, R., Devlic, A., Eckert, F., Fuchs, F., et al., “Outracing champion Gran Turismo drivers with deep reinforcement learning,” Nature, Vol. 602, No. 7896, 2022, pp. 223–228.
- Ravaioli, U. J., Cunningham, J., McCarroll, J., Gangal, V., Dunlap, K., and Hobbs, K. L., “Safe Reinforcement Learning Benchmark Environments for Aerospace Control Systems,” 2022 IEEE Aerospace Conference (50100), IEEE, 2022, pp. 1–18.
- Hamilton, N., Musau, P., Lopez, D. M., and Johnson, T. T., “Zero-Shot Policy Transfer in Autonomous Racing: Reinforcement Learning vs Imitation Learning,” 2022 IEEE International Conference on Assured Autonomy (ICAA, 2022a, pp. 11–20. 10.1109/ICAA52185.2022.00011.
- van Wijk, D., Dunlap, K., Majji, M., and Hobbs, K., “Deep Reinforcement Learning for Autonomous Spacecraft Inspection using Illumination,” AAS/AIAA Astrodynamics Specialist Conference, Big Sky, Montana, 2023.
- Brandonisio, A., Lavagna, M., and Guzzetti, D., “Reinforcement learning for uncooperative space objects smart imaging path-planning,” The Journal of the Astronautical Sciences, Vol. 68, No. 4, 2021, pp. 1145–1169.
- Oestreich, C. E., Linares, R., and Gondhalekar, R., “Autonomous six-degree-of-freedom spacecraft docking with rotating targets via reinforcement learning,” Journal of Aerospace Information Systems, Vol. 18, No. 7, 2021, pp. 417–428.
- Broida, J., and Linares, R., “Spacecraft Rendezvous Guidance in Cluttered Environments via Reinforcement Learning,” 29th AAS/AIAA Space Flight Mechanics Meeting, 2019, pp. 1–15.
- Hovell, K., and Ulrich, S., “Deep reinforcement learning for spacecraft proximity operations guidance,” Journal of spacecraft and rockets, Vol. 58, No. 2, 2021, pp. 254–264.
- Hamilton, N., Dunlap, K., Johnson, T. T., and Hobbs, K. L., “Ablation study of how run time assurance impacts the training and performance of reinforcement learning agents,” 2023 IEEE 9th International Conference on Space Mission Challenges for Information Technology (SMC-IT), IEEE, 2023, pp. 45–55.
- Jang, K., Vinitsky, E., Chalaki, B., Remer, B., Beaver, L., Malikopoulos, A. A., and Bayen, A., “Simulation to scaled city: zero-shot policy transfer for traffic control via autonomous vehicles,” Proceedings of the 10th ACM/IEEE International Conference on Cyber-Physical Systems, 2019, pp. 291–300.
- Kadian, A., Truong, J., Gokaslan, A., Clegg, A., Wijmans, E., Lee, S., Savva, M., Chernova, S., and Batra, D., “Are we making real progress in simulated environments? measuring the sim2real gap in embodied visual navigation,” arXiv preprint arXiv:1912.06321, 2019.
- Alshiekh, M., Bloem, R., Ehlers, R., Könighofer, B., Niekum, S., and Topcu, U., “Safe reinforcement learning via shielding,” Thirty-Second AAAI Conference on Artificial Intelligence, 2018, pp. 2669–2678.
- Fulton, N., and Platzer, A., “Safe reinforcement learning via formal methods,” AAAI Conference on Artificial Intelligence, 2018, p. 6485–6492.
- Fisac, J. F., Akametalu, A. K., Zeilinger, M. N., Kaynama, S., Gillula, J., and Tomlin, C. J., “A general safety framework for learning-based control in uncertain robotic systems,” IEEE Transactions on Automatic Control, Vol. 64, No. 7, 2018, pp. 2737–2752.
- Zhao, H., Zeng, X., Chen, T., Liu, Z., and Woodcock, J., “Learning Safe Neural Network Controllers with Barrier Certificates,” International Symposium on Dependable Software Engineering: Theories, Tools, and Applications, Springer, 2020, pp. 177–185.
- Hunt, N., Fulton, N., Magliacane, S., Hoang, T. N., Das, S., and Solar-Lezama, A., “Verifiably Safe Exploration for End-to-End Reinforcement Learning,” Proceedings of the 24th International Conference on Hybrid Systems: Computation and Control, Association for Computing Machinery, New York, NY, USA, 2021, p. 1–11.
- Li, Y., Li, N., Tseng, H. E., Girard, A., Filev, D., and Kolmanovsky, I., “Safe reinforcement learning using robust action governor,” Learning for Dynamics and Control, PMLR, 2021, pp. 1093–1104.
- Zhang, L., Zhang, R., Wu, T., Weng, R., Han, M., and Zhao, Y., “Safe Reinforcement Learning With Stability Guarantee for Motion Planning of Autonomous Vehicles,” IEEE Transactions on Neural Networks and Learning Systems, 2021, pp. 1–10. 10.1109/TNNLS.2021.3084685.
- Wagener, N. C., Boots, B., and Cheng, C.-A., “Safe reinforcement learning using advantage-based intervention,” International Conference on Machine Learning, PMLR, 2021, pp. 10630–10640.
- Hamilton, N., Robinette, P., and Johnson, T., “Training Agents to Satisfy Timed and Untimed Signal Temporal Logic Specifications with Reinforcement Learning,” Software Engineering and Formal Methods, 2022b.
- Dunlap, K., Mote, M., Delsing, K., and Hobbs, K. L., “Run time assured reinforcement learning for safe satellite docking,” Journal of Aerospace Information Systems, Vol. 20, No. 1, 2023a, pp. 25–36.
- Hobbs, K. L., Mote, M. L., Abate, M. C., Coogan, S. D., and Feron, E. M., “Runtime assurance for safety-critical systems: An introduction to safety filtering approaches for complex control systems,” IEEE Control Systems Magazine, Vol. 43, No. 2, 2023, pp. 28–65.
- Lovelly, T. M., and George, A. D., “Comparative analysis of present and future space-grade processors with device metrics,” Journal of aerospace information systems, Vol. 14, No. 3, 2017, pp. 184–197.
- Dunlap, K., van Wijk, D., and Hobbs, K. L., “Run Time Assurance for Autonomous Spacecraft Inspection,” 2023 AAS/AIAA Astrodynamics Specialist Conference, Big Sky, MT, 2023b.
- Hamilton, N., Schlemmer, L., Menart, C., Waddington, C., Jenkins, T., and Johnson, T. T., “Sonic to knuckles: evaluations on transfer reinforcement learning,” Unmanned Systems Technology XXII, Vol. 11425, International Society for Optics and Photonics, 2020, p. 114250J.
- Brockman, G., Cheung, V., Pettersson, L., Schneider, J., Schulman, J., Tang, J., and Zaremba, W., “Openai gym,” arXiv preprint arXiv:1606.01540, 2016.
- Henderson, P., Islam, R., Bachman, P., Pineau, J., Precup, D., and Meger, D., “Deep reinforcement learning that matters,” Proceedings of the AAAI conference on artificial intelligence, Vol. 32, 2018, pp. 3207–3214.
- Mania, H., Guy, A., and Recht, B., “Simple random search of static linear policies is competitive for reinforcement learning,” Proceedings of the 32nd International Conference on Neural Information Processing Systems, 2018, pp. 1805–1814.
- Bernini, N., Bessa, M., Delmas, R., Gold, A., Goubault, E., Pennec, R., Putot, S., and Sillion, F., “A few lessons learned in reinforcement learning for quadcopter attitude control,” Proceedings of the 24th International Conference on Hybrid Systems: Computation and Control, 2021, pp. 1–11.
- Schulman, J., Wolski, F., Dhariwal, P., Radford, A., and Klimov, O., “Proximal policy optimization algorithms,” arXiv preprint arXiv:1707.06347, 2017.
- Nagumo, M., “Über die lage der integralkurven gewöhnlicher differentialgleichungen,” Proceedings of the Physico-Mathematical Society of Japan. 3rd Series, Vol. 24, 1942, pp. 551–559.
- Ames, A. D., Coogan, S., Egerstedt, M., Notomista, G., Sreenath, K., and Tabuada, P., “Control barrier functions: Theory and applications,” 2019 18th European control conference (ECC), IEEE, 2019, pp. 3420–3431.
- Gurriet, T., Singletary, A., Reher, J., Ciarletta, L., Feron, E., and Ames, A., “Towards a Framework for Realizable Safety Critical Control through Active Set Invariance,” 2018 ACM/IEEE 9th International Conference on Cyber-Physical Systems (ICCPS), 2018, pp. 98–106. 10.1109/ICCPS.2018.00018.
- Chen, Y., Jankovic, M., Santillo, M., and Ames, A. D., “Backup control barrier functions: Formulation and comparative study,” 2021 60th IEEE Conference on Decision and Control (CDC), IEEE, 2021, pp. 6835–6841.
- Xiong, Y., Zhai, D.-H., Tavakoli, M., and Xia, Y., “Discrete-Time Control Barrier Function: High-Order Case and Adaptive Case,” IEEE Transactions on Cybernetics, Vol. 53, No. 5, 2023, pp. 3231–3239. 10.1109/TCYB.2022.3170607.
- Hill, G. W., “Researches in the Lunar Theory,” American journal of Mathematics, Vol. 1, No. 1, 1878, pp. 5–26.
- Clohessy, W., and Wiltshire, R., “Terminal Guidance System for Satellite Rendezvous,” Journal of the Aerospace Sciences, Vol. 27, No. 9, 1960, pp. 653–658.
- Merrick, J. D., Heiner, B. K., Long, C., Stieber, B., Fierro, S., Gangal, V., Blake, M., and Blackburn, J., “CoRL: Environment Creation and Management Focused on System Integration,” , 2023. https://doi.org/10.48550/arXiv.2303.02182.
- Ravaioli, U. J., Dunlap, K., and Hobbs, K., “A Universal Framework for Generalized Run Time Assurance with JAX Automatic Differentiation,” 2023 American Control Conference (ACC), 2023, pp. 4264–4269. 10.23919/ACC55779.2023.10156439.
- “ONNX Runtime,” https://onnxruntime.ai/, 2021.
- “quadprog,” https://github.com/quadprog/quadprog, 2024.
- Virtanen, P., Gommers, R., Oliphant, T. E., Haberland, M., Reddy, T., Cournapeau, D., Burovski, E., Peterson, P., Weckesser, W., Bright, J., van der Walt, S. J., Brett, M., Wilson, J., Millman, K. J., Mayorov, N., Nelson, A. R. J., Jones, E., Kern, R., Larson, E., Carey, C. J., Polat, İ., Feng, Y., Moore, E. W., VanderPlas, J., Laxalde, D., Perktold, J., Cimrman, R., Henriksen, I., Quintero, E. A., Harris, C. R., Archibald, A. M., Ribeiro, A. H., Pedregosa, F., van Mulbregt, P., and SciPy 1.0 Contributors, “SciPy 1.0: Fundamental Algorithms for Scientific Computing in Python,” Nature Methods, Vol. 17, 2020, pp. 261–272. 10.1038/s41592-019-0686-2.
- Harris, C. R., Millman, K. J., van der Walt, S. J., Gommers, R., Virtanen, P., Cournapeau, D., Wieser, E., Taylor, J., Berg, S., Smith, N. J., Kern, R., Picus, M., Hoyer, S., van Kerkwijk, M. H., Brett, M., Haldane, A., del Río, J. F., Wiebe, M., Peterson, P., Gérard-Marchant, P., Sheppard, K., Reddy, T., Weckesser, W., Abbasi, H., Gohlke, C., and Oliphant, T. E., “Array programming with NumPy,” Nature, Vol. 585, No. 7825, 2020, pp. 357–362. 10.1038/s41586-020-2649-2, URL https://doi.org/10.1038/s41586-020-2649-2.
- Macenski, S., Foote, T., Gerkey, B., Lalancette, C., and Woodall, W., “Robot Operating System 2: Design, architecture, and uses in the wild,” Science Robotics, Vol. 7, No. 66, 2022, p. eabm6074. 10.1126/scirobotics.abm6074, URL https://www.science.org/doi/abs/10.1126/scirobotics.abm6074.
- “QuadProg++,” https://github.com/liuq/QuadProgpp, 2024.
- Johnson, S. G., “The NLopt nonlinear-optimization package,” https://github.com/stevengj/nlopt, 2007.
- Guennebaud, G., Jacob, B., et al., “Eigen v3,” http://eigen.tuxfamily.org, 2010.
- Agarwal, R., Schwarzer, M., Castro, P. S., Courville, A. C., and Bellemare, M., “Deep reinforcement learning at the edge of the statistical precipice,” Advances in Neural Information Processing Systems, Vol. 34, 2021.
- Wilhelm, R., Engblom, J., Ermedahl, A., Holsti, N., Thesing, S., Whalley, D., Bernat, G., Ferdinand, C., Heckmann, R., Mitra, T., et al., “The worst-case execution-time problem—overview of methods and survey of tools,” ACM Transactions on Embedded Computing Systems (TECS), Vol. 7, No. 3, 2008, pp. 1–53.