Papers
Topics
Authors
Recent
Search
2000 character limit reached

Guidance Design for Escape Flight Vehicle Using Evolution Strategy Enhanced Deep Reinforcement Learning

Published 4 May 2024 in cs.LG, cs.AI, cs.NE, cs.SY, and eess.SY | (2405.03711v1)

Abstract: Guidance commands of flight vehicles are a series of data sets with fixed time intervals, thus guidance design constitutes a sequential decision problem and satisfies the basic conditions for using deep reinforcement learning (DRL). In this paper, we consider the scenario where the escape flight vehicle (EFV) generates guidance commands based on DRL and the pursuit flight vehicle (PFV) generates guidance commands based on the proportional navigation method. For the EFV, the objective of the guidance design entails progressively maximizing the residual velocity, subject to the constraint imposed by the given evasion distance. Thus an irregular dynamic max-min problem of extremely large-scale is formulated, where the time instant when the optimal solution can be attained is uncertain and the optimum solution depends on all the intermediate guidance commands generated before. For solving this problem, a two-step strategy is conceived. In the first step, we use the proximal policy optimization (PPO) algorithm to generate the guidance commands of the EFV. The results obtained by PPO in the global search space are coarse, despite the fact that the reward function, the neural network parameters and the learning rate are designed elaborately. Therefore, in the second step, we propose to invoke the evolution strategy (ES) based algorithm, which uses the result of PPO as the initial value, to further improve the quality of the solution by searching in the local space. Simulation results demonstrate that the proposed guidance design method based on the PPO algorithm is capable of achieving a residual velocity of 67.24 m/s, higher than the residual velocities achieved by the benchmark soft actor-critic and deep deterministic policy gradient algorithms. Furthermore, the proposed ES-enhanced PPO algorithm outperforms the PPO algorithm by 2.7\%, achieving a residual velocity of 69.04 m/s.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (28)
  1. F. Chen, Y. L. Xiao, and W. C. Chen, “Guidance based on zero effort miss for super-range exoatmospheric intercept,” Acta Aeronautica et Astronautica Sinica, vol. 30, no. 9, pp. 1583–1589, Sep. 2009.
  2. Y. Q. Liu and N. M. Qi, “A zero-effort miss distance-based guidance law for endoatmoshperic interceptor,” Journal of Astronautics, vol. 31, no. 7, pp. 1768–1774, Jul. 2010.
  3. M. Cui and F. Geng, “Solving singular two-point boundary value problem in reproducing kernel space,” Journal of Computational and Applied Mathematics, vol. 205, no. 1, pp. 6–15, Aug. 2007.
  4. X. Y. Huang, D. Y. Wang, and Y. F. Guan, “A linear quadratic optimal guidance method for lunar soft landing,” Aerospace Control, vol. 24, no. 6, pp. 11–16, Dec. 2006.
  5. E. Garcia, D. W, Casbeer, and M. Pachter, “Design and analysis of state-feedback optimal strategies for the differential game of active defense,” IEEE Transactions on Automatic Control, vol. 64, no. 2, pp. 553–568, Apr. 2018.
  6. H. Liang, J. Wang, Y. Wang, L. Wang, and P. Liu, “Optimal guidance against active defense ballistic missiles via differential game strategies,” Chinese Journal of Aeronautics, vol. 33, no. 3, pp. 978–989, Mar. 2020.
  7. S. Liu, B. Yan, T. Zhang, X. Zhang, and J. Yan, “Coverage-based cooperative guidance law for intercepting hypersonic vehicles with overload constraint,” Aerospace Science and Technology, vol. 126, article no. 107651, pp. 1–15, Jul. 2022.
  8. A. Sinha, S. R. Kumar, and D. Mukherjee, “Nonsingular impact time guidance and control using deviated pursuit,” Aerospace Science and Technology, vol. 115, article no. 106776, pp. 1–19, Aug. 2021.
  9. W. Yu, W. Chen, Z. Jiang, W. Zhang, and P. Zhao, “Analytical entry guidance for coordinated flight with multiple no-fly-zone constraints,” Aerospace Science and Technology, vol. 84, pp. 273–290, Jan. 2019.
  10. A. Marchidan and E. Bakolas, “Collision avoidance for an unmanned aerial vehicle in the presence of static and moving obstacles,” Journal of Guidance, Control, and Dynamics, vol. 43, no. 1, pp. 96–110, Jan. 2020.
  11. Z. Xu, R. Wei, Q. Zhang, K. Zhou, and R. He, “Obstacle avoidance algorithm for UAVs in unknown environment based on distributional perception and decision making,” in Proc. IEEE Chinese Guidance, Navigation and Control Conference (CGNCC), Nanjing, China, Aug. 2016, pp. 1072–1075.
  12. A. Mujumdar and R. Padhi, “Reactive collision avoidance of using nonlinear geometric and differential geometric guidance,” Journal of Guidance, Control, and Dynamics, vol. 34, no. 1, pp. 303–310, Jan. 2011.
  13. J. Schulman, S. Levine, P. Moritz, M. I. Jordan, and P. Abbeel, “Trust region policy optimization,” 2015. [Online]. Available: https://arxiv.org/abs/1502.05477v3.
  14. Y. Wang, H. He, C. Wen and X. Tan, “Truly proximal policy optimization,” 2019. [Online]. Available: https://arxiv.org/abs/1903.07940v1.
  15. H. G. Beyer and H. P. Schwefel, “Evolution strategies–a comprehensive introduction,” Natural Computing, vol. 1, pp. 3–52, Mar. 2002.
  16. Q. H. Zhang, B. Q. Ao, and Q. X. Zhang, “Q-learning reinforcement learning guidance law,” Systems Engineering and Electronics, vol. 42, no. 2, pp. 414–419, Feb. 2020.
  17. C. Liang, W. Wang, Z. Liu, C. Lai, and B. Zhou, “Learning to guide: Guidance law based on deep meta-learning and model predictive path integral control,” IEEE Access, vol. 7, pp. 47 353–47 365, Apr. 2019.
  18. A. Rajagopalan, “Intelligent missile guidance using artificial neural networks,” Journal of Artificial Intelligence Research, vol. 4, no. 1, pp. 60–76, Mar. 2015.
  19. B. Gaudet, R. Furfaro, and R. Linares, “Reinforcement learning for angle-only intercept guidance of maneuvering targets,” Aerospace Science and Technology, vol. 99, article no. 105746, pp. 1–10, Apr. 2020.
  20. E. Meyer, A. Heiberg, A. Rasheed, and O. San, “COLREG-compliant collision avoidance for unmanned surface vehicle using deep reinforcement learning,” IEEE Access, vol. 8, pp. 165 344–165 364, Sep. 2020.
  21. K. Wu and P. Yao, “Obstacle avoidance for AUV by Q-learning based guidance vector field,” in Proc. 3rd International Conference on Unmanned Systems (ICUS), Harbin, China, Nov. 2020, pp. 702–707.
  22. Z. Li, X. Sun, C. Hu, G. Liu, and B. He, “Neural network based online predictive guidance for high lifting vehicles,” Aerospace Science and Technology, vol. 82–83, pp. 149–160, Nov. 2018.
  23. L. Cheng, F. Jiang, Z. Wang, and J. Li, “Multiconstrained real-time entry guidance using deep neural networks,” IEEE Transactions on Aerospace and Electronic Systems, vol. 57, no. 1, pp. 325–340, Feb. 2020.
  24. C. Peng, H. Zhang, Y. He, and J. Ma, “State-following-kernel-based online reinforcement learning guidance law against maneuvering target,” IEEE Transactions on Aerospace and Electronic Systems, vol. 58, no. 6, pp. 5784–5797, Dec. 2022.
  25. C. Wang, D. Deng, L. Xu, and W. Wang, “Resource scheduling based on deep reinforcement learning in UAV assisted emergency communication networks,” IEEE Transactions on Communications, vol. 70, no. 6, pp. 3834–3848, Jun. 2022.
  26. R. Ding, F. Gao, and X. S. Shen, “3D UAV trajectory design and frequency band allocation for energy-efficient and fair communication: A deep reinforcement learning approach,” IEEE Transactions on Wireless Communications, vol. 19, no. 12, pp. 7796–7809, Dec. 2020.
  27. J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,” 2017. [Online]. Available: http://arxiv.org/abs/1707.06347.
  28. B. Fu, “Guidance design for the interception of hypersonic boost glide vehicles,” Ph.D. dissertation, College of Astronautics, Northwestern Polytechnical University, 2019.
Citations (1)

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 2 tweets with 0 likes about this paper.