Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
156 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Multi-Agent Probabilistic Ensembles with Trajectory Sampling for Connected Autonomous Vehicles (2312.13910v3)

Published 21 Dec 2023 in cs.RO, cs.LG, and cs.MA

Abstract: Autonomous Vehicles (AVs) have attracted significant attention in recent years and Reinforcement Learning (RL) has shown remarkable performance in improving the autonomy of vehicles. In that regard, the widely adopted Model-Free RL (MFRL) promises to solve decision-making tasks in connected AVs (CAVs), contingent on the readiness of a significant amount of data samples for training. Nevertheless, it might be infeasible in practice and possibly lead to learning instability. In contrast, Model-Based RL (MBRL) manifests itself in sample-efficient learning, but the asymptotic performance of MBRL might lag behind the state-of-the-art MFRL algorithms. Furthermore, most studies for CAVs are limited to the decision-making of a single AV only, thus underscoring the performance due to the absence of communications. In this study, we try to address the decision-making problem of multiple CAVs with limited communications and propose a decentralized Multi-Agent Probabilistic Ensembles with Trajectory Sampling algorithm MA-PETS. In particular, in order to better capture the uncertainty of the unknown environment, MA-PETS leverages Probabilistic Ensemble (PE) neural networks to learn from communicated samples among neighboring CAVs. Afterwards, MA-PETS capably develops Trajectory Sampling (TS)-based model-predictive control for decision-making. On this basis, we derive the multi-agent group regret bound affected by the number of agents within the communication range and mathematically validate that incorporating effective information exchange among agents into the multi-agent learning scheme contributes to reducing the group regret bound in the worst case. Finally, we empirically demonstrate the superiority of MA-PETS in terms of the sample efficiency comparable to MFBL.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (46)
  1. R. Wen, J. Huang, and Z. Zhao, “Multi-Agent Probabilistic Ensembles with Trajectory Sampling for Connected Autonomous Vehicles,” in Proc. IEEE Globecom 2023 (Intelligent6GArch Workshop), Kuala Lumpur, Malaysia, Dec. 2023.
  2. B. R. Kiran, I. Sobh, V. Talpaert, et al., “Deep reinforcement learning for autonomous driving: A survey,” IEEE Trans. Intell. Transp. Syst., vol. 23, no. 6, pp. 4909–4926, Jun. 2022.
  3. D. González, J. Pérez, V. Milanés, et al., “A review of motion planning techniques for automated vehicles,” IEEE Trans. Intell. Transp. Syst., vol. 17, no. 4, pp. 1135–1145, Apr. 2016.
  4. X. Liang, X. Du, G. Wang, et al., “A deep reinforcement learning network for traffic light cycle control,” IEEE Trans. Veh. Technol., vol. 68, no. 2, pp. 1243–1253, Feb. 2019.
  5. J. Wu, Z. Huang, W. Huang, et al., “Prioritized experience-based reinforcement learning with human guidance for autonomous driving,” IEEE Trans. Neural Netw. Learn. Syst., pp. 1–15, 2022, early Access.
  6. S. Gronauer and K. Diepold, “Multi-agent deep reinforcement learning: A survey,” Artif. Intell. Rev., vol. 55, no. 2, p. 895–943, Feb. 2022.
  7. Y. Guan, Y. Ren, S. E. Li, et al., “Centralized cooperation for connected and automated vehicles at intersections by proximal policy optimization,” IEEE Trans. Veh. Technol., vol. 69, no. 11, pp. 12 597–12 608, Sep. 2020.
  8. J. Zhang, C. Chang, X. Zeng, et al., “Multi-agent DRL-based lane change with right-of-way collaboration awareness,” IEEE Trans. Intell. Transp. Syst., vol. 24, no. 1, pp. 854–869, Jan. 2023.
  9. R. Lowe, Y. Wu, A. Tamar, et al., “Multi-agent actor-critic for mixed cooperative-competitive environments,” in Proc. Adv. Neural Inf. Proces. Syst. (NIPS), Long Beach, California, USA, Dec. 2017.
  10. J. N. Foerster, G. Farquhar, T. Afouras, et al., “Counterfactual multi-agent policy gradients,” in Proc. AAAI Conf. Artif. Intell., New Orleans, Louisiana, USA, Feb. 2018.
  11. T. Rashid, M. Samvelyan, C. S. De Witt, et al., “Monotonic value function factorisation for deep multi-agent reinforcement learning,” J. Mach. Learn. Res., vol. 21, no. 1, pp. 7234–7284, Jan. 2020.
  12. B. Xiao, R. Li, F. Wang, et al., “Stochastic graph neural network-based value decomposition for marl in internet of vehicles,” IEEE Trans. Veh. Technol., pp. 1–15, 2023, early Access.
  13. Z. Huang, J. Wu, and C. Lv, “Efficient deep reinforcement learning with imitative expert priors for autonomous driving,” IEEE Trans. Neur. Net. Learn. Syst., vol. 34, no. 10, pp. 7391–7403, Oct. 2023.
  14. J. Wu, Z. Huang, and C. Lv, “Uncertainty-aware model-based reinforcement learning: Methodology and application in autonomous driving,” IEEE Trans. Intell. Vehicl, vol. 8, no. 1, pp. 194–203, Jan. 2023.
  15. T. Pan, R. Guo, W. H. Lam, et al., “Integrated optimal control strategies for freeway traffic mixed with connected automated vehicles: A model-based reinforcement learning approach,” Transp. Res. C-Emer. Technol., vol. 123, p. 102987, Feb. 2021.
  16. A. Der Kiureghian and O. Ditlevsen, “Aleatory or epistemic? Does it matter?” Struct. Saf., vol. 31, no. 2, pp. 105–112, Mar. 2009.
  17. A. Nagabandi, G. Kahn, R. S. Fearing, et al., “Neural network dynamics for model-based deep reinforcement learning with model-free fine-tuning,” in Proc. IEEE Int. Conf. Rob. Autom. (ICRA), Long Beach, CA, USA, May 2018.
  18. K. Chua, R. Calandra, R. McAllister, et al., “Deep reinforcement learning in a handful of trials using probabilistic dynamics models,” in Proc. Adv. Neural Inf. Proces. Syst. (NIPS), Montréal, Canada, Dec. 2018.
  19. A. S. Polydoros and L. Nalpantidis, “Survey of model-based reinforcement learning: Applications on robotics,” J. Intell. Robot. Syst., vol. 86, no. 2, pp. 153–173, May 2017.
  20. C. Jin, Z. Allen-Zhu, S. Bubeck, et al., “Is q-learning provably efficient?” in Proc. Adv. Neural Inf. Proces. Syst. (NIPS), Montreal, QC, Canada, Dec. 2018.
  21. Z. Zhang, Y. Zhou, and X. Ji, “Almost optimal model-free reinforcement learningvia reference-advantage decomposition,” in Proc. Adv. Neural Inf. Proces. Syst. (NIPS), Vancouver, BC, Canada, Dec. 2020.
  22. J. B. Rawlings, “Tutorial overview of model predictive control,” IEEE Contr. Syst. Mag., vol. 20, no. 3, pp. 38–52, Jun. 2000.
  23. D. Corneil and J. Fonlupt, “The complexity of generalized clique covering,” Discrete Appl. Math., vol. 22, no. 2, pp. 109–118, 1988. [Online]. Available: https://www.sciencedirect.com/science/article/pii/0166218X88900868
  24. A. Tuynman and R. Ortner, “Transfer in reinforcement learning via regret bounds for learning agents,” Feb. 2022. [Online]. Available: https://api.semanticscholar.org/CorpusID:246473288
  25. M. Zhou, J. Luo, J. Villella, et al., “SMARTS: An open-source scalable multi-agent rl training school for autonomous driving,” in Proc. Mach. Learn. Res. (PMLR), Cambridge, MA, USA, Nov 2021.
  26. L. Wei, Z. Li, J. Gong, et al., “Autonomous driving strategies at intersections: Scenarios, state-of-the-art, and future outlooks,” in IEEE Conf. Intell. Transport. Syst. Proc. (ITSC), Indianapolis, IN, United states, Sep. 2021.
  27. D. Huseljic, B. Sick, M. Herde, et al., “Separation of aleatoric and epistemic uncertainty in deterministic deep neural networks,” in Proc. Int. Conf. Pattern Recognit. (ICPR), Milan, Italy, Jan. 2021.
  28. M. Deisenroth and C. E. Rasmussen, “PILCO: A model-based and data-efficient approach to policy search,” in Proc. Int. Conf. Mach. Learn. (ICML), Bellevue, Washington, USA, Jun 2011.
  29. S. Gu, T. Lillicrap, I. Sutskever, et al., “Continuous deep q-learning with model-based acceleration,” in Proc. Int. Conf. Mach. Learn. (ICML), New York, USA, Jun. 2016.
  30. M. Kearns and S. Singh, “Near-optimal reinforcement learning in polynomial time,” Mach. Learn., vol. 49, no. 2, pp. 209–232, Nov. 2002.
  31. P. Auer and R. Ortner, “Logarithmic online regret bounds for undiscounted reinforcement learning,” in Proc. Adv. Neural Inf. Proces. Syst. (NIPS), Vancouver, BC, Canada, Dec. 2006.
  32. T. Jaksch, R. Ortner, and P. Auer, “Near-optimal regret bounds for reinforcement learning,” J. Mach. Learn. Res., vol. 11, pp. 1563–1600, Aug. 2010.
  33. R. Ortner and D. Ryabko, “Online regret bounds for undiscounted continuous reinforcement learning,” in Proc. Adv. Neural Inf. Proces. Syst. (NIPS), Stateline, NV, USA, Dec. 2012.
  34. I. Osband, D. Russo, and B. Van Roy, “(More) Efficient reinforcement learning via posterior sampling,” in Proc. Adv. Neural Inf. Proces. Syst. (NIPS), Lake Tahoe, Nevada, USA, Dec. 2013.
  35. B. Hao, Y. Abbasi Yadkori, Z. Wen, et al., “Bootstrapping Upper Confidence Bound,” in Proc. Adv. Neural Inf. Proces. Syst. (NIPS), Vancouver, Canada, Dec. 2019.
  36. J. Lidard, U. Madhushani, and N. E. Leonard, “Provably efficient multi-agent reinforcement learning with fully decentralized communication,” in Proc. Am. Control Conf. (ACC), Atlanta, GA, USA, Jun. 2022.
  37. D. S. Bernstein, R. Givan, N. Immerman, et al., “The complexity of decentralized control of markov decision processes,” Math. Oper. Res., vol. 27, no. 4, pp. 819–840, Nov. 2002.
  38. P. Gajane, R. Ortner, and P. Auer, “Variational regret bounds for reinforcement learning,” in Proc. UAI 2019, Tel Aviv-Yafo, Israel, May 2019.
  39. C. B. Browne, E. Powley, D. Whitehouse, et al., “A survey of monte carlo tree search methods,” IEEE Trans. Comp. Intel. AI, vol. 4, no. 1, pp. 1–43, Mar. 2012.
  40. W. Li and E. Todorov, “Iterative linear quadratic regulator design for nonlinear biological movement systems,” in Lect. Notes Electr. Eng. (ICINCO), Setúbal, Portugal, Aug. 2004.
  41. Z. I. Botev, D. P. Kroese, R. Y. Rubinstein, et al., “The cross-entropy method for optimization,” in Handbook of statistics, 2013, vol. 31, pp. 35–59.
  42. A. Girard, C. E. Rasmussen, J. Quinonero-Candela, et al., “Multiple-step ahead prediction for non linear dynamic systems–a gaussian process treatment with propagation of the uncertainty,” in Proc. Adv. Neural Inf. Proces. Syst. (NIPS), Vancouver, British Columbia, Canada, Nov. 2002.
  43. W. Hoeffding, “Probability inequalities for sums of bounded random variables,” J Am. Stat. Assoc., vol. 58, no. 301, pp. 13–30, 1994.
  44. R. Ortner, O.-A. Maillard, and D. Ryabko, “Selecting near-optimal approximate state representations in reinforcement learning,” in Lect. Notes Comput. Sci. (LNCS), Bled, Slovenia, May 2014.
  45. V. Mnih, K. Kavukcuoglu, D. Silver, et al., “Human-level control through deep reinforcement learning,” Nature, vol. 518, no. 7540, pp. 529–533, Feb. 2015.
  46. T. Haarnoja, A. Zhou, P. Abbeel, et al., “Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor,” in Proc. Mach. Learn. Res. (PMLR), Stockholm, Sweden, Jul. 2018.

Summary

We haven't generated a summary for this paper yet.