Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
Gemini 2.5 Pro
GPT-5
GPT-4o
DeepSeek R1 via Azure
2000 character limit reached

A Safe Preference Learning Approach for Personalization with Applications to Autonomous Vehicles (2311.02099v4)

Published 30 Oct 2023 in cs.AI, cs.SY, and eess.SY

Abstract: This work introduces a preference learning method that ensures adherence to given specifications, with an application to autonomous vehicles. Our approach incorporates the priority ordering of Signal Temporal Logic (STL) formulas describing traffic rules into a learning framework. By leveraging Parametric Weighted Signal Temporal Logic (PWSTL), we formulate the problem of safety-guaranteed preference learning based on pairwise comparisons and propose an approach to solve this learning problem. Our approach finds a feasible valuation for the weights of the given PWSTL formula such that, with these weights, preferred signals have weighted quantitative satisfaction measures greater than their non-preferred counterparts. The feasible valuation of weights given by our approach leads to a weighted STL formula that can be used in correct-and-custom-by-construction controller synthesis. We demonstrate the performance of our method with a pilot human subject study in two different simulated driving scenarios involving a stop sign and a pedestrian crossing. Our approach yields competitive results compared to existing preference learning methods in terms of capturing preferences and notably outperforms them when safety is considered.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (42)
  1. R. Karagulle, N. Arechiga, A. Best, J. Decastro, and N. Ozay, “Poster abstract: Safety guaranteed preference learning approach for autonomous vehicles,” ser. HSCC ’23.   New York, NY, USA: Association for Computing Machinery, 2023. [Online]. Available: https://doi.org/10.1145/3575870.3589549
  2. M. Hasenjäger and H. Wersing, “Personalization in advanced driver assistance systems and autonomous vehicles: A review,” in IEEE Intl. Conf. on Intelligent Transportation Systems, 2017, pp. 1–7.
  3. S. Y. Park, D. J. Moore, and D. Sirkin, “What a driver wants: User preferences in semi-autonomous vehicle decision-making,” in Conf. on Human Factors in Computing Systems.   ACM, 2020, p. 1–13.
  4. H. Bellem, B. Thiel, M. Schrauf, and J. F. Krems, “Comfort in automated driving: An analysis of preferences for different automated driving styles and their dependence on personality traits,” Transp. Res. F: Traffic Psychol. Behav., vol. 55, pp. 90–100, 2018.
  5. C. Basu, Q. Yang, D. Hungerman, M. Sinahal, and A. D. Draqan, “Do you want your autonomous car to drive like you?” in ACM/IEEE Intl. Conf. on Human-Robot Interaction, 2017, pp. 417–425.
  6. L. Lindemann and D. V. Dimarogonas, “Control barrier functions for signal temporal logic tasks,” IEEE Control Systems Letters, vol. 3, no. 1, pp. 96–101, 2019.
  7. V. Raman, A. Donzé, M. Maasoumy, R. M. Murray, A. Sangiovanni-Vincentelli, and S. A. Seshia, “Model predictive control with signal temporal logic specifications,” in 53rd IEEE Conf. on Decision and Control, 2014, pp. 81–87.
  8. M. Kloetzer and C. Belta, “Temporal logic planning and control of robotic swarms by hierarchical abstractions,” IEEE Trans. on Robotics, vol. 23, no. 2, pp. 320–330, 2007.
  9. G. Bombara and C. Belta, “Online learning of temporal logic formulae for signal classification,” in Eur. Control Conf., 2018, pp. 2057–2062.
  10. D. Li, M. Cai, C.-I. Vasile, and R. Tron, “Learning signal temporal logic through neural network for interpretable classification,” in American Control Conf., 2023, pp. 1907–1914.
  11. A. G. Puranic, J. V. Deshmukh, and S. Nikolaidis, “Learning performance graphs from demonstrations via task-based evaluations,” IEEE Robotics and Automation Letters, vol. 8, no. 1, pp. 336–343, 2023.
  12. N. Mehdipour, C.-I. Vasile, and C. Belta, “Specifying user preferences using weighted signal temporal logic,” IEEE Control Systems Letters, vol. 5, no. 6, pp. 2006–2011, 2021.
  13. G. A. Cardona, D. Kamale, and C.-I. Vasile, “Mixed integer linear programming approach for control synthesis with weighted signal temporal logic,” in Hybrid Systems: Computation and Control.   New York, NY, USA: ACM, 2023.
  14. K. Martyn and M. Kadziński, “Deep preference learning for multiple criteria decision analysis,” Eur. Journal of Operational Research, vol. 305, no. 2, pp. 781–805, 2023.
  15. D. Sadigh, A. D. Dragan, S. Sastry, and S. A. Seshia, “Active preference-based learning of reward functions,” Robotics: Science and Systems, vol. 13, 2017.
  16. E. Biyik and D. Sadigh, “Batch active preference-based learning of reward functions,” in Conf. on Robot Learning, vol. 87.   PMLR, 29–31 Oct 2018, pp. 519–528.
  17. M. Tucker, N. Csomay-Shanklin, W.-L. Ma, and A. D. Ames, “Preference-based learning for user-guided hzd gait generation on bipedal walking robots,” in IEEE Intl. Conf. on Robotics and Automation, 2021, pp. 2804–2810.
  18. R. Cosner, M. Tucker, A. Taylor, K. Li, T. Molnar, W. Ubelacker, A. Alan, G. Orosz, Y. Yue, and A. Ames, “Safety-aware preference-based learning for safety-critical control,” in Learning for Dynamics and Control Conf., vol. 168.   PMLR, 2022, pp. 1020–1033.
  19. E. Plaku and S. Karaman, “Motion planning with temporal-logic specifications: Progress and challenges,” AI Communications, vol. 29, pp. 151–162, 2016.
  20. P. Nilsson, O. Hussien, A. Balkan, Y. Chen, A. D. Ames, J. W. Grizzle, N. Ozay, H. Peng, and P. Tabuada, “Correct-by-construction adaptive cruise control: Two approaches,” IEEE Trans. on Control Systems Technology, vol. 24, no. 4, pp. 1294–1307, 2016.
  21. G. E. Fainekos, A. Girard, H. Kress-Gazit, and G. J. Pappas, “Temporal logic motion planning for dynamic robots,” Automatica, vol. 45, no. 2, pp. 343–352, 2009.
  22. A. Linard, I. Torre, B. Ermanno, A. Sleat, I. Leite, and J. Tumova, “Real-time rrt* with signal temporal logic preferences,” in Intl. Conf. on Intelligent Robots and Systems, 2023.
  23. D. Neider and I. Gavran, “Learning linear temporal properties,” in Formal Methods in Computer Aided Design, 2018, pp. 1–10.
  24. Z. Xu, M. Ornik, A. A. Julius, and U. Topcu, “Information-guided temporal logic inference with prior knowledge,” in American Control Conf., 2019, pp. 1891–1897.
  25. G. Chou, N. Ozay, and D. Berenson, “Explaining multi-stage tasks by learning temporal logic formulas from suboptimal demonstrations,” in Robotics: Science and Systems (RSS), 2020.
  26. Y. Jiang, S. Bharadwaj, B. Wu, R. Shah, U. Topcu, and P. Stone, “Temporal-logic-based reward shaping for continuing reinforcement learning tasks,” in AAAI Conf. on Artificial Intelligence, vol. 35, no. 9, 2021, pp. 7995–8003.
  27. X. Li, Y. Ma, and C. Belta, “A policy search method for temporal logic specified reinforcement learning tasks,” in American Control Conf., 2018, pp. 240–245.
  28. R. Karagulle, N. Aréchiga, J. DeCastro, and N. Ozay, “Classification of driving behaviors using stl formulas: A comparative study,” in Formal Modeling and Analysis of Timed Systems.   Springer Intl. Publishing, 2022, p. 153–162.
  29. H. Wang, H. He, W. Shang, and Z. Kan, “Temporal logic guided motion primitives for complex manipulation tasks with user preferences,” in Intl. Conf. on Robotics and Automation, 2022, pp. 4305–4311.
  30. R. Yan, A. Julius, M. Chang, A. Fokoue, T. Ma, and R. Uceda-Sosa, “Stone: Signal temporal logic neural network for time series classification,” in Intl. Conf. on Data Mining Workshops, 2021, pp. 778–787.
  31. N. Fronda and H. Abbas, “Differentiable inference of temporal logic formulas,” IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems, vol. 41, no. 11, pp. 4193–4204, 2022.
  32. O. Maler and D. Nickovic, “Monitoring temporal properties of continuous signals,” in Formal Techniques, Modelling and Analysis of Timed and Fault-Tolerant Systems, Y. Lakhnech and S. Yovine, Eds.   Springer Berlin Heidelberg, 2004, pp. 152–166.
  33. A. Donzé and O. Maler, “Robust satisfaction of temporal logic over real-valued signals,” in Formal Modeling and Analysis of Timed Systems.   Springer Berlin Heidelberg, 2010, pp. 92–106.
  34. P. Varnai and D. V. Dimarogonas, “On robustness metrics for learning stl tasks,” in American Control Conf., 2020, pp. 5394–5399.
  35. G. De Giacomo and M. Y. Vardi, “Linear temporal logic and linear dynamic logic on finite traces,” in Intl. Joint Conf. on Artificial Intelligence, 2013, p. 854–860.
  36. A. Donzé, T. Ferrère, and O. Maler, “Efficient robust monitoring for stl,” in Computer Aided Verification, 2013.
  37. X. Li, G. Rosman, I. Gilitschenski, C.-I. Vasile, J. A. DeCastro, S. Karaman, and D. Rus, “Vehicle trajectory prediction using generative adversarial network with temporal logic syntax tree features,” IEEE Robotics and Automation Letters, vol. 6, pp. 3459–3466, 2021.
  38. K. Leung, N. Arechiga, and M. Pavone, “Back-propagation through signal temporal logic specifications: Infusing logical structure into gradient-based methods,” in Algorithmic Foundations of Robotics XIV.   Springer Intl. Publishing, 2021, pp. 432–449.
  39. D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” in 3rd Intl. Conf. on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conf. Track Proceedings, 2015.
  40. Y. Kantaros and M. M. Zavlanos, “Sampling-based optimal control synthesis for multirobot systems under global temporal tasks,” IEEE Trans. on Automatic Control, vol. 64, no. 5, pp. 1916–1931, 2019.
  41. H. Mania, A. Guy, and B. Recht, “Simple random search of static linear policies is competitive for reinforcement learning,” in Advances in Neural Information Processing Systems, vol. 31, 2018.
  42. R. A. Bradley and M. E. Terry, “Rank analysis of incomplete block designs: I. the method of paired comparisons,” Biometrika, vol. 39, no. 3/4, pp. 324–345, 1952.
Citations (5)

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.