Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Guarantees on Robot System Performance Using Stochastic Simulation Rollouts (2309.10874v2)

Published 19 Sep 2023 in cs.RO, cs.SY, and eess.SY

Abstract: We provide finite-sample performance guarantees for control policies executed on stochastic robotic systems. Given an open- or closed-loop policy and a finite set of trajectory rollouts under the policy, we bound the expected value, value-at-risk, and conditional-value-at-risk of the trajectory cost, and the probability of failure in a sparse cost setting. The bounds hold, with user-specified probability, for any policy synthesis technique and can be seen as a post-design safety certification. Generating the bounds only requires sampling simulation rollouts, without assumptions on the distribution or complexity of the underlying stochastic system. We adapt these bounds to also give a constraint satisfaction test to verify safety of the robot system. We provide a thorough analysis of the bound sensitivity to sim-to-real distribution shifts and provide results for constructing robust bounds that can tolerate some specified amount of distribution shift. Furthermore, we extend our method to apply when selecting the best policy from a set of candidates, requiring a multi-hypothesis correction. We show the statistical validity of our bounds in the Ant, Half-cheetah, and Swimmer MuJoCo environments and demonstrate our constraint satisfaction test with the Ant. Finally, using the 20 degree-of-freedom MuJoCo Shadow Hand, we show the necessity of the multi-hypothesis correction.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (60)
  1. E. Todorov, T. Erez, and Y. Tassa, “Mujoco: A physics engine for model-based control,” in 2012 IEEE/RSJ international conference on intelligent robots and systems, pp. 5026–5033, IEEE, 2012.
  2. R. de Lazcano, K. Andreas, J. J. Tai, S. R. Lee, and J. Terry, “Gymnasium Robotics,” 2023.
  3. P. Akella, M. Ahmadi, and A. D. Ames, “A scenario approach to risk-aware safety-critical system verification,” arXiv preprint arXiv:2203.02595, 2022.
  4. H. Krasowski, P. Akella, A. Ames, and M. Althoff, “Verifiably Safe Reinforcement Learning with Probabilistic Guarantees via Temporal Logic,” arXiv preprint arXiv:2212.06129, 2022.
  5. P. Akella, W. Ubellacker, and A. D. Ames, “Safety-Critical Controller Verification via Sim2Real Gap Quantification,” in 2023 IEEE International Conference on Robotics and Automation (ICRA), pp. 10539–10545, IEEE, 2023.
  6. P. Akella, A. Dixit, M. Ahmadi, J. W. Burdick, and A. D. Ames, “Sample-based bounds for coherent risk measures: Applications to policy synthesis and verification,” arXiv preprint arXiv:2204.09833, 2022.
  7. P. Akella, W. Ubellacker, and A. D. Ames, “Probabilistic Guarantees for Nonlinear Safety-Critical Optimal Control,” arXiv preprint arXiv:2303.06258, 2023.
  8. M. Cleaveland, L. Lindemann, R. Ivanov, and G. J. Pappas, “Risk verification of stochastic systems with neural network controllers,” Artificial Intelligence, vol. 313, p. 103782, 2022.
  9. A. Dvoretzky, J. Kiefer, and J. Wolfowitz, “Asymptotic minimax character of the sample distribution function and of the classical multinomial estimator,” The Annals of Mathematical Statistics, pp. 642–669, 1956.
  10. P. Massart, “The tight constant in the Dvoretzky-Kiefer-Wolfowitz inequality,” The annals of Probability, pp. 1269–1283, 1990.
  11. S. Carpin, Y.-L. Chow, and M. Pavone, “Risk aversion in finite Markov Decision Processes using total cost criteria and average value at risk,” in 2016 ieee international conference on robotics and automation (icra), pp. 335–342, IEEE, 2016.
  12. M. Ahmadi, X. Xiong, and A. D. Ames, “Risk-averse control via CVaR barrier functions: Application to bipedal robot locomotion,” IEEE Control Systems Letters, vol. 6, pp. 878–883, 2021.
  13. T. Lew, R. Bonalli, and M. Pavone, “Chance-constrained sequential convex programming for robust trajectory optimization,” in 2020 European Control Conference (ECC), pp. 1871–1878, IEEE, 2020.
  14. A. Hakobyan and I. Yang, “Wasserstein distributionally robust motion control for collision avoidance using conditional value-at-risk,” IEEE Transactions on Robotics, vol. 38, no. 2, pp. 939–957, 2021.
  15. A. Navsalkar and A. R. Hota, “Data-Driven Risk-sensitive Model Predictive Control for Safe Navigation in Multi-Robot Systems,” in 2023 IEEE International Conference on Robotics and Automation (ICRA), pp. 1442–1448, IEEE, 2023.
  16. A. Dixit, M. Ahmadi, and J. W. Burdick, “Risk-sensitive motion planning using entropic value-at-risk,” in 2021 European Control Conference (ECC), pp. 1726–1732, IEEE, 2021.
  17. E. R. Hunt, C. B. Cullen, and S. Hauert, “Value at Risk strategies for robot swarms in hazardous environments,” in Unmanned Systems Technology XXIII, vol. 11758, pp. 158–177, SPIE, 2021.
  18. T. Hiraoka, T. Imagawa, T. Mori, T. Onishi, and Y. Tsuruoka, “Learning robust options by conditional value at risk optimization,” Advances in Neural Information Processing Systems, vol. 32, 2019.
  19. K. Chua, R. Calandra, R. McAllister, and S. Levine, “Deep reinforcement learning in a handful of trials using probabilistic dynamics models,” Advances in neural information processing systems, vol. 31, 2018.
  20. R. Dyro, J. Harrison, A. Sharma, and M. Pavone, “Particle MPC for Uncertain and Learning-Based Control,” in 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 7127–7134, IEEE, 2021.
  21. T. Lew, R. Bonalli, and M. Pavone, “Sample Average Approximation for Stochastic Programming with Equality Constraints,” arXiv preprint arXiv:2206.09963, 2022.
  22. T. Lew, R. Bonalli, and M. Pavone, “Risk-Averse Trajectory Optimization via Sample Average Approximation,” arXiv preprint arXiv:2307.03167, 2023.
  23. G. Shafer and V. Vovk, “A Tutorial on Conformal Prediction,” Journal of Machine Learning Research, vol. 9, no. 3, 2008.
  24. R. Luo, S. Zhao, J. Kuck, B. Ivanovic, S. Savarese, E. Schmerling, and M. Pavone, “Sample-efficient safety assurances using conformal prediction,” in International Workshop on the Algorithmic Foundations of Robotics, pp. 149–169, Springer, 2023.
  25. L. Lindemann, M. Cleaveland, G. Shim, and G. J. Pappas, “Safe planning in dynamic environments using conformal prediction,” IEEE Robotics and Automation Letters, 2023.
  26. A. Dixit, L. Lindemann, S. X. Wei, M. Cleaveland, G. J. Pappas, and J. W. Burdick, “Adaptive conformal prediction for motion planning among dynamic agents,” in Learning for Dynamics and Control Conference, pp. 300–314, PMLR, 2023.
  27. D. B. Brown, “Large deviations bounds for estimating conditional value-at-risk,” Operations Research Letters, vol. 35, no. 6, pp. 722–730, 2007.
  28. Y. Wang and F. Gao, “Deviation inequalities for an estimator of the conditional value-at-risk,” Operations Research Letters, vol. 38, no. 3, pp. 236–239, 2010.
  29. P. Thomas and E. Learned-Miller, “Concentration inequalities for conditional value at risk,” in International Conference on Machine Learning, pp. 6225–6233, PMLR, 2019.
  30. R. K. Kolla, L. Prashanth, S. P. Bhat, and K. Jagannathan, “Concentration bounds for empirical conditional value-at-risk: The unbounded case,” Operations Research Letters, vol. 47, no. 1, pp. 16–20, 2019.
  31. W. Hoeffding, “Probability Inequalities for Sums of Bounded Random Variables,” Journal of the American Statistical Association, vol. 58, no. 301, pp. 13–30, 1963.
  32. L. Prashanth and S. P. Bhat, “A Wasserstein distance approach for concentration of empirical risk estimates,” The Journal of Machine Learning Research, vol. 23, no. 1, pp. 10830–10890, 2022.
  33. B. Szorenyi, R. Busa-Fekete, P. Weng, and E. Hüllermeier, “Qualitative multi-armed bandits: A quantile-based approach,” in International Conference on Machine Learning, pp. 1660–1668, PMLR, 2015.
  34. S. R. Howard and A. Ramdas, “Sequential estimation of quantiles with applications to A/B testing and best-arm identification,” Bernoulli, vol. 28, no. 3, pp. 1704–1728, 2022.
  35. R. Zieliński and W. Zieliński, “Best exact nonparametric confidence intervals for quantiles,” Statistics, vol. 39, no. 1, pp. 67–71, 2005.
  36. H. Scheffe and J. W. Tukey, “Non-parametric estimation. I. Validation of order statistics,” The Annals of Mathematical Statistics, vol. 16, no. 2, pp. 187–192, 1945.
  37. H. A. David and H. N. Nagaraja, Order statistics. John Wiley & Sons, 2004.
  38. T. W. Anderson, “Confidence limits for the expected value of an arbitrary bounded random variable with a continuous distribution function,” tech. rep., Stanford University Department of Statistics, 1969.
  39. M. Phan, P. Thomas, and E. Learned-Miller, “Towards practical mean bounds for small samples,” in International Conference on Machine Learning, pp. 8567–8576, PMLR, 2021.
  40. S. R. Howard, A. Ramdas, J. McAuliffe, and J. Sekhon, “Time-uniform, nonparametric, nonasymptotic confidence sequences,” The Annals of Statistics, vol. 49, no. 2, 2021.
  41. L. D. Brown, T. T. Cai, and A. DasGupta, “Interval estimation for a binomial proportion,” Statistical science, vol. 16, no. 2, pp. 101–133, 2001.
  42. A. M. Pires and C. Amado, “Interval estimators for a binomial proportion: Comparison of twenty methods,” REVSTAT-Statistical Journal, vol. 6, no. 2, pp. 165–197, 2008.
  43. C. J. Clopper and E. S. Pearson, “The use of confidence or fiducial limits illustrated in the case of the binomial,” Biometrika, vol. 26, no. 4, pp. 404–413, 1934.
  44. T. E. Sterne, “Some remarks on confidence or fiducial limits,” Biometrika, vol. 41, no. 1/2, pp. 275–278, 1954.
  45. E. L. Crow, “Confidence intervals for a proportion,” Biometrika, vol. 43, no. 3/4, pp. 423–435, 1956.
  46. M. W. Eudey, I. On the Treatment of Discontinuous Random Variables. II. Statistical Model for Comparing Two Methods of Diagnosis. PhD thesis, University of California - Berkeley, 1949.
  47. Springer, 2022.
  48. W. L. Stevens, “Fiducial limits of the parameter of a discontinuous distribution,” Biometrika, vol. 37, no. 1/2, pp. 117–129, 1950.
  49. W. Wang, “Smallest confidence intervals for one binomial proportion,” Journal of Statistical Planning and Inference, vol. 136, no. 12, pp. 4293–4306, 2006.
  50. V. Vovk, “Conditional validity of inductive conformal predictors,” in Asian conference on machine learning, pp. 475–490, PMLR, 2012.
  51. A. N. Angelopoulos and S. Bates, “A gentle introduction to conformal prediction and distribution-free uncertainty quantification,” arXiv preprint arXiv:2107.07511, 2021.
  52. P. Wawrzyński, “A cat-like robot real-time learning to run,” in Adaptive and Natural Computing Algorithms: 9th International Conference, ICANNGA 2009, Kuopio, Finland, April 23-25, 2009, Revised Selected Papers 9, pp. 380–390, Springer, 2009.
  53. J. Schulman, P. Moritz, S. Levine, M. Jordan, and P. Abbeel, “High-dimensional continuous control using generalized advantage estimation,” arXiv preprint arXiv:1506.02438, 2015.
  54. PhD thesis, Institut National Polytechnique de Grenoble-INPG, 2002.
  55. S. Mannor, R. Y. Rubinstein, and Y. Gat, “The cross entropy method for fast policy search,” in Proceedings of the 20th International Conference on Machine Learning (ICML-03), pp. 512–519, 2003.
  56. H. Abdi, “The Bonferonni and Šidák Corrections for Multiple Comparisons,” Encyclopedia of measurement and statistics, vol. 3, 01 2007.
  57. A. Melnik, L. Lach, M. Plappert, T. Korthals, R. Haschke, and H. Ritter, “Using tactile sensing to improve the sample efficiency and performance of deep deterministic policy gradients for simulated in-hand manipulation tasks,” Frontiers in Robotics and AI, p. 57, 2021.
  58. T. Howell, N. Gileadi, S. Tunyasuvunakool, K. Zakka, T. Erez, and Y. Tassa, “Predictive Sampling: Real-time Behaviour Synthesis with MuJoCo,” arXiv preprint arXiv:2212.00541, 2022.
  59. SIAM, 2021.
  60. R. Hulsman, “Distribution-Free Finite-Sample Guarantees and Split Conformal Prediction,” arXiv preprint arXiv:2210.14735, 2022.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Joseph A. Vincent (6 papers)
  2. Aaron O. Feldman (2 papers)
  3. Mac Schwager (88 papers)
Citations (2)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com