Conformal Off-Policy Prediction for Multi-Agent Systems (2403.16871v2)
Abstract: Off-Policy Prediction (OPP), i.e., predicting the outcomes of a target policy using only data collected under a nominal (behavioural) policy, is a paramount problem in data-driven analysis of safety-critical systems where the deployment of a new policy may be unsafe. To achieve dependable off-policy predictions, recent work on Conformal Off-Policy Prediction (COPP) leverage the conformal prediction framework to derive prediction regions with probabilistic guarantees under the target process. Existing COPP methods can account for the distribution shifts induced by policy switching, but are limited to single-agent systems and scalar outcomes (e.g., rewards). In this work, we introduce MA-COPP, the first conformal prediction method to solve OPP problems involving multi-agent systems, deriving joint prediction regions for all agents' trajectories when one or more ego agents change their policies. Unlike the single-agent scenario, this setting introduces higher complexity as the distribution shifts affect predictions for all agents, not just the ego agents, and the prediction task involves full multi-dimensional trajectories, not just reward values. A key contribution of MA-COPP is to avoid enumeration or exhaustive search of the output space of agent trajectories, which is instead required by existing COPP methods to construct the prediction region. We achieve this by showing that an over-approximation of the true joint prediction region (JPR) can be constructed, without enumeration, from the maximum density ratio of the JPR trajectories. We evaluate the effectiveness of MA-COPP in multi-agent systems from the PettingZoo library and the F1TENTH autonomous racing environment, achieving nominal coverage in higher dimensions and various shift settings.
- M. Uehara, C. Shi, and N. Kallus, “A review of off-policy evaluation in reinforcement learning,” arXiv preprint arXiv:2212.06355, 2022.
- S. Levine, A. Kumar, G. Tucker, and J. Fu, “Offline reinforcement learning: Tutorial, review, and perspectives on open problems,” arXiv preprint arXiv:2005.01643, 2020.
- S. A. Murphy, M. J. van der Laan, J. M. Robins, and C. P. P. R. Group, “Marginal mean models for dynamic regimes,” Journal of the American Statistical Association, vol. 96, no. 456, pp. 1410–1423, 2001.
- H. Le, C. Voloshin, and Y. Yue, “Batch policy learning under constraints,” in International Conference on Machine Learning. PMLR, 2019, pp. 3703–3712.
- A. N. Angelopoulos and S. Bates, “A gentle introduction to conformal prediction and distribution-free uncertainty quantification,” arXiv preprint arXiv:2107.07511, 2021.
- L. Bortolussi, F. Cairoli, N. Paoletti, S. A. Smolka, and S. D. Stoller, “Neural predictive monitoring,” in Runtime Verification: 19th International Conference, RV 2019, Porto, Portugal, October 8–11, 2019, Proceedings 19. Springer, 2019, pp. 129–147.
- F. Cairoli, N. Paoletti, and L. Bortolussi, “Conformal quantitative predictive monitoring of stl requirements for stochastic processes,” in Proceedings of the 26th ACM International Conference on Hybrid Systems: Computation and Control, 2023, pp. 1–11.
- X. Yu, Y. Zhao, X. Yin, and L. Lindemann, “Signal temporal logic control synthesis among uncontrollable dynamic agents with conformal prediction,” arXiv preprint arXiv:2312.04242, 2023.
- S. Yang, G. J. Pappas, R. Mangharam, and L. Lindemann, “Safe perception-based control under stochastic sensor uncertainty using conformal prediction,” arXiv preprint arXiv:2304.00194, 2023.
- R. J. Tibshirani, R. Foygel Barber, E. Candes, and A. Ramdas, “Conformal prediction under covariate shift,” Advances in neural information processing systems, vol. 32, 2019.
- M. F. Taufiq, J.-F. Ton, R. Cornish, Y. W. Teh, and A. Doucet, “Conformal off-policy prediction in contextual bandits,” in Advances in Neural Information Processing Systems, S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, Eds., vol. 35. Curran Associates, Inc., 2022, pp. 31 512–31 524. [Online]. Available: https://proceedings.neurips.cc/paper_files/paper/2022/file/cc84bfabe6389d8883fc2071c848f62a-Paper-Conference.pdf
- D. Foffano, A. Russo, and A. Proutiere, “Conformal off-policy evaluation in markov decision processes,” arXiv preprint arXiv:2304.02574, 2023.
- Y. Zhang, C. Shi, and S. Luo, “Conformal off-policy prediction,” in International Conference on Artificial Intelligence and Statistics. PMLR, 2023, pp. 2751–2768.
- J. Terry, B. Black, N. Grammel, M. Jayakumar, A. Hari, R. Sullivan, L. S. Santos, C. Dieffendahl, C. Horsch, R. Perez-Vicente et al., “Pettingzoo: Gym for multi-agent reinforcement learning,” Advances in Neural Information Processing Systems, vol. 34, pp. 15 032–15 043, 2021.
- M. O’Kelly, H. Zheng, D. Karthik, and R. Mangharam, “F1TENTH: An Open-source Evaluation Environment for Continuous Control and Reinforcement Learning,” in Proceedings of the NeurIPS 2019 Competition and Demonstration Track. PMLR, Aug. 2020, pp. 77–89, iSSN: 2640-3498. [Online]. Available: https://proceedings.mlr.press/v123/o-kelly20a.html
- L. Lei and E. J. Candès, “Conformal inference of counterfactuals and individual treatment effects,” Journal of the Royal Statistical Society Series B: Statistical Methodology, vol. 83, no. 5, pp. 911–938, 2021.
- M. Cleaveland, I. Lee, G. J. Pappas, and L. Lindemann, “Conformal prediction regions for time series using linear complementarity programming,” arXiv preprint arXiv:2304.01075, 2023.
- M. Althoff, M. Koschi, and S. Manzinger, “CommonRoad: Composable benchmarks for motion planning on roads,” in 2017 IEEE Intelligent Vehicles Symposium (IV), Jun. 2017, pp. 719–726. [Online]. Available: https://ieeexplore.ieee.org/document/7995802
- A. N. Angelopoulos, A. P. Kohli, S. Bates, M. Jordan, J. Malik, T. Alshaabi, S. Upadhyayula, and Y. Romano, “Image-to-image regression with distribution-free uncertainty quantification and applications in imaging,” in International Conference on Machine Learning. PMLR, 2022, pp. 717–730.
- V. Quach, A. Fisch, T. Schuster, A. Yala, J. H. Sohn, T. S. Jaakkola, and R. Barzilay, “Conformal language modeling,” arXiv preprint arXiv:2306.10193, 2023.
- L. Bortolussi, F. Cairoli, N. Paoletti, S. A. Smolka, and S. D. Stoller, “Neural predictive monitoring and a comparison of frequentist and bayesian approaches,” International Journal on Software Tools for Technology Transfer, vol. 23, no. 4, pp. 615–640, 2021.
- F. Cairoli, L. Bortolussi, and N. Paoletti, “Neural predictive monitoring under partial observability,” in Runtime Verification: 21st International Conference, RV 2021, Virtual Event, October 11–14, 2021, Proceedings 21. Springer, 2021, pp. 121–141.
- L. Lindemann, X. Qin, J. V. Deshmukh, and G. J. Pappas, “Conformal prediction for stl runtime verification,” in Proceedings of the ACM/IEEE 14th International Conference on Cyber-Physical Systems (with CPS-IoT Week 2023), 2023, pp. 142–153.
- L. Lindemann, M. Cleaveland, G. Shim, and G. J. Pappas, “Safe planning in dynamic environments using conformal prediction,” IEEE Robotics and Automation Letters, 2023.
- A. Dixit, L. Lindemann, S. X. Wei, M. Cleaveland, G. J. Pappas, and J. W. Burdick, “Adaptive conformal prediction for motion planning among dynamic agents,” in Learning for Dynamics and Control Conference. PMLR, 2023, pp. 300–314.
- A. Z. Ren, A. Dixit, A. Bodrova, S. Singh, S. Tu, N. Brown, P. Xu, L. Takayama, F. Xia, J. Varley et al., “Robots that ask for help: Uncertainty alignment for large language model planners,” arXiv preprint arXiv:2307.01928, 2023.
- A. Muthali, H. Shen, S. Deglurkar, M. H. Lim, R. Roelofs, A. Faust, and C. Tomlin, “Multi-agent reachability calibration with conformal prediction,” arXiv preprint arXiv:2304.00432, 2023.
- K. Stankeviciute, A. M Alaa, and M. van der Schaar, “Conformal time-series forecasting,” Advances in neural information processing systems, vol. 34, pp. 6216–6228, 2021.
- S. Sun and R. Yu, “Copula conformal prediction for multi-step time series forecasting,” arXiv preprint arXiv:2212.03281, 2022.
- I. Gibbs and E. Candes, “Adaptive conformal inference under distribution shift,” Advances in Neural Information Processing Systems, vol. 34, pp. 1660–1672, 2021.
- A. Gendler, T.-W. Weng, L. Daniel, and Y. Romano, “Adversarially robust conformal prediction,” in International Conference on Learning Representations, 2021.