Papers
Topics
Authors
Recent
Search
2000 character limit reached

System Safety Monitoring of Learned Components Using Temporal Metric Forecasting

Published 21 May 2024 in cs.LG, cs.AI, cs.RO, and cs.SE | (2405.13254v3)

Abstract: In learning-enabled autonomous systems, safety monitoring of learned components is crucial to ensure their outputs do not lead to system safety violations, given the operational context of the system. However, developing a safety monitor for practical deployment in real-world applications is challenging. This is due to limited access to internal workings and training data of the learned component. Furthermore, safety monitors should predict safety violations with low latency, while consuming a reasonable amount of computation. To address the challenges, we propose a safety monitoring method based on probabilistic time series forecasting. Given the learned component outputs and an operational context, we empirically investigate different Deep Learning (DL)-based probabilistic forecasting to predict the objective measure capturing the satisfaction or violation of a safety requirement (safety metric). We empirically evaluate safety metric and violation prediction accuracy, and inference latency and resource usage of four state-of-the-art models, with varying horizons, using autonomous aviation and autonomous driving case studies. Our results suggest that probabilistic forecasting of safety metrics, given learned component outputs and scenarios, is effective for safety monitoring. Furthermore, for both case studies, Temporal Fusion Transformer (TFT) was the most accurate model for predicting imminent safety violations, with acceptable latency and resource consumption.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (74)
  1. Report on the Loss of the Mars Polar Lander and Deep Space 2 Missions.
  2. GluonTS: Probabilistic and Neural Time Series Modeling in Python. Journal of Machine Learning Research 21, 116 (2020), 1–6. http://jmlr.org/papers/v21/19-820.html
  3. Assured Integration of Machine Learning-based Autonomy on Aviation Platforms. In 2020 AIAA/IEEE 39th Digital Avionics Systems Conference (DASC). IEEE, Institute of Electrical and Electronics Engineers (IEEE), San Antonio, TX, USA, 1–10.
  4. Dynamic Assurance Cases: A Pathway to Trusted Autonomy. Computer 53, 12 (2020), 35–46. https://doi.org/10.1109/MC.2020.3022030
  5. Towards Quantification of Assurance for Learning-Enabled Components. In 2019 15th European Dependable Computing Conference (EDCC). IEEE, New York, NY, US, 55–62. https://doi.org/10.1109/EDCC.2019.00021
  6. Quantifying Assurance in Learning-Enabled Systems. In Computer Safety, Reliability, and Security, António Casimiro, Frank Ortmeier, Friedemann Bitsch, and Pedro Ferreira (Eds.). Springer International Publishing, Cham, 270–286.
  7. Ricardo Baeza-Yates and Berthier Ribeiro-Neto. 1999. Modern Information Retrieval. ACM, New York, NY, USA.
  8. Deep Learning for Time Series Forecasting: Tutorial and Literature Survey. ACM Comput. Surv. 55, 6, Article 121 (dec 2022), 36 pages. https://doi.org/10.1145/3533382
  9. Jennifer Black and Philip Koopman. 2009. System Safety as an Emergent Property in Composite Systems. In 2009 IEEE/IFIP International Conference on Dependable Systems & Networks. IEEE, IEEE, New York, NY, USA, 369–378.
  10. Anomaly Detection in Autonomous Driving: A Survey. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops. IEEE, New York, NY, USA, 4488–4499.
  11. End to End Learning for Self-Driving Cars. arXiv:1604.07316 [cs.CV]
  12. Ergo, SMIRK is safe: A Safety Case for a Machine Learning Component in a Pedestrian Automatic Emergency Brake System. Software Quality Journal 31, 2 (2023), 335–403.
  13. Time series analysis: Forecasting and Control. John Wiley & Sons, Hoboken, NJ, USA. https://doi.org/10.1002/9781118619193
  14. NHITS: Neural Hierarchical Interpolation for Time Series Forecasting. Proceedings of the AAAI Conference on Artificial Intelligence 37, 6 (Jun. 2023), 6989–6997. https://doi.org/10.1609/aaai.v37i6.25854
  15. Run-Time Assurance for Learning-Enabled Systems. In NASA Formal Methods, Ritchie Lee, Susmit Jha, Anastasia Mavridou, and Dimitra Giannakopoulou (Eds.). Springer International Publishing, Cham, 361–368.
  16. Yarin Gal and Zoubin Ghahramani. 2016. Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning. In Proceedings of The 33rd International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 48), Maria Florina Balcan and Kilian Q. Weinberger (Eds.). PMLR, New York, NY, USA, 1050–1059. https://proceedings.mlr.press/v48/gal16.html
  17. Predicting Safety Misbehaviours in Autonomous Driving Systems using Uncertainty Quantification. In Proceedings of 17th IEEE International Conference on Software Testing, Verification and Validation (ICST ’24). IEEE, New York, NY, USA, 12 pages.
  18. Monitoring Perception Reliability in Autonomous Driving: Distributional Shift Detection for Estimating the Impact of Input Data on Prediction Accuracy. In Proceedings of the 5th ACM Computer Science in Cars Symposium (Ingolstadt, Germany) (CSCS ’21). Association for Computing Machinery, New York, NY, USA, Article 8, 9 pages. https://doi.org/10.1145/3488904.3493382
  19. Dan Hendrycks and Kevin Gimpel. 2017. A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net, online, 12 pages. https://openreview.net/forum?id=Hkg4TI9xl
  20. Towards Structured Evaluation of Deep Neural Network Supervisors. In 2019 IEEE International Conference On Artificial Intelligence Testing (AITest). IEEE, New York, NY, USA, 27–34. https://doi.org/10.1109/aitest.2019.00-12
  21. A Survey of Safety and Trustworthiness of Deep Neural Networks: Verification, Testing, Adversarial Attack and Defence, and Interpretability. Computer Science Review 37 (2020), 100270.
  22. DeepGuard: A Framework for Safeguarding Autonomous Driving Systems from Inconsistent Behaviour. Automated Software Engineering 29, 1 (2022), 1.
  23. Indy Autonomous Challenge 2024. Indy Autonomous Challenge. Indy Autonomous Challenge. Retrieved March 22, 2024 from https://www.indyautonomouschallenge.com/
  24. Criteria for Classifying Forecasting Methods. International Journal of Forecasting 36, 1 (2020), 167–177. https://doi.org/10.1016/j.ijforecast.2019.05.008 M4 Competition.
  25. Deep Neural Network Compression for Aircraft Collision Avoidance Systems. arXiv:1810.04240
  26. An Autonomous System for Head-to-Head Race: Design, Implementation and Analysis; Team KAIST at the Indy Autonomous Challenge. arXiv:2303.09463 [cs.RO]
  27. Case Study: Analysis of Autonomous Center Line Tracking Neural Networks. In Software Verification, Roderick Bloem, Rayna Dimitrova, Chuchu Fan, and Natasha Sharygina (Eds.). Springer International Publishing, Cham, 104–121.
  28. NASA ULI X-Plane Simulator. Stanford ASL. Retrieved May 7, 2024 from https://github.com/StanfordASL/NASA_ULI_Xplane_Simulator
  29. Verification of image-based neural network controllers using generative models. Journal of Aerospace Information Systems 19, 9 (2022), 574–584.
  30. Alex Kendall and Yarin Gal. 2017. What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision?. In Advances in Neural Information Processing Systems, I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.), Vol. 30. Curran Associates, Inc., New York, NY, USA. https://proceedings.neurips.cc/paper_files/paper/2017/file/2650d6089a6d640c5e85b2b88265dc2b-Paper.pdf
  31. Diederik P. Kingma and Jimmy Ba. 2017. Adam: A Method for Stochastic Optimization. arXiv:1412.6980 [cs.LG]
  32. Stephan Kolassa. 2016. Sometimes It’s Better to Be Simple than Correct. Foresight: The International Journal of Applied Forecasting 40 (2016), 20 – 26. https://search.ebscohost.com/login.aspx?direct=true&db=bth&AN=114335722&site=ehost-live
  33. Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles. In Advances in Neural Information Processing Systems, I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.), Vol. 30. Curran Associates, Inc., New York, NY, USA. https://proceedings.neurips.cc/paper_files/paper/2017/file/9ef2ed4b7fd2c810847ffa5fa85bce38-Paper.pdf
  34. Nancy G. Leveson. 2012. Engineering a Safer World. The MIT Press, Boston, MA, USA. 608 pages. https://doi.org/10.7551/mitpress/8179.001.0001
  35. Temporal Fusion Transformers for interpretable multi-horizon time series forecasting. International Journal of Forecasting 37, 4 (2021), 1748–1764. https://doi.org/10.1016/j.ijforecast.2021.03.012
  36. Bryan Lim and Stefan Zohren. 2021. Time-series Forecasting with Deep Learning: A Survey. Philosophical Transactions of the Royal Society A 379, 2194 (2021), 20200209.
  37. Felipe Tomazelli Lima and Vinicius M.A. Souza. 2023. A Large Comparison of Normalization Methods on Time Series. Big Data Research 34 (2023), 100407. https://doi.org/10.1016/j.bdr.2023.100407
  38. Deep Learning-based Anomaly Detection in Cyber-physical Systems: Progress and Opportunities. ACM Comput. Surv. 54, 5, Article 106 (may 2021), 36 pages. https://doi.org/10.1145/3453155
  39. R. J. Beckman M. D. Mckay and W. J. Conover. 2000. A Comparison of Three Methods for Selecting Values of Input Variables in the Analysis of Output From a Computer Code. Technometrics 42, 1 (2000), 55–61. https://doi.org/10.1080/00401706.2000.10485979 arXiv:https://www.tandfonline.com/doi/pdf/10.1080/00401706.2000.10485979
  40. David J. C. MacKay. 1992. A Practical Bayesian Framework for Backpropagation Networks. Neural Computation 4, 3 (05 1992), 448–472. https://doi.org/10.1162/neco.1992.4.3.448 arXiv:https://direct.mit.edu/neco/article-pdf/4/3/448/812348/neco.1992.4.3.448.pdf
  41. The M4 Competition: 100,000 Time Series and 61 Forecasting Methods. International Journal of Forecasting 36, 1 (2020), 54–74. https://doi.org/10.1016/j.ijforecast.2019.04.014 M4 Competition.
  42. Statistical, Machine Learning and Deep Learning Forecasting Methods: Comparisons and Ways Forward. Journal of the Operational Research Society 74, 3 (2023), 840–859. https://doi.org/10.1080/01605682.2022.2118629 arXiv:https://doi.org/10.1080/01605682.2022.2118629
  43. H. B. Mann and D. R. Whitney. 1947. On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other. The Annals of Mathematical Statistics 18, 1 (1947), 50–60. http://www.jstor.org/stable/2236101
  44. Taxonomy of Machine Learning Safety: A Survey and Primer. ACM Comput. Surv. 55, 8, Article 157 (dec 2022), 38 pages. https://doi.org/10.1145/3551385
  45. Can You Trust Your Model's Uncertainty? Evaluating Predictive Uncertainty under Dataset Shift. In Advances in Neural Information Processing Systems, H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett (Eds.), Vol. 32. Curran Associates, Inc., New York, NY, USA. https://proceedings.neurips.cc/paper_files/paper/2019/file/8558cb408c1d76621371888657d2eb1d-Paper.pdf
  46. Closed-Loop Analysis of Vision-Based Autonomous Systems: A Case Study. In Computer Aided Verification, Constantin Enea and Akash Lal (Eds.). Springer Nature Switzerland, Cham, 289–303.
  47. Closed-Loop Analysis of Vision-Based Autonomous Systems: A Case Study. In Computer Aided Verification, Constantin Enea and Akash Lal (Eds.). Springer Nature Switzerland, Cham, 289–303.
  48. Marco Peixeiro. 2022. Time Series Forecasting in Python. Simon and Schuster, New York City, NY, USA.
  49. Run-Time Monitoring of Machine Learning for Robotic Perception: A Survey of Emerging Trends. IEEE Access 9 (2021), 20067–20075. https://doi.org/10.1109/ACCESS.2021.3055015
  50. Testing Machine Learning based Systems: A Systematic Mapping. Empirical Software Engineering 25 (2020), 5193–5254.
  51. DeepAR: Probabilistic forecasting with autoregressive recurrent networks. International Journal of Forecasting 36, 3 (2020), 1181–1191. https://doi.org/10.1016/j.ijforecast.2019.07.001
  52. Introduction to Information Retrieval. Vol. 39. Cambridge University Press Cambridge, Cambridge, UK.
  53. Identifying the Hazard Boundary of ML-Enabled Autonomous Systems Using Cooperative Coevolutionary Search. IEEE Transactions on Software Engineering 49, 12 (2023), 5120–5138. https://doi.org/10.1109/TSE.2023.3327575
  54. Karen Simonyan and Andrew Zisserman. 2015. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv:1409.1556 [cs.CV]
  55. Dag I. K. Sjøberg and Gunnar Rye Bergersen. 2023. Construct Validity in Software Engineering. IEEE Transactions on Software Engineering 49, 3 (2023), 1374–1396. https://doi.org/10.1109/TSE.2022.3176725
  56. Leveraging ASTM Industry Standard F3269-17 for Providing Safe Operations of a Highly Autonomous Aircraft. In 2020 IEEE Aerospace Conference. Institute of Electrical and Electronics Engineers (IEEE), Big Sky, Montana, USA, 1–7. https://doi.org/10.1109/AERO47225.2020.9172434
  57. Dropout: A Simple Way to Prevent Neural Networks from Overfitting. The journal of machine learning research 15, 1 (2014), 1929–1958.
  58. ThirdEye: Attention Maps for Safe Autonomous Driving Systems. In Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering (Rochester, MI, USA) (ASE ’22). Association for Computing Machinery, New York, NY, USA, Article 102, 12 pages. https://doi.org/10.1145/3551349.3556968
  59. Andrea Stocco and Paolo Tonella. 2022. Confidence-driven Weighted Retraining for Predicting Safety-critical Failures in Autonomous Driving Systems. Journal of Software: Evolution and Process 34, 10 (2022), e2386. https://doi.org/10.1002/smr.2386 arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1002/smr.2386
  60. Misbehaviour Prediction for Autonomous Driving Systems. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering (Seoul, South Korea) (ICSE ’20). Association for Computing Machinery, New York, NY, USA, 359–371. https://doi.org/10.1145/3377811.3380353
  61. András Vargha and Harold D. Delaney. 2000. A Critique and Improvement of the CL Common Language Effect Size Statistics of McGraw and Wong. Journal of Educational and Behavioral Statistics 25, 2 (2000), 101–132. https://doi.org/10.3102/10769986025002101 arXiv:https://doi.org/10.3102/10769986025002101
  62. Dissector: Input Validation for Deep Learning Applications by Crossing-layer Dissection. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering (Seoul, South Korea) (ICSE ’20). Association for Computing Machinery, New York, NY, USA, 727–738. https://doi.org/10.1145/3377811.3380379
  63. A Multi-Horizon Quantile Recurrent Forecaster. arXiv:1711.11053 [stat.ML]
  64. Validity Concerns in Software Engineering Research. In Proceedings of the FSE/SDP Workshop on Future of Software Engineering Research (Santa Fe, New Mexico, USA) (FoSER ’10). Association for Computing Machinery, New York, NY, USA, 411–414. https://doi.org/10.1145/1882362.1882446
  65. X-Plane Core Team. 2024. X-Plane 11 Flight Simulator. Laminar Research, Columbia, South Carolina. https://www.x-plane.com/product/desktop/
  66. Self-Checking Deep Neural Networks in Deployment. In 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE). IEEE, New York, NY, US, 372–384. https://doi.org/10.1109/ICSE43902.2021.00044
  67. James Yae. 2022. Unintended Look-ahead Bias in Out-of-Sample Forecasting. Applied Economics Letters 0, 0 (2022), 1–5. https://doi.org/10.1080/13504851.2022.2159002 arXiv:https://doi.org/10.1080/13504851.2022.2159002
  68. Machine Learning Testing: Survey, Landscapes and Horizons. IEEE Transactions on Software Engineering 48, 1 (2022), 1–36. https://doi.org/10.1109/TSE.2019.2962027
  69. DeepRoad: GAN-based Metamorphic Testing and Input Validation Framework for Autonomous Driving Systems. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering (Montpellier, France) (ASE 2018). ACM, New York, NY, USA, 132–142. https://doi.org/10.1145/3238147.3238187
  70. DeepRoad: GAN-based Metamorphic Testing and Input Validation Framework for Autonomous Driving Systems. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering (Montpellier, France) (ASE ’18). Association for Computing Machinery, New York, NY, USA, 132–142. https://doi.org/10.1145/3238147.3238187
  71. Medical image synthesis with generative adversarial networks for tissue recognition. In 2018 IEEE International Conference on Healthcare Informatics (ICHI). IEEE, IEEE, New York, NY, US, 199–207.
  72. Learning from Longitudinal Data in Electronic Health Record and Genetic Data to Improve Cardiovascular Event Prediction. Scientific Reports 9, 1 (2019), 717. https://doi.org/10.1038/s41598-018-36745-x
  73. A Map of Threats to Validity of Systematic Literature Reviews in Software Engineering. In 2016 23rd Asia-Pacific Software Engineering Conference (APSEC). IEEE, New York, NY, USA, 153–160. https://doi.org/10.1109/APSEC.2016.031
  74. SMARLA: A Safety Monitoring Approach for Deep Reinforcement Learning Agents. arXiv:2308.02594

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 2 tweets with 0 likes about this paper.