System Safety Monitoring of Learned Components Using Temporal Metric Forecasting
Abstract: In learning-enabled autonomous systems, safety monitoring of learned components is crucial to ensure their outputs do not lead to system safety violations, given the operational context of the system. However, developing a safety monitor for practical deployment in real-world applications is challenging. This is due to limited access to internal workings and training data of the learned component. Furthermore, safety monitors should predict safety violations with low latency, while consuming a reasonable amount of computation. To address the challenges, we propose a safety monitoring method based on probabilistic time series forecasting. Given the learned component outputs and an operational context, we empirically investigate different Deep Learning (DL)-based probabilistic forecasting to predict the objective measure capturing the satisfaction or violation of a safety requirement (safety metric). We empirically evaluate safety metric and violation prediction accuracy, and inference latency and resource usage of four state-of-the-art models, with varying horizons, using autonomous aviation and autonomous driving case studies. Our results suggest that probabilistic forecasting of safety metrics, given learned component outputs and scenarios, is effective for safety monitoring. Furthermore, for both case studies, Temporal Fusion Transformer (TFT) was the most accurate model for predicting imminent safety violations, with acceptable latency and resource consumption.
- Report on the Loss of the Mars Polar Lander and Deep Space 2 Missions.
- GluonTS: Probabilistic and Neural Time Series Modeling in Python. Journal of Machine Learning Research 21, 116 (2020), 1–6. http://jmlr.org/papers/v21/19-820.html
- Assured Integration of Machine Learning-based Autonomy on Aviation Platforms. In 2020 AIAA/IEEE 39th Digital Avionics Systems Conference (DASC). IEEE, Institute of Electrical and Electronics Engineers (IEEE), San Antonio, TX, USA, 1–10.
- Dynamic Assurance Cases: A Pathway to Trusted Autonomy. Computer 53, 12 (2020), 35–46. https://doi.org/10.1109/MC.2020.3022030
- Towards Quantification of Assurance for Learning-Enabled Components. In 2019 15th European Dependable Computing Conference (EDCC). IEEE, New York, NY, US, 55–62. https://doi.org/10.1109/EDCC.2019.00021
- Quantifying Assurance in Learning-Enabled Systems. In Computer Safety, Reliability, and Security, António Casimiro, Frank Ortmeier, Friedemann Bitsch, and Pedro Ferreira (Eds.). Springer International Publishing, Cham, 270–286.
- Ricardo Baeza-Yates and Berthier Ribeiro-Neto. 1999. Modern Information Retrieval. ACM, New York, NY, USA.
- Deep Learning for Time Series Forecasting: Tutorial and Literature Survey. ACM Comput. Surv. 55, 6, Article 121 (dec 2022), 36Â pages. https://doi.org/10.1145/3533382
- Jennifer Black and Philip Koopman. 2009. System Safety as an Emergent Property in Composite Systems. In 2009 IEEE/IFIP International Conference on Dependable Systems & Networks. IEEE, IEEE, New York, NY, USA, 369–378.
- Anomaly Detection in Autonomous Driving: A Survey. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops. IEEE, New York, NY, USA, 4488–4499.
- End to End Learning for Self-Driving Cars. arXiv:1604.07316Â [cs.CV]
- Ergo, SMIRK is safe: A Safety Case for a Machine Learning Component in a Pedestrian Automatic Emergency Brake System. Software Quality Journal 31, 2 (2023), 335–403.
- Time series analysis: Forecasting and Control. John Wiley & Sons, Hoboken, NJ, USA. https://doi.org/10.1002/9781118619193
- NHITS: Neural Hierarchical Interpolation for Time Series Forecasting. Proceedings of the AAAI Conference on Artificial Intelligence 37, 6 (Jun. 2023), 6989–6997. https://doi.org/10.1609/aaai.v37i6.25854
- Run-Time Assurance for Learning-Enabled Systems. In NASA Formal Methods, Ritchie Lee, Susmit Jha, Anastasia Mavridou, and Dimitra Giannakopoulou (Eds.). Springer International Publishing, Cham, 361–368.
- Yarin Gal and Zoubin Ghahramani. 2016. Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning. In Proceedings of The 33rd International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 48), Maria Florina Balcan and Kilian Q. Weinberger (Eds.). PMLR, New York, NY, USA, 1050–1059. https://proceedings.mlr.press/v48/gal16.html
- Predicting Safety Misbehaviours in Autonomous Driving Systems using Uncertainty Quantification. In Proceedings of 17th IEEE International Conference on Software Testing, Verification and Validation (ICST ’24). IEEE, New York, NY, USA, 12 pages.
- Monitoring Perception Reliability in Autonomous Driving: Distributional Shift Detection for Estimating the Impact of Input Data on Prediction Accuracy. In Proceedings of the 5th ACM Computer Science in Cars Symposium (Ingolstadt, Germany) (CSCS ’21). Association for Computing Machinery, New York, NY, USA, Article 8, 9 pages. https://doi.org/10.1145/3488904.3493382
- Dan Hendrycks and Kevin Gimpel. 2017. A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net, online, 12Â pages. https://openreview.net/forum?id=Hkg4TI9xl
- Towards Structured Evaluation of Deep Neural Network Supervisors. In 2019 IEEE International Conference On Artificial Intelligence Testing (AITest). IEEE, New York, NY, USA, 27–34. https://doi.org/10.1109/aitest.2019.00-12
- A Survey of Safety and Trustworthiness of Deep Neural Networks: Verification, Testing, Adversarial Attack and Defence, and Interpretability. Computer Science Review 37 (2020), 100270.
- DeepGuard: A Framework for Safeguarding Autonomous Driving Systems from Inconsistent Behaviour. Automated Software Engineering 29, 1 (2022), 1.
- Indy Autonomous Challenge 2024. Indy Autonomous Challenge. Indy Autonomous Challenge. Retrieved March 22, 2024 from https://www.indyautonomouschallenge.com/
- Criteria for Classifying Forecasting Methods. International Journal of Forecasting 36, 1 (2020), 167–177. https://doi.org/10.1016/j.ijforecast.2019.05.008 M4 Competition.
- Deep Neural Network Compression for Aircraft Collision Avoidance Systems. arXiv:1810.04240
- An Autonomous System for Head-to-Head Race: Design, Implementation and Analysis; Team KAIST at the Indy Autonomous Challenge. arXiv:2303.09463Â [cs.RO]
- Case Study: Analysis of Autonomous Center Line Tracking Neural Networks. In Software Verification, Roderick Bloem, Rayna Dimitrova, Chuchu Fan, and Natasha Sharygina (Eds.). Springer International Publishing, Cham, 104–121.
- NASA ULI X-Plane Simulator. Stanford ASL. Retrieved May 7, 2024 from https://github.com/StanfordASL/NASA_ULI_Xplane_Simulator
- Verification of image-based neural network controllers using generative models. Journal of Aerospace Information Systems 19, 9 (2022), 574–584.
- Alex Kendall and Yarin Gal. 2017. What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision?. In Advances in Neural Information Processing Systems, I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.), Vol. 30. Curran Associates, Inc., New York, NY, USA. https://proceedings.neurips.cc/paper_files/paper/2017/file/2650d6089a6d640c5e85b2b88265dc2b-Paper.pdf
- Diederik P. Kingma and Jimmy Ba. 2017. Adam: A Method for Stochastic Optimization. arXiv:1412.6980 [cs.LG]
- Stephan Kolassa. 2016. Sometimes It’s Better to Be Simple than Correct. Foresight: The International Journal of Applied Forecasting 40 (2016), 20 – 26. https://search.ebscohost.com/login.aspx?direct=true&db=bth&AN=114335722&site=ehost-live
- Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles. In Advances in Neural Information Processing Systems, I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.), Vol. 30. Curran Associates, Inc., New York, NY, USA. https://proceedings.neurips.cc/paper_files/paper/2017/file/9ef2ed4b7fd2c810847ffa5fa85bce38-Paper.pdf
- Nancy G. Leveson. 2012. Engineering a Safer World. The MIT Press, Boston, MA, USA. 608 pages. https://doi.org/10.7551/mitpress/8179.001.0001
- Temporal Fusion Transformers for interpretable multi-horizon time series forecasting. International Journal of Forecasting 37, 4 (2021), 1748–1764. https://doi.org/10.1016/j.ijforecast.2021.03.012
- Bryan Lim and Stefan Zohren. 2021. Time-series Forecasting with Deep Learning: A Survey. Philosophical Transactions of the Royal Society A 379, 2194 (2021), 20200209.
- Felipe Tomazelli Lima and Vinicius M.A. Souza. 2023. A Large Comparison of Normalization Methods on Time Series. Big Data Research 34 (2023), 100407. https://doi.org/10.1016/j.bdr.2023.100407
- Deep Learning-based Anomaly Detection in Cyber-physical Systems: Progress and Opportunities. ACM Comput. Surv. 54, 5, Article 106 (may 2021), 36Â pages. https://doi.org/10.1145/3453155
- R. J. Beckman M. D. Mckay and W. J. Conover. 2000. A Comparison of Three Methods for Selecting Values of Input Variables in the Analysis of Output From a Computer Code. Technometrics 42, 1 (2000), 55–61. https://doi.org/10.1080/00401706.2000.10485979 arXiv:https://www.tandfonline.com/doi/pdf/10.1080/00401706.2000.10485979
- David J. C. MacKay. 1992. A Practical Bayesian Framework for Backpropagation Networks. Neural Computation 4, 3 (05 1992), 448–472. https://doi.org/10.1162/neco.1992.4.3.448 arXiv:https://direct.mit.edu/neco/article-pdf/4/3/448/812348/neco.1992.4.3.448.pdf
- The M4 Competition: 100,000 Time Series and 61 Forecasting Methods. International Journal of Forecasting 36, 1 (2020), 54–74. https://doi.org/10.1016/j.ijforecast.2019.04.014 M4 Competition.
- Statistical, Machine Learning and Deep Learning Forecasting Methods: Comparisons and Ways Forward. Journal of the Operational Research Society 74, 3 (2023), 840–859. https://doi.org/10.1080/01605682.2022.2118629 arXiv:https://doi.org/10.1080/01605682.2022.2118629
- H. B. Mann and D. R. Whitney. 1947. On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other. The Annals of Mathematical Statistics 18, 1 (1947), 50–60. http://www.jstor.org/stable/2236101
- Taxonomy of Machine Learning Safety: A Survey and Primer. ACM Comput. Surv. 55, 8, Article 157 (dec 2022), 38Â pages. https://doi.org/10.1145/3551385
- Can You Trust Your Model's Uncertainty? Evaluating Predictive Uncertainty under Dataset Shift. In Advances in Neural Information Processing Systems, H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett (Eds.), Vol. 32. Curran Associates, Inc., New York, NY, USA. https://proceedings.neurips.cc/paper_files/paper/2019/file/8558cb408c1d76621371888657d2eb1d-Paper.pdf
- Closed-Loop Analysis of Vision-Based Autonomous Systems: A Case Study. In Computer Aided Verification, Constantin Enea and Akash Lal (Eds.). Springer Nature Switzerland, Cham, 289–303.
- Closed-Loop Analysis of Vision-Based Autonomous Systems: A Case Study. In Computer Aided Verification, Constantin Enea and Akash Lal (Eds.). Springer Nature Switzerland, Cham, 289–303.
- Marco Peixeiro. 2022. Time Series Forecasting in Python. Simon and Schuster, New York City, NY, USA.
- Run-Time Monitoring of Machine Learning for Robotic Perception: A Survey of Emerging Trends. IEEE Access 9 (2021), 20067–20075. https://doi.org/10.1109/ACCESS.2021.3055015
- Testing Machine Learning based Systems: A Systematic Mapping. Empirical Software Engineering 25 (2020), 5193–5254.
- DeepAR: Probabilistic forecasting with autoregressive recurrent networks. International Journal of Forecasting 36, 3 (2020), 1181–1191. https://doi.org/10.1016/j.ijforecast.2019.07.001
- Introduction to Information Retrieval. Vol. 39. Cambridge University Press Cambridge, Cambridge, UK.
- Identifying the Hazard Boundary of ML-Enabled Autonomous Systems Using Cooperative Coevolutionary Search. IEEE Transactions on Software Engineering 49, 12 (2023), 5120–5138. https://doi.org/10.1109/TSE.2023.3327575
- Karen Simonyan and Andrew Zisserman. 2015. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv:1409.1556Â [cs.CV]
- Dag I. K. Sjøberg and Gunnar Rye Bergersen. 2023. Construct Validity in Software Engineering. IEEE Transactions on Software Engineering 49, 3 (2023), 1374–1396. https://doi.org/10.1109/TSE.2022.3176725
- Leveraging ASTM Industry Standard F3269-17 for Providing Safe Operations of a Highly Autonomous Aircraft. In 2020 IEEE Aerospace Conference. Institute of Electrical and Electronics Engineers (IEEE), Big Sky, Montana, USA, 1–7. https://doi.org/10.1109/AERO47225.2020.9172434
- Dropout: A Simple Way to Prevent Neural Networks from Overfitting. The journal of machine learning research 15, 1 (2014), 1929–1958.
- ThirdEye: Attention Maps for Safe Autonomous Driving Systems. In Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering (Rochester, MI, USA) (ASE ’22). Association for Computing Machinery, New York, NY, USA, Article 102, 12 pages. https://doi.org/10.1145/3551349.3556968
- Andrea Stocco and Paolo Tonella. 2022. Confidence-driven Weighted Retraining for Predicting Safety-critical Failures in Autonomous Driving Systems. Journal of Software: Evolution and Process 34, 10 (2022), e2386. https://doi.org/10.1002/smr.2386 arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1002/smr.2386
- Misbehaviour Prediction for Autonomous Driving Systems. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering (Seoul, South Korea) (ICSE ’20). Association for Computing Machinery, New York, NY, USA, 359–371. https://doi.org/10.1145/3377811.3380353
- András Vargha and Harold D. Delaney. 2000. A Critique and Improvement of the CL Common Language Effect Size Statistics of McGraw and Wong. Journal of Educational and Behavioral Statistics 25, 2 (2000), 101–132. https://doi.org/10.3102/10769986025002101 arXiv:https://doi.org/10.3102/10769986025002101
- Dissector: Input Validation for Deep Learning Applications by Crossing-layer Dissection. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering (Seoul, South Korea) (ICSE ’20). Association for Computing Machinery, New York, NY, USA, 727–738. https://doi.org/10.1145/3377811.3380379
- A Multi-Horizon Quantile Recurrent Forecaster. arXiv:1711.11053Â [stat.ML]
- Validity Concerns in Software Engineering Research. In Proceedings of the FSE/SDP Workshop on Future of Software Engineering Research (Santa Fe, New Mexico, USA) (FoSER ’10). Association for Computing Machinery, New York, NY, USA, 411–414. https://doi.org/10.1145/1882362.1882446
- X-Plane Core Team. 2024. X-Plane 11 Flight Simulator. Laminar Research, Columbia, South Carolina. https://www.x-plane.com/product/desktop/
- Self-Checking Deep Neural Networks in Deployment. In 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE). IEEE, New York, NY, US, 372–384. https://doi.org/10.1109/ICSE43902.2021.00044
- James Yae. 2022. Unintended Look-ahead Bias in Out-of-Sample Forecasting. Applied Economics Letters 0, 0 (2022), 1–5. https://doi.org/10.1080/13504851.2022.2159002 arXiv:https://doi.org/10.1080/13504851.2022.2159002
- Machine Learning Testing: Survey, Landscapes and Horizons. IEEE Transactions on Software Engineering 48, 1 (2022), 1–36. https://doi.org/10.1109/TSE.2019.2962027
- DeepRoad: GAN-based Metamorphic Testing and Input Validation Framework for Autonomous Driving Systems. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering (Montpellier, France) (ASE 2018). ACM, New York, NY, USA, 132–142. https://doi.org/10.1145/3238147.3238187
- DeepRoad: GAN-based Metamorphic Testing and Input Validation Framework for Autonomous Driving Systems. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering (Montpellier, France) (ASE ’18). Association for Computing Machinery, New York, NY, USA, 132–142. https://doi.org/10.1145/3238147.3238187
- Medical image synthesis with generative adversarial networks for tissue recognition. In 2018 IEEE International Conference on Healthcare Informatics (ICHI). IEEE, IEEE, New York, NY, US, 199–207.
- Learning from Longitudinal Data in Electronic Health Record and Genetic Data to Improve Cardiovascular Event Prediction. Scientific Reports 9, 1 (2019), 717. https://doi.org/10.1038/s41598-018-36745-x
- A Map of Threats to Validity of Systematic Literature Reviews in Software Engineering. In 2016 23rd Asia-Pacific Software Engineering Conference (APSEC). IEEE, New York, NY, USA, 153–160. https://doi.org/10.1109/APSEC.2016.031
- SMARLA: A Safety Monitoring Approach for Deep Reinforcement Learning Agents. arXiv:2308.02594
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.