- The paper introduces a monitor construction algorithm that provably ensures robustness against bounded input perturbations in deep neural networks.
- It leverages formal symbolic reasoning and rigorous bound estimation to reduce false positives from 0.62% to 0.125% during runtime monitoring.
- Two monitor types—robust min-max and interval activation—are developed to accurately detect out-of-distribution inputs in safety-critical applications like autonomous driving.
This paper, "Provably-Robust Runtime Monitoring of Neuron Activation Patterns" (2011.11959), addresses the critical need for robust monitoring of deep neural networks (DNNs) in safety-critical applications, particularly autonomous driving. The core problem is that while DNNs are effective, their performance can degrade unexpectedly when encountering inputs that are distant from the training data (i.e., outside the operational design domain or ODD). Runtime monitors that detect such inputs are crucial for enabling safe fallback behaviors.
Existing activation-based monitors work by building an abstraction (a compact representation) of the neuron activation patterns observed during training on the ODD data. During operation, if an input produces an activation pattern not included in this abstraction, a warning is raised. However, a major challenge in real-world deployment is that these monitors can suffer from high false positives. Small, unmodeled perturbations in real-world inputs (e.g., minor lighting changes) can cause slight variations in neuron activations, triggering false alarms even when the input is conceptually within the ODD.
To tackle this, the paper proposes a novel monitor construction algorithm that integrates formal symbolic reasoning. The key idea is to make the monitor provably robust against bounded input perturbations. Instead of learning the range of neuron values or activation patterns for specific training data points, the new method learns the range of neuron values (or activation patterns) that could occur if the training input were subjected to a defined level of perturbation (Δ) at a specific layer (kp).
The construction process involves iterating through the training data set. For each training input, a "perturbation estimate" is computed. This estimate determines the rigorous lower and upper bounds [lj,uj] for the value of each monitored neuron j, considering all possible perturbations bounded by Δ applied at layer kp or earlier. Techniques from DNN verification, such as Interval Bound Propagation (IBP), Zonotopes, or Star Sets, can be used for this bound estimation.
The monitor abstraction is then built over these computed bounds, rather than the single values produced by the original training inputs. The paper details how this applies to two types of monitors:
- Robust Min-Max Monitors: The monitor tracks the minimum lower bound Lj and maximum upper bound Uj observed for each neuron j across all training data points and their corresponding perturbation ranges. An operational input triggers a warning if its neuron value Gjk(vop) falls outside [Lj,Uj].
- Robust Boolean/Interval Activation Monitors: For each neuron, the state is determined by which interval its value falls into (defined by thresholds). For a training input under perturbation, the estimated range [lj,uj] might overlap with multiple intervals. The monitor stores the set of all possible activation patterns (combinations of interval states for all monitored neurons) that can arise from perturbed training inputs. This set of patterns is efficiently represented using Binary Decision Diagrams (BDDs). An operational input triggers a warning if its specific activation pattern is not present in the stored set. The paper generalizes this to multi-bit interval activation, using multiple thresholds per neuron for finer-grained monitoring.
The formal guarantee provided by the robust monitor is that if it raises a warning for an operational input vop, then there is no training data input vtr whose activation pattern at layer kp is within a Δ-bounded distance of vop's activation pattern at layer kp. This means warnings are only raised for inputs that are genuinely distant from the perturbed versions of training data, significantly reducing false positives caused by expected minor variations.
Preliminary experiments were conducted in a lab setting using a DNN for visual waypoints. The robust monitor successfully reduced the false positive rate from 0.62% to 0.125% (an 80\% reduction) compared to a standard monitor, while maintaining a similar detection rate for inputs explicitly created to be outside the training distribution (e.g., dark conditions, ice).
The paper concludes by highlighting future work, including studying how to train DNNs that are inherently more "monitorable" and evaluating the technique in more complex, real-world autonomous driving scenarios, such as those involving 3D perception. The authors also plan to integrate these provable robustness claims into formal safety argumentation frameworks.