Runtime Monitoring Neuron Activation Patterns (1809.06573v2)

Published 18 Sep 2018 in cs.LG and stat.ML

Abstract: For using neural networks in safety critical domains, it is important to know if a decision made by a neural network is supported by prior similarities in training. We propose runtime neuron activation pattern monitoring - after the standard training process, one creates a monitor by feeding the training data to the network again in order to store the neuron activation patterns in abstract form. In operation, a classification decision over an input is further supplemented by examining if a pattern similar (measured by Hamming distance) to the generated pattern is contained in the monitor. If the monitor does not contain any pattern similar to the generated pattern, it raises a warning that the decision is not based on the training data. Our experiments show that, by adjusting the similarity-threshold for activation patterns, the monitors can report a significant portion of misclassfications to be not supported by training with a small false-positive rate, when evaluated on a test set.

Citations (70)

View on Semantic Scholar

Summary

The paper introduces a runtime monitoring technique that records neuron activation patterns during training to assess if new inputs deviate from learned behavior.
It employs Binary Decision Diagrams and Hamming distance expansions to efficiently represent and query activation patterns for robust anomaly detection.
The approach enhances safety by flagging out-of-pattern inputs in applications like autonomous driving and sensor fusion, signaling potential misclassifications.

This paper (1809.06573) introduces a runtime monitoring technique for neural networks to determine if a decision is supported by patterns seen during training. This is particularly relevant for safety-critical applications like autonomous driving, where understanding when a network operates "outside its comfort zone" is crucial.

The core idea is to build a monitor by recording neuron activation patterns from a trained network on its training data. During operation, the monitor checks if the activation pattern of a new input is sufficiently "similar" to any pattern seen during training. If not, it raises a warning that the decision might be unreliable because the input's pattern is not represented in the training data's "comfort zone."

Key Concepts and Implementation:

Neuron Activation Patterns: The paper focuses on networks using ReLU or similar activation functions. For a given input, the activation pattern of a specific layer (preferably a close-to-output layer representing high-level features) is captured as a binary vector. Each element in the vector corresponds to a neuron in that layer, being '1' if the neuron's output is positive (activated) and '0' if it's zero or negative (suppressed). The definition of the pattern $p_{\mathrm{relu}}(x)$ is:

$p_{\mathrm{relu}}(x) = \begin{cases} 1 & x > 0 \ 0 & \text{otherwise} \end{cases}$

For a layer with $d_l$ neurons, the pattern is $(p_{\mathrm{relu}}(v_1), \ldots, p_{\mathrm{relu}}(v_{d_l}))$ , where $(v_1, \ldots, v_{d_l})$ are the outputs of the layer before the ReLU activation.
Comfort Zone ( $\mathcal{Z}^\gamma_c$ ): For each class $c$ , the comfort zone $\mathcal{Z}^0_c$ is initially defined as the set of activation patterns observed for all training examples of class $c$ that were correctly classified by the network. To account for minor variations and allow some generalization, the comfort zone is expanded to include patterns within a Hamming distance of $\gamma$ from the patterns in $\mathcal{Z}^0_c$ . The Hamming distance $\mathcal{H}(p, p')$ between two binary patterns $p$ and $p'$ is the number of positions at which the corresponding symbols are different. The $\gamma$ -comfort zone $\mathcal{Z}^{\gamma}_c$ is defined recursively:

$\mathcal{Z}^{0}_{c} = \{ \text{pat}(f^{(l)}(\text{in})) \mid \text{in} \in \mathcal{T}_{c} \wedge \text{dec}_{f^{(L)}(\text{in})=c} \}$

$\mathcal{Z}^{\gamma}_{c} = \mathcal{Z}^{\gamma -1}_{c} \cup \{ p \mid p \in \{0,1\}^{d_l} \wedge \exists p' \in \mathcal{Z}^{\gamma -1}_{c} : \mathcal{H}(p, p')=1 \}$ , for $\gamma > 0$ .
Monitor Construction: The monitor is a collection of comfort zones, one for each class: $\langle \mathcal{Z}^{\gamma}_{1}, \ldots, \mathcal{Z}^{\gamma}_{C}\rangle$ . The paper proposes using Binary Decision Diagrams (BDDs) to represent these sets of binary patterns efficiently. BDDs are a symbolic data structure for representing boolean functions, and they can represent sets of binary vectors. Standard BDD libraries provide operations like union (set union) and existential quantification, which are crucial for building and querying the comfort zones.

The algorithm for building the monitor involves:
- Initialize empty BDDs for $\mathcal{Z}^0_c$ for all classes $c$ .
- Iterate through the training data. For each correctly classified training input belonging to class $c$ , compute its activation pattern at the chosen layer and add it to the BDD for $\mathcal{Z}^0_c$ using a BDD encoding function and the BDD union operation.
- For $\gamma > 0$ , iteratively build $\mathcal{Z}^i_c$ from $\mathcal{Z}^{i-1}_c$ . This expansion (including patterns with Hamming distance 1) can be done efficiently using BDD existential quantification. For a set $S$ represented by a BDD, $\text{bdd.exists}(j, S)$ computes the set where the $j$ -th variable can be either 0 or 1 if there exists a pattern in $S$ matching the others. Applying this for each variable $j$ and taking the union expands the set to include patterns differing by one bit from the original set.

# Pseudocode based on Algorithm 1
import bdd # Assuming a BDD library like 'dd'

def build_monitor(network, layer_to_monitor, training_set, gamma):
    num_classes = ... # Get from network
    layer_dim = ... # Get dimension of layer_to_monitor

    # Initialize BDDs for Z_0 for each class
    Z_gamma_bdd = [bdd.emptySet() for _ in range(num_classes)]

    # Build Z_0
    for input, true_label in training_set:
        # Get network output
        output_layers = network.forward_until_layer(input, layer_to_monitor)
        final_output = output_layers[-1]
        predicted_class = argmax(final_output)

        # If correctly classified
        if predicted_class == true_label:
            monitored_layer_output = output_layers[index_of_monitored_layer]
            # Compute activation pattern (binary vector)
            pattern = [1 if x > 0 else 0 for x in monitored_layer_output]
            # Encode pattern as BDD
            pattern_bdd = bdd.encode(pattern)
            # Add to Z_0 for the correct class
            Z_gamma_bdd[true_label] = bdd.or_op(Z_gamma_bdd[true_label], pattern_bdd)

    # Iteratively build Z_gamma from Z_0 using existential quantification
    for i in range(1, gamma + 1):
        Z_prev_bdd = Z_gamma_bdd # Z^(i-1)
        Z_curr_bdd = [bdd.emptySet() for _ in range(num_classes)] # Z^i

        for c in range(num_classes):
            for j in range(layer_dim): # Existential quantification over each variable j
                expanded_bdd = bdd.exists(j, Z_prev_bdd[c])
                Z_curr_bdd[c] = bdd.or_op(Z_curr_bdd[c], expanded_bdd)
        Z_gamma_bdd = Z_curr_bdd

    return Z_gamma_bdd # List of BDDs, one for each class

def monitor_runtime(network, layer_to_monitor, monitor_bdd, input):
    # Get network decision
    output_layers = network.forward_until_layer(input, layer_to_monitor)
    final_output = output_layers[-1]
    predicted_class = argmax(final_output)

    # Compute activation pattern for input
    monitored_layer_output = output_layers[index_of_monitored_layer]
    pattern = [1 if x > 0 else 0 for x in monitored_layer_output]
    pattern_bdd = bdd.encode(pattern)

    # Check if pattern is in the comfort zone of the predicted class
    is_in_comfort_zone = bdd.is_element(pattern_bdd, monitor_bdd[predicted_class])

    if not is_in_comfort_zone:
        return "Warning: Out-of-pattern for predicted class", predicted_class
    else:
        return "Pattern is within comfort zone", predicted_class

# Example usage:
# monitor = build_monitor(my_network, monitored_layer, training_data, gamma=2)
# runtime_status, prediction = monitor_runtime(my_network, monitored_layer, monitor, new_input)
# print(f"Prediction: {prediction}, Monitor Status: {runtime_status}")

Handling Large Layers: BDDs have practical limits on the number of variables they can handle (typically hundreds). For layers with more neurons, the paper suggests monitoring only a subset of "important" neurons. Importance can be determined using gradient-based sensitivity analysis, similar to saliency maps. Neurons with a large absolute gradient of the output class score with respect to the neuron's output are considered more important.
Controlling Abstraction ( $\gamma$ and Neuron Selection): The choice of $\gamma$ and the number of monitored neurons control the coarseness of the abstraction. A $\gamma$ too low might flag too many inputs as "unseen," even if they are only slightly different from training data (like $\alpha_1$ in Figure 2). A $\gamma$ too high might consider almost any pattern "seen," making the monitor useless (like $\alpha_3$ ). The paper proposes using a validation set to tune $\gamma$ (and neuron selection parameters). The goal is to find parameters where the monitor flags a reasonable percentage of validation images as "out-of-pattern," and among those flagged images, a significant percentage are actual misclassifications. This trade-off is illustrated in Table II for MNIST and GTSRB experiments, showing that increasing $\gamma$ decreases the overall rate of flagged images but increases the likelihood that a flagged image is a misclassification.

Practical Applications and Implications:

Safety-Critical Systems: The primary application is in domains like autonomous driving, where understanding the reliability of a neural network's decision is paramount. A monitor flagging an input as "out-of-pattern" can trigger fallback mechanisms or alert a human operator.
Distributional Shift Detection: A high rate of out-of-pattern warnings during deployment can indicate that the operational data distribution has shifted significantly from the training data distribution, suggesting a need for retraining or model updates.
Sensor Fusion Assistance: As envisioned by the authors, the monitor's output (whether a pattern is in the comfort zone or not) can be used as an input into a sensor fusion system or higher-level decision-making logic, indicating the confidence level in the neural network's classification for a specific input.
Alternative to Formal Verification: While formal verification methods like Reluplex (1707.01635) offer strong guarantees, they are often limited to small networks. This runtime monitoring approach is more scalable to larger networks (especially with neuron selection) and provides a practical, if not absolute, measure of "in-distribution" behavior.
Distinguishing from Adversarial Detection: Unlike ML-based adversarial detection methods (1702.06280, 1704.01155) which are statistical and can have false negatives, the BDD-based monitor, for the specified $\gamma$ and monitored neurons, provides a sound over-approximation. If it flags an input as "out-of-pattern," it is genuinely outside the defined comfort zone, offering a higher level of certainty for that specific claim.

Implementation Considerations:

BDD Library: A robust BDD library (e.g., dd in Python) is necessary. Managing BDD complexity and variable ordering can be important for performance and memory usage, although the paper suggests it scales well for hundreds of variables.
Neuron Selection: Implementing the gradient-based neuron selection requires accessing gradients within the network framework (e.g., PyTorch, TensorFlow). The choice of which class gradients to use for selection (e.g., the predicted class, all classes, or specific safety-relevant classes) is an implementation detail.
Layer Choice: The choice of which layer(s) to monitor is crucial. Close-to-output layers are preferred as they capture high-level features, but early layers might also reveal low-level distributional shifts.
Tuning $\gamma$ : The validation set tuning process requires careful consideration of the desired balance between the false positive rate (flagging inputs within the operational distribution) and the true positive rate (flagging misclassified inputs or those truly outside the distribution).
Performance Overhead: Runtime monitoring adds computational overhead. The cost of forwarding the input through the network to the monitored layer, computing the pattern, encoding it as a BDD, and querying the comfort zone BDD must be acceptable for the application's latency requirements. BDD queries are generally efficient (linear in the number of variables/monitored neurons).

The paper demonstrates the approach's feasibility on standard datasets (MNIST, GTSRB) and mentions a case paper on a front-car detection system, highlighting its potential for practical deployment in safety-critical vision systems. Future work includes extending the technique to object detection networks like YOLO and exploring richer abstract domains beyond simple binary activation patterns.

PDF Markdown

Runtime Monitoring Neuron Activation Patterns (1809.06573v2)

Summary

Related Papers