Neural Trojans (1710.00942v1)

Published 3 Oct 2017 in cs.CR

Abstract: While neural networks demonstrate stronger capabilities in pattern recognition nowadays, they are also becoming larger and deeper. As a result, the effort needed to train a network also increases dramatically. In many cases, it is more practical to use a neural network intellectual property (IP) that an IP vendor has already trained. As we do not know about the training process, there can be security threats in the neural IP: the IP vendor (attacker) may embed hidden malicious functionality, i.e. neural Trojans, into the neural IP. We show that this is an effective attack and provide three mitigation techniques: input anomaly detection, re-training, and input preprocessing. All the techniques are proven effective. The input anomaly detection approach is able to detect 99.8% of Trojan triggers although with 12.2% false positive. The re-training approach is able to prevent 94.1% of Trojan triggers from triggering the Trojan although it requires that the neural IP be reconfigurable. In the input preprocessing approach, 90.2% of Trojan triggers are rendered ineffective and no assumption about the neural IP is needed.

Citations (327)

View on Semantic Scholar

Summary

The paper identifies neural Trojans as covert backdoors embedded in pre-trained models, highlighting supply-chain vulnerabilities.
It introduces robust defense mechanisms including anomaly detection with SVMs and decision trees, re-training with legitimate data, and autoencoder input preprocessing.
Experimental validations on the MNIST dataset demonstrate detection accuracy up to 99.8% and significant reductions in Trojan activation rates.

Analysis of "Neural Trojans"

In the presented paper, the authors introduce a security challenge associated with artificial neural networks, termed as "neural Trojans". This form of cybersecurity threat emerges when neurocomputational models are obtained from potentially untrusted third-party vendors. The central premise is that malicious entities can embed hidden functionalities or backdoors within the neural network, which can be triggered under specific circumstances, resulting in unauthorized behavior outputs. The authors explore various methodologies to mitigate these neural vulnerabilities, including anomaly detection, network re-training, and input preprocessing.

Core Contributions

The paper makes the following key contributions:

Identification of Neural Trojans: The authors provide a detailed exposition of neural Trojans as intentional, concealed functionalities embedded within neural IPs by potentially malicious vendors. This undermines the typical assumption of integrity and trustworthiness in pre-trained models, especially those sourced externally.
Proposed Defense Mechanisms:
- Input Anomaly Detection: Utilizes support vector machines (SVMs) and decision trees (DTs) to detect anomalous inputs that deviate from legitimate data distributions. DTs are highlighted for their effectiveness in detecting 99.8% of illegitimate inputs, albeit with a 12.2% false positive rate.
- Re-training: Involves continual training of the neural IP using only legitimate data, aiming to overwrite Trojan triggers. A substantial reduction in Trojan activation rates to as low as 6% is reported post re-training.
- Input Preprocessing with Autoencoders: By placing an autoencoder as an intermediary, this method restructures inputs to suppress Trojan activities effectively. The approach demonstrated a 90.2% efficacy in neutralizing Trojan triggers, with minimal impact on the accuracy of legitimate data classification.

Experimental Validation

A significant portion of the paper is devoted to experiments validating the proposed defenses. They utilize the MNIST dataset to simulate the presence of neural Trojans and evaluate the effectiveness of each proposed defensive strategy. The experimental results corroborate the hypothesis that neural Trojans can be effectively triggered with negligible impact on legitimate functionalities, alerting the need for robust detection and mitigation techniques.

Implications and Future Directions

The research contributes a novel perspective on neural network security, postulating that pre-trained models carry inherent risks of tampered integrity. It extends conventional threat models to include the supplier’s possible malicious intents, thus broadening the agenda for neural network security to consider supply-chain vulnerabilities.

Practically, the outlined defense mechanisms provide a blueprint for securing neural IPs from malicious backdoors. However, challenges persist in balancing defense rigor with functional accuracy—as observed with the false positives in anomaly detection and classification accuracy degradation in re-training.

Theoretically, the concept of neural Trojans aligns with advancements in adversarial learning research but emphasizes the underestimated risk of supply-chain attacks. This establishes a foundational basis for future exploration into resilient AI systems that are robust against both exposed and concealed adversarial threats.

In conclusion, while the research offers an initial framework to tackle neural Trojans, it is imperative for ongoing studies to refine these defenses, potentially leveraging more advanced techniques in anomaly detection and adversarial training to minimize overhead while maximizing security. As AI continues to integrate deeper into critical applications, ensuring the security and integrity of neural models will inevitably become paramount.

PDF Markdown