TABOR: A Highly Accurate Approach to Inspecting and Restoring Trojan Backdoors in AI Systems (1908.01763v2)

Published 2 Aug 2019 in cs.CR and cs.AI

Abstract: A trojan backdoor is a hidden pattern typically implanted in a deep neural network. It could be activated and thus forces that infected model behaving abnormally only when an input data sample with a particular trigger present is fed to that model. As such, given a deep neural network model and clean input samples, it is very challenging to inspect and determine the existence of a trojan backdoor. Recently, researchers design and develop several pioneering solutions to address this acute problem. They demonstrate the proposed techniques have a great potential in trojan detection. However, we show that none of these existing techniques completely address the problem. On the one hand, they mostly work under an unrealistic assumption (e.g. assuming availability of the contaminated training database). On the other hand, the proposed techniques cannot accurately detect the existence of trojan backdoors, nor restore high-fidelity trojan backdoor images, especially when the triggers pertaining to the trojan vary in size, shape and position. In this work, we propose TABOR, a new trojan detection technique. Conceptually, it formalizes a trojan detection task as a non-convex optimization problem, and the detection of a trojan backdoor as the task of resolving the optimization through an objective function. Different from the existing technique also modeling trojan detection as an optimization problem, TABOR designs a new objective function--under the guidance of explainable AI techniques as well as heuristics--that could guide optimization to identify a trojan backdoor in a more effective fashion. In addition, TABOR defines a new metric to measure the quality of a trojan backdoor identified. Using an anomaly detection method, we show the new metric could better facilitate TABOR to identify intentionally injected triggers in an infected model and filter out false alarms......

Authors (5)

Wenbo Guo (40 papers)
Lun Wang (33 papers)
Xinyu Xing (34 papers)
Min Du (46 papers)
Dawn Song (229 papers)

Citations (216)

View on Semantic Scholar

Summary

Overview of "TABOR: A Highly Accurate Approach to Inspecting and Restoring Trojan Backdoors in AI Systems"

The paper entitled "TABOR: A Highly Accurate Approach to Inspecting and Restoring Trojan Backdoors in AI Systems" addresses the critical issue of Trojan backdoors in deep neural networks (DNNs). These backdoors are hidden patterns that can be activated by specific triggers, causing the DNN to behave incorrectly when processing these compromised inputs. The detection and mitigation of such vulnerabilities is challenging, especially when one cannot assume access to contaminated training data. The paper critiques existing solutions for their unrealistic assumptions and limited efficacy, particularly when faced with varying trigger characteristics.

The authors propose a method called TABOR, which formalizes Trojan detection as a non-convex optimization problem. It uniquely incorporates explainable AI techniques to improve the identification and restoration of Trojan backdoors. Unlike previous methods, TABOR introduces a new objective function and metric to achieve higher fidelity in detecting backdoors, even if triggers vary in size, shape, and position. The paper argues that TABOR surpasses the capabilities of existing techniques like Neural Cleanse by more accurately detecting backdoors and restoring high-fidelity trigger images.

Key Contributions

Critique of Existing Techniques: The authors demonstrate that previous approaches, notably Neural Cleanse, are inadequate under scenarios where the Trojan trigger significantly varies. They highlight the unrealistic assumption that contaminated training data is available and challenge previous metrics for defining trigger quality.
Novel Objective Function and Metric: TABOR introduces a new objective function for the optimization problem of Trojan detection, utilizing insights from explainable AI. Additionally, it defines a new metric that better distinguishes false positives from true Trojan backdoors, thus enhancing trigger restoration fidelity.
Empirical Evaluation: Through extensive evaluation across multiple datasets and scenarios, TABOR is shown to outperform the state-of-the-art in terms of detection accuracy and robustness, providing strong numerical evidence. It effectively addresses challenges with varying trigger sizes, shapes, and locations, which previously confounded other methods.

Theoretical and Practical Implications

From a theoretical standpoint, the approach suggests a new direction in handling adversarial attacks by integrating explainable AI to directly influence model optimization objectives. This has potential implications for broader adversarial defense strategies beyond just Trojan backdoors. Practically, TABOR provides a mechanism for security analysts to better inspect AI systems for backdoors, thereby increasing the reliability and trustworthiness of deployed AI models in sensitive environments.

Future Directions

TABOR's introduction opens several avenues for future research. One potential direction is exploring how TABOR’s methodology can be adapted for other types of adversarial attacks. Another avenue could involve enhancing automated patching techniques that utilize high-fidelity restored triggers, thus minimizing human intervention. Furthermore, the integration of such techniques in real-world applications where dynamic and adaptive triggers might be employed by adversaries will be crucial in moving these methods from theoretical models to practical tools.

In conclusion, the paper delivers an insightful and rigorous exploration of Trojan backdoors in AI systems, providing a significant step forward with TABOR in both understanding and mitigating the risks associated with such vulnerabilities.

PDF Markdown

Related Papers

Find Related Papers