Backdoor Attacks to Graph Neural Networks (2006.11165v4)

Published 19 Jun 2020 in cs.CR and cs.LG

Abstract: In this work, we propose the first backdoor attack to graph neural networks (GNN). Specifically, we propose a \emph{subgraph based backdoor attack} to GNN for graph classification. In our backdoor attack, a GNN classifier predicts an attacker-chosen target label for a testing graph once a predefined subgraph is injected to the testing graph. Our empirical results on three real-world graph datasets show that our backdoor attacks are effective with a small impact on a GNN's prediction accuracy for clean testing graphs. Moreover, we generalize a randomized smoothing based certified defense to defend against our backdoor attacks. Our empirical results show that the defense is effective in some cases but ineffective in other cases, highlighting the needs of new defenses for our backdoor attacks.

Citations (187)

View on Semantic Scholar

Summary

The paper introduces a backdoor attack for GNNs, employing fixed subgraph triggers to consistently mislead classification outcomes.
It details an Erdos-Renyi based trigger design that achieves up to a 90% attack success rate with minimal impact on unperturbed accuracy.
The study evaluates a certified defense using randomized subsampling, revealing limitations when countering larger triggers.

Backdoor Attacks to Graph Neural Networks: A Critical Examination

Graph Neural Networks (GNNs) have become indispensable in processing graph-structured data due to their superior capability in graph classification tasks. However, their deployment in security-sensitive domains, such as fraud detection and malware identification, makes them attractive targets for adversarial attacks. This paper introduces a novel class of threats—backdoor attacks—targeted at GNNs, expanding on the relatively unexplored terrain of graph-based adversarial manipulation.

Core Contributions and Methodology

The paper breaks new ground by defining and implementing backdoor attacks specifically tailored for GNNs, unlike previous research predominantly focused on adversarial examples for node classification. In contrast to node attacks that require tailored perturbations for different inputs, these backdoor attacks employ consistent, fixed triggers to subvert classification outcomes across multiple test instances.

Subgraph-Based Attack Mechanism

This research postulates the insertion of a subgraph, termed as the "trigger," into graph inputs such that any GNN exposed to this modification outputs a target label dictated by the attacker. The attack is particularly insidious as it can be executed without precise knowledge of the target GNN's architecture. The subgraph trigger is designed by controlling parameters such as trigger size, density, synthesis method, and poisoning intensity. Notably, the Erdos-Renyi (ER) model is predominantly utilized to generate random subgraph triggers that remain distinctive among graph data.

The empirical evaluation spans diverse real-world datasets such as Bitcoin transaction graphs, Twitter social networks, and scientific collaboration networks (COLLAB), indicating that these backdoor attacks achieve significant success rates with minimal degradation of accuracy on unperturbed data. Notably, on datasets like Twitter, a well-executed attack maintains a 0.03 accuracy drop while achieving a 90% success rate.

Analysis and Defense Strategies

The research further explores defenses, concentrating on certified defense mechanisms using randomized subsampling—a variance of randomized smoothing adapted for binary graph data. This technique builds a smoothed classifier that can certify robustness against small-scale perturbations. The defense method aims to ensure that the injected trigger does not alter the main prediction by certifying the model to be robust to changes smaller than a calculated threshold.

Despite the promise of randomized subsampling, empirical evaluations highlight scenarios of limited efficacy, particularly when the trigger size grows. For instance, on certain datasets like Twitter, the certified defense could not significantly lower the attack success rate when larger subgraphs were used, illuminating a need for advanced and possibly novel defense techniques against graph backdoor attacks.

Implications and Future Research Directions

The implications of this research are multifold, affecting both practical implementations and theoretical formulations. GNNs, despite their analytical power, emerge as susceptible to structured adversarial modifications, suggesting critical vulnerabilities in existing security setups. This revelation underscores the importance of developing robust and diverse defenses that transcend traditional adversarial strategies tailored for image data.

Future research could broaden the exploration in several directions: devising more effective detection algorithms that leverage graph-specific features, enhancing certified defenses with greater resilience to trigger variability, and extending the framework to different types of deep learning models processing graph data.

This paper serves as a call to action for the research community, prompting a comprehensive examination of GNN architectures under adversarial settings and advocating for a secure evolution of machine learning paradigms in graph-related applications. The balance between enhancing model capabilities and fortifying them against threats continues to be a pivotal axis in the advancement of artificial intelligence technologies.

PDF Markdown