Backdoor Attacks on Self-Supervised Learning (2105.10123v3)

Published 21 May 2021 in cs.CV

Abstract: Large-scale unlabeled data has spurred recent progress in self-supervised learning methods that learn rich visual representations. State-of-the-art self-supervised methods for learning representations from images (e.g., MoCo, BYOL, MSF) use an inductive bias that random augmentations (e.g., random crops) of an image should produce similar embeddings. We show that such methods are vulnerable to backdoor attacks - where an attacker poisons a small part of the unlabeled data by adding a trigger (image patch chosen by the attacker) to the images. The model performance is good on clean test images, but the attacker can manipulate the decision of the model by showing the trigger at test time. Backdoor attacks have been studied extensively in supervised learning and to the best of our knowledge, we are the first to study them for self-supervised learning. Backdoor attacks are more practical in self-supervised learning, since the use of large unlabeled data makes data inspection to remove poisons prohibitive. We show that in our targeted attack, the attacker can produce many false positives for the target category by using the trigger at test time. We also propose a defense method based on knowledge distillation that succeeds in neutralizing the attack. Our code is available here: https://github.com/UMBCvision/SSL-Backdoor .

View on arXiv

Authors (4)

Aniruddha Saha (19 papers)
Ajinkya Tejankar (12 papers)
Soroush Abbasi Koohpayegani (17 papers)
Hamed Pirsiavash (50 papers)

Citations (91)

View on Semantic Scholar

Summary

Analyzing Vulnerabilities in Self-Supervised Learning: Backdoor Attacks

The paper "Backdoor Attacks on Self-Supervised Learning" by Saha et al. addresses a critical security challenge in the domain of self-supervised learning (SSL), namely, the susceptibility of popular SSL methods to backdoor attacks. As SSL gains traction as a prominent approach to learning effective visual representations from vast amounts of unlabeled data, understanding the potential security flaws becomes imperative for the deployment of these models in sensitive real-world applications.

Backdoor Attacks in SSL Context

Backdoor attacks are a form of data poisoning where the adversary introduces a hidden trigger into the training dataset. This trigger is typically an image patch, and a model trained on this tampered data will perform normally on clean data but can be manipulated to perform unexpected classifications when the trigger is present. This trait poses a significant risk for applications where SSL models might be integrated into safety-critical systems, such as autonomous driving.

The paper systematically analyzes this vulnerability, contrasting the characteristics of SSL with those in supervised learning where backdoor vulnerabilities are already a known issue. The authors emphasize that unlike supervised learning, the sheer scale of unlabeled data in SSL makes detecting and purging poisoned inputs a daunting task, hence posing an even greater risk.

Experimental Methodology and Insights

The researchers conducted extensive experiments using popular SSL methods such as MoCo v2, BYOL, and MSF, alongside older methods like Jigsaw and RotNet. They introduced triggers into a small subset (0.5%) of the training data from ImageNet-100 and examined how these methods were affected. Remarkably, the results indicate that exemplar-based SSL methods are substantially prone to backdoor attacks, evident from the disproportionately high number of false positives when tested with patched data. Interestingly, alternative methods like Jigsaw and RotNet showed far less susceptibility to these backdoors.

For a holistic assessment, the paper extends the scope of its experiments across different datasets, exploring both targeted and untargeted attack models. They analyzed the impact of various poisoning rates and drew insights into how the induction biases of SSL algorithms might be leveraged to create successful backdoor attacks without requiring full knowledge of the model architecture or training parameters.

Proposed Defense Mechanism

The team offered a defense strategy utilizing knowledge distillation. This defense involves transferring knowledge from a potentially backdoored model to a student model using a clean dataset, which can effectively neutralize the impact of the backdoor attack. Although less susceptible, the efficacy of this defense relies heavily on access to a proportion of clean data which could be a limitation in certain scenarios.

Broader Implications and Future Directions

As SSL continues to show approximable feature potential akin to supervised methods, its reliance on large, often publicly sourced datasets introduces several security vulnerabilities that need to be addressed. This paper contributes to raising awareness of such threats and highlights the necessity for improved model robustness against adversarial manipulation.

The findings underscore the importance of ongoing research into more robust SSL architectures that can inherently withstand such cyber threats, without compromising on the dataset scale and representation quality. Future avenues could explore enhancements in data inspection procedures or novel algorithmic constructs to mitigate such risks even further, ensuring that SSL's promise can be fully realized in production-level applications without substantial compromises on security.

In conclusion, this paper sheds light on an underexplored but crucial threat vector in SSL, prompting the community to align research efforts towards safeguarding these intelligent systems from adversarial influences.

PDF Markdown

Related Papers

Find Related Papers

GitHub

GitHub - UMBCvision/SSL-Backdoor: Official implementation of the CVPR 2022 paper "Backdoor Attacks on Self-Supervised Learning". (70 stars)