Certified Robustness of Nearest Neighbors against Data Poisoning and Backdoor Attacks (2012.03765v3)

Published 7 Dec 2020 in cs.CR and cs.LG

Abstract: Data poisoning attacks and backdoor attacks aim to corrupt a machine learning classifier via modifying, adding, and/or removing some carefully selected training examples, such that the corrupted classifier makes incorrect predictions as the attacker desires. The key idea of state-of-the-art certified defenses against data poisoning attacks and backdoor attacks is to create a majority vote mechanism to predict the label of a testing example. Moreover, each voter is a base classifier trained on a subset of the training dataset. Classical simple learning algorithms such as k nearest neighbors (kNN) and radius nearest neighbors (rNN) have intrinsic majority vote mechanisms. In this work, we show that the intrinsic majority vote mechanisms in kNN and rNN already provide certified robustness guarantees against data poisoning attacks and backdoor attacks. Moreover, our evaluation results on MNIST and CIFAR10 show that the intrinsic certified robustness guarantees of kNN and rNN outperform those provided by state-of-the-art certified defenses. Our results serve as standard baselines for future certified defenses against data poisoning attacks and backdoor attacks.

Authors (4)

Jinyuan Jia (69 papers)
Yupei Liu (12 papers)
Xiaoyu Cao (32 papers)
Neil Zhenqiang Gong (117 papers)

Citations (67)

View on Semantic Scholar

Summary

Certified Robustness of Nearest Neighbors against Data Poisoning and Backdoor Attacks

This paper addresses the pressing issue of enhancing machine learning classifiers' resilience against data poisoning and backdoor attacks. Data poisoning attacks attempt to manipulate a classifier by altering the training dataset, which leads the classifier to produce incorrect outputs. Similarly, backdoor attacks compromise the classifier such that it behaves correctly on clean inputs but misbehaves on inputs with specific triggers.

The authors explore leveraging classical learning algorithms, specifically $k$ nearest neighbors (kNN) and radius nearest neighbors (rNN), which inherently employ a majority voting mechanism during the classification process. They argue that this inherent feature of kNN and rNN can intrinsically offer robust defenses without requiring additional mechanisms proposed by state-of-the-art certified defenses.

Their primary contribution is the theoretical establishment and empirical validation that both kNN and rNN, through their natural majority voting strategies, provide certified robustness guarantees against both attack types. This claim is substantiated by their evaluation on MNIST and CIFAR10 datasets, where certified robustness of kNN and rNN surpasses that of contemporary certified defenses. Evaluations indicate, for instance, that when 1,000 training examples are poisoned on MNIST, rNN with an appropriate radius can achieve 22.9% to 40.8% higher certified accuracy compared to existing methods.

Several theoretical advancements are presented in the paper:

Derivation of intrinsic certified robustness guarantees for kNN and rNN against both data poisoning and backdoor attacks.
Introduction of joint certification for multiple testing examples with rNN, enhancing the robustness certification by aggregating predictive consistency across groups of samples.
Practical realization of these methodological innovations using the MNIST and CIFAR10 datasets, showing robust performance improvements.

An interesting facet of their approach is in addressing the limitations of other certifiably robust algorithms, which often result in multiple compromised voters per poisoned example. In contrast, both kNN and rNN only associate a single compromised voter per manipulated example. This distinction allows these models to tolerate a higher number of such instances, given the same voting gap between the leading labels. Furthermore, rNN allows for joint certification, an advantage not present in other algorithms.

The practical implications are notable, as this research offers a significant step toward more inherently secure machine learning techniques, limiting the impact of adversarial training tactics in sensitive areas like cybersecurity, healthcare, and autonomous systems. The theoretical insights further open pathways for refining robustness in other non-parametric methods, promoting a broader conversation on integrating classical learning methods enhanced by modern robustness frameworks.

For future advancements, more refined distance metrics could be explored to further bolster the certified accuracy of kNN and rNN. Additionally, application in more complex domains or integration with other forms of self-supervised learning offer promising directions to ensure that certified robustness is maintained without compromising on accuracy. The findings provoke reconsideration of where kNN and rNN can be effectively leveraged, integrating robust defenses directly within simple yet powerful algorithms.

PDF Markdown

Related Papers

YouTube

Show All Videos