Stochastic bandits robust to adversarial corruptions (1803.09353v1)

Published 25 Mar 2018 in cs.LG, cs.DS, cs.GT, and stat.ML

Abstract: We introduce a new model of stochastic bandits with adversarial corruptions which aims to capture settings where most of the input follows a stochastic pattern but some fraction of it can be adversarially changed to trick the algorithm, e.g., click fraud, fake reviews and email spam. The goal of this model is to encourage the design of bandit algorithms that (i) work well in mixed adversarial and stochastic models, and (ii) whose performance deteriorates gracefully as we move from fully stochastic to fully adversarial models. In our model, the rewards for all arms are initially drawn from a distribution and are then altered by an adaptive adversary. We provide a simple algorithm whose performance gracefully degrades with the total corruption the adversary injected in the data, measured by the sum across rounds of the biggest alteration the adversary made in the data in that round; this total corruption is denoted by $C$. Our algorithm provides a guarantee that retains the optimal guarantee (up to a logarithmic term) if the input is stochastic and whose performance degrades linearly to the amount of corruption $C$, while crucially being agnostic to it. We also provide a lower bound showing that this linear degradation is necessary if the algorithm achieves optimal performance in the stochastic setting (the lower bound works even for a known amount of corruption, a special case in which our algorithm achieves optimal performance without the extra logarithm).

Citations (195)

View on Semantic Scholar

Summary

The paper introduces a new bandit framework that robustly blends stochastic inputs with adversarial corruptions, achieving regret that scales linearly with corruption.
It employs a multi-layer active arm elimination algorithm that dynamically adjusts performance without prior knowledge of the corruption level.
The method's resilient optimal arm identification under data manipulation offers practical insights for online systems facing adversarial interference.

Overview of "Stochastic Bandits Robust to Adversarial Corruptions"

The paper by Lykouris, Mirrokni, and Paes Leme presents a novel approach to the multi-armed bandit problem, specifically addressing challenges posed by adversarially corrupted inputs. While traditional bandit models deal with either stochastic or adversarial inputs, this work introduces a blended model where data is primarily stochastic, but some fraction can be adversarially manipulated. This scenario mirrors real-world issues such as click fraud in online advertising or manipulation in recommendation systems, where most inputs follow a pattern, but a small set is altered with malicious intent.

Core Contributions

The key contribution of this paper is the introduction of a new bandit framework that is robust to adversarial corruptions. The authors propose an algorithm, named "Multi-layer Active Arm Elimination Race," designed to handle mixed inputs and provide performance that degrades linearly with the amount of corruption C, while retaining optimal stochastic performance up to a logarithmic factor.

Main Results:

Algorithm Performance: The presented algorithm achieves a regret bound that scales with the level of corruption, denoted as C, but remains close to optimal in the purely stochastic setting.
Robustness and Agnosticism: The algorithm is agnostic to the corruption level and adjusts its performance retrospectively to the amount of corruption introduced, maintaining robust action without prior knowledge of C.
Lower Bound on Regret: The paper also includes a theoretical lower bound demonstrating that the observed linear dependence on corruption is necessary for maintaining optimal stochastic performance.

Technical Approach

The authors utilize a multi-layer structure, essentially running multiple bandit algorithms concurrently. Each layer operates at a different speed and robustness level:

Fast Layers: Quickly identify suboptimal arms under purely stochastic conditions but are vulnerable to corruption.
Slow Layers: More resilient to corruption, allowing for the identification of the best arm even when some rewards are manipulated.

This setup ensures that if corruption is present, slower layers can eventually correct misjudgments made by faster layers.

Implications and Future Work

This research advances the theoretical understanding of bandit problems in environments where data is not purely stochastic due to adversarial interference. Practically, this model is significant for applications in digital advertising, recommendation systems, and online decision-making platforms that need resiliency against manipulated data.

Future work could explore active adjustments to corruption levels using real-time feedback, providing more dynamic responses in fluctuating corruption conditions. Additionally, examining extensions to other online learning paradigms where adversarial actions are possible could further enhance the applicability of these models.

In summary, the paper successfully addresses a critical gap in existing bandit frameworks by marrying stochastic and adversarial approaches, offering robust solutions adaptable to real-world data environments with latent adversarial elements.

PDF Markdown