Bandit-Feedback Online Multiclass Classification: Variants and Tradeoffs (2402.07453v1)

Published 12 Feb 2024 in cs.LG and stat.ML

Abstract: Consider the domain of multiclass classification within the adversarial online setting. What is the price of relying on bandit feedback as opposed to full information? To what extent can an adaptive adversary amplify the loss compared to an oblivious one? To what extent can a randomized learner reduce the loss compared to a deterministic one? We study these questions in the mistake bound model and provide nearly tight answers. We demonstrate that the optimal mistake bound under bandit feedback is at most $O(k)$ times higher than the optimal mistake bound in the full information case, where $k$ represents the number of labels. This bound is tight and provides an answer to an open question previously posed and studied by Daniely and Helbertal ['13] and by Long ['17, '20], who focused on deterministic learners. Moreover, we present nearly optimal bounds of $\tilde{\Theta}(k)$ on the gap between randomized and deterministic learners, as well as between adaptive and oblivious adversaries in the bandit feedback setting. This stands in contrast to the full information scenario, where adaptive and oblivious adversaries are equivalent, and the gap in mistake bounds between randomized and deterministic learners is a constant multiplicative factor of $2$. In addition, our results imply that in some cases the optimal randomized mistake bound is approximately the square-root of its deterministic parallel. Previous results show that this is essentially the smallest it can get.

Citations (2)

View on Semantic Scholar

Summary

The paper establishes nearly tight mistake bounds for bandit feedback, demonstrating that randomized learners outperform deterministic ones in adversarial online multiclass settings.
The study finds that adaptive adversaries can increase mistake bounds by a factor of Ω(k), significantly challenging online learning models.
By extending the analysis to agnostic settings without prior inconsistency budget knowledge, the research broadens the practical applicability of online multiclass classifiers.

Bandit Feedback in Online Multiclass Classification: Analyzing Variants and Trade-offs

Randomized vs. Deterministic Learners and Adaptive vs. Oblivious Adversaries

In the field of online multiclass classification under the adversarial setting, key questions revolve around the performance differentials between randomized and deterministic learners, as well as between adaptive and oblivious adversaries, especially when only bandit feedback is available. This paper embarks on a comprehensive exploration of these dimensions, delineating nearly tight bounds that illuminate the inherent trade-offs and dynamics at play.

Key Findings

Deterministic vs. Randomized Learners: Our investigation reveals that the optimal mistake bound for bandit feedback stands at most O(k) times higher than its full-information feedback counterpart, aligning with previously conjectured bounds. This result not only settles an open question but underscores the comparatively higher resilience of randomized strategies against adversarial inputs in the bandit feedback model.
Adaptive vs. Oblivious Adversaries: The comparison between adaptive and oblivious adversaries in the bandit feedback scenario unveils a stark contrast: while oblivious adversaries exhibit restrained impact, adaptive adversaries can substantially escalate the learner's mistake bounds. Specifically, we establish that against an adaptive adversary, the mistake bound can increase by a factor of Ω(k), emphasizing the heightened challenge adaptive adversaries pose.
The Role of Randomness: Interestingly, the examination into the efficacy of randomized versus deterministic learners underlines a nuanced dependence on randomness. In particular, for certain classes, we observe that randomness allows for bounds that are significantly tighter compared to deterministic approaches, especially when the adversary is adaptive. This divergence highlights randomness as a pivotal factor in mitigating adversarial influence.
Implications in Agnostic and Realizable Settings: Extending our analysis to the agnostic setting—where a perfect hypothesis may not exist—we demonstrate that our results hold under the more general r-realizable assumption. Moreover, we introduce a method to circumvent the need for prior knowledge of the inconsistency budget r, further broadening the applicability of our findings.

Future Directions

This work opens up several avenues for future investigation. A noteworthy direction entails extending our analysis to the multilabel setting, potentially uncovering richer dynamics when multiple correct labels are permissible. Additionally, exploring alternative feedback models, like comparison feedback, could yield further insights into the landscape of online learning under partial information.

In summary, our paper provides a clearer understanding of the trade-offs between different learners and adversaries in online multiclass classification with bandit feedback. The nearly tight bounds we present not only resolve long-standing open questions but also pave the way for more nuanced explorations of adversarial online learning.

PDF Markdown