An Optimal Algorithm for Bandit and Zero-Order Convex Optimization with Two-Point Feedback (1507.08752v1)

Published 31 Jul 2015 in cs.LG, math.OC, and stat.ML

Abstract: We consider the closely related problems of bandit convex optimization with two-point feedback, and zero-order stochastic convex optimization with two function evaluations per round. We provide a simple algorithm and analysis which is optimal for convex Lipschitz functions. This improves on \cite{dujww13}, which only provides an optimal result for smooth functions; Moreover, the algorithm and analysis are simpler, and readily extend to non-Euclidean problems. The algorithm is based on a small but surprisingly powerful modification of the gradient estimator.

Citations (242)

View on Semantic Scholar

Summary

The paper introduces an optimal algorithm for two-point feedback in bandit convex optimization that achieves minimal regret bounds for convex Lipschitz functions.
It employs a novel gradient estimator that reduces variance without complex smoothing and improves performance for both smooth and non-smooth functions.
The analysis extends to both Euclidean and non-Euclidean settings, paving the way for more efficient practical applications and further theoretical advancements.

An Optimal Algorithm for Bandit and Zero-Order Convex Optimization with Two-Point Feedback

The paper by Ohad Shamir presents a noteworthy advancement in the domain of convex optimization, specifically targeting the problems of bandit convex optimization and zero-order stochastic convex optimization, where feedback is limited to two points per iteration. The central contribution is an optimal algorithm for convex Lipschitz functions, improving upon prior work that only covered smooth functions and doing so with a simpler algorithm and analysis.

Problem Framework

Bandit convex optimization is defined as a sequential game against an adversary, in which the learner can only access the losses at two chosen points per round, without knowledge of the adversary's function beforehand. The learner's objective is to minimize the average regret compared to the best fixed decision in hindsight. This setting parallels zero-order stochastic convex optimization, where direct gradient information is inaccessible, and function evaluations are restricted to two points in each round. The paper focuses on determining regret bounds for convex Lipschitz functions, addressing both Euclidean and non-Euclidean domains.

Algorithmic Innovation

The primary innovation is a straightforward and effective modification of the gradient estimator that maintains optimal regret bounds, even for non-smooth functions. This differentiates from previous approaches, such as the one by \cite{dujww13}, which necessitated additional smoothing techniques with complex analysis. The paper proposes using an estimator that queries slightly differently than typical methods, reducing variance without complicating the theoretical analysis.

Theoretical Results

Key theoretical results are encapsulated in the performance guarantees of the proposed algorithm. For the Euclidean setting, the algorithm achieves optimality up to constant factors for both smooth and non-smooth functions, bridging the performance gap identified in previous literature. Importantly, the analysis extends to non-Euclidean settings, offering results optimal up to a logarithmic factor in certain cases, such as the $1$-norm domain.

Implications and Future Work

The conclusions drawn have significant implications for practical applications of convex optimization where gradient information is limited or computationally expensive. By streamlining the gradient estimation process and enhancing adaptivity to non-smooth contexts, this work facilitates more efficient optimization in real-world systems where such constraints are prevalent.

Theoretically, the paper opens avenues for extending the framework to more complex settings, including those with multiple observations per round or larger classes of convex functions, such as strongly-convex functions. Future work might also explore high-probability bounds, further refining the robustness of the proposed approach.

In summary, Shamir's work alleviates several previous barriers in two-point feedback optimization, providing a tool that is both applicable in more general settings and beneficial for deeper theoretical exploration within the field of convex optimization.

PDF Markdown