Identify the correct logarithmic factor in the expected cumulative regret for 1D stochastic convex bandits using bisection

Ascertain the precise logarithmic dependence in the expected cumulative regret for the one-dimensional stochastic convex bandit setting when using the bisection-based algorithm, determining whether the optimal expectation scales with log log(n), log(n), or another logarithmic factor.

Background

The chapter presents a bisection-based algorithm for one-dimensional stochastic convex bandits and compares cumulative and simple regret. Prior work (Cheshire et al.) established optimal simple regret scaling with sqrt{(log log n)/n}, while the derived bound here for simple regret via cumulative regret is O(log(n)/sqrt(n)).

The author notes that, unlike simple regret where the optimal logarithmic term is known, the exact logarithmic dependence for expected cumulative regret has not been pinned down.

References

Exactly what the logarithmic dependence should be for the expected regret (cumulative rather than simple) seems to be unknown.

— Bandit Convex Optimisation (2402.06535 - Lattimore, 9 Feb 2024) in Chapter "Bisection in one dimension", Notes, item 4

Identify the correct logarithmic factor in the expected cumulative regret for 1D stochastic convex bandits using bisection

Sponsor

Background

References

Related Problems