Gradient Descent is Pareto-Optimal in the Oracle Complexity and Memory Tradeoff for Feasibility Problems (2404.06720v1)

Published 10 Apr 2024 in math.OC, cs.CC, cs.DS, cs.LG, and stat.ML

Abstract: In this paper we provide oracle complexity lower bounds for finding a point in a given set using a memory-constrained algorithm that has access to a separation oracle. We assume that the set is contained within the unit $d$-dimensional ball and contains a ball of known radius $\epsilon>0$. This setup is commonly referred to as the feasibility problem. We show that to solve feasibility problems with accuracy $\epsilon \geq e^{{-d^{o(1)}}$,} any deterministic algorithm either uses $d^{1+\delta}$ bits of memory or must make at least $1/(d^{0.01\delta }\epsilon^{{2\frac{1-\delta}{1+1.01} \delta}-o(1)})$ oracle queries, for any $\delta\in[0,1]$. Additionally, we show that randomized algorithms either use $d^{1+\delta}$ memory or make at least $1/(d^{2\delta} \epsilon^{{2(1-4\delta)-o(1)})$} queries for any $\delta\in[0,\frac{1}{4}]$. Because gradient descent only uses linear memory $\mathcal O(d\ln 1/\epsilon)$ but makes $\Omega(1/\epsilon^2)$ queries, our results imply that it is Pareto-optimal in the oracle complexity/memory tradeoff. Further, our results show that the oracle complexity for deterministic algorithms is always polynomial in $1/\epsilon$ if the algorithm has less than quadratic memory in $d$. This reveals a sharp phase transition since with quadratic $\mathcal O(d² \ln1/\epsilon)$ memory, cutting plane methods only require $\mathcal O(d\ln 1/\epsilon)$ queries.

References (63)

Citations (1)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Tweets

https://twitter.com/StatMLPapers/status/1778272626173006031

YouTube

Show All Videos

Gradient Descent is Pareto-Optimal in the Oracle Complexity and Memory Tradeoff for Feasibility Problems (2404.06720v1)

Summary

Related Papers

Tweets

YouTube