Papers

Topics

Authors

Recent

View all

Detailed Answer

Quick Answer

Concise responses based on abstracts only

Detailed Answer

Well-researched responses based on abstracts and relevant paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses

Gemini 2.5 Flash

Gemini 2.5 Flash 94 tok/s

Gemini 2.5 Pro 54 tok/s Pro

GPT-5 Medium 31 tok/s Pro

GPT-5 High 38 tok/s Pro

GPT-4o 102 tok/s Pro

Kimi K2 207 tok/s Pro

GPT OSS 120B 467 tok/s Pro

Claude Sonnet 4 37 tok/s Pro

2000 character limit reached

Improving the Convergence Rates of Forward Gradient Descent with Repeated Sampling (2411.17567v1)

Published 26 Nov 2024 in math.ST, cs.LG, cs.NE, and stat.TH

Abstract: Forward gradient descent (FGD) has been proposed as a biologically more plausible alternative of gradient descent as it can be computed without backward pass. Considering the linear model with $d$ parameters, previous work has found that the prediction error of FGD is, however, by a factor $d$ slower than the prediction error of stochastic gradient descent (SGD). In this paper we show that by computing $\ell$ FGD steps based on each training sample, this suboptimality factor becomes $d/(\ell \wedge d)$ and thus the suboptimality of the rate disappears if $\ell \gtrsim d.$ We also show that FGD with repeated sampling can adapt to low-dimensional structure in the input distribution. The main mathematical challenge lies in controlling the dependencies arising from the repeated sampling process.