Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
156 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Structured Sparsity Inducing Adaptive Optimizers for Deep Learning (2102.03869v2)

Published 7 Feb 2021 in cs.LG and math.OC

Abstract: The parameters of a neural network are naturally organized in groups, some of which might not contribute to its overall performance. To prune out unimportant groups of parameters, we can include some non-differentiable penalty to the objective function, and minimize it using proximal gradient methods. In this paper, we derive the weighted proximal operator, which is a necessary component of these proximal methods, of two structured sparsity inducing penalties. Moreover, they can be approximated efficiently with a numerical solver, and despite this approximation, we prove that existing convergence guarantees are preserved when these operators are integrated as part of a generic adaptive proximal method. Finally, we show that this adaptive method, together with the weighted proximal operators derived here, is indeed capable of finding solutions with structure in their sparsity patterns, on representative examples from computer vision and natural language processing.

Citations (18)

Summary

We haven't generated a summary for this paper yet.