Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Online Hyperparameter Meta-Learning with Hypergradient Distillation (2110.02508v2)

Published 6 Oct 2021 in cs.LG

Abstract: Many gradient-based meta-learning methods assume a set of parameters that do not participate in inner-optimization, which can be considered as hyperparameters. Although such hyperparameters can be optimized using the existing gradient-based hyperparameter optimization (HO) methods, they suffer from the following issues. Unrolled differentiation methods do not scale well to high-dimensional hyperparameters or horizon length, Implicit Function Theorem (IFT) based methods are restrictive for online optimization, and short horizon approximations suffer from short horizon bias. In this work, we propose a novel HO method that can overcome these limitations, by approximating the second-order term with knowledge distillation. Specifically, we parameterize a single Jacobian-vector product (JVP) for each HO step and minimize the distance from the true second-order term. Our method allows online optimization and also is scalable to the hyperparameter dimension and the horizon length. We demonstrate the effectiveness of our method on two different meta-learning methods and three benchmark datasets.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Hae Beom Lee (19 papers)
  2. Hayeon Lee (14 papers)
  3. Eunho Yang (89 papers)
  4. Timothy Hospedales (101 papers)
  5. Sung Ju Hwang (178 papers)
  6. JaeWoong Shin (6 papers)
Citations (1)

Summary

We haven't generated a summary for this paper yet.