Online Hyperparameter Meta-Learning with Hypergradient Distillation (2110.02508v2)

Published 6 Oct 2021 in cs.LG

Abstract: Many gradient-based meta-learning methods assume a set of parameters that do not participate in inner-optimization, which can be considered as hyperparameters. Although such hyperparameters can be optimized using the existing gradient-based hyperparameter optimization (HO) methods, they suffer from the following issues. Unrolled differentiation methods do not scale well to high-dimensional hyperparameters or horizon length, Implicit Function Theorem (IFT) based methods are restrictive for online optimization, and short horizon approximations suffer from short horizon bias. In this work, we propose a novel HO method that can overcome these limitations, by approximating the second-order term with knowledge distillation. Specifically, we parameterize a single Jacobian-vector product (JVP) for each HO step and minimize the distance from the true second-order term. Our method allows online optimization and also is scalable to the hyperparameter dimension and the horizon length. We demonstrate the effectiveness of our method on two different meta-learning methods and three benchmark datasets.

Authors (6)

Hae Beom Lee (19 papers)
Hayeon Lee (14 papers)
Eunho Yang (89 papers)
Timothy Hospedales (101 papers)
Sung Ju Hwang (178 papers)
JaeWoong Shin (6 papers)

Citations (1)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Online Hyperparameter Meta-Learning with Hypergradient Distillation (2110.02508v2)

Summary

Related Papers