Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Hydra: Preserving Ensemble Diversity for Model Distillation (2001.04694v2)

Published 14 Jan 2020 in cs.LG and stat.ML

Abstract: Ensembles of models have been empirically shown to improve predictive performance and to yield robust measures of uncertainty. However, they are expensive in computation and memory. Therefore, recent research has focused on distilling ensembles into a single compact model, reducing the computational and memory burden of the ensemble while trying to preserve its predictive behavior. Most existing distillation formulations summarize the ensemble by capturing its average predictions. As a result, the diversity of the ensemble predictions, stemming from each member, is lost. Thus, the distilled model cannot provide a measure of uncertainty comparable to that of the original ensemble. To retain more faithfully the diversity of the ensemble, we propose a distillation method based on a single multi-headed neural network, which we refer to as Hydra. The shared body network learns a joint feature representation that enables each head to capture the predictive behavior of each ensemble member. We demonstrate that with a slight increase in parameter count, Hydra improves distillation performance on classification and regression settings while capturing the uncertainty behavior of the original ensemble over both in-domain and out-of-distribution tasks.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (10)
  1. Linh Tran (30 papers)
  2. Bastiaan S. Veeling (15 papers)
  3. Kevin Roth (12 papers)
  4. Joshua V. Dillon (23 papers)
  5. Jasper Snoek (42 papers)
  6. Stephan Mandt (100 papers)
  7. Tim Salimans (46 papers)
  8. Sebastian Nowozin (45 papers)
  9. Rodolphe Jenatton (41 papers)
  10. Jakub Swiatkowski (4 papers)
Citations (54)

Summary

We haven't generated a summary for this paper yet.