Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Co-advise: Cross Inductive Bias Distillation (2106.12378v1)

Published 23 Jun 2021 in cs.CV and cs.LG

Abstract: Transformers recently are adapted from the community of natural language processing as a promising substitute of convolution-based neural networks for visual learning tasks. However, its supremacy degenerates given an insufficient amount of training data (e.g., ImageNet). To make it into practical utility, we propose a novel distillation-based method to train vision transformers. Unlike previous works, where merely heavy convolution-based teachers are provided, we introduce lightweight teachers with different architectural inductive biases (e.g., convolution and involution) to co-advise the student transformer. The key is that teachers with different inductive biases attain different knowledge despite that they are trained on the same dataset, and such different knowledge compounds and boosts the student's performance during distillation. Equipped with this cross inductive bias distillation method, our vision transformers (termed as CivT) outperform all previous transformers of the same architecture on ImageNet.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Sucheng Ren (33 papers)
  2. Zhengqi Gao (21 papers)
  3. Tianyu Hua (9 papers)
  4. Zihui Xue (23 papers)
  5. Yonglong Tian (32 papers)
  6. Shengfeng He (72 papers)
  7. Hang Zhao (156 papers)
Citations (46)

Summary

We haven't generated a summary for this paper yet.

Youtube Logo Streamline Icon: https://streamlinehq.com