Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

ViR:the Vision Reservoir (2112.13545v2)

Published 27 Dec 2021 in cs.CV and cs.LG

Abstract: The most recent year has witnessed the success of applying the Vision Transformer (ViT) for image classification. However, there are still evidences indicating that ViT often suffers following two aspects, i) the high computation and the memory burden from applying the multiple Transformer layers for pre-training on a large-scale dataset, ii) the over-fitting when training on small datasets from scratch. To address these problems, a novel method, namely, Vision Reservoir computing (ViR), is proposed here for image classification, as a parallel to ViT. By splitting each image into a sequence of tokens with fixed length, the ViR constructs a pure reservoir with a nearly fully connected topology to replace the Transformer module in ViT. Two kinds of deep ViR models are subsequently proposed to enhance the network performance. Comparative experiments between the ViR and the ViT are carried out on several image classification benchmarks. Without any pre-training process, the ViR outperforms the ViT in terms of both model and computational complexity. Specifically, the number of parameters of the ViR is about 15% even 5% of the ViT, and the memory footprint is about 20% to 40% of the ViT. The superiority of the ViR performance is explained by Small-World characteristics, Lyapunov exponents, and memory capacity.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (10)
  1. Xian Wei (48 papers)
  2. Bin Wang (750 papers)
  3. Mingsong Chen (53 papers)
  4. Ji Yuan (1 paper)
  5. Hai Lan (11 papers)
  6. Jiehuang Shi (1 paper)
  7. Xuan Tang (25 papers)
  8. Bo Jin (57 papers)
  9. Guozhang Chen (8 papers)
  10. Dongping Yang (2 papers)
Citations (2)