Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
162 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Input Compression with Positional Consistency for Efficient Training and Inference of Transformer Neural Networks (2312.12385v1)

Published 22 Nov 2023 in cs.CV and cs.LG

Abstract: Transformers have rapidly increased in popularity in recent years, achieving state-of-the-art performance in processing text, images, audio and video. However, Transformers present large computational requirements for both training and inference, and are prone to overfitting during training. To address these challenges, we present Input Compression with Positional Consistency (ICPC), a new data augmentation method that, unlike prior augmentation techniques, simultaneously improves both generalization and training efficiency. ICPC applies varying levels of compression to each training sample in each epoch. This leads to smaller input sequences being processed by the Transformer, and hence faster training, while also alleviating overfitting by presenting each input with different compression levels. We introduce a consistency-aware position selection method in ICPC that enables accurate processing of compressed inputs without any changes to the underlying Transformer architecture. We detail compression-based augmentation methods for four different modalities -- insignificant word pruning for text, resolution modulation for images, spatio-temporal resolution modulation for videos, and spectogram size modulation for audio. ICPC also enables efficient variable-effort inference, where samples are first inferred at high compression levels, and progressively re-evaluated with lower compression for more challenging inputs. On 9 diverse tasks spanning 4 different modalities, ICPC improves accuracy by up to 1%, while also accelerating training and inference by up to 2.9X and 2.6X, respectively. Code is available at https://github.com/amrnag/ICPC.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (29)
  1. Steven Bird. NLTK: the natural language toolkit. In ACL 2006, 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference, Sydney, Australia, 17-21 July 2006. The Association for Computer Linguistics, 2006.
  2. Patch n’ pack: Navit, a vision transformer for any aspect ratio and resolution. CoRR, abs/2307.06304, 2023.
  3. Imagenet: A large-scale hierarchical image database. In 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2009), 20-25 June 2009, Miami, Florida, USA, pages 248–255. IEEE Computer Society, 2009.
  4. BERT: pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pages 4171–4186. Association for Computational Linguistics, 2019.
  5. An image is worth 16x16 words: Transformers for image recognition at scale. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net, 2021.
  6. AST: audio spectrogram transformer. CoRR, abs/2104.01778, 2021.
  7. The ”something something” video database for learning and evaluating visual common sense. In IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pages 5843–5851. IEEE Computer Society, 2017.
  8. Dynamic neural networks: A survey. IEEE Trans. Pattern Anal. Mach. Intell., 44(11):7436–7456, 2022.
  9. CONSTRUE/TIS: A system for content-based indexing of a database of news stories. In Proceedings of the The Second Conference on Innovative Applications of Artificial Intelligence (IAAI-90), Washington, DC, USA, May 1-3, 1990, pages 49–64. AAAI, 1990.
  10. Augmix: A simple data processing method to improve robustness and uncertainty. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020. OpenReview.net, 2020a.
  11. Augmix: A simple data processing method to improve robustness and uncertainty, 2020b.
  12. Scaling laws for neural language models. CoRR, abs/2001.08361, 2020.
  13. The kinetics human action video dataset. CoRR, abs/1705.06950, 2017.
  14. Efficient training of audio transformers with patchout. CoRR, abs/2110.05069, 2021.
  15. A. Krizhevsky. Learning Multiple Layers of Features from Tiny Images. Technical report, Univ. Toronto, 2009.
  16. Unmasked teacher: Towards training-efficient video foundation models. CoRR, abs/2303.16058, 2023.
  17. Roberta: A robustly optimized BERT pretraining approach. CoRR, abs/1907.11692, 2019.
  18. Specaugment: A simple data augmentation method for automatic speech recognition. In Interspeech 2019, 20th Annual Conference of the International Speech Communication Association, Graz, Austria, 15-19 September 2019, pages 2613–2617. ISCA, 2019.
  19. Karol J. Piczak. ESC: dataset for environmental sound classification. In Proceedings of the 23rd Annual ACM Conference on Multimedia Conference, MM ’15, Brisbane, Australia, October 26 - 30, 2015, pages 1015–1018. ACM, 2015.
  20. Branchynet: Fast inference via early exiting from deep neural networks. In 23rd International Conference on Pattern Recognition, ICPR 2016, Cancún, Mexico, December 4-8, 2016, pages 2464–2469. IEEE, 2016.
  21. Manifold mixup: Better representations by interpolating hidden states, 2019.
  22. GLUE: A multi-task benchmark and analysis platform for natural language understanding. In Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, pages 353–355, Brussels, Belgium, 2018. Association for Computational Linguistics.
  23. Not all images are worth 16x16 words: Dynamic transformers for efficient image recognition. In Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, virtual, pages 11960–11973, 2021.
  24. Pete Warden. Speech commands: A dataset for limited-vocabulary speech recognition. CoRR, abs/1804.03209, 2018.
  25. EDA: easy data augmentation techniques for boosting performance on text classification tasks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019, Hong Kong, China, November 3-7, 2019, pages 6381–6387. Association for Computational Linguistics, 2019.
  26. Huggingface’s transformers: State-of-the-art natural language processing. CoRR, abs/1910.03771, 2019.
  27. Segformer: Simple and efficient design for semantic segmentation with transformers. In Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, virtual, pages 12077–12090, 2021.
  28. Cutmix: Regularization strategy to train strong classifiers with localizable features. In 2019 IEEE/CVF International Conference on Computer Vision, ICCV 2019, Seoul, Korea (South), October 27 - November 2, 2019, pages 6022–6031. IEEE, 2019.
  29. mixup: Beyond empirical risk minimization. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, 2018.

Summary

We haven't generated a summary for this paper yet.