Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
133 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

SwiftLearn: A Data-Efficient Training Method of Deep Learning Models using Importance Sampling (2311.15134v1)

Published 25 Nov 2023 in cs.LG and cs.AI

Abstract: In this paper, we present SwiftLearn, a data-efficient approach to accelerate training of deep learning models using a subset of data samples selected during the warm-up stages of training. This subset is selected based on an importance criteria measured over the entire dataset during warm-up stages, aiming to preserve the model performance with fewer examples during the rest of training. The importance measure we propose could be updated during training every once in a while, to make sure that all of the data samples have a chance to return to the training loop if they show a higher importance. The model architecture is unchanged but since the number of data samples controls the number of forward and backward passes during training, we can reduce the training time by reducing the number of training samples used in each epoch of training. Experimental results on a variety of CV and NLP models during both pretraining and finetuning show that the model performance could be preserved while achieving a significant speed-up during training. More specifically, BERT finetuning on GLUE benchmark shows that almost 90% of the data can be dropped achieving an end-to-end average speedup of 3.36x while keeping the average accuracy drop less than 0.92%.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (41)
  1. Speeding up resnet architecture with layers targeted low rank decomposition.
  2. Mitigating dataset bias by using per-sample gradient.
  3. Improving resnet-9 generalization trained on small datasets. arXiv preprint arXiv:2309.03965.
  4. Findings of the 2016 conference on machine translation. In Proceedings of the First Conference on Machine Translation, pages 131–198, Berlin, Germany. Association for Computational Linguistics.
  5. Language models are few-shot learners.
  6. Selection via proxy: Efficient data selection for deep learning.
  7. A data-driven measure of relative uncertainty for misclassification detection.
  8. Flashattention: Fast and memory-efficient exact attention with io-awareness.
  9. Fast and accurate importance weighting for correcting sample bias.
  10. Imagenet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition, pages 248–255.
  11. Qlora: Efficient finetuning of quantized llms.
  12. Bert: Pre-training of deep bidirectional transformers for language understanding.
  13. Conformer: Convolution-augmented transformer for speech recognition.
  14. Training acceleration of low-rank decomposed networks using sequential freezing and rank quantization. arXiv preprint arXiv:2309.03824.
  15. Strategies for applying low rank decomposition to transformer-based models. In 36th Conference on Neural Information Processing Systems (NeurIPS2022).
  16. Robust estimation and tracking of pitch period using an efficient bayesian filter. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 24(7):1219–1229.
  17. Inflection point analysis: A machine learning approach for extraction of iegm active intervals during atrial fibrillation. Artificial intelligence in medicine, 85:7–15.
  18. Ecg delineation for qt interval analysis using an unsupervised learning method. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 2541–2545. IEEE.
  19. Methods, systems, and media for computer vision using 2d convolution of 4d video data tensors. US Patent App. 17/502,588.
  20. A deep learning approach for diagnosing long qt syndrome without measuring qt interval. In Advances in Artificial Intelligence: 32nd Canadian Conference on Artificial Intelligence, Canadian AI 2019, Kingston, ON, Canada, May 28–31, 2019, Proceedings 32, pages 440–445. Springer.
  21. Long qt syndrome diagnosis and classification. US Patent 11,344,246.
  22. Extended kalman filter frequency tracker for nonstationary harmonic signals. Measurement, 45(1):126–132.
  23. Lora: Low-rank adaptation of large language models.
  24. A short study on compressing decoder-based language models. arXiv preprint arXiv:2110.08460.
  25. Roberta: A robustly optimized bert pretraining approach.
  26. Swin transformer: Hierarchical vision transformer using shifted windows.
  27. Fangchang Ma and Sertac Karaman. 2018. Sparse-to-dense: Depth prediction from sparse depth samples and a single image.
  28. Repeated random sampling for minimizing the time-to-accuracy of learning.
  29. OpenAI. 2023. Gpt-4 technical report.
  30. Librispeech: an asr corpus based on public domain audio books. In Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on, pages 5206–5210. IEEE.
  31. Improving language understanding by generative pre-training.
  32. Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of Machine Learning Research, 21(140):1–67.
  33. Fedaux: Leveraging unlabeled auxiliary data in federated learning.
  34. A simple and effective pruning approach for large language models.
  35. Llama 2: Open foundation and fine-tuned chat models.
  36. Learning soft labels via meta learning.
  37. Glue: A multi-task benchmark and analysis platform for natural language understanding.
  38. Yumo Xu and Mirella Lapata. 2019. Weakly supervised domain detection.
  39. Kernel ridge regression-based graph dataset distillation. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD ’23, page 2850–2861, New York, NY, USA. Association for Computing Machinery.
  40. Zizhao Zhang and Tomas Pfister. 2021. Learning fast sample re-weighting without reward data.
  41. Dataset condensation with gradient matching.
Citations (1)

Summary

We haven't generated a summary for this paper yet.