Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

HW-GPT-Bench: Hardware-Aware Architecture Benchmark for Language Models (2405.10299v3)

Published 16 May 2024 in cs.LG and cs.AI

Abstract: The increasing size of LLMs necessitates a thorough analysis across multiple dimensions to assess trade-offs among crucial hardware metrics such as latency, energy consumption, GPU memory usage, and performance. Identifying optimal model configurations under specific hardware constraints is becoming essential but remains challenging due to the computational load of exhaustive training and evaluation on multiple devices. To address this, we introduce HW-GPT-Bench, a hardware-aware benchmark that utilizes surrogate predictions to approximate various hardware metrics across 13 devices of architectures in the GPT-2 family, with architectures containing up to 1.55B parameters. Our surrogates, via calibrated predictions and reliable uncertainty estimates, faithfully model the heteroscedastic noise inherent in the energy and latency measurements. To estimate perplexity, we employ weight-sharing techniques from Neural Architecture Search (NAS), inheriting pretrained weights from the largest GPT-2 model. Finally, we demonstrate the utility of HW-GPT-Bench by simulating optimization trajectories of various multi-objective optimization algorithms in just a few seconds.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (83)
  1. Proceedings of the International Conference on Computer Vision and Pattern Recognition (CVPR’20), 2020.
  2. Fluctuation-based adaptive structured pruning for large language models. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 38, pages 10865–10873, 2024.
  3. Towards efficient post-training quantization of pre-trained language models. Advances in Neural Information Processing Systems, 35:1405–1418, 2022.
  4. Ec-nas: Energy consumption aware tabular benchmarks for neural architecture search. In ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 5660–5664. IEEE, 2024.
  5. Can weight sharing outperform random architecture search? an investigation with tunas. In Proceedings of the International Conference on Computer Vision and Pattern Recognition (CVPR’20) cvp (2020).
  6. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
  7. Once-for-all: Train one network and specialize it for efficient deployment. In International Conference on Learning Representations, 2020. URL https://openreview.net/forum?id=HylxE1HKwS.
  8. Quip: 2-bit quantization of large language models with guarantees. Advances in Neural Information Processing Systems, 36, 2024.
  9. Searching for efficient multi-scale architectures for dense image prediction. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors, Proceedings of the 31st International Conference on Advances in Neural Information Processing Systems (NeurIPS’18), 2018.
  10. Autoformer: Searching transformers for visual recognition. In Proceedings of the 24nd IEEE/CVF International Conference on Computer Vision (ICCV’21), pages 12270–12280. cvfandieee, 2021.
  11. Palm: Scaling language modeling with pathways. Journal of Machine Learning Research, 24(240):1–113, 2023.
  12. Differentiable expected hypervolume improvement for parallel multi-objective bayesian optimization. Advances in Neural Information Processing Systems, 33:9851–9864, 2020.
  13. A fast and elitist multiobjective genetic algorithm: Nsga-ii. IEEE transactions on evolutionary computation, 6(2):182–197, 2002.
  14. Multi-objective optimization. In Decision sciences, pages 161–200. CRC Press, 2016.
  15. Compute and energy consumption trends in deep learning inference. arXiv preprint arXiv:2109.05472, 2021.
  16. X. Dong and Y. Yang. NAS-Bench-201: Extending the scope of reproducible Neural Architecture Search. In Proceedings of the International Conference on Learning Representations (ICLR’20), 2020. Published online: iclr.cc.
  17. Rethinking bias mitigation: Fairer architectures make for fairer face recognition. Advances in Neural Information Processing Systems, 36, 2024.
  18. Ea-has-bench: Energy-aware hyperparameter and architecture search benchmark. In The Eleventh International Conference on Learning Representations, 2022.
  19. TransNAS-Bench-101: Improving Transferability and Generalizability of Cross-Task Neural Architecture Search. In Proceedings of the International Conference on Computer Vision and Pattern Recognition (CVPR’21), pages 5251–5260, 2021.
  20. Not all layers of llms are necessary during inference. arXiv preprint arXiv:2403.02181, 2024.
  21. Single path one-shot neural architecture search with uniform sampling. In A. Vedaldi, H. Bischof, T. Brox, and J. Frahm, editors, 16th European Conference on Computer Vision (ECCV’20), pages 544–560. Springer, 2020.
  22. Learning both weights and connections for efficient neural network. Advances in neural information processing systems, 28, 2015.
  23. Optimal brain surgeon and general network pruning. In IEEE international conference on neural networks, pages 293–299. IEEE, 1993.
  24. Training compute-optimal large language models. arXiv preprint arXiv:2203.15556, 2022.
  25. Ofa 2: A multi-objective perspective for the once-for-all neural architecture search. In 2023 International Joint Conference on Neural Networks (IJCNN), pages 1–9. IEEE, 2023.
  26. Reinforcement learning for neural architecture search: A review. Image and Vision Computing, 89:57–66, 2019.
  27. Shortened llama: A simple depth pruning for large language models. arXiv preprint arXiv:2402.02834, 2024.
  28. Structural pruning of large language models via neural architecture search. In AutoML Conference 2023 (Workshop), 2023.
  29. Nas-bench-nlp: neural architecture search benchmark for natural language processing. IEEE Access, 10:45736–45747, 2022.
  30. Roger Koenker. Quantile regression, volume 38. Cambridge university press, 2005.
  31. Eeg-based sleep stage classification via neural architecture search. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 31:1075–1085, 2023.
  32. End-to-end constrained optimization learning: A survey. arXiv preprint arXiv:2103.16378, 2021.
  33. Optimal brain damage. Advances in neural information processing systems, 2, 1989.
  34. Hardware-adaptive efficient latency prediction for nas via meta-learning. Advances in Neural Information Processing Systems, 34:27016–27028, 2021.
  35. Hw-nas-bench: Hardware-aware neural architecture search benchmark. In The 9th International Conference on Learning Representations 2021 (ICLR 2021), 2021.
  36. Random search and reproducibility for neural architecture search. In Uncertainty in artificial intelligence, pages 367–377. PMLR, 2020.
  37. Pruning and quantization for deep neural network acceleration: A survey. Neurocomputing, 461:370–403, 2021.
  38. DARTS: Differentiable architecture search. In Proceedings of the International Conference on Learning Representations (ICLR’19), 2019. Published online: iclr.cc.
  39. Evsrnet: Efficient video super-resolution with neural architecture search. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2480–2485, 2021.
  40. The flan collection: Designing data and methods for effective instruction tuning. In International Conference on Machine Learning, pages 22631–22648. PMLR, 2023.
  41. Nsga-net: Neural architecture search using multi-objective genetic algorithm. In M. López-Ibáñez, editor, Proceedings of the Genetic and Evolutionary Computation Conference (GECCO’19), page 419–427, 2019.
  42. Nsganetv2: Evolutionary multi-objective surrogate-assisted neural architecture search. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part I 16, pages 35–51. Springer, 2020.
  43. The era of 1-bit llms: All large language models are in 1.58 bits. arXiv preprint arXiv:2402.17764, 2024.
  44. NAS-Bench-ASR: Reproducible Neural Architecture Search for Speech Recognition. In Proceedings of the International Conference on Learning Representations (ICLR’21), 2021. Published online: iclr.cc.
  45. Nas-bench-asr: Reproducible neural architecture search for speech recognition. In International Conference on Learning Representations, 2020.
  46. NAS-Bench-Suite: NAS evaluation is (now) surprisingly easy. In Proceedings of the International Conference on Learning Representations (ICLR’22), 2022. Published online: iclr.cc.
  47. Are sixteen heads really better than one? Advances in neural information processing systems, 32, 2019.
  48. Advrush: Searching for adversarially robust neural architectures. In Proceedings of the IEEE/CVF international conference on computer vision, pages 12322–12332, 2021.
  49. A flexible framework for multi-objective bayesian optimization using random scalarizations. In Uncertainty in Artificial Intelligence, pages 766–776. PMLR, 2020.
  50. Efficiently scaling transformer inference. Proceedings of Machine Learning and Systems, 5, 2023.
  51. Language models are unsupervised multitask learners. OpenAI blog, 1(8):9, 2019.
  52. Designing network design spaces. In Proceedings of the International Conference on Computer Vision and Pattern Recognition (CVPR’20) cvp (2020).
  53. Regularized evolution for image classifier architecture search. In Proceedings of the aaai conference on artificial intelligence, volume 33, pages 4780–4789, 2019.
  54. Conformalized quantile regression. Advances in neural information processing systems, 32, 2019.
  55. On the effect of dropping layers of pre-trained transformer models. Computer Speech & Language, 77:101429, 2023.
  56. Syne tune: A library for large scale hyperparameter tuning and reproducible research. In International Conference on Automated Machine Learning, pages 16–1. PMLR, 2022.
  57. Megatron-lm: Training multi-billion parameter language models using model parallelism. arXiv preprint arXiv:1909.08053, 2019.
  58. The evolved transformer. In International conference on machine learning, pages 5877–5886. PMLR, 2019.
  59. Roformer: Enhanced transformer with rotary position embedding. Neurocomputing, 568:127063, 2024.
  60. Multi-objective differentiable neural architecture search. arXiv preprint arXiv:2402.18213, 2024.
  61. Compression of generative pre-trained language models via quantization. arXiv preprint arXiv:2203.10705, 2022.
  62. Nas-bench-360: Benchmarking neural architecture search on diverse tasks. Advances in Neural Information Processing Systems, 35:12380–12394, 2022.
  63. The llm surgeon. In The Twelfth International Conference on Learning Representations, 2023.
  64. Analyzing multi-head self-attention: Specialized heads do the heavy lifting, the rest can be pruned. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 2019.
  65. Kv-cache: A scalable high-performance web-object cache for manycore. In 2013 IEEE/ACM 6th International Conference on Utility and Cloud Computing, pages 123–130. IEEE, 2013.
  66. Efficient large language models: A survey. arXiv preprint arXiv:2312.03863, 1, 2023.
  67. Gpt-j-6b: A 6 billion parameter autoregressive language model, 2021.
  68. Attentivenas: Improving neural architecture search via attentive sampling. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 6418–6427, 2021.
  69. Hat: Hardware-aware transformers for efficient natural language processing. arXiv:2005.14187[cs.CL], 2020a.
  70. Hat: Hardware-aware transformers for efficient natural language processing. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 7675–7688, 2020b.
  71. Structured pruning of large language models. arXiv preprint arXiv:1910.04732, 2019.
  72. How powerful are performance predictors in neural architecture search? arXiv:2104.01177 [cs.LG], 2021.
  73. Neural architecture search: Insights from 1000 papers. arXiv:2301.08727 [cs.LG], 2023.
  74. NAS-bench-x11 and the power of learning curves. In M. Ranzato, A. Beygelzimer, K. Nguyen, P. Liang, J. Vaughan, and Y. Dauphin, editors, Proceedings of the 35th International Conference on Advances in Neural Information Processing Systems (NeurIPS’21), volume 34, pages 22534–22549, 2021a.
  75. Nas-bench-x11 and the power of learning curves. Advances in Neural Information Processing Systems, 34:22534–22549, 2021b.
  76. Nas evaluation is frustratingly hard. In International Conference on Learning Representations, 2019.
  77. NAS-Bench-101: Towards reproducible Neural Architecture Search. In K. Chaudhuri and R. Salakhutdinov, editors, Proceedings of the 36th International Conference on Machine Learning (ICML’19), volume 97, pages 7105–7114. Proceedings of Machine Learning Research, 2019a.
  78. Nas-bench-101: Towards reproducible neural architecture search. In International conference on machine learning, pages 7105–7114. PMLR, 2019b.
  79. Slimmable neural networks. arXiv preprint arXiv:1812.08928, 2018.
  80. Prune once for all: Sparse pre-trained language models. arXiv preprint arXiv:2111.05754, 2021.
  81. Surrogate nas benchmarks: Going beyond the limited search spaces of tabular nas benchmarks. In Tenth International Conference on Learning Representations, pages 1–36. OpenReview. net, 2022.
  82. Unsupervised graph neural architecture search with disentangled self-supervision. Advances in Neural Information Processing Systems, 36, 2024.
  83. Neural architecture search with reinforcement learning. In International Conference on Learning Representations, 2016.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Rhea Sanjay Sukthanker (8 papers)
  2. Arber Zela (22 papers)
  3. Benedikt Staffler (7 papers)
  4. Frank Hutter (177 papers)
  5. Aaron Klein (24 papers)
  6. Lennart Purucker (15 papers)
  7. Joerg K. H. Franke (1 paper)
Citations (3)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com