The Larger the Better? Improved LLM Code-Generation via Budget Reallocation
Abstract: It is a common belief that LLMs are better than smaller-sized ones. However, larger models also require significantly more time and compute during inference. This begs the question: what happens when both models operate under the same budget? (e.g., compute, run-time). To address this question, we analyze code generation LLMs of various sizes and make comparisons such as running a 70B model once vs. generating five outputs from a 13B model. We consider a standard unit-test setup, which can be used to select the correct output from the smaller model. Our findings reveal that the repeated use of smaller models can yield consistent improvements, with gains of up to 15% across five tasks. On the other hand, in scenarios where unit-tests are unavailable, a ranking-based selection of candidates from the smaller model falls short of the performance of a single output from larger ones. Our results highlight the potential of using smaller models instead of larger ones, and the importance of studying approaches for ranking LLM outputs.
- Google DeepMind AlphaCode Team. Alphacode 2 technical report, 2023. URL https://storage.googleapis.com/deepmind-media/AlphaCode2/AlphaCode2_Tech_Report.pdf.
- Program synthesis with large language models, 2021. URL https://arxiv.org/abs/2108.07732. arXiv:2108.07732.
- Accelerating large language model decoding with speculative sampling, 2023. URL https://arxiv.org/abs/2302.01318. arXiv:2302.01318.
- Evaluating large language models trained on code, 2021. URL https://arxiv.org/abs/2107.03374. arXiv:2107.03374.
- Palm: Scaling language modeling with pathways. Journal of Machine Learning Research, 24(240):1–113, 2023.
- Unified scaling laws for routed language models. In International conference on machine learning, pp. 4057–4086. PMLR, 2022.
- Training verifiers to solve math word problems, 2021a. URL https://arxiv.org/abs/2110.14168. arXiv:2110.14168.
- Training verifiers to solve math word problems, 2021b. URL https://arxiv.org/abs/2110.14168. arXiv:2110.14168.
- Scaling vision transformers to 22 billion parameters. In International Conference on Machine Learning, pp. 7480–7512. PMLR, 2023.
- LLM.int8(): 8-bit matrix multiplication for transformers at scale. In Advances in Neural Information Processing Systems, 2022.
- Stepcoder: Improve code generation with reinforcement learning from compiler feedback, 2024. URL https://arxiv.org/abs/2402.01391. arXiv:2402.01391.
- Alpacafarm: A simulation framework for methods that learn from human feedback. In Advances in Neural Information Processing Systems, 2023. URL https://arxiv.org/abs/2305.14387.
- Language models scale reliably with over-training and on downstream tasks, 2024. URL https://arxiv.org/abs/2403.08540. arXiv:2403.08540.
- Scaling laws for discriminative speech recognition rescoring models, 2023. URL https://arxiv.org/abs/2306.15815. arXiv:2306.15815.
- Textually pretrained speech language models. Advances in Neural Information Processing Systems, 36, 2024.
- Teaching large language models to reason with reinforcement learning, 2024. URL https://arxiv.org/abs/2403.04642. arXiv:2403.04642.
- Measuring coding challenge competence with APPS. In Advances in Neural Information Processing Systems, 2021.
- Scaling laws for transfer, 2021. URL https://arxiv.org/abs/2102.01293. arXiv:2102.01293.
- Training compute-optimal large language models, 2022. URL https://arxiv.org/abs/2203.15556. arXiv:2203.15556.
- The curious case of neural text degeneration. In International Conference on Learning Representations, 2019.
- Scaling laws for neural language models, 2020. URL https://arxiv.org/abs/2001.08361. arXiv:2001.08361.
- Speculative decoding with big little decoder. In A. Oh, T. Neumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine (eds.), Advances in Neural Information Processing Systems, volume 36, pp. 39236–39256. Curran Associates, Inc., 2023. URL https://proceedings.neurips.cc/paper_files/paper/2023/file/7b97adeafa1c51cf65263459ca9d0d7c-Paper-Conference.pdf.
- Spoc: Search-based pseudocode to code. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett (eds.), Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019. URL https://proceedings.neurips.cc/paper_files/paper/2019/file/7298332f04ac004a0ca44cc69ecf6f6b-Paper.pdf.
- Fast inference from transformers via speculative decoding. In Proc. of ICLR, 2023. URL https://arxiv.org/abs/2211.17192.
- Common 7B language models already possess strong math capabilities, 2024. URL https://arxiv.org/abs/2403.04706. arXiv:2403.04706.
- G-Eval: NLG evaluation using GPT-4 with better human alignment, 2023. URL https://arxiv.org/abs/2303.16634. arXiv:2303.16634.
- Transformers are multi-state RNNs, 2024. URL https://arxiv.org/abs/2401.06104. arXiv:2401.06104.
- Large language models are effective text rankers with pairwise ranking prompting, 2023. URL https://arxiv.org/abs/2306.17563. arXiv:2306.17563.
- Scaling language models: Methods, analysis & insights from training gopher, 2021. URL https://arxiv.org/abs/2112.11446. arXiv:2112.11446.
- Code llama: Open foundation models for code, 2023. URL https://arxiv.org/abs/2308.12950. arXiv:2308.12950.
- Branch-solve-merge improves large language model evaluation and generation. In Proc. of NAACL, 2024. URL https://arxiv.org/abs/2310.15123.
- When do we not need larger vision models?, 2024. URL https://arxiv.org/abs/2403.13043. arXiv:2403.13043.
- Is ChatGPT good at search? investigating large language models as re-ranking agents. In Houda Bouamor, Juan Pino, and Kalika Bali (eds.), Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pp. 14918–14937, Singapore, December 2023. Association for Computational Linguistics. doi: 10.18653/v1/2023.emnlp-main.923. URL https://aclanthology.org/2023.emnlp-main.923.
- Gemini: a family of highly capable multimodal models, 2023. URL https://arxiv.org/abs/2312.11805. arXiv:2312.11805.
- Llama 2: Open foundation and fine-tuned chat models, 2023. URL https://arxiv.org/abs/2307.09288. arXiv:2307.09288.
- Solving math word problems with process-and outcome-based feedback, 2022. URL https://arxiv.org/abs/2211.14275. arXiv:2211.14275.
- Emergent abilities of large language models. Transactions on Machine Learning Research, 2022. ISSN 2835-8856. URL https://openreview.net/forum?id=yzkSU5zdwD. Survey Certification.
- xiaoju ye. calflops: a flops and params calculate tool for neural networks in pytorch framework, 2023. URL https://github.com/MrYxJ/calculate-flops.pytorch.
- Judging LLM-as-a-judge with MT-bench and chatbot arena. In Advances in Neural Information Processing Systems: Datasets and Benchmarks Track, 2023.
- Pre-trained language model based ranking in baidu search. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, pp. 4014–4022, 2021.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.