EffiBench: Benchmarking the Efficiency of Automatically Generated Code (2402.02037v5)

Published 3 Feb 2024 in cs.SE and cs.CL

Abstract: Code generation models have increasingly become integral to aiding software development. Although current research has thoroughly examined the correctness of the code produced by code generation models, a vital aspect that plays a pivotal role in green computing and sustainability efforts has often been neglected. This paper presents EffiBench, a benchmark with 1,000 efficiency-critical coding problems to assess the efficiency of code generated by code generation models. EffiBench contains a diverse set of LeetCode coding problems. Each problem is paired with an executable human-written canonical solution, which obtains the SOTA efficiency on the LeetCode solution leaderboard. With EffiBench, we empirically examine the ability of 42 LLMs (35 open-source and 7 closed-source) to generate efficient code. Our evaluation results demonstrate that the efficiency of the code generated by LLMs is generally worse than the efficiency of human-written canonical solutions. For example, GPT-4 generated code has an average \textbf{3.12} times execution time that of the human-written canonical solutions. In the most extreme cases, the execution time and total memory usage of GPT-4 generated code are \textbf{13.89} and \textbf{43.92} times that of the canonical solutions. The source code of EffiBench is released on https://github.com/huangd1999/EffiBench. We also provide the LeaderBoard at https://huggingface.co/spaces/EffiBench/effibench-leaderboard.

PDF HTML Abstract

Summarize PDF Markdown Bookmark Chat (Pro)

References (33)

Authors (5)

Dong Huang (102 papers)
Jie M. Zhang (39 papers)
Yuhao Qing (11 papers)
Heming Cui (29 papers)
Weiyi Shang (17 papers)

Citations (16)

View on Semantic Scholar

Tweets

https://twitter.com/ComputerPapers/status/1758500622318313556

https://twitter.com/ComputerPapers/status/1810358267685585335

https://twitter.com/ComputerPapers/status/1754767250421084320

EffiBench: Benchmarking the Efficiency of Automatically Generated Code (2402.02037v5)

Related Papers

Tweets