Escalating LLM-based Code Translation Benchmarking into the Class-level Era (2411.06145v2)

Published 9 Nov 2024 in cs.SE

Abstract: In recent years, LLMs have significantly improved automated code translation, often achieving over 80% accuracy on existing benchmarks. However, most of these benchmarks consist of short, standalone, algorithmic samples that do not reflect practical coding tasks. To address this gap, we introduce ClassEval-T, a class-level code translation benchmark designed to assess LLM performance on real-world coding scenarios. Built upon ClassEval, a class-level Python code generation benchmark covering topics such as database operations and game design, ClassEval-T extends into Java and C++ with complete code samples and test suites, requiring 360 person-hours for manual migration. We propose three translation strategies (holistic, min-dependency, and standalone) and evaluate six recent LLMs across various families and sizes on ClassEval-T. Results reveal a significant performance drop compared to method-level benchmarks, highlighting discrepancies among LLMs and demonstrating ClassEval-T's effectiveness. We further analyze LLMs' dependency awareness in translating class samples and categorize 1,397 failure cases by the best-performing LLM for practical insights and future improvement.

PDF HTML Abstract

Summarize PDF Markdown Bookmark Chat (Pro)

Authors (11)

Pengyu Xue (4 papers)
Linhao Wu (4 papers)
Chengyi Wang (32 papers)
Xiang Li (1002 papers)
Zhen Yang (160 papers)
Ruikai Jin (1 paper)
Yuxiang Zhang (104 papers)
Jia Li (380 papers)
Yifei Pei (1 paper)
Zhaoyan Shen (3 papers)
Xiran Lyu (1 paper)

Tweets

https://twitter.com/ComputerPapers/status/1856230088888504696

Escalating LLM-based Code Translation Benchmarking into the Class-level Era (2411.06145v2)

Related Papers

Tweets