Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Escalating LLM-based Code Translation Benchmarking into the Class-level Era (2411.06145v2)

Published 9 Nov 2024 in cs.SE

Abstract: In recent years, LLMs have significantly improved automated code translation, often achieving over 80% accuracy on existing benchmarks. However, most of these benchmarks consist of short, standalone, algorithmic samples that do not reflect practical coding tasks. To address this gap, we introduce ClassEval-T, a class-level code translation benchmark designed to assess LLM performance on real-world coding scenarios. Built upon ClassEval, a class-level Python code generation benchmark covering topics such as database operations and game design, ClassEval-T extends into Java and C++ with complete code samples and test suites, requiring 360 person-hours for manual migration. We propose three translation strategies (holistic, min-dependency, and standalone) and evaluate six recent LLMs across various families and sizes on ClassEval-T. Results reveal a significant performance drop compared to method-level benchmarks, highlighting discrepancies among LLMs and demonstrating ClassEval-T's effectiveness. We further analyze LLMs' dependency awareness in translating class samples and categorize 1,397 failure cases by the best-performing LLM for practical insights and future improvement.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (11)
  1. Pengyu Xue (4 papers)
  2. Linhao Wu (4 papers)
  3. Chengyi Wang (32 papers)
  4. Xiang Li (1002 papers)
  5. Zhen Yang (160 papers)
  6. Ruikai Jin (1 paper)
  7. Yuxiang Zhang (104 papers)
  8. Jia Li (380 papers)
  9. Yifei Pei (1 paper)
  10. Zhaoyan Shen (3 papers)
  11. Xiran Lyu (1 paper)
X Twitter Logo Streamline Icon: https://streamlinehq.com