Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

UniCoder: Scaling Code Large Language Model via Universal Code (2406.16441v1)

Published 24 Jun 2024 in cs.CL

Abstract: Intermediate reasoning or acting steps have successfully improved LLMs for handling various downstream NLP tasks. When applying LLMs for code generation, recent works mainly focus on directing the models to articulate intermediate natural-language reasoning steps, as in chain-of-thought (CoT) prompting, and then output code with the natural language or other structured intermediate steps. However, such output is not suitable for code translation or generation tasks since the standard CoT has different logical structures and forms of expression with the code. In this work, we introduce the universal code (UniCode) as the intermediate representation. It is a description of algorithm steps using a mix of conventions of programming languages, such as assignment operator, conditional operator, and loop. Hence, we collect an instruction dataset UniCoder-Instruct to train our model UniCoder on multi-task learning objectives. UniCoder-Instruct comprises natural-language questions, code solutions, and the corresponding universal code. The alignment between the intermediate universal code representation and the final code solution significantly improves the quality of the generated code. The experimental results demonstrate that UniCoder with the universal code significantly outperforms the previous prompting methods by a large margin, showcasing the effectiveness of the structural clues in pseudo-code.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (9)
  1. Tao Sun (143 papers)
  2. Linzheng Chai (16 papers)
  3. Jian Yang (505 papers)
  4. Yuwei Yin (21 papers)
  5. Hongcheng Guo (39 papers)
  6. Jiaheng Liu (100 papers)
  7. Bing Wang (246 papers)
  8. Liqun Yang (18 papers)
  9. Zhoujun Li (122 papers)
Citations (13)

Summary

We haven't generated a summary for this paper yet.