WizardCoder: Empowering Code Large Language Models with Evol-Instruct (2306.08568v1)

Published 14 Jun 2023 in cs.CL and cs.AI

Abstract: Code LLMs (Code LLMs), such as StarCoder, have demonstrated exceptional performance in code-related tasks. However, most existing models are solely pre-trained on extensive raw code data without instruction fine-tuning. In this paper, we introduce WizardCoder, which empowers Code LLMs with complex instruction fine-tuning, by adapting the Evol-Instruct method to the domain of code. Through comprehensive experiments on four prominent code generation benchmarks, namely HumanEval, HumanEval+, MBPP, and DS-1000, we unveil the exceptional capabilities of our model. It surpasses all other open-source Code LLMs by a substantial margin. Moreover, our model even outperforms the largest closed LLMs, Anthropic's Claude and Google's Bard, on HumanEval and HumanEval+. Our code, model weights, and data are public at https://github.com/nlpxucan/WizardLM

PDF HTML Abstract

Introduction

The landscape of Code LLMs (Code LLMs) has dramatically evolved with the introduction of various pre-trained models demonstrating proficiency in coding tasks. Open-source options like StarCoder have received significant acclaim. Yet, most of these models have largely been trained on code data alone, without the benefits of instruction fine-tuning. Building on the recent developments in general domain fine-tuning and the Evol-Instruct method, introduced by WizardLM, this paper presents WizardCoder, an enhancement to StarCoder that integrates complex instruction fine-tuning specific to coding tasks.

Related Work

In contextualizing WizardCoder, this research builds upon two primary foundations: open-source Code LLMs pre-trained on extensive code datasets and the methodology of instruction fine-tuning that has been largely explored in NLP tasks. Previous models, such as InstructGPT by OpenAI, have attempted to demonstrate the value of human-annotator provided instructions. Recent contributions like Alpaca and Vicuna further explored the potential of instruction fine-tuning, albeit in the general domain. WizardLM's Evol-Instruct method distinguished itself by evolving existing instruction data, signaling the potential for application in the code domain leading to the inception of WizardCoder.

Approach

WizardCoder employs an adapted Evol-Instruct method designed to evolve code instructions within the Code Alpaca dataset. This enables fine-tuning of StarCoder with an evolved set of code instruction-following training data. The researchers introduced evolutionary instructions that include code debugging and time-space complexity constraints, which are unique to the programming domain. The methodology ensures evolutionary prompts that augment the difficulty of the programming tasks. One observes that the empirical success of WizardCoder on several benchmarks is attributed to this nuanced approach of instruction fine-tuning.

Experimentation and Results

A rigorous experimentation framework was established utilizing multiple code generation benchmarks. WizardCoder outshines all open-source Code LLMs in these benchmarks, including its precursor, StarCoder. Notably, on prominent benchmarks such as HumanEval, it surpasses even the top closed-source LLMs, which is a remarkable feat for an open-source model of its size. The paper provides detailed comparative analysis, placing WizardCoder in the upper echelons of Code LLM performance. Furthermore, the ablation paper confirms the efficacy of the number of data evolution rounds carried out, providing insights into fine-tuning methodologies.

Conclusion and Implications

The paper concludes with WizardCoder positioned as a state-of-the-art model that advances the field of code generation through instruction fine-tuning. It successfully applies the Evol-Instruct method, previously proven in the general domain, to the specific challenges of coding tasks. Looking ahead, the researchers point out the potential enhancements to WizardCoder and the need for continual improvement to meet and exceed the benchmarks set by models like GPT-4. Reflecting on the broader impact, the authors acknowledge the ethical considerations paralleling those of other LLMs and emphasize the necessity of research towards responsible use and deployment.

PDF Markdown Bookmark Chat (Pro)

References (39)

Authors (10)

Ziyang Luo (35 papers)
Can Xu (98 papers)
Pu Zhao (82 papers)
Qingfeng Sun (40 papers)
Xiubo Geng (36 papers)
Wenxiang Hu (10 papers)
Chongyang Tao (61 papers)
Jing Ma (136 papers)
Qingwei Lin (81 papers)
Daxin Jiang (138 papers)

Citations (497)

View on Semantic Scholar

WizardCoder: Empowering Code Large Language Models with Evol-Instruct (2306.08568v1)

Introduction

Related Work

Approach

Experimentation and Results

Conclusion and Implications

Related Papers

GitHub

YouTube