Emergent Mind

Stable Code Technical Report

(2404.01226)
Published Apr 1, 2024 in cs.CL

Abstract

We introduce Stable Code, the first in our new-generation of code language models series, which serves as a general-purpose base code language model targeting code completion, reasoning, math, and other software engineering-based tasks. Additionally, we introduce an instruction variant named Stable Code Instruct that allows conversing with the model in a natural chat interface for performing question-answering and instruction-based tasks. In this technical report, we detail the data and training procedure leading to both models. Their weights are available via Hugging Face for anyone to download and use at https://huggingface.co/stabilityai/stable-code-3b and https://huggingface.co/stabilityai/stable-code-instruct-3b. This report contains thorough evaluations of the models, including multilingual programming benchmarks, and the MT benchmark focusing on multi-turn dialogues. At the time of its release, Stable Code is the state-of-the-art open model under 3B parameters and even performs comparably to larger models of sizes 7 billion and 15 billion parameters on the popular Multi-PL benchmark. Stable Code Instruct also exhibits state-of-the-art performance on the MT-Bench coding tasks and on Multi-PL completion compared to other instruction tuned models. Given its appealing small size, we also provide throughput measurements on a number of edge devices. In addition, we open source several quantized checkpoints and provide their performance metrics compared to the original model.

Approach for training Stable Code 3B and Stable Code Instruct 3B in stages.

Overview

  • Stable Code introduces new benchmarks in code language modeling with advancements in code completion, reasoning, and software engineering tasks.

  • The report details the models' architecture, training on diverse datasets, and evaluations showcasing superior performance in multilingual programming.

  • Innovations in training methodology include a multi-stage approach and Fill in the Middle (FIM) objectives, enhancing model comprehension and code prediction capabilities.

  • Stable Code and Stable Code Instruct demonstrate impressive results in both code and conversational tasks, promising significant advancements in software development interaction.

Exploring Stable Code: A New Benchmark in Code Language Modeling

Introduction to Stable Code

Stable Code emerges as a compelling advancement in the domain of code language models (LMs), aimed at enhancing code completion, reasoning, mathematical problem-solving, and broad software engineering tasks. Accompanying Stable Code, the report also introduces Stable Code Instruct, designed for natural language interfacing, enabling question-answering and instruction-based executions. This technical report meticulously outlines the models' training regime, datasets, and evaluations, providing the research community with both models through Hugging Face. It distinguishes itself by setting new benchmarks in multilingual programming tasks and even parallels the performance of much larger models in the field.

Training Data and Architecture

The report explores the comprehensive data sourcing and preparation strategy, comprising a blend of code repositories, technical documents, mathematical texts, and the web, tailored to foster a comprehensive understanding relevant to software development. The data strategy not only broadens the model's comprehension skills but also imbues it with a versatile conversational ability, thereby enhancing its applicability across a plethora of software engineering queries.

The model's architecture is built upon the Stable LM 3B framework, incorporating adjustments like Rotary Position Embeddings, LayerNorm modifications, and refined bias configurations. The chosen architecture underscores an emphasis on efficiency and performance, leveraging backend optimizations and demonstrating a grasp of advancements in the LLM landscape.

Training Methodology and Model Initialization

Intriguingly, the report outlines a multi-stage training approach enriched with Fill in the Middle (FIM) objectives. This strategic methodology is designed to combat the limitations of traditional causal language modeling by enriching the model's exposure to diverse structural patterns, thereby boosting its comprehension and prediction capabilities concerning code.

Moreover, the training section presents an insightful comparison between models trained from scratch versus those initialized from pre-trained LMs. The findings compellingly advocate for pre-trained initialization, spotlighting the beneficial crossover between natural language processing and code comprehension abilities.

Fine-Tuning and Alignment

Post the base model training, Stable Code Instruct undergoes a rigorous fine-tuning regimen, leveraging a curated blend of datasets tailored to enhance conversational interactivity and response quality. The fine-tuning phase adheres to established practices such as supervised fine-tuning followed by Direct Preference Optimization, highlighting a meticulous effort to refine the model's conversational capabilities.

Performance Evaluations

The evaluative benchmarks provide a robust testament to the models' capabilities. In code completion tasks, Stable Code demonstrates an impressive parity with much larger models across various programming languages. Furthermore, when specialized tasks such as Fill in the Middle (FIM) and SQL queries are considered, the models not only exhibit superior performance but also highlight the nuanced understanding of code contexts and databases.

Additionally, in the realm of instruction-based tasks, Stable Code Instruct showcases exemplary performance, underscoring the successful integration of conversational finesse post fine-tuning. These evaluations collectively emphasize the models' standing as competitive, if not superior, alternatives in the landscape of code LMs.

Throughput and Quantization Considerations

A notable mention is given to the throughput measurements and quantization strategies, showcasing the model's practicality in real-world scenarios, especially on edge devices. The report provides insight into the substantial throughput gains achievable through precision adjustments, marking an important consideration for developers aiming to deploy these models in varied computing environments.

Conclusions and Implications

The Stable Code series marks a pivotal advancement in the code LM domain, primarily by marrying the robustness of LLMs with the specificity of software engineering tasks. The detailed account of data sourcing, training methodologies, and fine-tuning strategies underlines a comprehensive effort to develop models that are not just cutting-edge in technology but also versatile in application. The performance metrics reinforce the models' competitiveness, making them valuable assets for researchers and practitioners alike.

Looking forward, the implications of Stable Code and Stable Code Instruct extend beyond mere code completion. They promise advancements in the way we interact with and conceptualize the development of software, paving the way for models that are increasingly in tune with the multifaceted needs of developers. As the field progresses, one can anticipate further refinements and applications stemming from this groundbreaking work.

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.

References
  1. Stable code complete alpha
  2. Learning to Represent Programs with Graphs
  3. code2seq: Generating Sequences from Structured Representations of Code
  4. code2vec: learning distributed representations of code. Proceedings of the ACM on Programming Languages, 3:1 – 29
  5. Program Synthesis with Large Language Models
  6. Llemma: An open language model for mathematics
  7. Layer normalization
  8. Qwen technical report
  9. Training a helpful and harmless assistant with reinforcement learning from human feedback
  10. Efficient Training of Language Models to Fill in the Middle
  11. Stable lm 2 1.6b technical report
  12. A framework for the evaluation of code generation models. https://github.com/bigcode-project/bigcode-evaluation-harness

  13. Gpt-neox-20b: An open-source autoregressive language model
  14. Multipl-e: A scalable and polyglot approach to benchmarking neural code generation. IEEE Transactions on Software Engineering, 49(7):3675–3691
  15. Teaching large language models to self-debug
  16. Together Computer. Redpajama: An open source recipe to reproduce llama training dataset
  17. Ultrafeedback: Boosting language models with high-quality feedback
  18. Cursor. Cursor: The ai-first code editor
  19. Premkumar T. Devanbu. On the naturalness of software. 2012 34th International Conference on Software Engineering (ICSE), pages 837–847
  20. GitHub. Github copilot: The world’s most widely adopted ai developer tool.
  21. Deepseek-coder: When the large language model meets programming – the rise of code intelligence
  22. MLX: Efficient and flexible machine learning on apple silicon
  23. Large language models for software engineering: A systematic literature review
  24. Camels in a changing climate: Enhancing lm adaptation with tulu 2
  25. The stack: 3 tb of permissively licensed source code. Preprint
  26. StarCoder: may the source be with you!
  27. StarCoder 2 and The Stack v2: The Next Generation
  28. OctoPack: Instruction Tuning Code Large Language Models
  29. Codegen: An open large language model for code with multi-turn program synthesis. In International Conference on Learning Representations
  30. The refinedweb dataset for falcon llm: Outperforming curated corpora with web data, and web data only
  31. Improving language understanding by generative pre-training
  32. Direct preference optimization: Your language model is secretly a reward model
  33. Zero: Memory optimizations toward training trillion parameter models
  34. Code llama: Open foundation models for code
  35. StackOverFlow. Stackoverflow developer survey - 2022
  36. Roformer: Enhanced transformer with rotary position embedding
  37. Llama: Open and efficient foundation language models
  38. Stablelm 3b 4e1t
  39. Zephyr: Direct Distillation of LM Alignment
  40. WizardLM: Empowering Large Language Models to Follow Complex Instructions
  41. If llm is the wizard, then code is the wand: A survey on how code empowers large language models to serve as intelligent agents
  42. MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models
  43. Root mean square layer normalization
  44. Judging llm-as-a-judge with mt-bench and chatbot arena

Show All 44