Papers
Topics
Authors
Recent
2000 character limit reached

Stable Code Technical Report (2404.01226v1)

Published 1 Apr 2024 in cs.CL

Abstract: We introduce Stable Code, the first in our new-generation of code LLMs series, which serves as a general-purpose base code LLM targeting code completion, reasoning, math, and other software engineering-based tasks. Additionally, we introduce an instruction variant named Stable Code Instruct that allows conversing with the model in a natural chat interface for performing question-answering and instruction-based tasks. In this technical report, we detail the data and training procedure leading to both models. Their weights are available via Hugging Face for anyone to download and use at https://huggingface.co/stabilityai/stable-code-3b and https://huggingface.co/stabilityai/stable-code-instruct-3b. This report contains thorough evaluations of the models, including multilingual programming benchmarks, and the MT benchmark focusing on multi-turn dialogues. At the time of its release, Stable Code is the state-of-the-art open model under 3B parameters and even performs comparably to larger models of sizes 7 billion and 15 billion parameters on the popular Multi-PL benchmark. Stable Code Instruct also exhibits state-of-the-art performance on the MT-Bench coding tasks and on Multi-PL completion compared to other instruction tuned models. Given its appealing small size, we also provide throughput measurements on a number of edge devices. In addition, we open source several quantized checkpoints and provide their performance metrics compared to the original model.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (44)
  1. Stable code complete alpha.
  2. Learning to represent programs with graphs. ArXiv, abs/1711.00740, 2017.
  3. code2seq: Generating sequences from structured representations of code. ArXiv, abs/1808.01400, 2018.
  4. code2vec: learning distributed representations of code. Proceedings of the ACM on Programming Languages, 3:1 – 29, 2018.
  5. Program synthesis with large language models. arXiv preprint arXiv:2108.07732, 2021.
  6. Llemma: An open language model for mathematics, 2023.
  7. Layer normalization, 2016.
  8. Qwen technical report, 2023.
  9. Training a helpful and harmless assistant with reinforcement learning from human feedback, 2022.
  10. Efficient training of language models to fill in the middle. ArXiv, abs/2207.14255, 2022.
  11. Stable lm 2 1.6b technical report, 2024.
  12. A framework for the evaluation of code generation models. https://github.com/bigcode-project/bigcode-evaluation-harness, 2022.
  13. Gpt-neox-20b: An open-source autoregressive language model, 2022.
  14. Multipl-e: A scalable and polyglot approach to benchmarking neural code generation. IEEE Transactions on Software Engineering, 49(7):3675–3691, 2023.
  15. Teaching large language models to self-debug, 2023.
  16. Together Computer. Redpajama: An open source recipe to reproduce llama training dataset, 2023.
  17. Ultrafeedback: Boosting language models with high-quality feedback, 2023.
  18. Cursor. Cursor: The ai-first code editor, 2024.
  19. Premkumar T. Devanbu. On the naturalness of software. 2012 34th International Conference on Software Engineering (ICSE), pages 837–847, 2012.
  20. GitHub. Github copilot: The world’s most widely adopted ai developer tool., 2024.
  21. Deepseek-coder: When the large language model meets programming – the rise of code intelligence, 2024.
  22. MLX: Efficient and flexible machine learning on apple silicon, 2023.
  23. Large language models for software engineering: A systematic literature review, 2023.
  24. Camels in a changing climate: Enhancing lm adaptation with tulu 2, 2023.
  25. The stack: 3 tb of permissively licensed source code. Preprint, 2022.
  26. StarCoder: may the source be with you!, 2023.
  27. Starcoder 2 and the stack v2: The next generation. arXiv preprint arXiv:2402.19173, 2024.
  28. Octopack: Instruction tuning code large language models. arXiv preprint arXiv:2308.07124, 2023.
  29. Codegen: An open large language model for code with multi-turn program synthesis. In International Conference on Learning Representations, 2022.
  30. The refinedweb dataset for falcon llm: Outperforming curated corpora with web data, and web data only, 2023.
  31. Improving language understanding by generative pre-training, 2018.
  32. Direct preference optimization: Your language model is secretly a reward model, 2023.
  33. Zero: Memory optimizations toward training trillion parameter models, 2020.
  34. Code llama: Open foundation models for code, 2023.
  35. StackOverFlow. Stackoverflow developer survey - 2022, 2022.
  36. Roformer: Enhanced transformer with rotary position embedding, 2023.
  37. Llama: Open and efficient foundation language models, 2023.
  38. Stablelm 3b 4e1t, 2023.
  39. Zephyr: Direct distillation of lm alignment. arXiv preprint arXiv:2310.16944, 2023.
  40. Wizardlm: Empowering large language models to follow complex instructions. arXiv preprint arXiv:2304.12244, 2023.
  41. If llm is the wizard, then code is the wand: A survey on how code empowers large language models to serve as intelligent agents.
  42. Metamath: Bootstrap your own mathematical questions for large language models. arXiv preprint arXiv:2309.12284, 2023.
  43. Root mean square layer normalization, 2019.
  44. Judging llm-as-a-judge with mt-bench and chatbot arena, 2023.
Citations (12)

Summary

  • The paper introduces Stable Code, a new language model variant optimized for code tasks with a novel multi-stage training approach.
  • It employs a decoder-only transformer enhanced by techniques like Rotary Position Embeddings and the FIM objective to improve code completion and fill-in-the-middle tasks.
  • The model’s lightweight design enables efficient use on consumer-grade hardware, reducing latency and supporting real-time code interactions.

Stable Code Technical Report

Introduction

The "Stable Code Technical Report" introduces Stable Code and its variant Stable Code Instruct, which are part of a new generation of LLMs optimized for code-related tasks. These models aim to enhance capabilities in code completion, reasoning, and related software engineering tasks. A crucial aspect of this work is its focus on creating lightweight models capable of running efficiently on edge devices.

Model Architecture and Training

Stable Code is built on Stable LM 3B, which employs a decoder-only transformer architecture. The model leverages architectural modifications, including Rotary Position Embeddings and LayerNorm, ensuring performance optimization for code-related tasks. A significant innovation in the training process is the adoption of a multi-stage approach, which leverages a pre-training dataset derived from a diverse array of programming languages and technical documents (Figure 1). Figure 1

Figure 1: Staged approach to training Stable Code 3B and Stable Code Instruct 3B.

The training employs advanced techniques such as AdamW optimizer and FIM objective. This incorporation of FIM helps the model predict code more contextually, reflecting the non-linear nature of code execution and structure. The staged approach is complemented by fine-tuning stages like SFT and DPO to further refine the model's instruction-based interactions.

Performance Evaluation

Stable Code 3B and its instruct variant have been extensively evaluated across various benchmarks:

  1. Code Completion: Stable Code demonstrates competitive performance on the Multi-PL benchmark, rivaling larger models such as Code Llama and StarCoder 15B, despite its smaller size.
  2. Fill in the Middle (FIM) Task: The models excel in FIM tasks—showcasing improved prediction capabilities on benchmarks like StarCoder-FIM, illustrating enhanced understanding and completion abilities in non-linear code contexts (Figure 2). Figure 2

    Figure 2: Stable Code 3B Loss and Learning Rate Curves.

  3. Instruction Tuning: Stable Code Instruct outperforms similar scale models in instruction-based tasks, particularly evident in complex multi-turn interactions, such as those found in MT-Bench. Figure 3

    Figure 3: Code Performance Comparison of Stable Code 3B Scratch and Stable LM 3B Initializations.

Practical Implications

Stable Code's development underscores the potential for creating efficient, high-performance models for software engineering applications. With its ability to function effectively on consumer-grade hardware, it facilitates a reduction in latency and dependencies often associated with cloud-based solutions. This architectural efficiency is critical for applications requiring real-time code completions and interactions.

Conclusion

The launch of Stable Code and its instruct version marks a strategic advancement in AI-driven code modeling. Their robust performance, especially in handling multilingual and multi-turn code tasks, presents promising avenues for extending LLM applications in software development environments. The open-source release signifies an opportunity for the broader research community to innovate upon and tailor these models for diverse and specialized code solutions. The exploration of quantized weights represents a step forward in optimizing inference speed and resource utilization, making these models highly adaptable for scalable deployment scenarios.

Slide Deck Streamline Icon: https://streamlinehq.com

Whiteboard

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

Sign up for free to view the 3 tweets with 58 likes about this paper.