Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
143 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

SteloCoder: a Decoder-Only LLM for Multi-Language to Python Code Translation (2310.15539v2)

Published 24 Oct 2023 in cs.CL and cs.AI

Abstract: With the recent focus on LLMs, both StarCoder (Li et al., 2023) and Code Llama (Rozi`ere et al., 2023) have demonstrated remarkable performance in code generation. However, there is still a need for improvement in code translation functionality with efficient training techniques. In response to this, we introduce SteloCoder, a decoder-only StarCoder-based LLM designed specifically for multi-programming language-to-Python code translation. In particular, SteloCoder achieves C++, C#, JavaScript, Java, or PHP-to-Python code translation without specifying the input programming language. We modified StarCoder model architecture by incorporating a Mixture-of-Experts (MoE) technique featuring five experts and a gating network for multi-task handling. Experts are obtained by StarCoder fine-tuning. Specifically, we use a Low-Rank Adaptive Method (LoRA) technique, limiting each expert size as only 0.06% of number of StarCoder's parameters. At the same time, to enhance training efficiency in terms of time, we adopt curriculum learning strategy and use self-instruct data for efficient fine-tuning. As a result, each expert takes only 6 hours to train on one single 80Gb A100 HBM. With experiments on XLCoST datasets, SteloCoder achieves an average of 73.76 CodeBLEU score in multi-programming language-to-Python translation, surpassing the top performance from the leaderboard by at least 3.5. This accomplishment is attributed to only 45M extra parameters with StarCoder as the backbone and 32 hours of valid training on one 80GB A100 HBM. The source code is release here: https://github.com/sade-adrien/SteloCoder.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (33)
  1. Unified pre-training for program understanding and generation, 2021.
  2. Santacoder: don’t reach for the stars!, 2023.
  3. Curriculum learning. In Proceedings of the 26th Annual International Conference on Machine Learning, ICML ’09, pp.  41–48, New York, NY, USA, 2009. Association for Computing Machinery. ISBN 9781605585161. doi: 10.1145/1553374.1553380. URL https://doi.org/10.1145/1553374.1553380.
  4. Language models are few-shot learners, 2020.
  5. On the properties of neural machine translation: Encoder-decoder approaches, 2014.
  6. Palm: Scaling language modeling with pathways, 2022.
  7. Bert: Pre-training of deep bidirectional transformers for language understanding, 2019.
  8. Codetrans: Towards cracking the language of silicon’s code through self-supervised deep learning and high performance computing, 2021.
  9. Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity, 2022.
  10. Codebert: A pre-trained model for programming and natural languages, 2020.
  11. Parameter-efficient mixture-of-experts architecture for pre-trained language models, 2022.
  12. Demix layers: Disentangling domains for modular language modeling, 2021.
  13. Towards a unified view of parameter-efficient transfer learning, 2022.
  14. Lora: Low-rank adaptation of large language models, 2021.
  15. Curriculum learning and minibatch bucketing in neural machine translation. In RANLP 2017 - Recent Advances in Natural Language Processing Meet Deep Learning. Incoma Ltd. Shoumen, Bulgaria, nov 2017. doi: 10.26615/978-954-452-049-6_050. URL https://doi.org/10.26615%2F978-954-452-049-6_050.
  16. Cobol2vec: Learning representations of cobol code, 2022.
  17. Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension, 2019.
  18. Starcoder: may the source be with you!, 2023.
  19. Prefix-tuning: Optimizing continuous prompts for generation, 2021.
  20. Peft: State-of-the-art parameter-efficient fine-tuning methods. https://github.com/huggingface/peft, 2022.
  21. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pp.  311–318, Philadelphia, Pennsylvania, USA, July 2002. Association for Computational Linguistics. doi: 10.3115/1073083.1073135. URL https://aclanthology.org/P02-1040.
  22. Robust speech recognition via large-scale weak supervision. In International Conference on Machine Learning, pp.  28492–28518. PMLR, 2023.
  23. Exploring the limits of transfer learning with a unified text-to-text transformer, 2023.
  24. Codebleu: a method for automatic evaluation of code synthesis. arXiv preprint arXiv:2009.10297, 2020.
  25. Code llama: Open foundation models for code, 2023.
  26. Outrageously large neural networks: The sparsely-gated mixture-of-experts layer, 2017.
  27. Llama: Open and efficient foundation language models, 2023.
  28. Attention is all you need, 2023.
  29. Self-instruct: Aligning language models with self-generated instructions, 2023.
  30. Codet5: Identifier-aware unified pre-trained encoder-decoder models for code understanding and generation, 2021.
  31. Harnessing the power of llms in practice: A survey on chatgpt and beyond, 2023.
  32. Reinforced curriculum learning on pre-trained neural machine translation models, 2020.
  33. Xlcost: A benchmark dataset for cross-lingual code intelligence. arXiv preprint arXiv:2206.08474, 2022.
Citations (7)

Summary

We haven't generated a summary for this paper yet.

Github Logo Streamline Icon: https://streamlinehq.com