Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

StarCoder: may the source be with you! (2305.06161v2)

Published 9 May 2023 in cs.CL, cs.AI, cs.PL, and cs.SE
StarCoder: may the source be with you!

Abstract: The BigCode community, an open-scientific collaboration working on the responsible development of LLMs for Code (Code LLMs), introduces StarCoder and StarCoderBase: 15.5B parameter models with 8K context length, infilling capabilities and fast large-batch inference enabled by multi-query attention. StarCoderBase is trained on 1 trillion tokens sourced from The Stack, a large collection of permissively licensed GitHub repositories with inspection tools and an opt-out process. We fine-tuned StarCoderBase on 35B Python tokens, resulting in the creation of StarCoder. We perform the most comprehensive evaluation of Code LLMs to date and show that StarCoderBase outperforms every open Code LLM that supports multiple programming languages and matches or outperforms the OpenAI code-cushman-001 model. Furthermore, StarCoder outperforms every model that is fine-tuned on Python, can be prompted to achieve 40\% pass@1 on HumanEval, and still retains its performance on other programming languages. We take several important steps towards a safe open-access model release, including an improved PII redaction pipeline and a novel attribution tracing tool, and make the StarCoder models publicly available under a more commercially viable version of the Open Responsible AI Model license.

Introduction

The BigCode community has unveiled StarCoder and StarCoderBase, extensive LLMs trained on code data. Featuring 15.5B parameters with an 8K token context length, these models boast infilling capabilities and efficient large-batch inference via multi-query attention. The training corpus for StarCoderBase amounts to 1 trillion tokens sourced from a diverse collection of permissively licensed GitHub repositories known as The Stack. StarCoder is StarCoderBase's fine-tuned counterpart, tailored on 35B Python tokens. A comprehensive evaluation reveals that StarCoderBase surpasses all other open Code LLMs in multiple language support and parallels the performance of OpenAI's code-cushman-001 model. Moreover, StarCoder outshines models fine-tuned on Python while maintaining proficiency in other programming languages.

Model Development

The StarCoder models demonstrate a commitment to responsible development, encompassing copyright respect, privacy protection, and shared community involvement in the development process. Contributing to legal compliance, the PII redaction pipeline has been enhanced and an attribution tool developed, tracing code generations back to training data. Ensuring open access is pivotal to the community-driven approach of the BigCode project. The Stack provides a transparent pre-training dataset with governance tools to verify inclusion and an opt-out process for developers desiring to exclude their code. This effort facilitates external audits and contributions to model improvements and serves as an exemplary open scientific collaboration model.

Empirical Analysis

Evaluation benchmarks the core of Code LLM assessment. The evaluation strategy for StarCoder integrates a diverse array of benchmarks, covering language understanding, reasoning, and toxicity levels. Performance on GSM8K elucidates the reasoning capabilities of StarCoderBase, surpassing similar parameter-sized Code LLMs. Metrics from MMLU and CoQA disclose its language prowess. Meanwhile, RealToxicityPrompts aid in detecting potential biases and toxicity in generated text, an essential safety aspect. StarCoder and StarCoderBase's skilled performance across numerous benchmarks fortifies their staunch positions amid current Code LLMs.

Tools for Safe Deployment

The release of StarCoder models embraces an OpenRAIL-M license, stipulating responsible use restrictions to avert potential misuse in critical scenarios. This initiative addresses the liability by improving transparency and encouraging ethical usage. Augmenting the responsible deployment initiative, new tools for membership checking and a BM25 index search have been published, facilitating users to link model output to training sets effectively. Such tools are pioneering steps towards safeguarding responsible AI deployment, curbing misuse, and bolstering accountability in model-generated code.

In conclusion, the BigCode community's contribution of StarCoder and StarCoderBase represents a significant stride towards the effective and safe application of Code LLMs. With open access, meticulous evaluation, and tools to ensure responsible use, these models stand as beacons of progress while galvanizing community engagement and collaboration.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (118)
  1. Unified pre-training for program understanding and generation. In Proceedings of NAACL, 2021. URL https://aclanthology.org/2021.naacl-main.211.
  2. BigScience: a case study in the social construction of a multilingual large language model. CoRR, abs/2212.04960, 2022. doi: 10.48550/arXiv.2212.04960. URL https://doi.org/10.48550/arXiv.2212.04960.
  3. Spacerini: Plug-and-play search engines with Pyserini and Hugging Face. CoRR, abs/2302.14534, 2023. doi: 10.48550/arXiv.2302.14534. URL https://doi.org/10.48550/arXiv.2302.14534.
  4. 3:23-cv-00201 N.D. Cal. 2023.
  5. A general language assistant as a laboratory for alignment. arXiv preprint arXiv:2112.00861, 2021.
  6. Program synthesis with large language models. arXiv preprint arXiv:2108.07732, 2021.
  7. A maximum likelihood approach to continuous speech recognition. Pattern Analysis and Machine Intelligence, IEEE Transactions on, PAMI-5:179 – 190, 04 1983. doi: 10.1109/TPAMI.1983.4767370.
  8. Efficient training of language models to fill in the middle. arXiv preprint arXiv:2207.14255, 2022. doi: 10.48550/ARXIV.2207.14255. URL https://arxiv.org/abs/2207.14255.
  9. BBC. ChatGPT accessible again in Italy. https://www.bbc.com/news/technology-65431914, 2023.
  10. A framework for the evaluation of code generation models. https://github.com/bigcode-project/bigcode-evaluation-harness, December 2022.
  11. SantaCoder: don’t reach for the stars! In Deep Learning for Code Workshop (DL4C), 2023.
  12. A neural probabilistic language model. In T. Leen, T. Dietterich, and V. Tresp (eds.), Advances in Neural Information Processing Systems, volume 13. MIT Press, 2000. URL https://proceedings.neurips.cc/paper_files/paper/2000/hash/728f206c2a01bf572b5940d7d9a8fa4c-Abstract.html.
  13. Pythia: A suite for analyzing large language models across training and scaling. arXiv preprint arXiv:2304.01373, 2023.
  14. BigScience Workshop. BLOOM (revision 4ab0472), 2022. URL https://huggingface.co/bigscience/bloom.
  15. GPT-NeoX-20B: an open-source autoregressive language model. arXiv preprint arXiv:2204.06745, 2022.
  16. Burton H. Bloom. Space/time trade-offs in hash coding with allowable errors. Commun. ACM, 13(7):422–426, jul 1970. ISSN 0001-0782. doi: 10.1145/362686.362692. URL https://doi.org/10.1145/362686.362692.
  17. On the opportunities and risks of foundation models. CoRR, abs/2108.07258, 2021. URL https://arxiv.org/abs/2108.07258.
  18. Large language models in machine translation. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), pp.  858–867, Prague, Czech Republic, June 2007. Association for Computational Linguistics. URL https://aclanthology.org/D07-1090.
  19. Andrei Z. Broder. Identifying and filtering near-duplicate documents. In Annual symposium on combinatorial pattern matching, pp. 1–10. Springer, 2000.
  20. Language models are few-shot learners. arXiv preprint arXiv:2005.14165, 2020.
  21. N-gram counts and language models from the Common Crawl. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC’14), pp.  3579–3584, Reykjavik, Iceland, May 2014. European Language Resources Association (ELRA). URL http://www.lrec-conf.org/proceedings/lrec2014/pdf/1097_Paper.pdf.
  22. Matthew Butterick. This CoPilot is stupid and wants to kill me. https://matthewbutterick.com/chron/this-copilot-is-stupid-and-wants-to-kill-me.html, 2022.
  23. MultiPL-E: a scalable and polyglot approach to benchmarking neural code generation. IEEE Transactions on Software Engineering, pp.  1–17, 2023. doi: 10.1109/TSE.2023.3267446. URL https://arxiv.org/abs/2208.08227.
  24. Evaluating large language models trained on code. arXiv preprint, 2021.
  25. PaLM: scaling language modeling with pathways. CoRR, abs/2204.02311, 2022. doi: 10.48550/arXiv.2204.02311. URL https://doi.org/10.48550/arXiv.2204.02311.
  26. Training verifiers to solve math word problems. arXiv preprint arXiv:2110.14168, 2021.
  27. FlashAttention: Fast and memory-efficient exact attention with IO-awareness. In Advances in Neural Information Processing Systems, 2022.
  28. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp.  4171–4186, Minneapolis, Minnesota, June 2019. Association for Computational Linguistics. doi: 10.18653/v1/N19-1423. URL https://aclanthology.org/N19-1423.
  29. DOE 1 v. and GitHub, Inc. 4:22-cv-06823 N.D. Cal. 2022.
  30. GPTs are GPTs: An early look at the labor market impact potential of large language models. arXiv preprint arXiv:2303.10130, 2023.
  31. Euronews. Microsoft attracting users to its code-writing, generative AI software. https://www.euronews.com/next/2023/01/25/microsoft-results-ai, 2023.
  32. European Council. The general data protection regulation. https://www.consilium.europa.eu/en/policies/data-protection/data-protection-regulation/, 2018.
  33. Codebert: A pre-trained model for programming and natural languages. arXiv preprint arXiv:2002.08155, 2020. doi: 10.48550/ARXIV.2002.08155. URL https://arxiv.org/abs/2002.08155.
  34. InCoder: a generative model for code infilling and synthesis. arXiv preprint arXiv:2204.05999, 2022. doi: 10.48550/ARXIV.2204.05999. URL https://arxiv.org/abs/2204.05999.
  35. The Pile: An 800GB dataset of diverse text for language modeling. arXiv preprint arXiv:2101.00027, 2021a.
  36. A framework for few-shot language model evaluation, September 2021b. URL https://doi.org/10.5281/zenodo.5371628.
  37. PAL: Program-aided language models. arXiv preprint arXiv:2211.10435, 2022.
  38. RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language Models. In Findings of the Association for Computational Linguistics: EMNLP 2020, pp.  3356–3369, Online, November 2020. Association for Computational Linguistics. doi: 10.18653/v1/2020.findings-emnlp.301. URL https://aclanthology.org/2020.findings-emnlp.301.
  39. Scalable modified Kneser-Ney language model estimation. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pp.  690–696, Sofia, Bulgaria, August 2013. Association for Computational Linguistics. URL https://aclanthology.org/P13-2121.
  40. Foundation models and fair use. arXiv preprint arXiv:2303.15715, 2023.
  41. Measuring massive multitask language understanding. arXiv preprint arXiv:2009.03300, 2020.
  42. On the naturalness of software. In 2012 34th International Conference on Software Engineering (ICSE), pp.  837–847. IEEE, 2012.
  43. Training compute-optimal large language models. arXiv preprint arXiv:2203.15556, 2022.
  44. The curious case of neural text degeneration. In International Conference on Learning Representations, 2020. URL https://openreview.net/forum?id=rygGQyrFvH.
  45. CodeSearchNet challenge: Evaluating the state of semantic code search. arXiv preprint arXiv:1909.09436, 2019.
  46. Social biases in NLP models as barriers for persons with disabilities. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp.  5491–5501, Online, July 2020. Association for Computational Linguistics. doi: 10.18653/v1/2020.acl-main.487. URL https://aclanthology.org/2020.acl-main.487.
  47. Exploring the limits of language modeling. arXiv preprint arXiv:1602.02410, 2016.
  48. Learning and evaluating contextual embedding of source code. In Proceedings of the 37th International Conference on Machine Learning, ICML’20. JMLR.org, 2020.
  49. Scaling laws for neural language models. arXiv preprint arXiv:2001.08361, 2020.
  50. A hazard analysis framework for code synthesis large language models. arXiv preprint arXiv:2207.14157, 2022.
  51. Adam: A method for stochastic optimization. In Yoshua Bengio and Yann LeCun (eds.), 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, 2015. URL http://arxiv.org/abs/1412.6980.
  52. The Stack: 3 TB of permissively licensed source code. Preprint, 2022. URL https://arxiv.org/abs/2211.15533.
  53. Large language models are zero-shot reasoners. arXiv preprint arXiv:2205.11916, 2022.
  54. Bradley M. Kuhn. If software is my copilot, who programmed my software? https://sfconservancy.org/blog/2022/feb/03/github-copilot-copyleft-gpl/, 2022.
  55. Measuring bias in contextualized word representations. In Proceedings of the First Workshop on Gender Bias in Natural Language Processing, pp.  166–172, Florence, Italy, August 2019. Association for Computational Linguistics. doi: 10.18653/v1/W19-3823. URL https://www.aclweb.org/anthology/W19-3823.
  56. Quantifying the carbon emissions of machine learning. arXiv preprint arXiv:1910.09700, 2019.
  57. DS-1000: a natural and reliable benchmark for data science code generation. ArXiv, abs/2211.11501, 2022.
  58. Dong-Hyun Lee. Pseudo-label: The simple and efficient semi-supervised learning method for deep neural networks. In Workshop on challenges in representation learning, ICML, number 2, pp.  896, 2013.
  59. Comparing code explanations created by students and large language models, 2023.
  60. Fair learning. Tex. L. Rev., 99:743, 2020. URL https://texaslawreview.org/fair-learning/.
  61. Amanda Levendowski. How copyright law can fix artificial intelligence’s implicit bias problem. Wash. L. Rev., 93:579, 2018.
  62. Competition-level code generation with alphacode. arXiv preprint arXiv:2203.07814, 2022.
  63. Holistic evaluation of language models. arXiv preprint arXiv:2211.09110, 2022.
  64. RoBERTa: a robustly optimized BERT pretraining approach. arXiv preprint arXiv:1907.11692, 2019.
  65. Natasha Lomas. Unpicking the rules shaping generative AI. https://techcrunch.com/2023/04/13/generative-ai-gdpr-enforcement/, 2022.
  66. CodeXGLUE: A machine learning benchmark dataset for code understanding and generation. arXiv preprint arXiv:2102.04664, 2021.
  67. Marc Marone and Benjamin Van Durme. Data portraits: Recording foundation model training data. CoRR, abs/2303.03919, 2023. doi: 10.48550/arXiv.2303.03919. URL https://doi.org/10.48550/arXiv.2303.03919.
  68. On measuring social biases in sentence encoders. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp.  622–628, Minneapolis, Minnesota, June 2019. Association for Computational Linguistics. doi: 10.18653/v1/N19-1063. URL https://www.aclweb.org/anthology/N19-1063.
  69. An empirical survey of the effectiveness of debiasing techniques for pre-trained language models. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp.  1878–1898, Dublin, Ireland, May 2022. Association for Computational Linguistics. doi: 10.18653/v1/2022.acl-long.132. URL https://aclanthology.org/2022.acl-long.132.
  70. Using in-context learning to improve dialogue safety, February 2023. URL http://arxiv.org/abs/2302.00871. arXiv:2302.00871 [cs].
  71. Recurrent neural network based language model. In Takao Kobayashi, Keikichi Hirose, and Satoshi Nakamura (eds.), INTERSPEECH 2010, 11th Annual Conference of the International Speech Communication Association, Makuhari, Chiba, Japan, September 26-30, 2010, pp.  1045–1048. ISCA, 2010. URL http://www.isca-speech.org/archive/interspeech_2010/i10_1045.html.
  72. Model cards for model reporting. In danah boyd and Jamie H. Morgenstern (eds.), Proceedings of the Conference on Fairness, Accountability, and Transparency, FAT* 2019, Atlanta, GA, USA, January 29-31, 2019, pp.  220–229. ACM, 2019. doi: 10.1145/3287560.3287596. URL https://doi.org/10.1145/3287560.3287596.
  73. Measuring data. CoRR, abs/2212.05129, 2022. doi: 10.48550/arXiv.2212.05129. URL https://doi.org/10.48550/arXiv.2212.05129.
  74. huggingface/tokenizers: Rust 0.13.2, November 2022. URL https://doi.org/10.5281/zenodo.7298413.
  75. Crosslingual generalization through multitask finetuning. arXiv preprint arXiv:2211.01786, 2022.
  76. StereoSet: Measuring stereotypical bias in pretrained language models. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp.  5356–5371, Online, August 2021. Association for Computational Linguistics. doi: 10.18653/v1/2021.acl-long.416. URL https://aclanthology.org/2021.acl-long.416.
  77. CrowS-Pairs: a challenge dataset for measuring social biases in masked language models. arXiv:2010.00133 [cs], September 2020. URL http://arxiv.org/abs/2010.00133. arXiv: 2010.00133.
  78. CodeGen: an open large language model for code with multi-turn program synthesis. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=iaYcJKpY2B_.
  79. In-context learning and induction heads. Transformer Circuits Thread, 2022. https://transformer-circuits.pub/2022/in-context-learning-and-induction-heads/index.html.
  80. OpenAI. GPT-4 technical report. arXiv preprint arXiv:2009.03300, 2023a.
  81. OpenAI. GPT-4 system card. https://cdn.openai.com/papers/gpt-4-system-card.pdf, 2023b.
  82. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pp.  311–318, Philadelphia, Pennsylvania, USA, July 2002. Association for Computational Linguistics. doi: 10.3115/1073083.1073135. URL https://aclanthology.org/P02-1040.
  83. Asleep at the keyboard? Assessing the security of GitHub Copilot’s code contributions. In IEEE Symposium on Security and Privacy, San Francisco, CA, 2022. URL https://arxiv.org/abs/2108.09293.
  84. Red teaming language models with language models. arXiv preprint arXiv:2202.03286, 2022.
  85. The ROOTS search tool: Data transparency for LLMs. CoRR, abs/2302.14035, 2023. doi: 10.48550/arXiv.2302.14035. URL https://doi.org/10.48550/arXiv.2302.14035.
  86. TypeWriter: Neural Type Prediction with Search-Based Validation. In ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 2020. doi: 10.1145/3368089.3409715.
  87. Language models are unsupervised multitask learners. OpenAI blog, 1(8):9, 2019.
  88. Scaling language models: Methods, analysis & insights from training Gopher. arXiv preprint arXiv:2112.11446, 2021.
  89. Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research, 21(1):5485–5551, 2020.
  90. CoQA: A conversational question answering challenge. Transactions of the Association for Computational Linguistics, 7:249–266, 2019. doi: 10.1162/tacl_a_00266. URL https://aclanthology.org/Q19-1016.
  91. Copyright implications of the use of code repositories to train a machine learning model. https://www.fsf.org/licensing/copilot/copyright-implications-of-the-use-of-code-repositories-to-train-a-machine-learning-model, 2022.
  92. Lost at C: A user study on the security implications of large language model code assistants, 2023.
  93. BLOOM: a 176B-parameter open-access multilingual language model. arXiv preprint arXiv:2211.05100, 2022.
  94. Noam Shazeer. Fast transformer decoding: One write-head is all you need. CoRR, abs/1911.02150, 2019. URL http://arxiv.org/abs/1911.02150.
  95. Arfon Smith. Kernel description. https://github.blog/2016-06-29-making-open-source-data-more-available/, 2016.
  96. Using DeepSpeed and Megatron to train Megatron-Turing NLG 530B, a large-scale generative language model. arXiv preprint arXiv:2201.11990, 2022.
  97. Irene Solaiman. The gradient of generative AI release: Methods and considerations. arXiv preprint arXiv:2302.04844, 2023.
  98. Unifying language learning paradigms. arXiv preprint arXiv:2205.05131, 2022.
  99. Clive Thompson. How an ai became my code-writing genie, Mar 2022. URL https://www.wired.com/story/openai-copilot-autocomplete-for-code/.
  100. Lamda: Language models for dialog applications. arXiv preprint arXiv:2201.08239, 2022.
  101. Choose your weapon: Survival strategies for depressed AI academics. arXiv preprint arXiv:2304.06035, 2023.
  102. LLaMA: open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023.
  103. Attention is all you need. In Advances in Neural Information Processing Systems, pp. 5998–6008, 2017.
  104. Learning from the worst: Dynamically generated datasets to improve online hate detection. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp.  1667–1682, Online, August 2021. Association for Computational Linguistics. doi: 10.18653/v1/2021.acl-long.132. URL https://aclanthology.org/2021.acl-long.132.
  105. Poisoning language models during instruction tuning, 2023.
  106. GPT-J-6B: a 6 billion parameter autoregressive language model, 2021.
  107. CodeT5: Identifier-aware unified pre-trained encoder-decoder models for code understanding and generation. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp.  8696–8708, Online and Punta Cana, Dominican Republic, November 2021. Association for Computational Linguistics. doi: 10.18653/v1/2021.emnlp-main.685. URL https://aclanthology.org/2021.emnlp-main.685.
  108. Execution-based evaluation for open-domain code generation. arXiv preprint arXiv:2212.10481, 2022.
  109. Chain of thought prompting elicits reasoning in large language models. In Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho (eds.), Advances in Neural Information Processing Systems, 2022. URL https://openreview.net/forum?id=_VjQlMeSB_J.
  110. Open science is a research accelerator. Nature Chemistry, 3(10):745–748, October 2011. ISSN 1755-4349. doi: 10.1038/nchem.1149.
  111. Transformers: State-of-the-art natural language processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pp.  38–45, Online, October 2020. Association for Computational Linguistics. URL https://www.aclweb.org/anthology/2020.emnlp-demos.6.
  112. World Economic Forum. Future of jobs report. https://www3.weforum.org/docs/WEF_Future_of_Jobs_2023.pdf, 2023.
  113. A systematic evaluation of large language models of code. In Proceedings of the 6th ACM SIGPLAN International Symposium on Machine Programming, MAPS 2022, pp.  1–10, New York, NY, USA, 2022. Association for Computing Machinery. ISBN 9781450392730. doi: 10.1145/3520312.3534862. URL https://doi.org/10.1145/3520312.3534862.
  114. Do machine learning models produce TypeScript types that type check? In European Conference on Object-Oriented Programming (ECOOP), 2023.
  115. GLM-130B: an open bilingual pre-trained model. arXiv preprint arXiv:2210.02414, 2022.
  116. OPT: open pre-trained transformer language models. arXiv preprint arXiv:2205.01068, 2022.
  117. CodeGeeX: A pre-trained model for code generation with multilingual evaluations on HumanEval-X. arXiv preprint arXiv:2303.17568, 2023. doi: 10.48550/arXiv.2303.17568.
  118. Least-to-most prompting enables complex reasoning in large language models. arXiv preprint arXiv:2205.10625, 2022.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (67)
  1. Raymond Li (24 papers)
  2. Loubna Ben Allal (12 papers)
  3. Yangtian Zi (6 papers)
  4. Niklas Muennighoff (56 papers)
  5. Denis Kocetkov (5 papers)
  6. Chenghao Mou (7 papers)
  7. Marc Marone (11 papers)
  8. Christopher Akiki (15 papers)
  9. Jia Li (380 papers)
  10. Jenny Chim (12 papers)
  11. Qian Liu (252 papers)
  12. Evgenii Zheltonozhskii (22 papers)
  13. Terry Yue Zhuo (32 papers)
  14. Thomas Wang (17 papers)
  15. Olivier Dehaene (4 papers)
  16. Mishig Davaadorj (1 paper)
  17. Joel Lamy-Poirier (9 papers)
  18. Oleh Shliazhko (4 papers)
  19. Nicolas Gontier (8 papers)
  20. Nicholas Meade (12 papers)
Citations (563)
Youtube Logo Streamline Icon: https://streamlinehq.com