Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Scaling Granite Code Models to 128K Context (2407.13739v1)

Published 18 Jul 2024 in cs.AI, cs.CL, and cs.SE

Abstract: This paper introduces long-context Granite code models that support effective context windows of up to 128K tokens. Our solution for scaling context length of Granite 3B/8B code models from 2K/4K to 128K consists of a light-weight continual pretraining by gradually increasing its RoPE base frequency with repository-level file packing and length-upsampled long-context data. Additionally, we also release instruction-tuned models with long-context support which are derived by further finetuning the long context base models on a mix of permissively licensed short and long-context instruction-response pairs. While comparing to the original short-context Granite code models, our long-context models achieve significant improvements on long-context tasks without any noticeable performance degradation on regular code completion benchmarks (e.g., HumanEval). We release all our long-context Granite code models under an Apache 2.0 license for both research and commercial use.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (22)
  1. Matt Stallone (4 papers)
  2. Vaibhav Saxena (11 papers)
  3. Leonid Karlinsky (79 papers)
  4. Bridget McGinn (1 paper)
  5. Tim Bula (1 paper)
  6. Mayank Mishra (38 papers)
  7. Adriana Meza Soria (5 papers)
  8. Gaoyuan Zhang (18 papers)
  9. Aditya Prasad (10 papers)
  10. Yikang Shen (62 papers)
  11. Saptha Surendran (3 papers)
  12. Shanmukha Guttula (3 papers)
  13. Hima Patel (18 papers)
  14. Parameswaran Selvam (2 papers)
  15. Xuan-Hong Dang (11 papers)
  16. Yan Koyfman (4 papers)
  17. Atin Sood (4 papers)
  18. Rogerio Feris (105 papers)
  19. Nirmit Desai (11 papers)
  20. David D. Cox (12 papers)
Citations (3)

HackerNews