Scaling Granite Code Models to 128K Context (2407.13739v1)
Abstract: This paper introduces long-context Granite code models that support effective context windows of up to 128K tokens. Our solution for scaling context length of Granite 3B/8B code models from 2K/4K to 128K consists of a light-weight continual pretraining by gradually increasing its RoPE base frequency with repository-level file packing and length-upsampled long-context data. Additionally, we also release instruction-tuned models with long-context support which are derived by further finetuning the long context base models on a mix of permissively licensed short and long-context instruction-response pairs. While comparing to the original short-context Granite code models, our long-context models achieve significant improvements on long-context tasks without any noticeable performance degradation on regular code completion benchmarks (e.g., HumanEval). We release all our long-context Granite code models under an Apache 2.0 license for both research and commercial use.
- Matt Stallone (4 papers)
- Vaibhav Saxena (11 papers)
- Leonid Karlinsky (79 papers)
- Bridget McGinn (1 paper)
- Tim Bula (1 paper)
- Mayank Mishra (38 papers)
- Adriana Meza Soria (5 papers)
- Gaoyuan Zhang (18 papers)
- Aditya Prasad (10 papers)
- Yikang Shen (62 papers)
- Saptha Surendran (3 papers)
- Shanmukha Guttula (3 papers)
- Hima Patel (18 papers)
- Parameswaran Selvam (2 papers)
- Xuan-Hong Dang (11 papers)
- Yan Koyfman (4 papers)
- Atin Sood (4 papers)
- Rogerio Feris (105 papers)
- Nirmit Desai (11 papers)
- David D. Cox (12 papers)