Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

HiRoPE: Length Extrapolation for Code Models Using Hierarchical Position (2403.19115v2)

Published 28 Mar 2024 in cs.SE

Abstract: Addressing the limitation of context length in LLMs for code-related tasks is the primary focus of this paper. Existing LLMs are constrained by their pre-trained context lengths, leading to performance issues in handling long complex code sequences. Inspired by how human programmers navigate code, we introduce Hierarchical Rotary Position Embedding (HiRoPE), a novel approach that enhances the traditional rotary position embedding into a hierarchical format based on the hierarchical structure of source code. HiRoPE offers easy integration into existing LLMs without extra training costs. Our method is extensively evaluated with various LLMs, demonstrating stable performance in tasks such as LLMing and long code completion. We also introduce a new long code understanding task with real-world code projects, in hopes of promoting further development in this code-related field. Theoretically and experimentally, we find that HiRoPE also addresses the out-of-distribution issue in position encoding. Our HiRoPE significantly expands the context length capabilities of LLMs, enabling inference at lengths exponentially greater than the training length.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (29)
  1. Learning to represent programs with graphs. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings.
  2. Longbench: A bilingual, multitask benchmark for long context understanding. CoRR, abs/2308.14508.
  3. bloc97. 2023. Ntk-aware scaled rope allows llama models to have extended (8k+) context size without any fine-tuning and minimal perplexity degradation. https://www.reddit.com/r/LocalLLaMA/comments/14lz7j5/ntkaware_scaled_rope_allows_llama_models_to_have/.
  4. tree-sitter/tree-sitter: v0.21.0-pre-release-1.
  5. CLEX: continuous length extrapolation for large language models. CoRR, abs/2310.16450.
  6. Extending context window of large language models via positional interpolation. CoRR, abs/2306.15595.
  7. Fortify the shortest stave in attention: Enhancing context awareness of large language models for effective tool use. CoRR, abs/2312.04455.
  8. Vicuna: An open-source chatbot impressing gpt-4 with 90%* chatgpt quality. See https://vicuna. lmsys. org (accessed 14 April 2023).
  9. CodeParrot. 2022. https://huggingface.co/codeparrot.
  10. Longnet: Scaling transformers to 1, 000, 000, 000 tokens. CoRR, abs/2307.02486.
  11. gkamradt. 2023. Needle in a haystack - pressure testing llms. https://github.com/gkamradt/LLMTest_NeedleInAHaystack/tree/main.
  12. GPT-3.5. 2023. https://platform.openai.com/docs/models/gpt-3-5.
  13. Longcoder: A long-range pre-trained language model for code completion. In International Conference on Machine Learning, ICML 2023, 23-29 July 2023, Honolulu, Hawaii, USA, pages 12098–12107.
  14. Lm-infinite: Simple on-the-fly length generalization for large language models. CoRR, abs/2308.16137.
  15. LLM maybe longlm: Self-extend LLM context window without tuning. CoRR, abs/2401.01325.
  16. Repobench: Benchmarking repository-level code auto-completion systems. CoRR, abs/2306.03091.
  17. Scaling laws of rope-based extrapolation. CoRR, abs/2310.05209.
  18. Giraffe: Adventures in expanding context lengths in llms. CoRR, abs/2308.10882.
  19. Yarn: Efficient context window extension of large language models. CoRR, abs/2309.00071.
  20. Code llama: Open foundation models for code. arXiv preprint arXiv:2308.12950.
  21. Jianlin Su. 2023. Rectified rotary position embeddings. https://github.com/bojone/rerope.
  22. Roformer: Enhanced transformer with rotary position embedding. Neurocomputing, 568:127063.
  23. Llama: Open and efficient foundation language models. CoRR, abs/2302.13971.
  24. Llama 2: Open foundation and fine-tuned chat models. CoRR, abs/2307.09288.
  25. Sheared llama: Accelerating language model pre-training via structured pruning. CoRR, abs/2310.06694.
  26. Efficient streaming language models with attention sinks. CoRR, abs/2309.17453.
  27. Effective long-context scaling of foundation models. CoRR, abs/2309.16039.
  28. Implant global and local hierarchy information to sequence based code representation models. In 31st IEEE/ACM International Conference on Program Comprehension, ICPC 2023, Melbourne, Australia, May 15-16, 2023, pages 157–168.
  29. Tinyllama: An open-source small language model. CoRR, abs/2401.02385.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Kechi Zhang (22 papers)
  2. Ge Li (213 papers)
  3. Huangzhao Zhang (9 papers)
  4. Zhi Jin (160 papers)
Citations (1)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com