Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

InternLM2 Technical Report (2403.17297v1)

Published 26 Mar 2024 in cs.CL and cs.AI
InternLM2 Technical Report

Abstract: The evolution of LLMs like ChatGPT and GPT-4 has sparked discussions on the advent of AGI. However, replicating such advancements in open-source models has been challenging. This paper introduces InternLM2, an open-source LLM that outperforms its predecessors in comprehensive evaluations across 6 dimensions and 30 benchmarks, long-context modeling, and open-ended subjective evaluations through innovative pre-training and optimization techniques. The pre-training process of InternLM2 is meticulously detailed, highlighting the preparation of diverse data types including text, code, and long-context data. InternLM2 efficiently captures long-term dependencies, initially trained on 4k tokens before advancing to 32k tokens in pre-training and fine-tuning stages, exhibiting remarkable performance on the 200k ``Needle-in-a-Haystack" test. InternLM2 is further aligned using Supervised Fine-Tuning (SFT) and a novel Conditional Online Reinforcement Learning from Human Feedback (COOL RLHF) strategy that addresses conflicting human preferences and reward hacking. By releasing InternLM2 models in different training stages and model sizes, we provide the community with insights into the model's evolution.

InternLM2: A Comprehensive Overview on Pre-Training and Alignment Strategies

Pre-training Process and Data Preparation

The development of InternLM2, an open-source LLM, involves meticulous pre-training on a diverse mixture of text, code, and long-context data. The pre-training corpus is significant, amassing trillions of tokens from various sources including web pages, academic papers, and publicly available text resources. Special attention is paid to the quality of pre-training data, ensuring that it is high-quality, relevant, and encompasses a wide knowledge base.

Text Data

Text data is collected from multiple sources and rigorously processed through steps including standardization, deduplication, and safety filtering to ensure not only the diversity but also the safety and quality of the pre-training corpus.

Code Data

Given the increasing importance of programming and coding skills in LLMs, InternLM2's pre-training data notably includes a significant amount of code data. This corpus is carefully curated to cover a wide range of programming languages and domains, enhancing the model's coding capabilities.

Long Context Data

InternLM2 stands out for its effective incorporation of long-context data during pre-training. This innovative step enables the model to efficiently handle long-context scenarios, vastly expanding its application potentials. The long-context data preparation involves additional filtering and quality checks to ensure its relevance and utility in training.

Innovative Pre-training and Optimization Techniques

InternLM2 pre-training consists of three distinct phases, focusing on models that efficiently capture long-term dependencies. The innovative Group Query Attention (GQA) mechanism is introduced to decrease memory requirements during inference, making long-sequence processing more feasible.

Conditional Online RLHF and Alignment Strategies

InternLM2's alignment phase employs a novel Conditional Online Reinforcement Learning from Human Feedback (COOL RLHF) strategy. This involves the use of a conditional reward model to harmonize conflicting human preferences and multi-round RLHF to address emergent reward hacking behaviors. The conditional reward model dynamically adjusts its priorities based on the given conditions, maintaining consistent performance across varied tasks.

Comprehensive Evaluation and Analysis

InternLM2 is evaluated across several benchmarks covering a wide array of tasks and capabilities including comprehensive examinations, knowledge tasks, coding problems, reasoning, mathematics, and long-context modeling. InternLM2 exhibits strong numerical results and significant improvements in performance post alignment training, demonstrating its effectiveness in aligning with human preferences and extending its utility in real-world applications.

InternLM2's performance in coding tasks, specifically in Python and multiple programming languages, showcases its robust coding capabilities. Similarly, in long-context modeling tasks, InternLM2 demonstrates exceptional performance, marking it as a versatile model capable of handling intricate tasks requiring extensive contextual understanding.

Implications and Future Developments

InternLM2's comprehensive development strategy, focusing on diverse pre-training data, innovative optimization techniques, and strategic alignment training, outlines a promising approach to advancing LLM capabilities. The release of pre-training checkpoints offers the community valuable insights into the evolution of LLMs. Looking ahead, the continual refinement of alignment strategies and expansion of pre-training data can further enhance LLMs' effectiveness, broadening their applicability across numerous domains.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (100)
  1. Zheng Cai (157 papers)
  2. Maosong Cao (9 papers)
  3. Haojiong Chen (1 paper)
  4. Kai Chen (512 papers)
  5. Keyu Chen (76 papers)
  6. Xin Chen (456 papers)
  7. Xun Chen (166 papers)
  8. Zehui Chen (41 papers)
  9. Zhi Chen (235 papers)
  10. Pei Chu (8 papers)
  11. Xiaoyi Dong (73 papers)
  12. Haodong Duan (55 papers)
  13. Qi Fan (30 papers)
  14. Zhaoye Fei (15 papers)
  15. Yang Gao (761 papers)
  16. Jiaye Ge (4 papers)
  17. Chenya Gu (3 papers)
  18. Yuzhe Gu (10 papers)
  19. Tao Gui (127 papers)
  20. Aijia Guo (1 paper)
Citations (121)
Youtube Logo Streamline Icon: https://streamlinehq.com

HackerNews

  1. InternLM2 (135 points, 23 comments)
Reddit Logo Streamline Icon: https://streamlinehq.com

Reddit

  1. InternLM2 (0 points, 1 comment)