Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent (2411.02265v3)

Published 4 Nov 2024 in cs.CL and cs.AI

Abstract: In this paper, we introduce Hunyuan-Large, which is currently the largest open-source Transformer-based mixture of experts model, with a total of 389 billion parameters and 52 billion activation parameters, capable of handling up to 256K tokens. We conduct a thorough evaluation of Hunyuan-Large's superior performance across various benchmarks including language understanding and generation, logical reasoning, mathematical problem-solving, coding, long-context, and aggregated tasks, where it outperforms LLama3.1-70B and exhibits comparable performance when compared to the significantly larger LLama3.1-405B model. Key practice of Hunyuan-Large include large-scale synthetic data that is orders larger than in previous literature, a mixed expert routing strategy, a key-value cache compression technique, and an expert-specific learning rate strategy. Additionally, we also investigate the scaling laws and learning rate schedule of mixture of experts models, providing valuable insights and guidances for future model development and optimization. The code and checkpoints of Hunyuan-Large are released to facilitate future innovations and applications. Codes: https://github.com/Tencent/Hunyuan-Large Models: https://huggingface.co/tencent/Tencent-Hunyuan-Large

References (62)

Citations (6)

View on Semantic Scholar

Summary

The paper presents Hunyuan-Large, an open-source MoE model with 52B activated parameters that achieves state-of-the-art performance across multiple benchmarks.
It employs advanced techniques such as synthetic data integration, dynamic routing, and expert-specific learning rates to optimize scalability and efficiency.
Experimental results demonstrate that Hunyuan-Large surpasses even larger models in long-context processing, commonsense understanding, and complex reasoning tasks.

An Expert Review of "Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent"

The paper introduces Hunyuan-Large, an advanced open-source Transformer-based mixture of experts (MoE) model, developed by the Tencent Hunyuan Team. This model has been designed with a substantial scale of 389 billion total parameters and 52 billion activated parameters, placing it among the largest models available in the open-source community. Its architectural design allows it to process sequences of up to 256,000 tokens, outperforming models such as LLama3.1-70B in several key benchmarks, and equating LLama3.1-405B in terms of effectiveness.

Key Attributes and Achievements

Hunyuan-Large's competence spans multiple domains, including NLP, mathematical reasoning, and long-context tasks. Evaluations reportedly show superior performance over both dense and MoE models of comparable or larger parameter sizes. Specifically, the model demonstrates robust capabilities in commonsense understanding, mathematics, and coding, notably outperforming baselines in datasets like MMLU, MATH, and CMMLU.

Technical innovations play a crucial role in the model's performance gains. These include the incorporation of large-scale synthetic data, advanced routing strategies, key-value cache compression, and expert-specific learning rate strategies. These techniques contribute to a balanced load across experts, optimized training schedules, and controlled computational costs, all of which facilitate the model's scalability and efficiency.

Experimental Framework

The research thoroughly explores the scaling laws of MoE models, drawing insights that guide the strategic planning of model dimensions and training requirements. This understanding leads to well-informed decisions in parameter configurations and training schedules, ensuring optimal performance aligned with practical computational constraints.

In pre-training, the authors integrate a diverse and voluminous dataset. The model utilizes both natural and synthetic data sources, incorporating nearly 1.5 trillion high-quality synthetic tokens to bolster learning versatility and generalization prowess.

Post-training involves a two-phase instruction alignment process. Supervised fine-tuning (SFT) on over a million high-quality data samples refines task-specific capabilities. Reinforcement learning from human feedback (RLHF) further aligns model outputs with human preferences, employing Direct Preference Optimization (DPO) to optimize the alignment process.

Implications and Future Directions

The implications of Hunyuan-Large are substantial for both practical applications and theoretical advancements in AI model design. The release of such a large-scale, high-performance MoE model provides a valuable asset for the AI research community, opening opportunities for further exploration and innovation in handling expansive contextual depths and intricate reasoning tasks.

The paper suggests that the synergy between high-quality synthetic data and innovative model optimization techniques can significantly enhance the MoE models' functional quotas. For future models, deeper investigations into specific parameter regimes and scheduling could yield additional gains, potentially simplifying the path towards achieving more robust model generalization and personalization.

In conclusion, Hunyuan-Large stands as a demonstrative achievement in MoE model architecture and training strategy, setting a substantial benchmark for future explorations within the LLM domain. The release of its code and checkpoints positions it as an instrument for ongoing discourse and experimentation within the AI research landscape.

PDF Markdown

Related Papers

GitHub

GitHub - Tencent/Tencent-Hunyuan-Large (25 stars)
GitHub - Tencent/Tencent-Hunyuan-Large (25 stars)

Tweets

https://twitter.com/AiXsatoshi/status/1853713181820203320

https://twitter.com/anna_kazlauskas/status/1859041007163371915

https://twitter.com/AdinaYakup/status/1853722321233539261

https://twitter.com/IAMJBDEL/status/1853939173566394543

https://twitter.com/fly51fly/status/1853919141314597173

https://twitter.com/ZeyiYang/status/1854274347583652114