YUAN 2.0: A Large Language Model with Localized Filtering-based Attention (2311.15786v4)

Published 27 Nov 2023 in cs.CL, cs.AI, and cs.HC

Abstract: In this work, we develop and release Yuan 2.0, a series of LLMs with parameters ranging from 2.1 billion to 102.6 billion. The Localized Filtering-based Attention (LFA) is introduced to incorporate prior knowledge of local dependencies of natural language into Attention. A data filtering and generating system is presented to build pre-training and fine-tuning dataset in high quality. A distributed training method with non-uniform pipeline parallel, data parallel, and optimizer parallel is proposed, which greatly reduces the bandwidth requirements of intra-node communication, and achieves good performance in large-scale distributed training. Yuan 2.0 models display impressive ability in code generation, math problem-solving, and chatting compared with existing models. The latest version of YUAN 2.0, including model weights and source code, is accessible at Github.

Citations (6)

View on Semantic Scholar

Summary

The paper introduces Localized Filtering-based Attention to leverage local token dependencies for improved language understanding.
It employs a novel non-uniform pipeline with data and optimizer parallelism to reduce training bandwidth and cost.
The model demonstrates strong performance in code generation, math problem solving, and conversational tasks, while its open-source release fosters community innovation.

The development of LLMs has advanced significantly, with models like GPT-3 and ChatGPT gaining widespread attention for their remarkable language-generation capabilities. Yuan 2.0 is a continuation of this trend, providing a series of LLMs with parameters ranging from 2.1 billion to 102.6 billion. One of the key innovations in Yuan 2.0 is the introduction of Localized Filtering-based Attention (LFA), which integrates prior knowledge of local dependencies in natural language processing into Attention mechanisms. Essentially, LFA helps the model to better capture the nuances of language by giving more significance to the connection between neighboring words or tokens, which mirrors human language understanding more closely.

Yuan 2.0 also emphasizes the importance of quality data, attempting to improve model performance not only through size but through the use of well-curated datasets. Another focus of the Yuan 2.0 development was optimizing the training process. The model uses a novel combination of non-uniform pipeline, data, and optimizer parallelism to enhance distributed training efficiency. This new training method is significant because it reduces the bandwidth requirements which are usually high for large-scale LLMs, enabling faster and more cost-effective training without the limitations of previous "3D" parallel paradigms involving tensor, pipeline, and data parallelism.

In terms of practical applications, Yuan 2.0 has demonstrated exceptional abilities that extend across code generation, mathematical problem-solving, and conversational tasks, with performance metrics suggesting advantages over several existing models. In code generation, for instance, Yuan 2.0 has been evaluated using benchmarks like HumanEval and shown to provide reliable outcomes. Its impressive performance is underscored by its ability to generate helpful unit tests to validate code functionality, illustrating the model's grasp on not just static information but interactive and procedural tasks as well.

Furthermore, Yuan 2.0 scored well on math problem-solving benchmarks like GSM8K and AGIEval’s Gaokao-Math tasks, where it demonstrated an ability to handle complicated reasoning. Yuan 2.0 is designed to address the nuances of language and cater to the intricacies of human communication, which is illustrated by its solid results in the Truthful QA benchmark—a test that measures a model's capability to give reliable answers over various subjects.

The release of Yuan 2.0 includes not only the model weights but also the source code, emphasizing openness and collaboration with both the research community and commercial entities. This move aligns with a trend of transparency and resource sharing within the AI community, inviting others to improve upon and utilize the Yuan 2.0 model to its fullest potential.

In summary, Yuan 2.0 presents an advanced LLM that incorporates novel attention mechanisms and optimized distributed training methods, ensuring effective and efficient scalability while demonstrating strong performance across diverse language processing tasks. The commitment of its creators to open-source principles further contributes to the model's potential to catalyze innovation and development within the field of AI.

PDF Markdown

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

Generate Now

Related Papers

Reddit

YUAN-2.0-102B, with code and weights. Scores between ChatGPT and GPT-4 on various benchmarks [R] (15 points, 2 comments)
YUAN-2.0-102B, with code and weights. Scores between ChatGPT and GPT-4 on various benchmarks (8 points, 2 comments)