Nyonic Technical Report

Published 24 Apr 2024 in cs.CL | (2404.15702v1)

Abstract: This report details the development and key achievements of our latest LLM designed for custom LLMs. The advancements introduced include a novel Online Data Scheduler that supports flexible training data adjustments and curriculum learning. The model's architecture is fortified with state-of-the-art techniques such as Rotary Positional Embeddings, QK-LayerNorm, and a specially crafted multilingual tokenizer to enhance stability and performance. Moreover, our robust training framework incorporates advanced monitoring and rapid recovery features to ensure optimal efficiency. Our Wonton 7B model has demonstrated competitive performance on a range of multilingual and English benchmarks. Future developments will prioritize narrowing the performance gap with more extensively trained models, thereby enhancing the model's real-world efficacy and adaptability.GitHub: \url{https://github.com/nyonicai/nyonic-public}

Abstract PDF HTML Upgrade to Chat

References (35)

Summary

The paper introduces the Wonton 7B model with a novel Online Data Scheduler that dynamically adjusts training data for efficient curriculum learning.
It employs advanced tokenization with a tailored multilingual vocabulary and transformer enhancements like Rotary Positional Embeddings and QK-LayerNorm to improve model stability.
Robust experimental results highlight its competitive performance on multilingual benchmarks and scalable deployment using cutting-edge infrastructure.

Development and Evaluation of a Novel LLM Architecture: The Wonton 7B

Introduction

In this report, the development of the Wonton 7B model is detailed, highlighting significant improvements in the areas of data scheduling, tokenization, model architecture, and deployment strategies. This model integrates advanced components like an Online Data Scheduler and utilizes cutting-edge techniques such as Rotary Positional Embeddings and QK-LayerNorm. The model's performance is benchmarked on a variety of tasks, demonstrating its efficacy in multilingual and English contexts.

Data Scheduling Innovations

The Wonton 7B leverages a novel Online Data Scheduler to dynamically adjust training data, supporting an efficient and flexible training process. Key benefits of this scheduler include:

Real-time adjustments to training data ratios based on immediate model feedback.
Curriculum learning capabilities that focus training efforts on more challenging or relevant data, optimizing computational resources.
Efficient data loading and processing achieved through an integrated multiplexer and content stuffing approach, allowing seamless mixtures of data from various sources.

Advanced Tokenization Techniques

The model employs a multilingual tokenizer using byte-pair encoding (BPE) with a tailored vocabulary of 139,000 tokens. This tokenizer:

Efficiently handles diverse data sources including code and multilingual text.
Optimizes model performance through well-tuned vocabulary capacities, balancing transformer computational needs and meaning extraction efficiency.

Model Architecture and Training

Wonton 7B constructs upon the transformer architecture, benefiting from:

Rotary Positional Embeddings (RoPE) which offer a nuanced approach to incorporating sequence position information.
QK-LayerNorm applied in the pre-attention stages to enhance training stability by normalizing attention logits.
Max-z Loss supplementation to maintain controlled logit values during training, promoting robust and stable learning outcomes.

The model's training utilized the AdamW optimizer with specific attention to learning rate adjustments and weight decay settings conducive to optimal performance.

Infrastructure and Deployment

Utilizing the combined strengths of PyTorch, DeepSpeed, and NVIDIA's technologies like FlashAttention and TensorRT, the Wonton 7B model achieves robust training throughput and inference efficiency. Deployment on Alibaba's Aliyun EAS ensures scalable and secure model serving capabilities.

Experimental Validation

The model was rigorously tested against benchmarks like Lambada, WinoGrande, and various multilingual tasks from XNLI and Belebele. Wonton 7B displayed competitive performance across these, suggesting effective learning strategies and architecture choices, though still trailing behind more extensively trained models like Mistral 7B in some complex reasoning tasks.

Conclusion and Future Work

The Wonton 7B represents a thoughtful integration of novel AI techniques and infrastructure decisions, yielding a high-performance model with strong multilingual capabilities. Future work will focus on closing the performance gap identified in specialized tasks and extending the model’s application range to additional languages and domains. The current findings and model assets are made available for community use and further development, promising a continued enhancement of LLM capabilities.

Markdown