Fully Open Source Moxin-7B Technical Report (2412.06845v2)

Published 8 Dec 2024 in cs.CL, cs.AI, and cs.LG

Abstract: Recently, LLMs have undergone a significant transformation, marked by a rapid rise in both their popularity and capabilities. Leading this evolution are proprietary LLMs like GPT-4 and GPT-o1, which have captured widespread attention in the AI community due to their remarkable performance and versatility. Simultaneously, open-source LLMs, such as LLaMA and Mistral, have made great contributions to the ever-increasing popularity of LLMs due to the ease to customize and deploy the models across diverse applications. Although open-source LLMs present unprecedented opportunities for innovation and research, the commercialization of LLMs has raised concerns about transparency, reproducibility, and safety. Many open-source LLMs fail to meet fundamental transparency requirements by withholding essential components like training code and data, and some use restrictive licenses whilst claiming to be "open-source," which may hinder further innovations on LLMs. To mitigate this issue, we introduce Moxin 7B, a fully open-source LLM developed in accordance with the Model Openness Framework (MOF), a ranked classification system that evaluates AI models based on model completeness and openness, adhering to principles of open science, open source, open data, and open access. Our model achieves the highest MOF classification level of "open science" through the comprehensive release of pre-training code and configurations, training and fine-tuning datasets, and intermediate and final checkpoints. Experiments show that our model achieves superior performance in zero-shot evaluation compared with popular 7B models and performs competitively in few-shot evaluation.

PDF HTML Abstract

Analyzing the Comprehensive Openness of Moxin-LLM: Technical Innovations and Implications

The technical report on Moxin-LLM presents the development and implications of Moxin 7B, a fully open-source LLM designed to adhere to the Model Openness Framework (MOF). The MOF is critical in promoting transparency, reproducibility, and complete access to model components, such as training datasets and code, which have often been restricted in some open-source models. The paper's pivotal focus is on demonstrating that a commitment to openness does not necessarily compromise model performance, as evidenced by the robust results of Moxin 7B.

Overview of Moxin 7B Development

Moxin 7B is constructed by extending the Mistral model architecture, leveraging techniques such as Grouped-Query Attention (GQA) and Sliding Window Attention (SWA) to optimize both performance and inference speed. The architecture is extended to 36 blocks, incorporating significant innovations in attention mechanisms that allow for efficient handling of long sequences.

The data preparation for Moxin 7B meticulously curates a mix from open-source datasets like SlimPajama and DCLM-BASELINE, while addressing common issues such as duplication and quality filtering using advanced methods including MinHashLSH for deduplication. This level of data curation ensures high-quality inputs during model training, thereby enhancing performance on a broad array of language processing tasks.

Performance Evaluation

Moxin 7B's performance was assessed against existing models such as Mistral-7B, LLaMA 2-7B, and Gemma-7B, using benchmarks like AI2 Reasoning Challenge, HellaSwag, MMLU, and others, across zero-shot and few-shot evaluations.

Zero-Shot Evaluation: Moxin-7B-finetuned achieved superior performance in complex reasoning tasks, notably PIQA, with its performance increasing from 78.07% to 82.24% compared to its base model, surpassing also other 7B category models.
Few-Shot Evaluation: The model demonstrated competitive scores, outperforming several state-of-the-art benchmarks due to its effective fine-tuning process, evidencing the training's impact on its few-shot learning capabilities.

The Moxin-7B-chat, aligned via supervised fine-tuning, also showed commendable results in the MTBench, scoring competitively against other models, thus reinforcing its utility as an interactive AI assistant.

Implications and Future Directions

The transparent development of Moxin 7B underscores a potential paradigm shift within the open-source LLM community. By fully disclosing training methodologies, datasets, and model configurations, Moxin 7B sets a precedent for enhancing collaborative research and innovation. This comprehensive openness fosters an inclusive and sustainable AI research environment, facilitating reproducibility and allowing researchers worldwide to build on robust model baselines.

The use of MOF as a guideline appears critical in combating so-called "openwashing" practices, ensuring that models labeled as open-source truly adhere to open science principles. The alignment of Moxin 7B with these standards propels further discourse on encouraging more AI entities to embrace this ethos.

In future directions, the research presents avenues for improving LLMs with further enhancements in training data quality and evaluating model alignment to diverse linguistic and application-specific contexts. Extending these advancements could significantly impact the practical utility of open-source LLMs in various sectors, from industrial applications to academic research.

In conclusion, the Moxin-LLM technical report illuminates the significance of full transparency in the development of AI models, supported by solid performance across language processing benchmarks. It posits an optimistic future for AI research characterized by cooperative development, accessibility, and the stimulating promise of open shared innovation.

PDF Markdown Bookmark Chat (Pro)

Authors (16)

Pu Zhao (82 papers)
Xuan Shen (29 papers)
Zhenglun Kong (33 papers)
Yixin Shen (11 papers)
Sung-En Chang (10 papers)
Timothy Rupprecht (2 papers)
Lei Lu (55 papers)
Enfu Nan (3 papers)
Changdi Yang (10 papers)
Yumei He (3 papers)
Xingchen Xu (13 papers)
Yu Huang (176 papers)
Wei Wang (1793 papers)
Yue Chen (236 papers)
Yong He (77 papers)
Yanzhi Wang (197 papers)

Related Papers

Find Related Papers

Tweets

https://twitter.com/_akhaliq/status/1866679225111154966

https://twitter.com/knishimae0531/status/1870296710158921746