LLMBox: A Comprehensive Library for Large Language Models (2407.05563v1)

Published 8 Jul 2024 in cs.CL

Abstract: To facilitate the research on LLMs, this paper presents a comprehensive and unified library, LLMBox, to ease the development, use, and evaluation of LLMs. This library is featured with three main merits: (1) a unified data interface that supports the flexible implementation of various training strategies, (2) a comprehensive evaluation that covers extensive tasks, datasets, and models, and (3) more practical consideration, especially on user-friendliness and efficiency. With our library, users can easily reproduce existing methods, train new models, and conduct comprehensive performance comparisons. To rigorously test LLMBox, we conduct extensive experiments in a diverse coverage of evaluation settings, and experimental results demonstrate the effectiveness and efficiency of our library in supporting various implementations related to LLMs. The detailed introduction and usage guidance can be found at https://github.com/RUCAIBox/LLMBox.

PDF HTML Abstract

LLMBox: A Comprehensive Library for LLMs

The paper "LLMBox: A Comprehensive Library for LLMs" by Tianyi Tang et al. presents a unifying framework aimed at facilitating the development, utilization, and evaluation of LLMs. Given the rapidly evolving landscape of LLM-based research, this paper addresses critical gaps in reproducibility, user-friendliness, and efficiency in existing LLM frameworks.

Key Features and Functionalities

The LLMBox library is designed with three primary features:

Unified Data Interface: This interface is capable of handling various data formats, such as plain text and instruction data, and supports multiple training strategies. Key methodologies like dynamic mixture proportion and combined training with pre-training and instruction data are smoothly integrated. It extensively supports parameter-efficient tuning methods, including LoRA and alignment tuning such as PPO.
Comprehensive Evaluation: LLMBox facilitates holistic evaluation by encompassing 18 downstream tasks and 56 datasets, thus covering a wide array of LLM functionalities. This includes assessment metrics for advanced capabilities like human alignment, hallucination detection, and instruction following. The framework integrates various public LLMs and commercial APIs, providing a robust platform for comparison.
User-Friendly and Efficient Design: LLMBox emphasizes easy-to-use pipelines that require minimal commands to initiate. The GPU calculator feature helps users determine necessary resources, enabling efficient training and inference mechanisms. Remarkably, the library demonstrates the ability to run inference on the entire MMLU benchmark within six minutes on a single A800 GPU and completes instruction tuning in ten minutes using eight A800 GPUs.

Training Framework

LLMBox integrates several key training methods:

Pre-Training: It supports both initial pre-training and continual pre-training on domain-specific corpora, with capabilities for vocabulary expansion.
Instruction Tuning: Providing ten integrated datasets for instruction tuning, LLMBox employs techniques such as Self-Instruct, Evol-Instruct, and topic diversification, streamlining the pre-processing into a unified format for flexible data handling.
Human Alignment: Incorporating RLHF methods like PPO and non-RL methods such as DPO, along with several preference datasets, to align LLMs with human values.

To ensure efficiency, LLMBox includes strategies like LoRA, QLoRA, DeepSpeed optimizations, and packing. These strategies enable efficient utilization of limited computational resources. The detailed GPU calculator aids in estimating memory consumption, ensuring that users can select appropriate resources for their training scenarios.

Utilization and Evaluation

For utilization, LLMBox supports several methods:

Quantization: The library incorporates bitsandbytes and GPTQ for quantization, enhancing memory efficiency.
In-Context Learning and Chain-of-Thought Prompting: Advanced ICL strategies and various CoT methods are embedded, supporting both prompt design and example selection for diverse tasks.

Evaluation methods include free-form generation, completion perplexity, and option probability. The prefix caching strategy in LLMBox significantly enhances inference efficiency by caching hidden states of common prefix texts.

Supported Models and Tasks

LLMBox is compatible with a vast array of models and tasks, leveraging the Transformers library. Supported models span general, multilingual, chat-oriented, code-specific, and mathematical LLMs. The extensive task support includes benchmarks like MMLU and GSM8K, multilingual benchmarks, machine translation, code synthesis, human alignment, hallucination detection, and more.

Performance

The comprehensive experiments validate LLMBox's effectiveness and efficiency. Training results on LLaMA-2 demonstrate consistent reproducibility of performance metrics. Extensive utilization testing on various LLMs (including GPT-NeoX, OPT, BLOOM) against benchmarks such as MMLU and HellaSwag proves LLMBox's compatibility and effectiveness. Efficiency evaluations indicate significant reductions in inference time through the prefix caching strategy.

Implications and Future Directions

LLMBox represents a significant step towards standardizing LLM research practices. Its unified interface and comprehensive evaluation framework streamline the reproducibility and comparison of different LLM methodologies, facilitating more robust and comparative studies. Practically, LLMBox lowers the technical barrier for entry, allowing researchers to focus on innovation rather than implementation details.

Future developments could include expanding the library's support for emerging LLM architectures and incorporating additional advanced metrics for evaluation. Integrating real-time collaboration tools and further optimizations for resource-constrained environments could enhance its utility for a broader research community.

In conclusion, LLMBox offers a versatile and efficient solution for LLM research, bridging significant gaps in the current ecosystem and paving the way for more systematic advancements in the field.

PDF Markdown Bookmark Chat (Pro)

Authors (25)

Tianyi Tang (30 papers)
Yiwen Hu (21 papers)
Bingqian Li (3 papers)
Wenyang Luo (6 papers)
Zijing Qin (1 paper)
Haoxiang Sun (5 papers)
Jiapeng Wang (22 papers)
Shiyi Xu (2 papers)
Xiaoxue Cheng (12 papers)
Geyang Guo (6 papers)
Han Peng (11 papers)
Bowen Zheng (51 papers)
Yiru Tang (3 papers)
Yingqian Min (14 papers)
Yushuo Chen (15 papers)
Jie Chen (602 papers)
Yuanqian Zhao (3 papers)
Luran Ding (1 paper)
Yuhao Wang (144 papers)
Zican Dong (12 papers)

Related Papers

Find Related Papers

GitHub

GitHub - RUCAIBox/LLMBox: A comprehensive library for implementing LLMs, including a unified training pipeline and comprehensive model evaluation. (624 stars)

Tweets

https://twitter.com/gm8xx8/status/1810746491709137290

https://twitter.com/DrPyRepo/status/1812442675951304830