FLM-101B: An Open LLM and How to Train It with $100K Budget (2309.03852v2)

Published 7 Sep 2023 in cs.CL and cs.AI

Abstract: LLMs have achieved remarkable success in NLP and multimodal tasks, among others. Despite these successes, two main challenges remain in developing LLMs: (i) high computational cost, and (ii) fair and objective evaluations. In this paper, we report a solution to significantly reduce LLM training cost through a growth strategy. We demonstrate that a 101B-parameter LLM with 0.31T tokens can be trained with a budget of 100K US dollars. Inspired by IQ tests, we also consolidate an additional range of evaluations on top of existing evaluations that focus on knowledge-oriented abilities. These IQ evaluations include symbolic mapping, rule understanding, pattern mining, and anti-interference. Such evaluations minimize the potential impact of memorization. Experimental results show that our model, named FLM-101B, trained with a budget of 100K US dollars, achieves performance comparable to powerful and well-known models, e.g., GPT-3 and GLM-130B, especially on the additional range of IQ evaluations. The checkpoint of FLM-101B is released at https://huggingface.co/CofeAI/FLM-101B.

PDF Abstract

An Examination of FLM-101B: Training a 101B-Parameter LLM with Budget Constraints

The paper "FLM-101B: An Open LLM and How to Train It with a \$100K Budget" investigates the development and training of a LLM with over 100 billion parameters under a constrained budget. This paper addresses two pivotal challenges in LLM development: reducing high training costs and establishing fair evaluations that extend beyond mere memorization.</p> <h3 class='paper-heading'>Training Cost Reduction Through a Growth Strategy</h3> <p>A significant contribution of the paper is the introduction of a growth strategy to minimize training cost. Traditionally, <a href="#" x-data="{ link: 'https://www.emergentmind.com/topics/large-language-models-llms' }" @click.prevent="window.location.href = link" @auxclick.prevent="window.open(link, '_blank')" data-href="https://www.emergentmind.com/topics/large-language-models-llms" class="assistant-link pseudo">LLMs</a> like GPT-3 and the LLAMA series have high computational demands. The researchers present a methodology to train a 101B-parameter LLM, termed FLM-101B, using only \$100,000. The key innovation is the "growth" strategy where model parameters are not fixed and grow throughout training. This approach theoretically reduces the number of floating-point operations required, as FLOPs generally scale with the number of model parameters. Through this method, the training cost benefits directly by maximizing computational savings across different growth strategies evaluated in the text.

Performance Evaluation and IQ-Based Assessment

FLM-101B is evaluated not only through conventional knowledge-based assessments but also through intelligence quotient (IQ)-like tests. The paper identifies limitations in standard benchmarks that may not fully reflect a model's true reasoning and problem-solving abilities. Therefore, the IQ-inspired evaluations focus on symbolic mapping, rule understanding, pattern mining, and anti-interference, offering a diversified approach to evaluating LLM capabilities beyond simple knowledge recall. Notably, FLM-101B provides strong performance comparable to existing models such as GPT-3 and GLM-130B in these varied contextual evaluations.

Contributions and Experimental Results

The paper claims that FLM-101B, aside from being a cost-efficient model, offers competitive results on several evaluation tasks while using significantly fewer computational resources. The model undergoes extensive evaluations across a wide range of tasks, demonstrating skills in both knowledge-oriented benchmarks and less conventional IQ tests. The paper asserts that the model's effective use of the growth strategy provides a promising direction for future research in reducing the computational demands of training expansive LLMs.

Implications and Future Directions

The paper’s implications are twofold. Practically, it emphasizes cost-effective methodologies for scaling LLMs. Theoretically, it provokes a rethinking of evaluation methodologies in AI, highlighting the potential of IQ-inspired assessments. Future developments could explore further optimizations in the growth strategy, potentially applying these principles to even larger models, potentially beyond the trillion-parameter mark. Additionally, with the release of FLM-101B model checkpoints, this research supports burgeoning efforts in bilingual LLM development and experimentation.

In conclusion, the paper effectively illustrates strategies to address the dual challenges of cost and evaluation in LLM training, advocating for both innovative technical approaches and reconsidered evaluative frameworks. The combination of explicit cost constraints with a broader evaluation capacity aligns with emerging needs in the AI community for scalable, cost-efficient, and robust LLMs.