Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
60 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

OpenELM: An Efficient Language Model Family with Open Training and Inference Framework (2404.14619v2)

Published 22 Apr 2024 in cs.CL, cs.AI, and cs.LG
OpenELM: An Efficient Language Model Family with Open Training and Inference Framework

Abstract: The reproducibility and transparency of LLMs are crucial for advancing open research, ensuring the trustworthiness of results, and enabling investigations into data and model biases, as well as potential risks. To this end, we release OpenELM, a state-of-the-art open LLM. OpenELM uses a layer-wise scaling strategy to efficiently allocate parameters within each layer of the transformer model, leading to enhanced accuracy. For example, with a parameter budget of approximately one billion parameters, OpenELM exhibits a 2.36% improvement in accuracy compared to OLMo while requiring $2\times$ fewer pre-training tokens. Diverging from prior practices that only provide model weights and inference code, and pre-train on private datasets, our release includes the complete framework for training and evaluation of the LLM on publicly available datasets, including training logs, multiple checkpoints, and pre-training configurations. We also release code to convert models to MLX library for inference and fine-tuning on Apple devices. This comprehensive release aims to empower and strengthen the open research community, paving the way for future open research endeavors. Our source code along with pre-trained model weights and training recipes is available at \url{https://github.com/apple/corenet}. Additionally, \model models can be found on HuggingFace at: \url{https://huggingface.co/apple/OpenELM}.

OpenELM: A Comprehensive Open Source LLM Framework

Introduction

The recently developed OpenELM, a variant of transformer-based LLMs, introduces an innovative approach to parameter allocation using a layer-wise scaling methodology. This approach not only boosts model performance but also reduces the dependency on large-scale pre-training datasets, setting a new benchmark in model efficiency.

Model Architecture and Training Approach

OpenELM's architecture diverges from the traditional uniform parameter allocation in transformer layers. The model employs a layer-wise scaling strategy which adjusts the scale of parameters across different layers effectively:

  • Each layer can have a varying number of attention heads and dimensions, allowing more flexible and efficient use of model capacity.
  • Parameters grow from smaller dimensions in initial layers to larger dimensions closer to the output layer, optimizing the model's learning and representation capabilities.

OpenELM was trained using a combination of publicly available datasets amounting to 1.5 trillion tokens—significantly less than typically required by models of comparable complexity.

Key Results and Benchmarks

OpenELM achieves notable efficiency and performance improvements compared to existing models like OLMo and MobiLlama:

  • With only 1.1 billion parameters, OpenELM demonstrates a 2.36% average accuracy improvement over OLMo, while requiring half the tokens for pre-training.
  • Benchmarked on common NLP tasks, OpenELM consistently outperforms other models pre-trained on public datasets, with substantial leads on both zero-shot and few-shot settings.

OpenELM’s comprehensive release, including training logs and model weights, facilitates greater transparency and reproducibility in AI research, which is often hampered by proprietary practices.

Implications and Future Directions

The introduction of OpenELM signals a significant shift towards more accessible and efficient AI research tools. Given its open-source nature and superior performance metrics, OpenELM is poised to become a valuable resource for researchers aiming to develop more effective and efficient NLP models. Future research could explore further optimizations in parameter scaling and allocation to enhance model efficiency even more. In addition, the community might also develop improved versions of RMSNorm, tailored to better handle the computing demands of large-scale models like OpenELM.

Performance Analysis

Despite its improved accuracy, OpenELM shows a decrease in inference speed compared to models using more traditional normalization methods. This discovery has led to an ongoing exploration of how normalization impacts performance and efficiency, highlighting a potential area for further optimization. Future updates could address these bottlenecks, potentially adjusting or enhancing RMSNorm implementations to bridge the performance gap with LayerNorm.

Conclusion

OpenELM represents a significant advancement in the design and deployment of LLMs. By improving the efficiency of parameter usage and reducing the pre-training data requirements, OpenELM not only achieves state-of-the-art performance but also furthers the democratization of AI research, empowering a broader community to contribute to and expand upon this foundational work.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (11)
  1. Sachin Mehta (48 papers)
  2. Mohammad Hossein Sekhavat (4 papers)
  3. Qingqing Cao (16 papers)
  4. Maxwell Horton (18 papers)
  5. Yanzi Jin (5 papers)
  6. Chenfan Sun (2 papers)
  7. Iman Mirzadeh (11 papers)
  8. Mahyar Najibi (38 papers)
  9. Dmitry Belenko (3 papers)
  10. Peter Zatloukal (2 papers)
  11. Mohammad Rastegari (57 papers)
Citations (28)
Github Logo Streamline Icon: https://streamlinehq.com
Youtube Logo Streamline Icon: https://streamlinehq.com
Reddit Logo Streamline Icon: https://streamlinehq.com