Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

OPT: Open Pre-trained Transformer Language Models (2205.01068v4)

Published 2 May 2022 in cs.CL and cs.LG

Abstract: LLMs, which are often trained for hundreds of thousands of compute days, have shown remarkable capabilities for zero- and few-shot learning. Given their computational cost, these models are difficult to replicate without significant capital. For the few that are available through APIs, no access is granted to the full model weights, making them difficult to study. We present Open Pre-trained Transformers (OPT), a suite of decoder-only pre-trained transformers ranging from 125M to 175B parameters, which we aim to fully and responsibly share with interested researchers. We show that OPT-175B is comparable to GPT-3, while requiring only 1/7th the carbon footprint to develop. We are also releasing our logbook detailing the infrastructure challenges we faced, along with code for experimenting with all of the released models.

Overview of "OPT: Open Pre-trained Transformer LLMs"

The paper "OPT: Open Pre-trained Transformer LLMs" presents a technical report on the development and characteristics of a suite of decoder-only pre-trained transformers, referred to as Open Pre-trained Transformers (OPT), which range from 125 million to 175 billion parameters. The research aims to address the lack of accessibility and transparency regarding LLMs such as GPT-3#1, by openly sharing the models, logbooks, and codebase, ensuring that a broader segment of researchers can engage with these powerful tools.

Key Contributions

  1. Model Development and Design: The authors developed eight Transformer LLMs with varying parameter sizes: 125M, 350M, 1.3B, 2.7B, 6.7B, 13B, 30B, 66B, and 175B. These models largely follow the architectures and hyperparameters established by previous works, particularly GPT-3, but incorporate contemporary best practices in data collection and training efficiency.
  2. Resource Sharing: Complete weights for OPT models up to 66B are freely available, with OPT-175B accessible upon request for research purposes. This makes them unique among very LLMs, most of which are either proprietary or accessible only via paid APIs.
  3. Efficiency: OPT-175B was trained using a significantly lower carbon footprint—only 1/7th of GPT-3#1's training emissions—thanks to the use of modern NVIDIA hardware and optimized training techniques.
  4. Performance: The performance of OPT-175B was benchmarked against GPT-3 and other LLMs across various standard NLP and dialogue tasks. While showing similar performance across standard metrics, certain discrepancies were noted in specific tasks and safety benchmarks.
  5. Responsibility and Ethical Release: The authors emphasize the importance of responsible AI and ethical considerations. They highlight potential risks associated with LLMs, such as bias and toxicity, and provide detailed evaluations to identify and understand these risks.

Numerical Results

On standard NLP tasks, OPT-175B demonstrates competitiveness with GPT-3. For instance, zero-shot performance average across 14 tasks aligns closely with GPT-3, although individual task performance varied. Specific results include:

  • Zero-shot Accuracy: Approximate parity with GPT-3 on 10 tasks, with minor underperformance on MultiRC and overperformance on WIC.
  • Training Efficiency: OPT-175B achieved up to 147 TFLOP/s per 80GB A100 GPU, demonstrating significant computational efficiency.
  • Carbon Footprint: The development of OPT-175B resulted in approximately 75 tons of CO2eq emissions versus the 500 tons estimated for GPT-3#1.

Bias and Safety Evaluations

Legitimate concerns regarding biases and toxic content generation are addressed through comprehensive evaluations:

  • Hate Speech Detection: OPT-175B performs better than GPT-3 Davinci, particularly in one-shot and few-shot binary/multiclass settings.
  • CrowS-Pairs: OPT-175B exhibits more stereotypical bias overall compared to Davinci, likely due to the prevalence of unmoderated social media text in the training data.
  • StereoSet: Shows comparable performance to Davinci in idealized context association test scores (ICAT), although nuances exist across bias categories.
  • RealToxicityPrompts: Higher propensity for toxic outputs compared to PaLM and Davinci, especially as prompt toxicity increases.

Implications and Future Directions

The implications of this work extend to both practical implementations and theoretical advancements in AI. Practically, the openly available OPT models foster reproducibility and inclusivity in AI research, allowing more diverse contributions to the paper and mitigation of LLM-related risks. The theoretically significant reduction in training emissions presents a more sustainable approach to developing large models.

Future developments may focus on issues of factual accuracy, bias mitigation, and the use of retrieval-augmented techniques to enhance model performance on fact-dependent tasks. Additionally, further investigation into the capabilities and limitations of these models across unexplored or less-studied languages and domains is warranted.

Conclusion

The release of OPT marks a significant step towards transparency and collaboration within the AI research community. By openly sharing extensive model details and computational logs, the authors aim to democratize access to cutting-edge LLMs, promoting a balanced dialogue on their societal impacts while enabling new avenues for innovation and ethical AI development.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (19)
  1. Susan Zhang (12 papers)
  2. Stephen Roller (27 papers)
  3. Naman Goyal (37 papers)
  4. Mikel Artetxe (52 papers)
  5. Moya Chen (9 papers)
  6. Shuohui Chen (4 papers)
  7. Christopher Dewan (3 papers)
  8. Mona Diab (71 papers)
  9. Xian Li (116 papers)
  10. Xi Victoria Lin (39 papers)
  11. Todor Mihaylov (23 papers)
  12. Myle Ott (33 papers)
  13. Sam Shleifer (15 papers)
  14. Kurt Shuster (28 papers)
  15. Daniel Simig (10 papers)
  16. Punit Singh Koura (10 papers)
  17. Anjali Sridhar (2 papers)
  18. Tianlu Wang (33 papers)
  19. Luke Zettlemoyer (225 papers)
Citations (3,064)
Youtube Logo Streamline Icon: https://streamlinehq.com