AI Research Assistant for Computer Scientists

Papers
Topics
Authors
Recent
2000 character limit reached
OPT: Open Pre-trained Transformer Language Models (2205.01068)
Published 2 May 2022 in cs.CL and cs.LG
OPT: Open Pre-trained Transformer Language Models

Overview

  • The paper introduces Open Pre-trained Transformers (OPT), a series of open-access, decoder-only transformer models ranging from 125 million to 175 billion parameters, designed to increase accessibility and transparency in LLMs.

  • The authors provide complete weights for models up to 66B and 175B upon request, emphasizing responsible AI by addressing bias and toxicity concerns through comprehensive evaluations.

  • OPT-175B offers competitive performance with GPT-3 while maintaining significantly lower training emissions, marking a sustainable approach to large model development.

Overview of "OPT: Open Pre-trained Transformer Language Models"

The paper titled "OPT: Open Pre-trained Transformer Language Models" presents a technical report on the development and characteristics of a suite of decoder-only pre-trained transformers, referred to as Open Pre-trained Transformers (OPT), which range from 125 million to 175 billion parameters. The research aims to address the lack of accessibility and transparency regarding LLMs such as GPT-3#1, by openly sharing the models, logbooks, and codebase, ensuring that a broader segment of researchers can engage with these powerful tools.

Key Contributions

  1. Model Development and Design: The authors developed eight Transformer language models with varying parameter sizes: 125M, 350M, 1.3B, 2.7B, 6.7B, 13B, 30B, 66B, and 175B. These models largely follow the architectures and hyperparameters established by previous works, particularly GPT-3, but incorporate contemporary best practices in data collection and training efficiency.

  2. Resource Sharing: Complete weights for OPT models up to 66B are freely available, with OPT-175B accessible upon request for research purposes. This makes them unique among very LLMs, most of which are either proprietary or accessible only via paid APIs.

  3. Efficiency: OPT-175B was trained using a significantly lower carbon footprint—only 1/7th of GPT-3#1's training emissions—thanks to the use of modern NVIDIA hardware and optimized training techniques.

  4. Performance: The performance of OPT-175B was benchmarked against GPT-3 and other LLMs across various standard NLP and dialogue tasks. While showing similar performance across standard metrics, certain discrepancies were noted in specific tasks and safety benchmarks.

  5. Responsibility and Ethical Release: The authors emphasize the importance of responsible AI and ethical considerations. They highlight potential risks associated with LLMs, such as bias and toxicity, and provide detailed evaluations to identify and understand these risks.

Numerical Results

On standard NLP tasks, OPT-175B demonstrates competitiveness with GPT-3. For instance, zero-shot performance average across 14 tasks aligns closely with GPT-3, although individual task performance varied. Specific results include:

  • Zero-shot Accuracy: Approximate parity with GPT-3 on 10 tasks, with minor underperformance on MultiRC and overperformance on WIC.
  • Training Efficiency: OPT-175B achieved up to 147 TFLOP/s per 80GB A100 GPU, demonstrating significant computational efficiency.
  • Carbon Footprint: The development of OPT-175B resulted in approximately 75 tons of CO2eq emissions versus the 500 tons estimated for GPT-3#1.

Bias and Safety Evaluations

Legitimate concerns regarding biases and toxic content generation are addressed through comprehensive evaluations:

  • Hate Speech Detection: OPT-175B performs better than GPT-3 Davinci, particularly in one-shot and few-shot binary/multiclass settings.
  • CrowS-Pairs: OPT-175B exhibits more stereotypical bias overall compared to Davinci, likely due to the prevalence of unmoderated social media text in the training data.
  • StereoSet: Shows comparable performance to Davinci in idealized context association test scores (ICAT), although nuances exist across bias categories.
  • RealToxicityPrompts: Higher propensity for toxic outputs compared to PaLM and Davinci, especially as prompt toxicity increases.

Implications and Future Directions

The implications of this work extend to both practical implementations and theoretical advancements in AI. Practically, the openly available OPT models foster reproducibility and inclusivity in AI research, allowing more diverse contributions to the paper and mitigation of LLM-related risks. The theoretically significant reduction in training emissions presents a more sustainable approach to developing large models.

Future developments may focus on issues of factual accuracy, bias mitigation, and the use of retrieval-augmented techniques to enhance model performance on fact-dependent tasks. Additionally, further investigation into the capabilities and limitations of these models across unexplored or less-studied languages and domains is warranted.

Conclusion

The release of OPT marks a significant step towards transparency and collaboration within the AI research community. By openly sharing extensive model details and computational logs, the authors aim to democratize access to cutting-edge language models, promoting a balanced dialogue on their societal impacts while enabling new avenues for innovation and ethical AI development.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (19)
  1. Susan Zhang (9 papers)
  2. Stephen Roller (27 papers)
  3. Naman Goyal (36 papers)
  4. Mikel Artetxe (48 papers)
  5. Moya Chen (9 papers)
  6. Shuohui Chen (4 papers)
  7. Christopher Dewan (3 papers)
  8. Mona Diab (65 papers)
  9. Xian Li (84 papers)
  10. Xi Victoria Lin (33 papers)
  11. Todor Mihaylov (21 papers)
  12. Myle Ott (33 papers)
  13. Sam Shleifer (15 papers)
  14. Kurt Shuster (28 papers)
  15. Daniel Simig (10 papers)
  16. Punit Singh Koura (7 papers)
  17. Anjali Sridhar (2 papers)
  18. Tianlu Wang (29 papers)
  19. Luke Zettlemoyer (200 papers)