Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

ChatGPT's One-year Anniversary: Are Open-Source Large Language Models Catching up? (2311.16989v4)

Published 28 Nov 2023 in cs.CL

Abstract: Upon its release in late 2022, ChatGPT has brought a seismic shift in the entire landscape of AI, both in research and commerce. Through instruction-tuning a LLM with supervised fine-tuning and reinforcement learning from human feedback, it showed that a model could answer human questions and follow instructions on a broad panel of tasks. Following this success, interests in LLMs have intensified, with new LLMs flourishing at frequent interval across academia and industry, including many start-ups focused on LLMs. While closed-source LLMs (e.g., OpenAI's GPT, Anthropic's Claude) generally outperform their open-source counterparts, the progress on the latter has been rapid with claims of achieving parity or even better on certain tasks. This has crucial implications not only on research but also on business. In this work, on the first anniversary of ChatGPT, we provide an exhaustive overview of this success, surveying all tasks where an open-source LLM has claimed to be on par or better than ChatGPT.

Upon the one-year commemoration of ChatGPT's introduction, a comprehensive analysis was carried out to compare the effectiveness of open-source LLMs against the closed-source ChatGPT. The paper evaluates a range of tasks where open-source models have been claimed to perform on par with, or even exceed, the capabilities of ChatGPT, which, as a closed-source model, does not provide full access to its internal workings.

ChatGPT has had a substantial impact on both research and commercial AI applications, illustrated by its rapid user growth and substantial business investments. However, because of its non-public nature, there are limitations around understanding associated societal risks, difficulties ensuring reproducible research, and reliance on a single company's infrastructure and policies, which can lead to issues concerning access, data privacy, and costs.

The paper observes that although open-source models like Llama-2 and Falcon initially lagged behind their closed-source counterparts, they are rapidly closing the gap in performance across an array of tasks. Instances where the open-source LLMs excel include handling multi-turn conversations, agent capabilities, logical reasoning such as mathematics and coding, and domain-specific tasks like medical analysis.

The research consolidates various evaluation methods ranging from human feedback to alternative sequence generation, emphasizing the importance of data quality and training strategies in LLM development. The diversity of evaluations showcases the complexity of accurately assessing LLM capabilities and the challenges faced in establishing standardized benchmarks.

Ultimately, the paper aims to serve as a critical resource for both the research community and the business sector. It highlights the recent strides made by open-source LLMs, the evolving strategies for improving these models, and the potential issues encountered in open-source LLM development. This overview allows stakeholders to make informed decisions about the development and adoption of open-source LLMs.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Hailin Chen (11 papers)
  2. Fangkai Jiao (19 papers)
  3. Xingxuan Li (17 papers)
  4. Chengwei Qin (28 papers)
  5. Mathieu Ravaut (17 papers)
  6. Ruochen Zhao (15 papers)
  7. Caiming Xiong (337 papers)
  8. Shafiq Joty (187 papers)
Citations (24)
Youtube Logo Streamline Icon: https://streamlinehq.com
Reddit Logo Streamline Icon: https://streamlinehq.com