Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 65 tok/s
Gemini 2.5 Pro 47 tok/s Pro
GPT-5 Medium 39 tok/s Pro
GPT-5 High 32 tok/s Pro
GPT-4o 97 tok/s Pro
Kimi K2 164 tok/s Pro
GPT OSS 120B 466 tok/s Pro
Claude Sonnet 4 38 tok/s Pro
2000 character limit reached

Large Language Models: A Survey (2402.06196v3)

Published 9 Feb 2024 in cs.CL and cs.AI

Abstract: LLMs have drawn a lot of attention due to their strong performance on a wide range of natural language tasks, since the release of ChatGPT in November 2022. LLMs' ability of general-purpose language understanding and generation is acquired by training billions of model's parameters on massive amounts of text data, as predicted by scaling laws \cite{kaplan2020scaling,hoffmann2022training}. The research area of LLMs, while very recent, is evolving rapidly in many different ways. In this paper, we review some of the most prominent LLMs, including three popular LLM families (GPT, LLaMA, PaLM), and discuss their characteristics, contributions and limitations. We also give an overview of techniques developed to build, and augment LLMs. We then survey popular datasets prepared for LLM training, fine-tuning, and evaluation, review widely used LLM evaluation metrics, and compare the performance of several popular LLMs on a set of representative benchmarks. Finally, we conclude the paper by discussing open challenges and future research directions.

Citations (210)
List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

  • The paper provides a comprehensive survey of state-of-the-art LLMs, detailing advances in architecture and training methods.
  • It reviews the evolution from statistical models to neural and transformer-based models, emphasizing benchmark data and emergent capabilities.
  • It discusses open challenges such as model efficiency, bias mitigation, and multi-modal integration to guide future research.

Overview of LLMs

LLMs have become central to advancements in natural language processing due to their strong capabilities in language understanding and generation. This paper provides an in-depth survey of prominent LLMs such as GPT, LLaMA, and PaLM, examining their attributes, contributions, and limitations. In addition, the paper addresses methodologies for constructing and augmenting these models, highlighting datasets and benchmarks used for training and evaluating LLM performance. The paper concludes by discussing the open challenges and possible future directions in LLM research.

Evolution of LLMs

The development of LLMs has progressed through several phases. Initially, statistical LLMs like n-grams used word probability products conditioned on preceding words. The need to address data sparsity led to the development of neural LLMs (NLMs), which map words to embeddings and predict subsequent words using neural networks. Transformative advances emerged with the introduction of pre-trained LLMs (PLMs) such as BERT, which utilized encoder-only architectures, and GPT, which utilized decoder-only architectures for tasks such as text generation.

LLMs represent the latest phase in this evolution. Leveraging the transformer architecture, LLMs incorporate billions of parameters trained on extensive datasets, enabling emergent abilities such as in-context learning, instruction following, and multi-step reasoning.

Key Model Families

  • GPT Family: Starting with GPT-1, this family pioneered the generative pre-training approach, improving with subsequent iterations like GPT-2 and GPT-3. ChatGPT and GPT-4 have expanded to interactive applications, demonstrating human-like conversation capabilities.
  • LLaMA Family: Developed by Meta, LLaMA models are open-source and designed for efficient deployment. Fine-tuned iterations, such as LLaMA-2 Chat, are reported to surpass other open models in benchmarks.
  • PaLM Family: Google’s PaLM models are known for strong language generation skills, instruction tuning, and domain-specific LLMs like Med-PaLM, which targets healthcare applications.

Constructing and Utilizing LLMs

Building LLMs involves several steps:

  • Data Preparation: Filtering, deduplication, and tokenization ensure quality training data.
  • Training Methods: Models are pre-trained on massive datasets with autoregressive or masked LLMing objectives.
  • Fine-tuning and Alignment: Instruction tuning and reinforcement learning align LLM behavior with human intent.
  • Decoding and Deployment: Approaches like beam search and top-K sampling are used for text generation during deployment.

These methodologies are complemented by tools and frameworks for optimized model training and inference efficiency.

Applications and Challenges

LLMs have transformative applications spanning content creation, personalized search, and interactive agents. Despite their efficacy, challenges remain, including model efficiency, architectural innovations beyond attention mechanisms, the integration of multi-modal information, and addressing biases and security concerns.

Conclusion

As the field of LLMs develops, researchers focus on refining model efficiency, exploring new architectural paradigms, embracing multi-modal data integration, and advancing alignment techniques. Addressing security and ethical considerations remains paramount in deploying LLMs across various domains. Continued innovation promises further expansion in the capabilities and applications of LLMs, setting the stage for the next era in artificial intelligence.

Youtube Logo Streamline Icon: https://streamlinehq.com

HackerNews

  1. Large Language Models: A Survey (5 points, 0 comments)

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube