Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

LLLMs: A Data-Driven Survey of Evolving Research on Limitations of Large Language Models (2505.19240v2)

Published 25 May 2025 in cs.CL, cs.AI, and cs.LG

Abstract: LLM research has grown rapidly, along with increasing concern about their limitations such as failures in reasoning, hallucinations, and limited multilingual capability. While prior reviews have addressed these issues, they often focus on individual limitations or consider them within the broader context of evaluating overall model performance. This survey addresses the gap by presenting a data-driven, semi-automated review of research on limitations of LLMs (LLLMs) from 2022 to 2025, using a bottom-up approach. From a corpus of 250,000 ACL and arXiv papers, we extract 14,648 relevant limitation papers using keyword filtering and LLM-based classification, validated against expert labels. Using topic clustering (via two approaches, HDBSCAN+BERTopic and LlooM), we identify between 7 and 15 prominent types of limitations discussed in recent LLM research across the ACL and arXiv datasets. We find that LLM-related research increases nearly sixfold in ACL and nearly fifteenfold in arXiv between 2022 and 2025, while LLLMs research grows even faster, by a factor of over 12 in ACL and nearly 28 in arXiv. Reasoning remains the most studied limitation, followed by generalization, hallucination, bias, and security. The distribution of topics in the ACL dataset stays relatively stable over time, while arXiv shifts toward safety and controllability (with topics like security risks, alignment, hallucinations, knowledge editing), and multimodality between 2022 and 2025. We offer a quantitative view of trends in LLM limitations research and release a dataset of annotated abstracts and a validated methodology, available at: https://github.com/a-kostikova/LLLMs-Survey.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Aida Kostikova (5 papers)
  2. Zhipin Wang (5 papers)
  3. Deidamea Bajri (1 paper)
  4. Ole Pütz (2 papers)
  5. Benjamin Paaßen (29 papers)
  6. Steffen Eger (90 papers)

Summary

A Data-Driven Survey of Evolving Research on Limitations of LLMs

The paper "LLLMs: A Data-Driven Survey of Evolving Research on Limitations of LLMs," authored by Aida Kostikova et al., offers a systematic analysis of the limitations inherent in LLMs, focusing on areas such as reasoning, hallucination, bias, and security. The authors leverage a data-driven approach to provide quantitative insights into the prevailing trends within the LLM research community, particularly concerning the limitations and constraints of these models.

The paper spans from 2022 to early 2025, encompassing 250,000 papers from major repositories like ACL and arXiv. Through an elaborate filtering process, the authors identified 14,648 papers relevant to LLM limitations using techniques like keyword filtering, LLM-based classification, and topic clustering with methodologies such as HDBSCAN+BERTopic and LlooM. This substantial corpus allowed for a comprehensive meta-analysis, revealing that LLM research has surged substantially in recent years, with limitations-focused research growing even more rapidly, accounting for over 30% of all LLM studies by 2024.

Several key findings emerged from this analysis:

  1. Reasoning Challenges: Reasoning difficulties remain the most studied limitation across both ACL and arXiv datasets. This suggests a persistent struggle in enabling LLMs to perform complex cognitive tasks effectively without relying on superficial heuristics.
  2. Limitations in Generalization and Hallucination: Generalization and hallucination are significant areas of concern, with studies highlighting challenges related to model consistency and factual accuracy. This underscores ongoing issues with LLM reliability, particularly in diverse contexts and multimodal outputs.
  3. Multilinguality and Bias: While multilingual capabilities have improved, the paper identifies continued limitations in language diversity and representation, exacerbated by underlying biases and fairness challenges. These biases have implications in various societal domains, emphasizing the need for more equitable NLP solutions.
  4. Security Risks: As LLM deployment becomes more widespread, security issues like adversarial attacks and prompt injection have become increasingly relevant. This reflects broader concerns about model robustness and integrity in sensitive applications.

The research points to a dynamic landscape where LLM limitations are being rapidly addressed, albeit with new challenges emerging as models evolve. The shift in research focus towards safety and controllability on platforms like arXiv parallels advancements in LLM deployments in critical domains such as healthcare and finance.

Implications and Future Directions:

Practically, this research informs developers and policymakers about critical areas in LLM design and implementation that require ongoing attention. Theoretically, it lays the groundwork for further studies that must contemplate these limitations as foundational rather than peripheral issues.

Future research could benefit from deeper exploration into specific subcategories within identified limitations, such as nuanced forms of reasoning or detailed types of hallucinations. Longitudinal studies that trace the evolution of these concerns over time can also provide insights into the efficacy of mitigation strategies being applied.

Overall, while LLM research continues to expand, this paper highlights the criticality of scrutinizing limitations to ensure responsible and effective deployment across diverse applications. This strategic focus could drive the development of more robust, fair, and secure LLMs in the future.

X Twitter Logo Streamline Icon: https://streamlinehq.com
Youtube Logo Streamline Icon: https://streamlinehq.com