Position: Key Claims in LLM Research Have a Long Tail of Footnotes (2308.07120v2)

Published 14 Aug 2023 in cs.CL

Abstract: Much of the recent discourse within the ML community has been centered around LLMs, their functionality and potential -- yet not only do we not have a working definition of LLMs, but much of this discourse relies on claims and assumptions that are worth re-examining. We contribute a definition of LLMs, critically examine five common claims regarding their properties (including 'emergent properties'), and conclude with suggestions for future research directions and their framing.

PDF Abstract

Analysis of "Mind your Language (Model): Fact-Checking LLMs and their Role in NLP Research and Practice"

This position paper by Alexandra Sasha Luccioni and Anna Rogers critically analyzes the discourse surrounding LLMs in NLP. The authors address fundamental issues in the field, such as the lack of a clear definition for LLMs, the evidential basis behind prevalent assumptions about their functionalities, and the impact of these assumptions on the future trajectory of NLP research.

Definition and Criteria for LLMs

The paper begins by proposing criteria that precisely define what constructs an LLM. These criteria are:

LLMs are tasked with modeling and generating text based on contextual inputs.
They undergo large-scale pretraining, evaluated here by a minimum threshold of 1 billion tokens for data scale.
They enable transfer learning, demonstrating adaptability across a wide range of tasks.

By these criteria, models like BERT and GPT series qualify as LLMs, while models such as word2vec do not, as they lack contextual adaptability during inference. This attempt to rigorously define LLMs resolves some of the ambiguities in the ongoing discourse.

Evaluating Assumptions about LLM Functionality

The paper scrutinizes four prevalent claims regarding LLMs: robustness, state-of-the-art (SOTA) status, performance attributed to scale, and emergent properties.

Robustness: While LLMs mitigate some brittleness typical in early symbolic AI systems, the paper cites existing research that demonstrates their susceptibility to phenomena such as shortcut learning and prompt sensitivity.
SOTA Performance: LLMs are frequently positioned as superior across NLP benchmarks, but this position is nuanced by distinguishing between fine-tuning and few-shot paradigms. The authors argue that, contrary to popular belief, LLMs do not unequivocally surpass non-LLM approaches. They caution against the high likelihood of data contamination skewing benchmark results.
Scaling and Performance: The hypothesis that scaling inherently improves model performance is interrogated. Although larger models achieve impressive results, the exact contributions of model size vs. data size remain uncertain, and recent trends illustrate significant model efficiencies without merely increasing size.
Emergent Properties: The authors challenge the notion of emergent properties that are not tied to learned data. They emphasize the importance of empirical evidence that aligns model behaviors with pre-training data, arguing that these 'emergent' abilities primarily stem from data exposure rather than inherent model faculties.

Implications for NLP Research and Practice

The authors indicate several patterns emerging due to the prolific adoption of LLMs:

Homogenization: Increasing reliance on LLMs is threatening the diversity of research methodologies.
Industry Influence: Industry-driven priorities are shaping research directions, potentially sidelining theoretical explorations.
De-democratization: The resource-intensive nature of training LLMs is drifting the research landscape away from academic and independent settings.
Reproducibility Challenges: The intricate nature of LLMs introduces additional complexities in replicating results across projects and timelines.

Recommendations for the Future

To navigate these implications, the authors put forward recommendations to preserve and advance NLP research, including:

Encouraging methodological diversity.
Clarifying terminology and ensuring precision in defining LLM-related concepts.
Refraining from using proprietary models as benchmarks to maintain transparency and reproducibility.
Promoting rigorous studies on LLM functionality and refining evaluation methodologies.

The paper serves as a cautious reminder of the burgeoning field's overlooked assumptions and highlights the necessity for disciplined scholarly conduct. As the industry advances, sustaining a vibrant, inclusive research ecosystem is crucial for continued progress in understanding LLM capabilities and their rightful place in NLP applications.

PDF Markdown Bookmark Chat (Pro)

Authors (2)

Alexandra Sasha Luccioni (25 papers)
Anna Rogers (27 papers)

Citations (8)

View on Semantic Scholar

Related Papers

Find Related Papers

Tweets

https://twitter.com/MilaNLProc/status/1798681155174387833

YouTube

Show All Videos