Towards Fundamental Language Models: Does Linguistic Competence Scale with Model Size? (2509.02225v1)

Published 2 Sep 2025 in cs.CL

Abstract: LLMs offer impressive language capabilities but suffer from well-known limitations, including hallucinations, biases, privacy concerns, and high computational costs. These issues are largely driven by the combination of linguistic competence and factual memorization within a single monolithic model. This paper introduces and empirically supports the Fundamental LLM (FLM) paradigm, which advocates for smaller, linguistically competent models that offload factual retrieval to external tools. We evaluate models ranging from 135M to 32B parameters across three dimensions: linguistic competence, external factual knowledge, and internal factual knowledge. Our findings reveal that while both linguistic competence and factual knowledge improve with scale, internal factual knowledge grows significantly faster, suggesting that model size is more closely tied to memorization than to core language ability. These results support a modular approach to language modeling, where compact, linguistically proficient models serve as the foundation for tool-augmented systems. The FLM paradigm offers a path toward more efficient, interpretable, and sustainable NLP solutions.

Summary

The paper finds that linguistic competence plateaus with increased size while internal factual knowledge scales sharply, highlighting a decoupling opportunity.
The methodology rigorously benchmarks models from 135M to 32B parameters using tests like WiC, BLiMP, RTE, MNLI, and QQP.
Results support a modular approach where smaller FLMs, aided by external fact retrieval, can maintain robust language processing.

Towards Fundamental LLMs: Does Linguistic Competence Scale with Model Size?

Introduction

The paper entitled "Towards Fundamental LLMs: Does Linguistic Competence Scale with Model Size?" explores the feasibility of separating linguistic competence from factual knowledge in LLMs. This separation is articulated through the Fundamental LLM (FLM) paradigm, which advocates for smaller, linguistically competent models that rely on external tools for factual retrieval. The paper evaluates a range of models from 135 million to 32 billion parameters, aiming to discern whether linguistic competence scales in tandem with model size or if it stabilizes, thus supporting a modular approach to language modeling.

Theoretical Perspective

The FLM paradigm is introduced as a solution to address the limitations faced by monolithic LLMs, such as hallucinations, biases, and high computational costs. Traditional LLMs embed both factual and linguistic information, which can result in inefficient models once external retrieval could suffice for factual data. The FLM approach posits that linguistic competence, defined by the model's ability to understand and generate language structures (e.g., grammar and semantics), can be maintained in smaller models. This suggests that scaling larger models predominantly enhances factual memorization rather than core linguistic proficiency.

Methodology

The paper systematically evaluates models across three dimensions: linguistic competence, external factual knowledge, and internal factual knowledge. Using benchmarks from the LM Evaluation Harness, such as WiC for lexical competence, BLiMP for grammatical competence, and RTE, MNLI, and QQP for semantic competence, the authors examine whether model size impacts core language abilities. Models from various families like SmolLM2, Qwen2.5, Llama-3, and more, are assessed with parameter sizes ranging from 135M to 32B.

Results and Analysis

The results reveal a nuanced perspective on scaling LLMs:

Linguistic Competence: Smaller models were found to retain significant linguistic competence, with Qwen2.5 models demonstrating superior performance across lexical, grammatical, and semantic benchmarks even at moderate parameter sizes.
External Factual Knowledge: The performance in external factual knowledge tasks does not scale uniformly with size, suggesting reasoning capabilities are achievable without extensive model scaling.
Internal Factual Knowledge: Larger models predictably excelled in recalling internal factual data, reinforcing that model size directly enhances memorization capabilities.
Figure 1: Scores achieved against model size in million parameters.

Regression Analysis and Statistical Tests

The analysis included linear regression, with $\log(\text{Size})$ proving a more accurate predictor for internal factual knowledge than for linguistic competence. The regression slopes illustrate that factual knowledge scales more sharply with model size than linguistic competence. Moreover, Mann-Whitney U tests suggest significant performance gains mainly occur when comparing models at the extremes of size (small vs. large), particularly in factual knowledge tasks.

Figure 2: Linear regressions for each competence against $\log(\text{Size})$ in million parameters.

Discussion

The findings validate the FLM approach, proposing that linguistic competence can be decoupled from factual knowledge, thus informing the design of more efficient and interpretable models. By externalizing factual retrieval to specialized systems, smaller models can maintain robust linguistic capabilities. This modular design aligns with cognitive models and offers a promising path towards sustainable AI applications.

Conclusion

This paper provides evidence supporting the development of Fundamental LLMs, which focus on separating linguistic competence from factual knowledge. The results advocate for smaller, specialized models that efficiently handle language processing while relying on external systems for factual information, potentially transforming NLP infrastructures.

This modular approach not only enhances model efficiency but also paves the way for more adaptable and scalable AI systems, with implications for the future of artificial intelligence research and application.