LLM Influence on Research and Society

Updated 25 April 2026

Large Language Models (LLMs) are transformer-based neural architectures pre-trained on extensive text corpora that drive measurable shifts in scholarly writing and coding practices.
Methodological frameworks like Token Distribution Dynamics enable precise attribution of prompt influence and improve output steering for safety and accuracy.
LLM adoption has reshaped research across disciplines by altering citation patterns, inducing linguistic homogenization, and raising ethical and sociocognitive challenges.

LLMs are advanced neural architectures, most commonly based on the Transformer paradigm, pre-trained on vast text corpora and subsequently adapted for various downstream tasks. Their influence now pervades academic writing, spoken scientific discourse, cross-disciplinary research, code development, collective cognition, and even social and political decision-making. This article systematically surveys the empirically evidenced influence of LLMs, integrating stylistic, methodological, sociolinguistic, and ethical dimensions.

1. Linguistic and Stylistic Influence on Academic Writing

The adoption of LLMs has yielded measurable shifts in the lexicon and style of scholarly communication. A temporally resolved analysis of ~2.9 million arXiv submissions demonstrates both the statistical magnitude and heterogeneity of these changes (Geng et al., 26 Mar 2026). LLMs have driven a marked increase in the frequency of certain “model-favored” terms in paper titles (e.g., “beyond”: predicted 0.8% → observed 1.2% in 2025, effect +0.4 percentage points [95% CI: 0.30, 0.50]; “via”: 1.5% → 2.1%, effect +0.6 pp [0.45, 0.75]). In abstracts, traditionally common function words (“the,” “of”) show sharp drops (e.g., “the”: observed 5.4% in early 2026 vs. predicted 8.1%, effect −2.7 pp [−3.2, −2.2]; “of”: 3.9% vs. 5.4%, −1.5 pp [−1.8, −1.2]).

Classifier-based authorship attribution experiments, targeting the identification of specific LLM-generated texts among 2,000 abstracts rewritten by nine different models, reveal the increasing homogenization of LLM output. Binary classifier accuracy remains high (80–95%), but realistic multi-class model identification tasks see accuracy at or below 65%. The convergence of LLM-generated and human-written text is further evidenced by near-identical ROUGE and BERTScore metrics and overlapping word-usage fingerprints.

Stylometric contagion renders manual or binary detection techniques insufficient; previously distinctive “AI-style” markers are now diffused throughout the human authoring ecosystem, necessitating time-resolved and lexicon-drift–aware surveillance. Cross-model calibration is critical, as different LLMs and prompt templates impart distinct, dynamically evolving linguistic fingerprints.

2. Prompt Influence and Causal Input Saliency

LLM output is sensitive to the structure and content of input prompts, with recent research formalizing fine-grained frameworks to attribute causal influence to individual prompt tokens (Feng et al., 2024). The Token Distribution Dynamics (TDD) approach projects each token’s hidden state into vocabulary space, quantifying at each step the contrastive shift in model output distribution. TDD variants (forward, backward, bidirectional) afford interpretable, vocabulary-level saliency assignments that outperform prior gradient- or attention-based methods in faithfulness (up to +5 p.p. AOPC and −7 p.p. Sufficiency on BLiMP, LLaMA2-13B, OPT-30B).

These fine-grained input attributions enable principled, targeted prompt manipulations for zero-shot toxicity suppression (e.g., toxicity down 59% on GPT2 REALTOXICPROMPTS corpus) and sentiment steering (e.g., negative-sentiment proportion: 48%→87%; positive: 52%→78%). The TDD framework thereby elevates prompt engineering from heuristic trial-and-error to systematic, causally justified design.

3. Cross-Disciplinary Proliferation and Applications

LLMs have catalyzed a profound broadening of NLP influence across the full spectrum of academia, as quantified by large-scale citation and content analyses (Pramanick et al., 2024). Among ~148,000 non-CS research papers citing a manually curated set of 106 LLMs, Linguistics (23.2%) and Engineering (22.3%) dominate, together accounting for ~45% of non-CS LLM citations. Adoption is accelerating across Medicine (17.6%), Environmental Science (6.9%), Mathematics (5.1%), and at weaker rates (<1%) in fields like History and Geography.

Field-normalized adoption rates illuminate the depth of LLM integration (Linguistics: 5–6% of all papers; Law: ~4.5%; Mathematics: ~4.0%; Medicine: ~2.0%). Statistical indices confirm increased disciplinary diffusion (2018–2023 Gini index: 0.70→0.56), surpassing earlier paradigms such as RNNs and HMMs in breadth of uptake. The dominant usage mode is task-agnostic, zero- or few-shot (inference-only) deployment, with fine-tuning relatively rare outside CS, Law, and Medicine. Domain-specific opportunities (e.g., ancient text restoration, geospatial data analysis, clinical workflows) are juxtaposed against unresolved deficits in specialized model development and ethical self-reflection; only ~2% of domain papers that cite LLMs explicitly discuss issues such as bias or reproducibility.

4. Sociocognitive Effects and Susceptibility Patterns

LLMs display social cognitive vulnerabilities paralleling those in human agents (Bian et al., 2023, Griffin et al., 2023). Controlled experiments demonstrate that LLM memories and beliefs are systematically modifiable by external statements, especially when delivered via high-authority sources (accuracy with counterfactual “research paper” context: 32.9%, vs. 81.1% baseline; Spearman’s ρ=–1.0, p<0.001 for credibility bias). LLM opinions show context-sensitive shifts across domains and styles, modulated by stimulus attributes such as authority, in-group status, and positive social roles. For subjective influence, LLMs exhibit a Pollyanna bias (prefer positive sentiment), high susceptibility to emotional contagion, and strong in-group bias in opinion updating.

Direct comparisons between human and LLM susceptibility to influence, using paradigms such as the Illusory Truth Effect (ITE) and populist framing of news, reveal qualitative alignment in the direction and linearity of effects, although LLMs tend to exhibit higher variance and do not fully capture the moderation by latent individual-difference variables observed in humans.

5. Standardization and Homogenization of Language, Reasoning, and Expression

Cumulative evidence from linguistics, cognition, and computer science demonstrates a pronounced LLM-driven convergence of linguistic, perspectival, and reasoning diversity (Sourati et al., 2 Aug 2025). Direct measurements on Reddit posts and scientific abstracts (2015–2024) reveal lexical entropy reduction (12%: D_lex ≈ 3.2 bits→2.8 bits), Simpson’s index decline (0.72→0.63, p<.001), and reduced BERTScore variance (–18%, p<.01). Sociolinguistic markers, such as region- or culture-specific lexical items, are suppressed by LLM exposure (e.g., –28%, t=4.3, p<.001).

The design and deployment loop—pretraining on socially skewed corpora, alignment via RLHF, next-token prediction bias toward consensus, and real-world re-injection—amplifies dominant patterns at the expense of minority forms. Empirical studies document a concurrent loss of memory retention and neural diversity (e.g., –22% fMRI network coherence, –18% memory at test, p<.05) and a rise in cross-cohort similarity of expression.

Widely proposed remedies include distribution-matching fine-tuning (KL(q || p_LLM)), explicit diversity rewards during RLHF, and pluralistic alignment via multi-persona or culturally tuned agents.

6. Influence on Code Generation, Style, and Software Practice

LLMs have transformed code synthesis, debugging, documentation, and review. Analysis of nearly 20,000 GitHub repositories linked to arXiv papers shows a temporal drift aligned with LLM-generated code style. In Python, the proportion of snake_case variable names increased from 47% (Q1 2023) to 51% (Q1 2025); function names showed a similar shift (44%→49%) (Xu et al., 13 Jun 2025). LLM-produced code exhibits longer, more descriptive identifiers and greater consistency in stylistic conventions, as well as high similarity (cosine/Jaccard >0.7) to reference code when guided.

Despite these stylistic shifts, no systematic trends are observed in deeper structural complexity or maintainability metrics on large-scale codebases. LLMs' natural-language reasoning chains for algorithmic tasks are marked by frequent articulation of both correct and incorrect solution strategies; error rates generally exceed match rates, and the frequency of incorrect approach suggestions rises with problem difficulty.

At the industry level, LLM adoption drives increases in productivity (e.g., code-generation–driven time reductions of up to 30%), correctness (passing unit tests: human-only 75% → LLM-assisted 90%), and perceived quality, but raises acute challenges in data privacy, user mistrust, hallucination management, and prompt engineering complexity (Jalil, 2023). Twelve open problems in LLM-assisted software engineering remain, ranging from non-determinism and best-practice embedding to wide-scale bias mitigation and human-centric trust calibration.

LLMs not only shape linguistic and procedural norms but also exert measurable persuasion and ideological transfer. Controlled human–LLM interaction studies show that users exposed to LLM chatbot advice are approximately 5 percentage points more likely to endorse the same policy preferences as the LLM (τ̂=0.050, SE=0.014), an effect size on par with paid political advertising (AlDahoul et al., 7 May 2025). These effects are robust across levels of user interest, news engagement, and prior familiarity.

While aggregate “partisanship” indicators may suggest LLM moderation, careful decomposition shows that LLMs embody offsetting issue-specific extremity and inconsistent ideological bundles—akin to moderate, but inconsistent, human voters. The persuasive effect is not domain-agnostic; users shift along specific axes aligned with the LLM’s own responses.

From a normative standpoint, benchmarks such as DeliberationBench operationalize “beneficial” influence as directional and magnitude alignment between LLM-induced attitude change and the empirically measured shifts among real-world deliberative-poll participants (Hewitt et al., 22 Feb 2026). Large-scale experiments confirm that LLM-induced opinion shifts are substantial and directionally aligned with deliberative change (r ≈ 0.45; p<0.02), but can exacerbate variance or polarization under certain conversational regimes.

The social-cognitive and ethical stakes are further heightened in cooperative, reputational, and group-dynamic contexts. LLMs, when used to assess or moderate social interactions, can either sustain or destabilize cooperation depending on their implicit “social norm” extraction—operationalized via a four-parameter judgment tensor (d_{G,C}, d_{G,D}, d_{B,C}, d_{B,D}) and evolutionary game-theory modeling (Pires et al., 30 Jun 2025). Prompting interventions (“motivation,” “signalling”) can successfully steer these norms to ones that restore robustness and prosocial outcomes, especially in settings of dispersed (“private”) reputation.

References