LLMs as Enhancers in ML Pipelines

Updated 11 September 2025

LLMs-as-Enhancers are a paradigm where large language models augment traditional pipelines by encoding rich, context-aware representations through feature- and text-level enhancement.
They integrate with downstream architectures like GNNs and multimodal systems, enhancing tasks such as node classification, link prediction, and robust adversarial resistance.
Empirical gains include improved clustering metrics, reduced adversarial vulnerability, and higher data efficiency, while balancing computational costs and integration challenges.

LLMs as enhancers—often labeled “LLMs-as-Enhancers”—comprise a paradigm in which LLMs are employed not as direct predictors or generators but as mechanisms to augment, refine, or enrich representations, features, or knowledge within downstream machine learning, graph, vision, and information systems. Rather than replacing established models, LLMs are interposed before (or in conjunction with) classical systems such as GNNs, standard ML estimators, or multimodal reasoning pipelines, supplying richer, more semantically nuanced, or generatively expanded features. This approach leverages the extensive world knowledge and contextual understanding of LLMs to transcend the limitations of shallow embeddings, brittle input features, and static model architectures in a variety of domains.

1. Principal Enhancement Strategies

Two principal methodologies define LLMs-as-Enhancers:

Feature-Level Enhancement: LLMs (or embedding-exposing PLMs) encode raw, naturally occurring data (typically node-level text attributes, sentences, or domain-specific tokens) into dense, context-rich vector representations. These embeddings, denoted $h_i = f(s_i)$ $h_{i} = f (s_{i})$ , where $f(\cdot)$ $f (\cdot)$ is an LLM or embedding model, become the basis for further processing in downstream networks such as GNNs. Feature-level enhancements can be realized via:
- Cascading (sequential) structures, where LLMs encode each node separately and feed embeddings into a message-passing framework: $h_i^{(l)} = UPD^{(l)}(h_i^{(l-1)}, AGG_{j \in \mathcal{N}(i)} MSG^{(l)}(h_i^{(l-1)}, h_j^{(l-1)}))$ (Chen et al., 2023).
- Iterative (co-training) structures, where LLMs and GNNs generate pseudo-labels for each other (e.g., GLEM-LM, GLEM-GNN); while more powerful, these have increased computational overhead, especially in low-label settings.
Text-Level Enhancement: For embedding-invisible or instruction-tuned LLMs, enhancement utilizes natural-language prompt engineering. LLMs generate semantically augmented text (explanations, entity lists, or paraphrased descriptions) that is then re-encoded by sentence embedding models or PLMs to create richer features. Techniques include:
- TAPE: Generating explanatory texts clarifying the node content–label relationship.
- KEA: Prompting LLMs for related knowledge entities or technical descriptions, which are concatenated or ensembled with original attributes.

In both regimes, ensembling (averaging or concatenating) original features with LLM-augmented ones further improves downstream performance, especially for classification and node differentiation tasks (Chen et al., 2023).

2. Architectural Integration and Downstream Workflows

LLMs-as-Enhancers can be systematically integrated into diverse downstream architectures:

Graph Machine Learning: In text-attributed graphs (TAGs), LLM-derived node representations are utilized as input to GNNs, which aggregate local neighbor information. Topological enhancements are also possible: LLMs assess semantic similarity (via prompt-based evaluation), guiding edge deletion/addition to denoise graph structure or infer new links. Additionally, LLM-generated pseudo-labels support label propagation regularization, refining the learning of edge weights (Sun et al., 2023).
Multimodal Systems: Vision-enhanced LLMs employ dedicated modules such as Modular Visual Memory (MVM) for storing image-derived knowledge, and soft Mixtures-of-Multimodal Experts (MoMEs) to coordinate the impact of visual and textual experts at the token level (Li et al., 2023). In audio-visual speech recognition, sparse mixtures-of-projectors (SMoP) use modality-specific routers to efficiently scale model capacity by selectively activating expert projectors without increasing inference cost (Cappellazzo et al., 20 May 2025).
CLF/ML Estimators: LLMs serve as auxiliary predictors whose outputs are linearly (or adaptively) combined with classical ML scores (e.g., logistic regression), or as calibration sources to satisfy a multi-accuracy condition, i.e., $E[Y - f(X) \mid Z] = 0$ , where $Z$ is the LLM output. LLM-labeled data can also augment the training set in transfer learning scenarios (Wu et al., 8 May 2024).
Bioinformatics and Specialized Domains: Adapted LLMs (through domain-specific retraining and specialized tokenization) can produce valid protein sequences, with outputs validated by metrics such as pLDDT, RMSD, and TM-score, demonstrating parity or superiority with established protein sequence models on limited-data regimes (Zeinalipour et al., 12 Aug 2024). In geospatial modeling, text-derived geocoordinate representations formed by LLMs (“LLMGeovec”) are directly concatenated to other spatio-temporal features, improving generalization and performance in tasks ranging from climate prediction to traffic forecasting (He et al., 22 Aug 2024).

3. Empirical Gains and Robustness

Across application domains, LLMs-as-Enhancers have yielded substantial empirical gains:

Representation Quality: Deep sentence embeddings produced by LLMs (Sentence-BERT, e5-large, Deberta-base, etc.) in sequential cascades consistently outperform shallow features (BoW, TF-IDF) for standard graph datasets (Cora, Pubmed, Ogbn-arxiv, etc.) (Chen et al., 2023, Guo et al., 16 Jul 2024). These representations are highly clusterable and robust to perturbations, as shown by lower Davies-Bouldin Index (DBI) and t-SNE separability.
Robustness to Adversarial Attacks: LLM-enhanced node features resist both structural (edge) and textual (attribute) adversarial attacks. Performance drop (GAP) from edge perturbation is considerably lower for LLM features (e.g., 3.9% for LLaMA) compared to TF-IDF (13.9%) in strong-attack regimes. Attack success rates (ASR) under textual manipulations are also significantly reduced when LLM features are used, especially after GNN aggregation (Guo et al., 16 Jul 2024).
Generalization and Data Efficiency: In reinforcement learning, using LLMs to abstract state features and provide reward labels enables highly data-efficient learning of language-conditioned policies, generalizing to previously unseen tasks. Fine-tuned LLMs in the TEDUO pipeline (e.g., Llama-3-8B) achieve 65%–55% success in training/novel settings, vastly outperforming standard RL and naive imitation learning baselines (Pouplin et al., 9 Dec 2024).
Domain-Transcending Enhancement: LLMGeovec, using only OpenStreetMap text and no expensive imagery/mobility data, achieves $R^2 > 0.90$ on several global indicators and can outperform or match supervised state-of-the-art geospatial models (He et al., 22 Aug 2024). In protein design, retrained LLMs on just 42K human proteins achieve pLDDT, RMSD, and TM-score at parity with models trained on millions of sequences (Zeinalipour et al., 12 Aug 2024).

4. Efficiency, Scalability, and Practical Considerations

LLMs-as-Enhancers expose several trade-offs:

Training and Inference Cost: Feature-level enhancement with pretrained embedding models is computationally efficient, as embeddings are static and used directly in downstream models. Iterative co-training or prompt-generation (e.g., running ChatGPT over large datasets in TAPE/KEA) introduces notable scaling costs. For iterative GLEM-style frameworks, the expense is acute in low-labeling scenarios (Chen et al., 2023).
Model Scaling: Simply enlarging an LLM (more parameters) does not guarantee improvements in embedding quality; careful selection of pretraining objectives and domain adaptation strategies is critical. Parameter-efficient enhancement techniques—such as LLMBRACES, which modulate FFN sub-updates in transformer layers with a lightweight relevance module—outperform LoRA with up to 75% fewer parameters and further permit conditional output control (e.g., for sentiment) (Shen et al., 20 Mar 2025).
Plug-and-Play Integration: Enhancers can be architected as plug-and-play modules (e.g., the Attention-based Transmission (AT) module), allowing dynamic optimization of which LLM-derived features are transmitted to the GNN, yielding consistent accuracy improvements of ~1–3% across diverse datasets and backbones (Gao et al., 13 May 2025).
Deployment Constraints: LLM-enhanced models may require greater hardware resources (e.g., in biomedical NER, LLMs achieved 2–8% higher F1, but inference times were 1–2 orders of magnitude slower than encoder models). For real-time or resource-constrained systems, encoder-based models may still hold practical advantages (Obeidat et al., 1 Apr 2025).

5. Domain-Specific and Cognitive Extensions

LLMs-as-Enhancers have been adapted to specialized and cognitive modeling domains:

Topology Refinement in Graphs: Beyond feature enhancement, LLMs can be used to optimize the graph structure itself, removing unreliable edges and adding semantically relevant ones using prompt-based similarity evaluation, and guiding topology learning via pseudo-label propagation (Sun et al., 2023).
Multimodal Knowledge Storage: In vision-enhanced LLMs, dedicated memory modules (e.g., Modular Visual Memory) and dynamic expert mixing facilitate the internalization and adaptive utilization of visual knowledge in reasoning tasks, boosting performance on VQA and commonsense benchmarks by up to 16% (Li et al., 2023).
Social Simulation and Behavioral Modeling: Mechanisms such as Social Information Processing-based Chain-of-Thought (SIP-CoT), enhanced with emotion-guided memory, enable LLM agents to better simulate human social dynamics, reducing bias/divergence against human stance and emotion distributions ( $\Delta_{bias}$ reduced to 0.0521 from 0.108) and improving alignment F1 scores to 0.745–0.888 (Zhang et al., 8 Jul 2025).
Analytical Reasoning: LLMs augmented with structured memory modules (Dynamic Evidence Trees) and evidence condensation pipelines enable more effective organizing and narrative generation over multi-document intelligence analysis tasks, though current models remain less imaginative than expert analysts (Yousuf et al., 25 Nov 2024).

6. Theoretical and Mechanistic Insights

Recent analysis of context-enhanced learning, a gradient-based equivalent of in-context learning, established that providing extra, privileged information (e.g., curriculum text) in the training context—without computing loss on those tokens—improves the accuracy of gradient signals, leading to an exponential reduction in the sample complexity required for learning tasks such as multi-step translation ( $SQ\text{-}dim \geq n^{\Omega(d)}$ standard vs. $O(poly(n)d\log d)$ context-enhanced) (Zhu et al., 3 Mar 2025). Furthermore, experiments show that such privileged context is not verifiably “memorized” or leaked, underpinning significant implications for data security and copyright.

7. Future Research Directions and Open Challenges

The LLMs-as-Enhancers paradigm motivates several directions:

Hybrid and Adaptive Pipeline Selection: Exploring hybrid workflows that dynamically choose between cascading and iterative enhancement schemes based on resource and labeling regimes (Chen et al., 2023).
Optimization of Feature Transmission: Investigation of more sophisticated, adaptive, or attention-based transmission modules to maximize the causal alignment and informativeness of LLM-derived features, with measurable impact on accuracy and interpretability (Gao et al., 13 May 2025).
Extension to New Modalities and Tasks: Scaling enhancement strategies beyond node classification to link prediction, graph clustering, recommendation, protein property inference, AVSR, and multimodal reasoning applications.
Prompt and Memory Engineering: Further sophistication in prompt design as well as architectural memory augmentation (external memory modules, emotion-guided memory, etc.) for greater reasoning depth, control, and interpretability.
Empirical–Theoretical Unification: Continued interplay of empirical ablation, theoretical mechanistic modeling (interchange interventions, context dropout), and causal reasoning will be essential for uncovering the underlying principles governing enhancement effectiveness.

In sum, LLMs-as-Enhancers represent a structurally flexible, empirically validated, and theoretically sound strategy enabling LLMs to infuse existing pipelines across a range of machine learning disciplines with richer representations, robust generalization, and improved performance, while opening new avenues in the design and understanding of composite, interpretable, and highly adaptive intelligent systems.