LLM-Driven NAS Pipeline

Updated 14 December 2025

LLM-driven NAS pipelines are automated systems that use LLM agents and retrieval augmentation to explore large architecture spaces and optimize neural network design.
They integrate multi-agent orchestration, prompt engineering, and closed-loop performance feedback to meet complex design constraints.
Empirical results show significant accuracy improvements and resource efficiency compared to traditional neural architecture search methods.

LLM-driven Neural Architecture Search (NAS) pipelines are modern frameworks that leverage LLMs as core agents for end-to-end automation in neural architecture design. These pipelines integrate generative, retrieval-augmented, and agentic components to efficiently traverse large architectural search spaces, process complex design constraints, and provide semantically guided model discovery and optimization under multi-objective resource and application requirements (Hu et al., 25 Nov 2025).

1. Key Principles and Framework Structure

LLM-driven NAS replaces traditional search and optimization techniques with agentic LLM orchestration and retrieval-augmented reasoning. The canonical workflow, exemplified by LLM-NND for transient stability assessment, consists of:

Natural language scenario input transformed to simulator API calls via prompt engineering modules, leveraging role-based, chain-of-thought (CoT), syntax-optimization, and factuality prompts.
Retrieval-Augmented Generation (RAG): Domain-specific documentation (manuals, API references, code examples) is dynamically fetched from a vector database to steer LLM response grounding.
Multi-agent NAS loop: Three specialized LLM agents—Navigator ("strategy"), Generator ("candidate synthesis"), Operator ("evaluation and feedback")—coordinate iterative model proposal, legality checking, training, performance assessment, and feedback compilation.
Closed-loop optimization: Iterative feedback is provided via natural language encoding of quantitative metrics (accuracy, parameter count, inference latency), enabling focused exploration and refinement.
Generalized objective:

$A^* = \arg\max_{A\in\mathcal A} \left\{ P(A) - \lambda_C C(A) - \lambda_L L(A) \right\}$

where $P(A)$ , $C(A)$ , $L(A)$ represent accuracy, parameter count, and latency, respectively, Subject to domain-specific $\lambda_C, \lambda_L$ scaling (Hu et al., 25 Nov 2025).

2. Search Space Parameterization and Prompt Engineering

LLM-driven NAS frameworks specify discrete modular search spaces encompassing multiple layer and operation types. Exemplary parameterization includes:

Element	Options / Ranges
Layer types	Dense, 1D Conv, LSTM/GRU, Transformer, Graph Conv
Operations	BatchNorm, LayerNorm, Dropout, Attention, Pooling
Activations	ReLU, LeakyReLU, GELU, Tanh, Sigmoid
Kernel sizes	{3, 5, 7}
Channel widths	[16, ..., 512]
Depths	[2, ..., 10]
Output heads	Multi-class Softmax, regression

Prompt construction encodes the target metric constraints, domain role, and search strategy. For example:

{
  "model": {
    "type": "Conv1D",
    "layers": [
      {"filters": 64, "kernel": 5, "activation": "ReLU"},
      ...
    ],
    "head": {"fc": 128, "dropout": 0.3}
  },
  "training": {
    "loss": "CrossEntropy",
    "optimizer": "Adam", "lr": 1e-3,
    "scheduler": "Cosine",
    "epochs": 50,
    "batch_size": 64
  }
}

Performance targets and resource constraints (e.g., "Max latency <10 ms", "Params <10M") are incorporated into prompt templates for strategy-driven LLM response (Hu et al., 25 Nov 2025).

3. Performance-Guided Feedback, Evaluation, and Closed-Loop Optimization

Critical to LLM-driven NAS is the synthesis and encoding of performance metrics within feedback prompts. For each candidate architecture $A_t$ , the Operator LLM calculates:

Accuracy $P_t$ on hold-out validation set.
Parameter count $C_t$ .
Latency $L_t$ (e.g., ms/sample).

Feedback is crafted as:

"ValAcc=91.2% (< target 92%), params=5.6M, latency=1.1 ms. Observed slight overfitting (val_loss↑ after epoch 40). Recommend: add Dropout(0.3) after conv layers, switch to Focal Loss."

These natural language recommendations are ingested by the Navigator LLM to generate subsequent strategies targeting areas such as model regularization or architectural simplification. Convergence is achieved when the objective function surpasses the target threshold or the iteration budget $T$ is exhausted (Hu et al., 25 Nov 2025).

4. Empirical Validation, Ablations, and Comparative Analysis

LLM-driven NAS delivers substantial improvement in accuracy-resource trade-offs. In the TSA domain:

Model	Parameters	Accuracy	Inference time (ms/sample)
LLM-NND	4.78M	93.71%	0.95
DenseNet	25.9M	80.05%	3.2

Ablation studies demonstrate that removal of domain-grounded retrieval reduces accuracy by −5.6 pp, CoT reasoning by −3.5 pp, and feedback loop by −4.0 pp, underscoring the necessity of integrated retrieval, reasoning, and iterative feedback. These synergies yield at least 3–6 pp net improvements per component (Hu et al., 25 Nov 2025).

5. Generalization, Domain Extension, and Pipeline Adaptivity

LLM-NAS workflows generalize to diverse domains by:

Building vector databases from domain manuals and case studies for retrieval augmentation.
Modularizing LLM agents for strategy, candidate generation, and evaluation to enhance stability and coverage.
Systematic prompt taxonomy leveraging chain-of-thought, factuality, and syntax optimization.
Extending the workflow to tasks such as optimal power flow, fault location, market operations, and—by simulator substitution—other fields (e.g., chemistry, robotics, multi-modal RL).

This suggests the LLM-NAS pipeline is highly extensible across structured simulation–to–model workflows where domain corpora and simulation APIs are available (Hu et al., 25 Nov 2025).

6. Implementation Protocols and Scalability

The LLM-NND pipeline is demonstrated with 9,139 labeled samples, four-class output, and time-series input features derived from power system simulations. Models are trained via Adam optimizer, cosine annealing, CrossEntropy/FocalLoss, early stopping on validation loss, and batch size of 64. The approach maintains real-time latency (<0.95 ms/sample) on an NVIDIA T4 (Hu et al., 25 Nov 2025).

Scalability is achieved via:

Performance-driven, multi-agent closed loops.
Smart sampling (k-candidate per iteration).
Modular feedback and memory to coordinate improvement.
Retrieval augmentation for semantic coverage.

7. Outlook and Limitations

LLM-driven NAS pipelines automate neural architecture search with unprecedented accuracy and efficiency. However, their efficacy depends on the availability of rich domain corpora, well-designed prompt engineering, and the internal proficiency of the deployed LLM agents. Extensions to other domains require careful adaptation of the retrieval corpus, scenario language, and performance criteria. The method is especially suited to scientific computing, cyber-physical systems, and resource-constrained edge deployment (Hu et al., 25 Nov 2025).

References:

"LLM-Driven Transient Stability Assessment: From Automated Simulation to Neural Architecture Design" (Hu et al., 25 Nov 2025)

PDF Markdown Chat (Pro)

References (1)

LLM-Driven Transient Stability Assessment: From Automated Simulation to Neural Architecture Design (2025)

Whiteboard

Generate a whiteboard explanation of this topic.

Topic to Video (Beta)

Generate a video overview of this topic.

Follow Topic

Get notified by email when new papers are published related to LLM-driven NAS Pipeline.