This paper introduces a novel approach called LLM-driven Particle Swarm Optimization (PSO) to accelerate the hyperparameter tuning process for deep learning (DL) models (Hameed et al., 19 Apr 2025 ). The core problem addressed is the computationally expensive and often manual nature of finding optimal DL architectures, specifically parameters like the number of layers and neurons/filters. Traditional methods like grid search are exhaustive, while standard metaheuristics like PSO can sometimes converge slowly or get stuck in local optima.
The proposed solution integrates LLMs, specifically ChatGPT-3.5 and Llama3, into the standard PSO algorithm. The key idea is to leverage the pattern recognition and generation capabilities of LLMs to guide the PSO search more effectively. Instead of relying solely on the PSO update rules (Equations 1-4), the LLM-driven PSO periodically queries an LLM with the current state of the particle swarm (positions, velocities, and associated costs). The LLM then suggests potentially better particle positions and velocities.
Methodology: LLM-Driven PSO
The methodology involves two phases:
- Standard PSO Phase: Initially, a standard PSO algorithm (Algorithm 1) runs for a small number of iterations to explore the search space. This phase establishes initial personal best (
pbest
) and global best (gbest
) positions. - LLM-Enhanced Phase: After the initial PSO iterations, the system queries the LLM (Algorithm 2). The prompt (shown in the paper's text box labeled "Format of our LLM Prompt") provides the LLM with the current particle information (neurons/layers, velocities, cost). The LLM is asked to generate a new set of particle positions (neurons/layers) and velocities intended to reduce the cost function further.
- Particle Replacement: The LLM's suggestions are evaluated. The worst-performing particles from the current PSO swarm are replaced with the best suggestions provided by the LLM.
- Iteration: The PSO process continues, potentially making further calls to the LLM if the global best does not improve significantly or until a maximum iteration count is reached.
This process aims to replace computationally expensive DL model evaluations (which are needed to calculate the cost/fitness for each particle in standard PSO) with cheaper LLM calls and potentially guide the search towards the global optimum faster.
Implementation Details
- PSO Parameters: The standard PSO parameters like population size, inertia weight (), and acceleration coefficients () are used (Table I). The particle dimensions represent the hyperparameters being tuned (e.g., number of layers, number of neurons/filters).
- LLM Interaction: A specific prompt format was designed to ensure the LLM returns suggestions in a parsable format. The prompt includes the current best particle configurations and their costs, asking the LLM for new configurations likely to yield lower costs.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
# Simplified Prompt Structure Example my_prompt = f""" Below is the string showing the best number of neurons as the first entry and best number of layers as the second entry of the DL model for {Npop} particles with their corresponding cost as the fifth entry... The first entry (Neurons) ranges from 2 to 200, while the second entry (Layers) ranges from 2 to 5. {particle_prompt_string} Give me exactly {Npop} more number of neurons and layers for the same model in order to reduce the cost further. Your response must be exactly in the same format as input and must contain only values. Your response must not contain the cost values. """
- Hyperparameter Ranges: Specific ranges were defined for the hyperparameters being optimized (e.g., layers: [2, 5], neurons: [2, 200]).
Experimental Evaluation
The LLM-driven PSO was evaluated in three scenarios:
- Rastrigin Function Optimization: A standard mathematical benchmark function (Equation 5) used to test optimization algorithms. LLM-driven PSO showed modest improvements in convergence speed (fewer iterations) compared to standard PSO, especially with Llama3 (Table II vs. Table IV, Figure 5 vs. Figure 7). Llama3 achieved reductions of 2.94% to 8.50% in iterations, while ChatGPT-3.5 showed improvements mainly for larger particle sizes (up to 4.25%).
- LSTM Hyperparameter Tuning for Regression: Optimizing the number of layers and neurons for an LSTM model predicting Air Quality Index (AQI) based on sensor data (Figure 4). The goal was to minimize Root Mean Squared Error (RMSE). Standard PSO (5 particles, 10 iterations) required 50 model calls.
- LLM-driven PSO with ChatGPT-3.5 achieved comparable RMSE with only 20 model calls (4 PSO iterations total), a 60% reduction (Table VI, Figure 8).
- LLM-driven PSO with Llama3 required 30-40 model calls (6-8 PSO iterations total), a 20%-40% reduction (Table VI, Figure 8).
- The final RMSE values were statistically similar across all methods (Figure 9).
- CNN Hyperparameter Tuning for Classification: Optimizing the number of layers and filters for a CNN classifying images as recyclable or organic materials (Figure 3). The goal was to maximize classification accuracy. Standard PSO (5 particles, 10 iterations) required 50 model calls.
- LLM-driven PSO with both ChatGPT-3.5 and Llama3 achieved comparable accuracy with only 20 model calls (4 PSO iterations total), a 60% reduction (Table VII, Figure 10).
- Final accuracy was statistically similar across methods (Figure 11).
Key Findings and Practical Implications
- Reduced Computational Cost: The primary benefit is a significant reduction (20%-60%) in the number of expensive DL model training runs (model calls) needed to find good hyperparameters, while maintaining comparable model performance (RMSE/accuracy).
- Faster Convergence: LLMs can effectively guide the PSO search, replacing poorly performing particles and accelerating convergence towards optimal hyperparameter configurations.
- Efficiency: Using a small number of particles (e.g., 5) combined with LLM guidance proved effective for DL hyperparameter tuning, minimizing the overhead of both PSO and DL model evaluations.
- LLM Choice: ChatGPT-3.5 generally required fewer iterations/calls than Llama3 in the DL tasks, although Llama3 showed slightly better performance on the Rastrigin function.
- Prompt Engineering: The effectiveness relies on well-structured prompts that elicit useful and correctly formatted suggestions from the LLM.
- Applicability: The method offers a practical way to speed up hyperparameter optimization, particularly valuable in resource-constrained environments or when model training is time-consuming. It can potentially be adapted for other metaheuristic algorithms like Genetic Algorithms.
The paper demonstrates a practical and efficient method for leveraging LLMs to enhance a well-established optimization technique (PSO) specifically for the common challenge of DL hyperparameter tuning. The reduction in model evaluations translates directly to savings in computation time and resources.