Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
156 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Deep Reinforcement Learning using Genetic Algorithm for Parameter Optimization (1905.04100v1)

Published 19 Feb 2019 in cs.NE and cs.RO

Abstract: Reinforcement learning (RL) enables agents to take decision based on a reward function. However, in the process of learning, the choice of values for learning algorithm parameters can significantly impact the overall learning process. In this paper, we use a genetic algorithm (GA) to find the values of parameters used in Deep Deterministic Policy Gradient (DDPG) combined with Hindsight Experience Replay (HER), to help speed up the learning agent. We used this method on fetch-reach, slide, push, pick and place, and door opening in robotic manipulation tasks. Our experimental evaluation shows that our method leads to better performance, faster than the original algorithm.

Citations (87)

Summary

  • The paper demonstrates how applying a GA to tune six key hyperparameters for the DDPG+HER agent significantly reduces the epochs needed to reach a 0.85 success rate.
  • It employs uniform crossover and flip mutation techniques to effectively explore complex, non-linear hyperparameter interactions across various robotic manipulation tasks.
  • The approach automates hyperparameter tuning, offering a scalable method that accelerates convergence and can be adapted for other deep reinforcement learning algorithms.

This paper presents a method for accelerating the training of Deep Reinforcement Learning (DRL) agents, specifically those using the Deep Deterministic Policy Gradient (DDPG) algorithm combined with Hindsight Experience Replay (HER), by optimizing key hyperparameters using a Genetic Algorithm (GA). The core idea is that the default hyperparameters used in DRL algorithms like DDPG+HER are often not optimal for specific tasks, and tuning them can significantly improve learning speed and final performance. However, the relationship between these hyperparameters and performance is complex and non-linear, making simple optimization techniques like grid search or hill climbing ineffective or computationally expensive.

Problem Addressed

Training DRL agents, especially for complex robotic manipulation tasks with sparse rewards (where HER is beneficial), can be very time-consuming. The choice of hyperparameters significantly impacts convergence speed and the final success rate. Finding optimal hyperparameters manually or through exhaustive search is impractical.

Proposed Solution: GA for Hyperparameter Optimization

The authors propose using a Genetic Algorithm (GA) to automatically search for better hyperparameter values for the DDPG+HER algorithm. The GA explores the hyperparameter space to find combinations that minimize the number of training epochs required to reach a target success rate.

Hyperparameters Optimized:

The GA targets six key hyperparameters:

  1. Discount Factor (γ\gamma): Controls the importance of future rewards. Range [0, 1].
  2. Polyak-averaging coefficient (τ\tau): Controls the update rate of the target networks. Range [0, 1].
  3. Critic Learning Rate (αcritic\alpha_{critic}): Learning rate for the critic network optimizer. Range [0, 1].
  4. Actor Learning Rate (αactor\alpha_{actor}): Learning rate for the actor network optimizer. Range [0, 1].
  5. Random Action Probability (ϵ\epsilon): Probability of taking a random action during exploration. Range [0, 1].
  6. Noise Standard Deviation (η\eta): Standard deviation of Gaussian noise added to actions for exploration (as a percentage of max action value). Range [0, 1].

GA Implementation:

  • Encoding: Each hyperparameter is encoded as an 11-bit binary string (allowing for values up to three decimal places). The six binary strings are concatenated to form a 66-bit chromosome representing a full set of hyperparameters.
  • Population: A population of 30 chromosomes is used.
  • Fitness Function: The fitness of a chromosome (a set of hyperparameters) is defined as the inverse of the number of epochs required for the DDPG+HER agent, using those hyperparameters, to first achieve a success rate of 0.85 or higher on the given task. The inverse is used because GA maximizes fitness, effectively minimizing the number of epochs.
    1
    
    fitness = 1 / num_epochs_to_reach_0.85_success_rate
  • Selection: Ranking selection is used, where chromosomes are selected probabilistically based on their fitness rank.
  • Crossover: Uniform crossover is applied to generate offspring (new hyperparameter sets).
  • Mutation: Flip mutation is applied with a probability of 0.1.
  • Generations: The GA runs for 30 generations.

Algorithm Flow (Algorithm 1):

  1. Initialize a population of n chromosomes (hyperparameter sets).
  2. For each chromosome in the population:
    • Configure a DDPG+HER agent with the hyperparameters defined by the chromosome.
    • Train the agent on the target robotic task.
    • Record the number of epochs (E) it takes to first reach a success rate 0.85\geq 0.85.
    • Calculate the fitness as 1/E.
  3. Select parent chromosomes based on fitness (ranking selection).
  4. Generate new chromosomes (offspring) using uniform crossover.
  5. Apply flip mutation to the offspring.
  6. Replace the old population with the new one.
  7. Repeat steps 2-6 for the specified number of generations (30).
  8. The best chromosome found across all generations represents the optimized set of hyperparameters.

Experiments and Results

  • Environments: The method was tested on five simulated robotic manipulation tasks from OpenAI Gym: FetchReach-v1, FetchPush-v1, FetchSlide-v1, FetchPickAndPlace-v1, and DoorOpening.
  • Comparison: The performance (success rate vs. epochs) of DDPG+HER using GA-optimized hyperparameters was compared against the performance using the original, default hyperparameters provided in the OpenAI Baselines implementation.
  • Findings:
    • The non-linear impact of hyperparameters was demonstrated (Figure 1 shows varying performance for different τ\tau values).
    • The GA consistently found hyperparameter sets that resulted in significantly faster learning compared to the baseline parameters across all tested environments (Figures 2, 3, and 5). The agent reached the target success rate (0.85) in fewer training epochs.
    • The final performance achieved with optimized parameters was often comparable or slightly better than the baseline, but achieved much more quickly.
    • Table 1 shows an example set of original vs. GA-optimized parameters for one task, highlighting significant differences found by the GA (e.g., τ\tau changed from 0.95 to 0.184, ϵ\epsilon from 0.3 to 0.055). Notably, the learning rates remained unchanged in this specific run.
  • Code: The authors provide a link to their implementation: https://github.com/aralab-unr/ReinforcementLearningWithGA

Practical Implications and Considerations

  • Automation: This approach automates the tedious and often intuition-driven process of hyperparameter tuning for DRL agents.
  • Efficiency: By finding parameters that lead to faster convergence, the GA optimization reduces the overall time and computational resources needed to train effective DRL agents.
  • Applicability: While demonstrated on DDPG+HER, the GA-based optimization approach is general and could potentially be applied to tune hyperparameters of other DRL algorithms.
  • Computational Cost: The primary drawback is the computational cost of the GA optimization itself. Each fitness evaluation requires training a DRL agent until it reaches the target success rate, which can take hours depending on the task and hardware. Running a GA with a population of 30 for 30 generations means potentially hundreds of DRL training runs. However, this cost might be acceptable if it significantly speeds up subsequent training or leads to better final policies. The evaluations within a generation can be parallelized.
  • Task Specificity: The optimal hyperparameters found by the GA are likely specific to the task/environment they were optimized for. Optimization may need to be rerun for significantly different tasks.

In summary, the paper demonstrates that using a Genetic Algorithm is a viable and effective strategy for optimizing hyperparameters in DDPG+HER, leading to faster training convergence for robotic manipulation tasks compared to using default parameter values.

Youtube Logo Streamline Icon: https://streamlinehq.com