Discovering Quality-Diversity Algorithms via Meta-Black-Box Optimization (2502.02190v1)

Published 4 Feb 2025 in cs.NE and cs.LG

Abstract: Quality-Diversity has emerged as a powerful family of evolutionary algorithms that generate diverse populations of high-performing solutions by implementing local competition principles inspired by biological evolution. While these algorithms successfully foster diversity and innovation, their specific mechanisms rely on heuristics, such as grid-based competition in MAP-Elites or nearest-neighbor competition in unstructured archives. In this work, we propose a fundamentally different approach: using meta-learning to automatically discover novel Quality-Diversity algorithms. By parameterizing the competition rules using attention-based neural architectures, we evolve new algorithms that capture complex relationships between individuals in the descriptor space. Our discovered algorithms demonstrate competitive or superior performance compared to established Quality-Diversity baselines while exhibiting strong generalization to higher dimensions, larger populations, and out-of-distribution domains like robot control. Notably, even when optimized solely for fitness, these algorithms naturally maintain diverse populations, suggesting meta-learning rediscovers that diversity is fundamental to effective optimization.

Summary

The paper introduces Learned Quality-Diversity (LQD), a meta-learned framework for discovering novel and effective Quality-Diversity algorithms by learning competition rules.
LQD learns these competition rules from data using meta-black-box optimization on diverse tasks, parameterized by a transformer neural network.
Learned LQD algorithms generalize well, match or exceed baseline performance on BBOB and robot tasks, and inherently maintain diversity even when optimized for fitness.

The paper introduces Learned Quality-Diversity (LQD), a meta-learned approach to discovering Quality-Diversity (QD) algorithms. Instead of relying on hand-designed heuristics like grid-based competition in MAP-Elites (ME) or nearest-neighbor competition in Dominated Novelty Search (DNS), the authors propose to learn the competition rules directly from data using meta-black-box optimization. The core idea is to parameterize the competition function of a genetic algorithm using an attention-based neural architecture and then evolve the parameters of this architecture to discover novel and effective QD algorithms.

The authors frame QD algorithms as genetic algorithms where global competition is replaced by local competition. They define a flexible framework where the competition function, which determines how individuals compete for survival, is parameterized by a transformer neural network with parameters $\theta$ . This learned competition function takes as input the fitness values and descriptor vectors of the population and outputs a competition fitness value for each individual. The transformer architecture is chosen for its permutation equivariance properties, ensuring that the competition rules are invariant to the ordering of individuals in the population.

To discover effective LQD algorithms, the authors employ a meta-learning procedure. This involves training the LQD parameters on a diverse set of 22 Black-Box Optimization Benchmarking (BBOB) functions, with varying dimensionality, noise models, and random rotations of the search space. Each BBO task is associated with a descriptor space through random projection. The LQD parameters are optimized using Sep-CMA-ES for 16,384 meta-generations.

The authors explore three distinct meta-objectives, leading to specialized LQD variants:

LQD (F) optimized for fitness
LQD (N) optimized for novelty
LQD (F+N) optimized for a QD score that balances fitness and diversity.

Experiments are conducted to evaluate the performance of the discovered LQD algorithms against established baselines, including ME, DNS, Genetic Algorithm (GA), and Novelty Search (NS). The results demonstrate that LQD can match or exceed the performance of these baselines on both meta-training tasks and out-of-distribution BBOB functions.

One key finding is that LQD variants trained purely for fitness optimization naturally maintain significant population diversity. Even though LQD (F) is trained to maximize fitness, it achieves substantially higher novelty scores than a standard GA, suggesting that meta-optimization has rediscovered the principle that maintaining a diverse population creates stepping stones for discovering high-performing solutions.

The authors visualize the competition fitness values assigned by each LQD variant across the descriptor space to understand the learned local competition strategies. LQD (N) develops a distance-based competition mechanism that rewards solutions for being far from existing ones. LQD (F) learns a more nuanced strategy that creates fitness-sensitive reward patterns around promising solutions. LQD (F+N) combines these approaches, balancing novelty seeking with fitness sensitivity.

To assess the importance of descriptors, the authors conduct an ablation paper comparing LQD with task-specific descriptors, LQD with random descriptors, and a standard GA without descriptors. The results show that LQD with random descriptors performs similarly to GA, while LQD with task-specific descriptors significantly outperforms both alternatives. This indicates that LQD effectively leverages the structure encoded in the descriptor space to identify promising solutions.

The generalization capabilities of LQD are demonstrated on a suite of challenging robot control tasks, including Hopper, Walker2d, Half Cheetah, and Ant. LQD was never exposed to robotic control problems, high-dimensional search spaces, or domain-specific descriptors during training. Despite this, LQD matches DNS's leading performance on the Hopper task, outperforms all baselines on Walker2d, Half Cheetah, and Ant with feet contact descriptors, and equals GA's top performance on Ant with velocity-based descriptors.

The authors conclude that meta-learning can discover sophisticated QD algorithms that outperform traditional hand-designed approaches. The LQD framework, which parameterizes competition rules using attention-based architectures, captures complex relationships between solutions and maintains the benefits of local competition. The discovered algorithms exhibit superior performance, robust generalization, and emergent diversity, suggesting that meta-learning rediscovers diversity as an instrumental goal for achieving peak performance.

PDF Markdown

Tweets

https://twitter.com/maxencefaldor/status/1907390364249649172

https://twitter.com/ceobillionaire/status/1907453674986602965

https://twitter.com/Montreal_AI/status/1907457751510311031

Discovering Quality-Diversity Algorithms via Meta-Black-Box Optimization (2502.02190v1)

Summary

Related Papers

Tweets