Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Evolution Transformer: In-Context Evolutionary Optimization (2403.02985v1)

Published 5 Mar 2024 in cs.AI and cs.NE

Abstract: Evolutionary optimization algorithms are often derived from loose biological analogies and struggle to leverage information obtained during the sequential course of optimization. An alternative promising approach is to leverage data and directly discover powerful optimization principles via meta-optimization. In this work, we follow such a paradigm and introduce Evolution Transformer, a causal Transformer architecture, which can flexibly characterize a family of Evolution Strategies. Given a trajectory of evaluations and search distribution statistics, Evolution Transformer outputs a performance-improving update to the search distribution. The architecture imposes a set of suitable inductive biases, i.e. the invariance of the distribution update to the order of population members within a generation and equivariance to the order of the search dimensions. We train the model weights using Evolutionary Algorithm Distillation, a technique for supervised optimization of sequence models using teacher algorithm trajectories. The resulting model exhibits strong in-context optimization performance and shows strong generalization capabilities to otherwise challenging neuroevolution tasks. We analyze the resulting properties of the Evolution Transformer and propose a technique to fully self-referentially train the Evolution Transformer, starting from a random initialization and bootstrapping its own learning progress. We provide an open source implementation under https://github.com/RobertTLange/evosax.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (41)
  1. JAX: composable transformations of Python+NumPy programs. (2018). http://github.com/google/jax
  2. Learning to learn without gradient descent by gradient descent. In International Conference on Machine Learning. PMLR, 748–756.
  3. Towards learning universal hyperparameter optimizers with transformers. Advances in Neural Information Processing Systems 35 (2022), 32053–32068.
  4. Multi-step Planning for Automated Hyperparameter Optimization with OptFormer. arXiv preprint arXiv:2210.04971 (2022).
  5. Brax–A Differentiable Physics Engine for Large Scale Rigid Body Simulation. arXiv preprint arXiv:2106.13281 (2021).
  6. Nikolaus Hansen. 2006. The CMA evolution strategy: a comparing review. Towards a new evolutionary computation: Advances in the estimation of distribution algorithms (2006), 75–102.
  7. Real-parameter black-box optimization benchmarking 2010: Experimental setup. Ph.D. Dissertation. INRIA.
  8. Array programming with NumPy. Nature 585, 7825 (2020), 357–362.
  9. John D Hunter. 2007. Matplotlib: A 2D graphics environment. IEEE Annals of the History of Computing 9, 03 (2007), 90–95.
  10. Population based training of neural networks. arXiv preprint arXiv:1711.09846 (2017).
  11. Perceiver: General perception with iterative attention. In International conference on machine learning. PMLR, 4651–4664.
  12. General-purpose in-context learning by meta-learning transformers. arXiv preprint arXiv:2212.04458 (2022).
  13. Louis Kirsch and Jürgen Schmidhuber. 2022. Eliminating meta optimization through self-referential meta learning. arXiv preprint arXiv:2212.14392 (2022).
  14. Self-attention between datapoints: Going beyond individual input-output pairs in deep learning. Advances in Neural Information Processing Systems 34 (2021), 28742–28756.
  15. Generative pretraining for black-box optimization. arXiv preprint arXiv:2206.10786 (2022).
  16. Discovering Attention-Based Genetic Algorithms via Meta-Black-Box Optimization. In Proceedings of the Genetic and Evolutionary Computation Conference. 929–937.
  17. Discovering evolution strategies via meta-black-box optimization. In Proceedings of the Companion Conference on Genetic and Evolutionary Computation. 29–30.
  18. NeuroEvoBench: Benchmarking Evolutionary Optimizers for Deep Learning Applications. Advances in Neural Information Processing Systems 36 (2024).
  19. Robert Tjarko Lange. 2021. MLE-Infrastructure: A Set of Lightweight Tools for Distributed Machine Learning Experimentation. (2021). http://github.com/mle-infrastructure
  20. Robert Tjarko Lange. 2022a. evosax: JAX-based Evolution Strategies. arXiv preprint arXiv:2212.04180 (2022).
  21. Robert Tjarko Lange. 2022b. gymnax: A JAX-based Reinforcement Learning Environment Library. (2022). http://github.com/RobertTLange/gymnax
  22. Large Language Models As Evolution Strategies. arXiv preprint arXiv:2402.18381 (2024).
  23. In-context reinforcement learning with algorithm distillation. arXiv preprint arXiv:2210.14215 (2022).
  24. Set transformer: A framework for attention-based permutation-invariant neural networks. In International conference on machine learning. PMLR, 3744–3753.
  25. Structured state space models for in-context reinforcement learning. arXiv preprint arXiv:2303.03982 (2023).
  26. Arbitrary Order Meta-Learning with Simple Population-Based Evolution. arXiv preprint arXiv:2303.09478 (2023).
  27. Training learned optimizers with randomly initialized learned optimizers. arXiv preprint arXiv:2101.07367 (2021).
  28. Improving language understanding by generative pre-training. (2018).
  29. Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research 21, 1 (2020), 5485–5551.
  30. Ingo Rechenberg. 1978. Evolutionsstrategien. In Simulationsmethoden in der Medizin und Biologie. Springer, 83–114.
  31. Raymond Ros and Nikolaus Hansen. 2008. A simple modification in CMA-ES achieving linear time and space complexity. In International conference on parallel problem solving from nature. Springer, 296–305.
  32. Evolution strategies as a scalable alternative to reinforcement learning. arXiv preprint arXiv:1703.03864 (2017).
  33. High dimensions and heavy tails for natural evolution strategies. In Proceedings of the 13th annual conference on Genetic and evolutionary computation. 845–852.
  34. Jürgen Schmidhuber. 1987. Evolutionary principles in self-referential learning, or on learning how to learn: the meta-meta-… hook. Ph.D. Dissertation. Technische Universität München.
  35. Deep neuroevolution: Genetic algorithms are a competitive alternative for training deep neural networks for reinforcement learning. arXiv preprint (2017).
  36. Yujin Tang and David Ha. 2021. The sensory neuron as a transformer: Permutation-invariant neural networks for reinforcement learning. Advances in Neural Information Processing Systems 34 (2021), 22574–22587.
  37. EvoJAX: Hardware-Accelerated Neuroevolution. arXiv preprint arXiv:2202.05008 (2022).
  38. Adaptive Agents Team. 2023. Human-timescale adaptation in an open-ended task space. arXiv preprint arXiv:2301.07608 (2023).
  39. Attention is all you need. Advances in neural information processing systems 30 (2017).
  40. Michael L Waskom. 2021. Seaborn: statistical data visualization. Journal of Open Source Software 6, 60 (2021), 3021.
  41. Natural evolution strategies. The Journal of Machine Learning Research 15, 1 (2014).
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Robert Tjarko Lange (21 papers)
  2. Yingtao Tian (32 papers)
  3. Yujin Tang (31 papers)
Citations (7)

Summary

Evolution Transformer: In-Context Evolutionary Optimization

The paper "Evolution Transformer: In-Context Evolutionary Optimization" presents a novel approach to evolutionary optimization using a Transformer-based architecture called the Evolution Transformer. The authors propose leveraging the Transformer model to represent Evolution Strategies (ES) more flexibly and effectively. Traditional evolutionary optimization methods often draw loose analogies to biological evolution, which makes them potentially suboptimal for leveraging sequential information. This work shifts the paradigm by employing meta-optimization using the Evolution Transformer, which applies causal attention mechanisms to improve in-context learning and optimization capabilities.

Key Contributions

  1. Architecture Design: The Evolution Transformer introduces a causal Transformer framework specifically designed to handle evolutionary strategies. It incorporates suitable inductive biases such as population-order invariance and dimension-order equivariance, which are critical in evolutionary optimization algorithms. This setup ensures that the model's update to the search distribution is invariant to the order of solutions within a generation and equivariant regarding the order of search dimensions.
  2. Evolutionary Algorithm Distillation (EAD): The paper introduces a technique to train the Evolution Transformer by distilling the behavior of existing evolutionary algorithms. This involves leveraging optimization trajectories collected from teacher algorithms to train the Transformer to predict performance-improving updates to the search distribution. The model demonstrated remarkable generalization capabilities, managing to perform well on unseen optimization tasks.
  3. In-Context Optimization: Upon training, the Evolution Transformer has shown the ability to perform in-context evolutionary optimization, adapting to various optimization tasks without further modifications. This aspect enhances the model's practical applicability across different domains where direct optimization is necessary.
  4. Self-Referential Evolutionary Algorithm Distillation (SR-EAD): The paper also outlines a method for self-train the Evolution Transformer without relying on pre-existing teacher algorithms. This self-referential method involves generating perturbed versions of the model, evaluating the generated trajectories, and using these observations to bootstrap and improve the model's performance iteratively.
  5. Empirical Evaluation: The authors provided an open-source implementation and conducted an array of experiments, comparing the Evolution Transformer against established evolutionary optimization techniques. The results indicate that the Evolution Transformer can encapsulate desirable ES properties, including translation invariance, unbiasedness, and scale self-adaptation.

Implications and Future Directions

The proposed Evolution Transformer architecture paves the way for a new category of data-driven evolutionary optimizers capable of generalizing across multiple problems. The implications of this research are significant for fields like meta-learning and neural architecture search, where the ability to optimize variables efficiently can dramatically enhance performance.

The concept of introducing self-referential learning through SR-EAD shows promise in discovering novel optimization strategies without explicitly defined algorithms, potentially fostering advancements in automated machine learning and autonomous agent design. However, the stability and robustness of this approach warrant further exploration, particularly concerning scaling to more complex tasks and broader generalization issues.

Future research can expand on integrating more capable attention mechanisms like state-space models to handle longer contexts, thus overcoming current limitations of context length. Moreover, combining this methodology with broader, diversified task sets can lead to better insights into optimizing dynamic and complex systems, facilitating advancements in both theoretical and practical facets of evolutionary computations.

In summary, the Evolution Transformer marks a significant step in evolving traditional evolutionary optimization approaches by infusing them with the learnability and flexibility of modern deep learning architectures. This work's contribution is not only in demonstrating the applicability of Transformers to evolutionary strategies but also in showcasing a methodology of algorithmic improvement through learning and adaptation, advancing the efficiency and efficacy of optimization tasks.