- The paper introduces ASAL, which leverages vision-language foundation models to automate the search for interesting artificial life simulations.
- It outlines three methods—supervised target, open-endedness, and illumination—utilizing optimization techniques like evolutionary algorithms and gradient descent.
- Experimental results demonstrate that FM embeddings outperform pixel-based metrics, enabling quantitative analysis and mapping of diverse simulation behaviors.
Artificial Life (ALife) research often involves searching vast combinatorial spaces of simulation parameters to discover emergent behaviors. Traditionally, this search relies heavily on manual design, intuition, and trial-and-error due to the difficulty of predicting complex outcomes from simple rules and the lack of robust quantitative metrics for phenomena like "interestingness" or "open-endedness."
This paper presents Automated Search for Artificial Life (ASAL), a novel paradigm that leverages vision-language foundation models (FMs) to automate and accelerate the discovery of interesting ALife simulations. The core idea is to use the high-level, human-aligned representations learned by FMs to evaluate and guide the search through simulation parameter spaces. ASAL is designed to be agnostic to both the specific FM used (as long as it can process visual inputs) and the ALife substrate, provided the simulation state can be rendered into an image.
ASAL proposes three distinct search methods, each formulated as an optimization problem in the simulation parameter space (θ), evaluated using FM embeddings of the simulation's rendered output (RST(θ)):
- Supervised Target Search: Finds simulations that produce visual outputs matching a specified natural language text prompt or sequence of prompts. The objective is to maximize the similarity between the FM embedding of the simulation's rendered state at time T and the FM embedding of the target prompt at that time.
θ∗=θargmaxET[⟨VLMimg(RST(θ)),VLMtxt(promptT)⟩]
This method can be implemented using standard optimization techniques. For single targets, evolutionary algorithms like Sep-CMA-ES were used. For temporal targets, backpropagation through time with gradient descent (Adam optimizer) was applied, particularly effective for substrates like Neural Cellular Automata (NCA). This allows searching for simulations that exhibit specific evolutionary trajectories, like a single cell dividing into two, effectively discovering self-replication rules.
- Open-Endedness Search: Identifies simulations that generate temporally open-ended novelty. This is quantified by minimizing the maximum similarity between the FM embedding of the current state (RST(θ)) and all previous states (RST′(θ) for T′<T) in the simulation trajectory. Minimizing this "historical nearest neighbor" similarity encourages the simulation to continuously produce visually novel states in the FM's representation space.
θ∗=θargminET[T′<Tmax⟨VLMimg(RST(θ)),VLMimg(RST′(θ))⟩]
This method was applied to the Life-Like Cellular Automata (CA) substrate, which has a relatively small discrete search space (218 rules), allowing for brute-force evaluation of all possible rules. The results showed that FMs can effectively identify rules exhibiting persistent, non-convergent dynamics, placing rules like Conway's Game of Life among the top open-ended candidates.
- Illumination Search: Discovers a diverse set of simulations that illuminate the variety of phenomena possible within a substrate. The objective is to find a set of simulations {θ0,…,θn} that are maximally diverse, measured by minimizing the maximum similarity between any simulation's final state FM embedding and its nearest neighbor within the discovered set.
{θ0∗,…,θn∗}=θ0,…,θnargminEθ,T[θ′=θmax⟨VLMimg(RST(θ)),VLMimg(RST(θ′))⟩]
This search requires maintaining a population of diverse solutions. A custom genetic algorithm was used, which iteratively mutates solutions and prunes the least novel ones based on nearest neighbor distance in the FM embedding space. Applied to Boids and Lenia, this method successfully generated diverse sets of simulations exhibiting a wide range of behaviors (e.g., different flocking patterns, various cell-like forms), visualized as "simulation atlases" organized by visual similarity in the FM space.
The implementation of ASAL relies on defining the ALife substrate S by parameterizing its initial state distribution (Initθ), forward dynamics (Stepθ), and crucially, a rendering function (Renderθ) that converts the simulation state into an image. The FM (VLM) then provides functions VLMimg(⋅) and VLMtxt(⋅) to embed images and text into a common latent space, where similarity can be measured using an inner product.
Experiments were conducted across a diverse range of substrates: Boids (parameterized by neural network weights), Particle Life (interaction matrices), Life-like CA (rule tables), Lenia (dynamics and initial state parameters), and Neural Cellular Automata (NCA, neural network weights for transition rules). The FMs used were CLIP (for all tasks) and DINOv2 (for tasks not requiring text prompts, like Open-Endedness and Illumination). Comparisons showed that FM representations significantly outperform pixel-based metrics for capturing human-perceived diversity.
Beyond discovery, ASAL enables novel quantitative analyses of ALife phenomena. For instance, by interpolating parameters between two discovered simulations and measuring the CLIP similarity of the intermediate states, the non-linear and potentially chaotic nature of the parameter space can be visualized (Figure 4a). FMs can also quantify the emergence of specific target phenomena as simulation parameters (like particle count) are varied (Figure 4b), assess the sensitivity of simulation outcomes to individual parameters (Figure 4c), or provide a human-aligned metric for simulation convergence by tracking the rate of change in the FM embedding over time (Figure 4d).
The practical implications are significant. ASAL provides researchers with automated tools to:
- Discover specific phenomena: Find simulation rules that produce desired visual outcomes or sequences.
- Search for complex dynamics: Identify simulations that exhibit persistent novelty and open-endedness.
- Explore the possibility space: Generate comprehensive maps of diverse behaviors possible within a substrate.
- Quantify emergent properties: Replace subjective or low-level metrics with FM-based measures aligned with human perception.
Future work includes integrating video-language or 3D FMs, leveraging image-to-text models to use powerful LLMs for analysis, and applying similar methods to search for rules in other complex systems like physics simulators. The code for ASAL is publicly available, facilitating its adoption and extension by the ALife and AI research communities.