Meta-Learned Optimizers for Plasticity

Updated 11 August 2025

The paper introduces a meta-learning framework that evolves synaptic plasticity rules using a two-loop architecture with gradient-free optimization.
It leverages an ANN-based parameterization to compute adaptive updates, enabling neuromorphic agents to perform efficient reinforcement learning.
Empirical results show these optimizers achieve near-optimal performance and enhanced transfer learning across various task distributions.

Meta-learned optimizers for plasticity are algorithmic frameworks in which the rules that govern weight changes—synaptic plasticity—are themselves learned and optimized, rather than hand-designed. This paradigm draws inspiration from biological nervous systems, where synaptic learning rules and their hyperparameters are hypothesized to have evolved to support highly effective adaptation, transfer, and rapid learning across tasks. Meta-learned plasticity introduces an outer optimization loop that searches for learning rules or hyperparameter settings tuned for a given distribution of tasks, often relying on gradient-free optimization strategies suited to hardware constraints and the non-differentiable nature of neuromorphic or analog implementations. The core objective is to produce adaptive agents whose synaptic update mechanisms are not fixed but rather are shaped by a meta-learning process—ultimately yielding superior reward-based learning and transfer.

1. Learning-to-Learn and Meta-Plasticity Frameworks

Meta-learned optimizers for plasticity are structured around a two-loop "learning-to-learn" (L2L) architecture. The inner loop comprises a neuromorphic spiking neural network (SNN) tasked with solving reinforcement learning (RL) problems using a local synaptic plasticity rule. The outer loop performs meta-optimization over the space of synaptic update rules or hyperparameters using gradient-free methods—including cross-entropy, evolutionary strategies, and simulated annealing—rather than traditional genetic algorithms or backpropagation. A characteristic innovation is the parameterization of the plasticity rule itself: the synaptic update at each timestep is computed by a compact artificial neural network (ANN, referred to as $f_{\text{ANN}}$ ), whose weights are tuned in the outer loop.

Formally, for a two-armed bandit task, the update takes the form: $w_{(i, 1)}(t+1) = w_{(i, 1)}(t) + f_{\text{ANN}}(\text{inputs}_{(i,1)}(t); \theta)$ where inputs include the current timestep $t$ , an indicator for responsibility (was the synapse chosen?), the reward $r(t)$ , and relevant synaptic weights. The function $f_{\text{ANN}}$ is meta-learned: its parameters $\theta$ are adjusted in the outer loop to maximize cumulative reward over a task distribution.

This setup enables both the search for optimal plasticity rules and for suitable hyperparameter regions, circumventing the need for hand-tuning or brittle genetic-operator engineering. The approach explicitly addresses the limitations of fixed or manually-derived update rules by leveraging the expressive power of neural parameterizations and robust, gradient-free optimization landscapes.

2. Meta-Plasticity Rule Design and Mathematical Formulation

The meta-plasticity rule is designed to capture rich dependencies beyond classical RL updates. Traditional temporal-difference (TD) learning rules update weights as

$w_{(i,j)}(t+1) = w_{(i,j)}(t) + \alpha [ r(t) + \gamma \max_k w_{(k,j)}(t) - w_{(i,j)}(t) ]$

where $\alpha$ is a fixed learning rate and $\gamma$ the discount factor. In contrast, meta-plasticity with an ANN parameterization generalizes this to

$w_{(i,1)}(t+1) = w_{(i,1)}(t) + f_{\text{ANN}}( t, \mathbb{I}_{a(t)=a_i}, r(t), w_{(i,1)}(t), w_{(3-i,1)}(t); \theta )$

where $f_{\text{ANN}}$ can implement arbitrary nonlinear dependencies on time, action, reward, and current synaptic states. The outer meta-optimization seeks values of $\theta$ that maximize abstract learning objectives, such as cumulative reward or task transfer speed.

This design facilitates the discovery of learning rules that are sensitive to task-specific features and hardware constraints. Numerical results demonstrate that the meta-learned plasticity rule achieves performance near the optimal Gittins index policy and exhibits improved transfer across families of related tasks, in contrast to rules optimized for fixed, narrow scenarios—a critical aspect of functional plasticity.

3. Outer-Loop Optimization and Biological Relevance

The outer-loop meta-optimization is executed with robust, gradient-free algorithms. This choice is justified due to the practicalities of neuromorphic hardware: the reward landscape is highly stochastic (owing to RL randomness and analog circuit variations) and potentially non-differentiable. The optimization methods include:

Cross-Entropy Methods: These fit a parameterized distribution over the plasticity rule parameters, repeatedly sampling and updating based on elite performers.
Evolutionary Strategies: Perturbations are applied to candidate solutions, which are then ranked and propagated.
Simulated Annealing: Updates are probabilistically accepted based on a temperature schedule, encouraging exploration at early stages.

Relative to standard genetic algorithms (which require hand-designed recombination and mutation operators), these approaches provide more robust and generalizable exploratory power.

The accelerated execution of neuromorphic circuits—operating at up to 1000× biological speeds—enables the thousands of network rollouts needed for effective meta-optimization, directly connecting biological timescale constraints with artificial system design.

4. Implications for Adaptive Behavior and Transfer

By encoding the learning rule in a neural network and optimizing it over a suite of tasks, the resulting meta-learned optimizer is intrinsically shaped to the statistics and demands of those tasks. When the meta-plasticity rule or related hyperparameters are tuned for a family of structured problems (e.g., variants of bandit scenarios), the neuromorphic agent rapidly exploits underlying regularities in subsequent, related tasks. This mechanism aligns closely with the notion of transfer learning observed in biological organisms, where prior experience accelerates new learning by leveraging abstracted knowledge about environmental structure.

Furthermore, the plasticity rule discovered via meta-learning supports better generalization under task shift, outperforming fixed learning-rate or hand-tuned rule approaches. This suggests a direct link between meta-learned synaptic rules and the robustness, speed, and plasticity of learning observed in biological intelligence.

5. Performance Characteristics and Implementation Trade-offs

Empirical studies indicate that meta-learned plasticity rules provide high-performing, sample-efficient agents within the domain of RL on neuromorphic hardware. The numerical performance approaches that of theoretically optimal policies while maintaining ease of transfer and adaptability.

Deploying these methods in practice requires attention to:

Computation: Sufficient hardware acceleration is essential to permit the vast number of meta-level iterations; neuromorphic time constants enable this feasibility where emulated or digital implementations may struggle.
Robustness to Noise: The parameterization of the learning rule as a function approximator (e.g., a small ANN) must be regularized or constrained to prevent overfitting to stochasticities or hardware-specific quirks, balancing expressivity with generalizability.
Deployment: The ANN encoding the plasticity rule must be co-located or efficiently accessible within neuromorphic systems to achieve real-time synaptic weight updates. Low-memory, low-power implementations are feasible due to the compactness of the rule representation.

6. Theoretical and Practical Significance

The demonstrated framework establishes that meta-learning, when applied to the domain of synaptic plasticity, produces optimizers that surpass hand-designed algorithms for RL on neuromorphic systems—especially in reward-driven, non-stationary, or transfer tasks. The connection between meta-learned update rules and task-aligned plasticity provides a plausible computational analog for evolutionary tuning of synaptic learning in the brain. In practical engineering terms, the approach points toward adaptable, scalable agents whose learning mechanisms can be customized for device physics, task distributions, or environmental regimes with minimal manual tuning.

The approach further highlights the complementary roles of biologically inspired learning principles and advanced optimization methodologies: it unifies fast, local plasticity with slow, evolutionary learning through meta-optimization, setting a precedent for future artificial learning systems that seek both efficiency and flexibility.

PDF Markdown Chat (Pro)

Follow Topic

Get notified by email when new papers are published related to Meta-Learned Optimizers for Plasticity.