GateMod: Neural Framework for Policy Gating
- GateMod is a computational and neural framework for policy gating that integrates free energy minimization with interpretable neural implementations.
- It consists of three core components—GateFrame, GateFlow, and GateNet—that together bridge task objectives, convergent dynamics, and circuit-level realizability.
- Empirical evaluations show robust policy composition in multi-agent coordination and human decision-making, outperforming traditional models in adaptive settings.
GateMod is a computational and neural framework for policy gating driven by free energy minimization, establishing a unifying and interpretable account for the emergence and implementation of policy-gating mechanisms in biological and artificial agents. Developed to formalize how decision-making tasks shape policy composition and how these computations can be realized in neural circuits, GateMod comprises three principal elements: a normative variational objective (GateFrame), a provably convergent continuous-time dynamics (GateFlow), and a soft-competitive recurrent neural architecture (GateNet). These components collectively bridge task structure, dynamical implementation, and circuit-level realizability, yielding interpretable, robust, and theoretically grounded accounts of gating phenomena across disparate domains (Rossi et al., 4 Dec 2025).
1. GateFrame: Free-Energy-Based Normative Gating
GateFrame defines policy composition for an agent operating in a stochastic environment with a discrete time index . At each step, the agent observes state and samples actions from a composite policy , which is a convex combination of pre-learned primitive policies:
where is the simplex .
The core objective is formulated as a regularized free-energy minimization, balancing task cost and policy complexity:
where is the distribution induced by the composed policy, is a generative model incorporating the task cost via , and is the entropy of the weight vector. The mixture weights thus mediate a tradeoff between task-optimality and policy diversity, enabling graded, task-dependent policy composition. Derivation using Lagrangian multipliers yields the softmax weighting rule as the unique optimum:
where is the extended free-energy functional (Rossi et al., 4 Dec 2025).
2. GateFlow: Contractive Continuous-Time Dynamics
GateFlow provides an energy-based, continuous-time dynamical system that globally and exponentially converges to the GateFrame optimum. The ODE governing the evolution of the mixture weights is:
where is a timescale parameter. The GateFlow energy function
monotonically decreases along trajectories. The system is contractive on the simplex , with the symmetric part of the Jacobian satisfying
yielding unique, global, and exponentially fast convergence to the equilibrium (Rossi et al., 4 Dec 2025).
3. GateNet: Biologically Plausible Neural Circuit Implementation
GateNet realizes GateFlow as a two-layer recurrent neural network employing soft-competitive dynamics and local computation motifs analogous to dendritic processing. The circuit splits into a “fast” module, which computes the softmax exponent via contextual Sigma–Pi processing, and a “slow” normalization module, which implements the softmax transformation. All circuit variables remain nonnegative, supporting a firing-rate interpretation. Contextual information is incorporated via dependency of the primitives on state , enabling dendritic-like, locally contextual computation.
A schematic of the main variable transformations is as follows:
| Subsystem | Computation | Motif |
|---|---|---|
| Fast (Sigma–Pi) | Dendritic context, log-exp | |
| Slow (softmax) | Recurrent, normalized exponent |
GateNet thereby implements the contractive energy flow and guarantees robust policy gating with biologically plausible mechanisms (Rossi et al., 4 Dec 2025).
4. Empirical Evaluation: Multi-Agent and Human Decision-Making
GateMod has been empirically evaluated in both collective multi-agent settings and human behavioral experiments.
- Flocking with social-force primitives: GateMod composes separation, alignment, and cohesion force primitives for agents (“boids”) using Gaussian policy seeds. Without external goals, the flock achieves emergent alignment (), indicating successful behavioral coordination. Introduction of goals to a subset of agents leads to adaptive, leader–follower plasticity—weight fluctuations in goal-informed agents, followed by collective steering without cohesion loss. Weight convergence and distribution reflect interpretable allocation among primitives.
- Multi-armed bandits and human decision-making: GateMod is benchmarked on published human datasets from two-armed bandit tasks. Primitives capture distinct behavioral strategies: exploitation, uncertainty-seeking, and risk aversion. GateMod's trial-by-trial weight trajectories reveal interpretable shifts between these strategies, outperforming hybrid UCB+Thompson models and other baselines in protected exceedance probability metrics ( and $0.81$ vs. $0.23$ and $0.17$ for the hybrid in two experiments). Exploitation dominates in single safe-arm contexts; in stochastic contexts, subjects show periodic alternation between exploration and exploitation, matching trial-level weight switches inferred by GateMod (Rossi et al., 4 Dec 2025).
5. Interpretability and Theoretical Integration
GateMod integrates and extends conceptual connections between active inference, maximum entropy reinforcement learning, and mixture-of-experts models. The explicit emergence of softmax gating from entropy-regularized free-energy minimization, rather than as an ad-hoc choice, constitutes a principled basis for context-sensitive gating behavior. The dynamical contractivity of GateFlow secures robust, unique convergence, and the structure of GateNet translates optimization and task-level principles into mechanistic, synaptic, and dendritic operations consistent with known biological motifs. GateMod thus establishes an explicit link between three levels:
- Task objective: free-energy functional,
- Dynamical computation: contractive ODE on the simplex,
- Circuit implementation: contextual, soft-competitive recurrent networks with localized processing.
These correspondences yield interpretable, mechanistic accounts of gating in both natural and artificial systems (Rossi et al., 4 Dec 2025).
6. Broader Implications and Future Directions
GateMod’s general framework for policy gating by free-energy minimization enables systematic, interpretable, and principled composition of behavioral primitives. It offers a theoretical and practical basis for understanding policy gating in neural circuits and for engineering artificial agents capable of adaptive policy composition. The realization of contractive, soft-competitive circuits—leveraging local, contextual, and nonnegative processing—aligns with biologically plausible architectures, thus informing circuit-level hypotheses in neuroscience and design strategies in neuromorphic and machine learning systems.
A plausible implication is the extension of the GateMod architecture to more complex continuous control domains, richer libraries of behavioral primitives, and broader regimes of uncertainty. General principles—such as the equivalence of entropic barriers to gating, and the correspondence of proximal-gradient flows to robust policy mixing—suggest further theoretical and experimental exploration, bridging computational neuroscience and AI policy composition (Rossi et al., 4 Dec 2025).