Papers
Topics
Authors
Recent
Search
2000 character limit reached

GateMod: Neural Framework for Policy Gating

Updated 5 December 2025
  • GateMod is a computational and neural framework for policy gating that integrates free energy minimization with interpretable neural implementations.
  • It consists of three core components—GateFrame, GateFlow, and GateNet—that together bridge task objectives, convergent dynamics, and circuit-level realizability.
  • Empirical evaluations show robust policy composition in multi-agent coordination and human decision-making, outperforming traditional models in adaptive settings.

GateMod is a computational and neural framework for policy gating driven by free energy minimization, establishing a unifying and interpretable account for the emergence and implementation of policy-gating mechanisms in biological and artificial agents. Developed to formalize how decision-making tasks shape policy composition and how these computations can be realized in neural circuits, GateMod comprises three principal elements: a normative variational objective (GateFrame), a provably convergent continuous-time dynamics (GateFlow), and a soft-competitive recurrent neural architecture (GateNet). These components collectively bridge task structure, dynamical implementation, and circuit-level realizability, yielding interpretable, robust, and theoretically grounded accounts of gating phenomena across disparate domains (Rossi et al., 4 Dec 2025).

1. GateFrame: Free-Energy-Based Normative Gating

GateFrame defines policy composition for an agent operating in a stochastic environment with a discrete time index kk. At each step, the agent observes state xk1x_{k-1} and samples actions uku_k from a composite policy ukk1(xk1)u_{k|k-1}(\cdot|x_{k-1}), which is a convex combination of nn pre-learned primitive policies:

ukk1=i=1nwkiπkk1i,wkΔnu_{k|k-1} = \sum_{i=1}^n w_k^i \, \pi^i_{k|k-1}, \qquad w_k \in \Delta^n

where Δn\Delta^n is the simplex {wR0n:iwi=1}\{ w \in \mathbb{R}^n_{\ge 0} : \sum_i w_i = 1 \}.

The core objective is formulated as a regularized free-energy minimization, balancing task cost and policy complexity:

minwΔnDKL(pq)εH(w)\min_{w \in \Delta^n} D_{\mathrm{KL}}(p \parallel q) - \varepsilon H(w)

where pp is the distribution induced by the composed policy, qq is a generative model incorporating the task cost via c(xk,uk)c(x_k,u_k), and H(w)H(w) is the entropy of the weight vector. The mixture weights ww thus mediate a tradeoff between task-optimality and policy diversity, enabling graded, task-dependent policy composition. Derivation using Lagrangian multipliers yields the softmax weighting rule as the unique optimum:

w=softmax(1εF(w))w^* = \mathrm{softmax}\big( -\tfrac{1}{\varepsilon} \nabla F(w^*) \big)

where F(w)F(w) is the extended free-energy functional (Rossi et al., 4 Dec 2025).

2. GateFlow: Contractive Continuous-Time Dynamics

GateFlow provides an energy-based, continuous-time dynamical system that globally and exponentially converges to the GateFrame optimum. The ODE governing the evolution of the mixture weights w(t)w(t) is:

τw˙=w+softmax(1εF(w))\tau \, \dot{w} = -w + \mathrm{softmax} \big( -\tfrac{1}{\varepsilon} \nabla F(w) \big)

where τ\tau is a timescale parameter. The GateFlow energy function

E(w)=F(w)εH(w)E(w) = F(w) - \varepsilon H(w)

monotonically decreases along trajectories. The system is contractive on the simplex Δn\Delta^n, with the symmetric part of the Jacobian satisfying

Df(w)+Df(w)21τI\frac{Df(w)+Df(w)^{\top}}{2} \preceq -\tfrac{1}{\tau} I

yielding unique, global, and exponentially fast convergence to the equilibrium (Rossi et al., 4 Dec 2025).

3. GateNet: Biologically Plausible Neural Circuit Implementation

GateNet realizes GateFlow as a two-layer recurrent neural network employing soft-competitive dynamics and local computation motifs analogous to dendritic processing. The circuit splits into a “fast” module, which computes the softmax exponent 1εF(w)-\frac{1}{\varepsilon}\nabla F(w) via contextual Sigma–Pi processing, and a “slow” normalization module, which implements the softmax transformation. All circuit variables remain nonnegative, supporting a firing-rate interpretation. Contextual information is incorporated via dependency of the primitives on state xx, enabling dendritic-like, locally contextual computation.

A schematic of the main variable transformations is as follows:

Subsystem Computation Motif
Fast (Sigma–Pi) abya \rightarrow b \rightarrow y Dendritic context, log-exp
Slow (softmax) m,rwm, r \rightarrow w Recurrent, normalized exponent

GateNet thereby implements the contractive energy flow and guarantees robust policy gating with biologically plausible mechanisms (Rossi et al., 4 Dec 2025).

4. Empirical Evaluation: Multi-Agent and Human Decision-Making

GateMod has been empirically evaluated in both collective multi-agent settings and human behavioral experiments.

  1. Flocking with social-force primitives: GateMod composes separation, alignment, and cohesion force primitives for agents (“boids”) using Gaussian policy seeds. Without external goals, the flock achieves emergent alignment (Φ(t)1\Phi(t) \rightarrow 1), indicating successful behavioral coordination. Introduction of goals to a subset of agents leads to adaptive, leader–follower plasticity—weight fluctuations in goal-informed agents, followed by collective steering without cohesion loss. Weight convergence and distribution reflect interpretable allocation among primitives.
  2. Multi-armed bandits and human decision-making: GateMod is benchmarked on published human datasets from two-armed bandit tasks. Primitives capture distinct behavioral strategies: exploitation, uncertainty-seeking, and risk aversion. GateMod's trial-by-trial weight trajectories reveal interpretable shifts between these strategies, outperforming hybrid UCB+Thompson models and other baselines in protected exceedance probability metrics (PXP0.76\mathrm{PXP} \approx 0.76 and $0.81$ vs. $0.23$ and $0.17$ for the hybrid in two experiments). Exploitation dominates in single safe-arm contexts; in stochastic contexts, subjects show periodic alternation between exploration and exploitation, matching trial-level weight switches inferred by GateMod (Rossi et al., 4 Dec 2025).

5. Interpretability and Theoretical Integration

GateMod integrates and extends conceptual connections between active inference, maximum entropy reinforcement learning, and mixture-of-experts models. The explicit emergence of softmax gating from entropy-regularized free-energy minimization, rather than as an ad-hoc choice, constitutes a principled basis for context-sensitive gating behavior. The dynamical contractivity of GateFlow secures robust, unique convergence, and the structure of GateNet translates optimization and task-level principles into mechanistic, synaptic, and dendritic operations consistent with known biological motifs. GateMod thus establishes an explicit link between three levels:

  • Task objective: free-energy functional,
  • Dynamical computation: contractive ODE on the simplex,
  • Circuit implementation: contextual, soft-competitive recurrent networks with localized processing.

These correspondences yield interpretable, mechanistic accounts of gating in both natural and artificial systems (Rossi et al., 4 Dec 2025).

6. Broader Implications and Future Directions

GateMod’s general framework for policy gating by free-energy minimization enables systematic, interpretable, and principled composition of behavioral primitives. It offers a theoretical and practical basis for understanding policy gating in neural circuits and for engineering artificial agents capable of adaptive policy composition. The realization of contractive, soft-competitive circuits—leveraging local, contextual, and nonnegative processing—aligns with biologically plausible architectures, thus informing circuit-level hypotheses in neuroscience and design strategies in neuromorphic and machine learning systems.

A plausible implication is the extension of the GateMod architecture to more complex continuous control domains, richer libraries of behavioral primitives, and broader regimes of uncertainty. General principles—such as the equivalence of entropic barriers to gating, and the correspondence of proximal-gradient flows to robust policy mixing—suggest further theoretical and experimental exploration, bridging computational neuroscience and AI policy composition (Rossi et al., 4 Dec 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to GateMod.