GateFrame: Normative Policy Gating

Updated 5 December 2025

GateFrame is a normative framework that formalizes policy gating via entropy-regularized free energy minimization, integrating decision theory, neuroscience, and machine learning.
It employs a closed-form softmax solution to optimize mixing weights over a library of primitive policies, ensuring strong convexity and unique optimality.
GateFrame underpins the GateMod suite by linking mathematical free-energy decomposition with algorithmic (GateFlow) and biologically-plausible (GateNet) implementations for adaptive control.

GateFrame is a normative framework for policy gating based on minimizing free energy, providing a unifying principle to describe and analyze the selection and composition of policies in decision-making, neuroscience, and machine learning. Central to the GateMod suite, GateFrame formalizes gating as the entropy-regularized minimization of Kullback–Leibler divergence between induced and desired agent-environment dynamics, parameterized by mixing weights over a library of primitive policies. Its convex structure, closed-form softmax gating rule, and principled decomposition extend across diverse domains, from cognitive models to engineering control (Rossi et al., 4 Dec 2025).

1. Mathematical Definition and Optimization Problem

GateFrame targets the policy mixture

$\pi^\star_k(u \mid x_{k-1}) = \sum_{\alpha=1}^n w_k^\alpha\; \pi_\alpha(u \mid x_{k-1}),$

where each $\pi_\alpha$ is a primitive policy from a finite library, and $w = (w^1, ..., w^n)$ are nonnegative mixing weights constrained to the simplex $\Delta^n = \{w \in \mathbb{R}_{\ge 0}^n : \sum_{\alpha=1}^n w^\alpha = 1\}$ . The optimization seeks $w^\star_k$ minimizing the entropy-regularized KL divergence

$w_k^\star = \arg\min_{w \in \Delta^n} D_{\mathrm{KL}}\!\left[p(x_k, u_k \mid x_{k-1}) \,\|\, q(x_k, u_k \mid x_{k-1})\right] - \varepsilon H(w),$

where $p$ is the resulting distribution under the mixture, $q$ is a generative model encoding desired dynamics or task constraints, $H(w)$ is the Shannon entropy, and $\varepsilon > 0$ the temperature. Because the mapping $w \mapsto D_{\mathrm{KL}}(p\|q)$ is convex and $-\varepsilon H(w)$ is strictly concave, GateFrame is a strongly convex program with guaranteed unique solutions.

2. Free-Energy Decomposition and Objective Structure

A standard KL decomposition of GateFrame's objective, given an exponentially tilted $q$ with respect to a cost $c(x_k, u_k)$ (i.e., $q(x_k, u_k \mid x_{k-1}) = q(x_k \mid x_{k-1}) \frac{1}{Z(x_{k-1})} e^{-c(x_k, u_k)}$ ), yields: $D_{\mathrm{KL}}\left[p \| q\right] = D_{\mathrm{KL}}\!\left[\pi_k(\cdot \mid x_{k-1}) \middle\| \tfrac{1}{Z}\exp(-c)\right] + \mathbb{E}_p\left[c(X_k, U_k)\right] + \ln Z,$ where $\ln Z$ is constant with respect to $w$ . Thus, the optimization reduces to: $F(w) - \varepsilon H(w),$ with

$F(w) = D_{\mathrm{KL}}\!\left[\pi_k(\cdot \mid x_{k-1}) \middle\| \tfrac{1}{Z}\exp(-c)\right] + \mathbb{E}_p\left[c(X_k, U_k)\right].$

GateFrame thus realizes an entropy-regularized free energy minimization that simultaneously penalizes expected cost and divergence from a generative prior, connecting frameworks such as active inference, maximum-entropy RL, and KL-control under a common principle.

3. Closed-Form Softmax Gating Solution

GateFrame admits a closed-form solution for the optimal gating, derived via Lagrangian methods on the simplex. Setting the stationarity conditions for $F(w) - \varepsilon H(w)$ and solving leads to: $w^{\star\,\alpha} = \frac{\exp\left[-\frac{1}{\varepsilon}\nabla_\alpha F(w^\star)\right]}{\sum_{\beta=1}^n \exp\left[-\frac{1}{\varepsilon}\nabla_\beta F(w^\star)\right]}$ for each primitive $\alpha$ . The logit scores $-\frac{1}{\varepsilon} \nabla_\alpha F(w^\star)$ encode, for each primitive, the marginal gain in free energy from adjusting its weight, rendering the solution interpretable as softmax policy arbitration. As $\varepsilon \to 0$ , the solution reduces to hard argmax selection; as $\varepsilon \to \infty$ , the weights approach uniform mixing.

4. Influence of Task Structure on Gating

The form of $F(w)$ embeds all task-specific structure, determined by:

the generative model $q(x_k, u_k \mid x_{k-1})$ ,
the cost function $c(x_k, u_k)$ or any learned dynamics biases,
the environment transition kernel $q(x_k\mid x_{k-1})$ .

Practically, the gradient $\nabla_\alpha F(w)$ involves computing expected log-likelihoods and costs under each primitive: $\nabla_\alpha F(w) = \mathbb{E}_{\pi_\alpha}[\ln \pi_k + c] + \text{(divergence terms)}$ These context-dependent terms ensure that gating weights adapt online to both environmental conditions and demands of the given task, producing flexible, interpretable selection of policies according to moment-to-moment utility and generative fit.

5. Normative Properties and Theoretical Guarantees

GateFrame exhibits several notable normative features:

Strong convexity and uniqueness: The entropy term ensures a unique optimum for $w^\star$ on $\Delta^n$ .
Principled optimality: Solutions minimize a well-motivated free-energy functional unifying multiple frameworks in decision theory.
Interpretability: Gating logits measure each primitive’s mismatch to the task-driven generative model or expected costs.
Continuity: Varying $\varepsilon$ interpolates between hard selection and equivocal softmax arbitration.
Framework generality: Any policy set $\{\pi_\alpha\}$ , cost structure, or environment model can be accommodated, making GateFrame broadly applicable across neuroscience, cognition, and engineered controllers.

6. Connections to GateFlow and Neural Realization via GateNet

GateFrame provides the normative foundation for two subsequent realizations in GateMod:

GateFlow is a continuous-time proximal-gradient ODE,

$\tau\,\dot w = -w + \mathrm{softmax}\left(-\frac{1}{\varepsilon}\nabla F(w)\right),$

whose unique, globally exponentially stable equilibrium is the GateFrame solution $w^\star$ . Its vector field keeps trajectories within $\Delta^n$ and ensures strict, monotonic decrease of the cost functional at rate $1/\tau$ .

GateNet implements GateFlow as a biologically-plausible recurrent circuit. The network comprises two modules: a fast stage computing $\nabla F(w)$ using local, contextual (Sigma-Pi) computations with log/linear activations, and a slow stage performing softmax normalization via exponentiation and normalization. All neurons obey nonnegativity constraints (interpretable as firing rates) and exchange only local information.

This succession—GateFrame (normative), GateFlow (algorithmic), and GateNet (mechanistic)—establishes a rigorous pipeline from free-energy-based gating objectives down to dynamical and neural implementations, supporting both interpretability and cross-domain applicability (Rossi et al., 4 Dec 2025).

Markdown Report Issue Upgrade to Chat

References (1)

Neural Policy Composition from Free Energy Minimization (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to GateFrame.