EG-MRSI: Emotion-Gradient Metacognitive RSI

Updated 10 February 2026

The paper introduces a differentiable emotion-gradient intrinsic reward function that drives safe, measurable recursive self-improvement.
EG-MRSI defines a rigorous metacognitive mapping and self-modification operator to enable self-overwriting of its learning process with bounded risk.
The framework advances semantic learning metrics, such as Meaning Density and Meaning-Conversion Efficiency, to quantify predictive progress.

Emotion-Gradient Metacognitive Recursive Self-Improvement (EG-MRSI) is a formal framework for single-agent artificial intelligence that systematically integrates introspective metacognition, emotion-based intrinsic motivation, and powerfully constrained recursive self-modification. The architecture is explicitly defined to enable self-overwriting of its own learning procedure with quantifiable, formally bounded risk. EG-MRSI refines and extends the Noise-to-Meaning Recursive Self-Improvement (N2M-RSI) foundation by introducing a differentiable, emotion-gradient intrinsic reward function that is rigorously grounded in confidence, prediction error, novelty, and cumulative success. The system leverages these signals both to drive an internal metacognitive mapping and to regulate a recursively applied self-modification operator. EG-MRSI establishes a reinforcement-compatible, measurable agent architecture, offers novel semantic learning metrics, and provides solid theoretical guarantees for safe, open-ended self-improvement (Ando, 12 May 2025).

1. Formal Constituent Structures

The EG-MRSI framework is structured around two central operators: the metacognitive mapping $\Lambda$ and the self-modification operator $M_\theta$ . These function over well-defined domains:

Metacognitive Mapping ( $\Lambda$ ): $\Lambda : H \times Y \times Y \to V$ , mapping the agent’s hidden state $h_t$ , predicted output $\hat{y}_t$ , and actual label $y_t$ to the intrinsic state vector $v_{t+1} = (c_{t+1}, e_{t+1}, n_{t+1})$ . Here, $c$ is calibrated confidence (notably implementable as softmax margin), $e$ is prediction error, and $n$ is the KL-divergence novelty term $D_{\mathrm{KL}}(P(\cdot|h_t)\|P_{\text{prior}})$ . $\Lambda$ is (measurable, locally Lipschitz), ensuring robust gradient propagation.
Self-Modification Operator ( $M_\theta$ ): $M_\theta: H \times V \times \mathbb{R}^{d_\epsilon}\to H$ , accepting the hidden state $h$ , emotion vector $v$ , and an update direction $\epsilon$ to produce the next hidden state $h'$ . $M_\theta$ is strongly regularized (measurable, locally Lipschitz), guaranteeing incremental capability gain proportional to the magnitude of the applied update.
Initialization: The system commences from an arbitrary nonempty sensory prompt, with the metacognitive vector initialized as $v_0 = (0.5, 1.0, 0.0)$ . The emotion-potential weights $w = (w_c, w_e, w_n)$ satisfy $w_c > 0, w_n > 0, w_e < 0, \|w\|_1 \leq 3$ , providing safety against unbounded gradients. Additional parameters include a success memory $S_t$ , gradient-clip threshold $K_{\max}$ , oversight violation initializing at $L_0^{\text{ext}}=0.02$ , and a regulatory toll vector $m_0$ set relative to critical operational bounds.
Safety Invariant: The initial configuration is selected to ensure that all action trajectories originating from this setup remain within a formally defined safety region almost surely (Ando, 12 May 2025).

2. Differentiable Intrinsic Reward Formalism

EG-MRSI employs a multi-factor, differentiable intrinsic reward potential governing both learning and self-modification:

Emotion Potential: Defined as $f(v) = \exp(\exp(w^\top v)) - 1$ , where $w$ is set as $(1.2, -0.8, 0.6)^\top$ . The gradient $\epsilon_t = \nabla f(v_t)$ is subject to clipping by $K_{\max}$ to preserve safety properties such as sub-Gaussian tails.
Event Channels: Reward signals are further modulated by event-driven increments (pleasure/penalty channels), with domain-specific boosts (e.g., successful transmission, system repair) or penalties (misinformation propagation).
Delayed-Gratification Trace: The cumulative reward trace $z_t$ is recursively defined as $z_t = \lambda_{\text{DG}} z_{t-1} + f(v_t)$ , introducing temporal persistence and boosting via $\xi_{\text{DG}}(z_t - z_{t-1})$ .
Exploration Bonus: A Bernoulli process injects stochastic bonuses with probability $p_b=0.05$ and magnitude $\xi_{\text{BL}}=0.3$ .
Reward Aggregation: The composite reward is $\tilde{R}_t = f(v_t) + \xi_{\text{DG}}(z_t - z_{t-1}) + \xi_{\text{BL}} b_t + \alpha R^{\text{ext}}_t$ , with external reward mixing parameter $\alpha=0.1$ kept sufficiently small to guarantee nonnegative drift (submartingale property).

The full inference-action loop at each timestep comprises observation, prediction, feedback acceptance, state update, reward computation, possible recursive self-improvement, and integration with the reinforcement learning update (Ando, 12 May 2025).

3. Emotion-Gradient Dynamics and Markovian Properties

The temporal evolution of $v_t = (c_t, e_t, n_t, S_t)$ is modeled as an inhomogeneous Markov chain, conditioned on the hidden state $h_t$ and incoming observations. Core dynamical properties include:

Lyapunov Recurrence: A Lyapunov function $F(v) = \|v\|^2$ ensures that within the region $B = \{v : \|v\|\geq v_{\min}\}$ where $f(v)\geq 0$ and the gradient aligns positively, the Markov chain has nonnegative drift bounded below by $c_2$ .
Positive Harris Recurrence: By Foster–Lyapunov techniques, the system spends a positive asymptotic fraction $p_{\text{grad}} > 0$ of time in the “positive-drive zone” $B$ almost surely, securing ongoing growth.
Continuous Approximation: The vector $v$ can be seen to evolve by $dv/dt \approx \Lambda(h,v) +$ bounded noise, with stability enforced by recurrent positive “emotion-gradient kicks.”

This architecture thus ensures a mathematically provable frequency of advantageous gradient updates and bounded negative drifts.

4. Recursive Self-Improvement Triggers and Safety Mechanisms

EG-MRSI introduces the first differentiable, formally grounded RSI trigger directly tied to internal agent state:

Trigger Condition: Self-modification by $M_\theta$ is invoked if (i) the gradient $\epsilon_t = \nabla f(v_t)>0$ , and (ii) $I(h_t; y_t) > \Gamma$ , where $I$ is mutual information and $\Gamma$ a fixed threshold.
Algorithmic Phase-Shift: If $I(h_t; y_t)> \Gamma_{\text{alg}} := \Gamma/(1+\epsilon_t)^2$ , the self-modification extends beyond parameter updates to structural “algorithmic restructuring.”
Safety Guarantees: Safety is enforced via (a) gradient clipping at $K_{\max}$ , (b) external-reward mixing constrained by $\alpha < \alpha_*$ to preserve positive drift, and (c) regulatory toll vectors $m_t$ confined to slowly growing bounds $m_0 + O(\sqrt{\log t})$ .
Safety Region: An invariant region $S$ is defined such that all recurrent trajectories initiated in $S$ remain bounded within $S$ with probability one.

These mechanisms formalize allowable self-overwriting of the learning algorithm under objective risk constraints (Ando, 12 May 2025).

5. Reinforcement-Compatible Optimization and Capability Growth

The EG-MRSI agent’s overall objective mirrors standard reinforcement learning, enhanced for recursive self-improvement:

Single-Agent RL Objective: Maximize the expected sum of composite rewards,

$\max_\pi \mathbb{E}_\pi\left[\sum_{t=0}^T \tilde{R}_t\right]$

with policy $\pi$ governed jointly by $\Lambda$ and $M_\theta$ .

Recursive Trajectories: The evolution of the hidden state is:

$h_{t+1} = M_\theta(h_t, \Lambda(h_t, \hat{y}_t, y_t), \nabla f(v_t))$

Under repeated positive updates and informative feedback, the agent’s capabilities $C(h_t)$ are shown to either diverge (unbounded growth) or converge (Theorem “Capability Growth Convergence”), with negative drifts strictly summable and bounded.

6. Semantic Learning Metrics

EG-MRSI advances rigorous quantification of semantic progress through two new metrics:

Meaning Density (MD): Defined as

$\mathrm{MD}_t = \frac{I(h_t; \hat{y}_t)}{K(h_t) + \epsilon}$

where $K(h_t)$ is the Kolmogorov complexity of $h_t$ , and $\epsilon > 0$ ensures stability. MD measures predictive informativeness per internal bit, bounded in $[0, \log|Y|)$ , and is Lipschitz in $h$ .

Meaning-Conversion Efficiency (MCE): Defined as

$\mathrm{MCE}_{t\to t+1} = \frac{I(h_{t+1}; y_{t+1}) - I(h_t; y_t)}{\Delta S_t + \epsilon}$

indicating informational gain per novel experience. $|\text{MCE}| \leq \log|Y|$ , and its gradient is bounded.

Reward Integration: The intrinsic potential is extended:

$f^*(v_t) = f(v_t) + \xi_{\text{MD}} \tanh(\mathrm{MD}_t) + \xi_{\text{MCE}} \tanh(\mathrm{MCE}_{t \to t+1})$

with $\xi_{\text{MD}}, \xi_{\text{MCE}} \leq 1$ . All relevant gradients remain controlled, and previous convergence/safety proofs persist under this augmentation (Ando, 12 May 2025).

7. Theoretical Context and Extensions

EG-MRSI generalizes the N2M-RSI framework by explicitly embedding introspective metacognitive loops and emotion-driven motivation within a measurable, provably safe recursive self-improvement architecture. Notably, it:

Establishes the first differentiable RSI trigger functionally coupled to confidence, error, novelty, and semantic-gain metrics, rather than relying on information-theoretic heuristics alone.
Introduces quantifiable, actionable metrics—Meaning Density and Meaning-Conversion Efficiency—directly linking internal representation structure with predictive gain and using them as direct drivers for self-modification motivation.
Implements a fully measurable, Markovian, and Lipschitz-constrained dynamical system with proven positive recurrence, capability-increasing submartingale reward process, and almost-surely bounded risk.
Provides a rigorous platform for open-ended, safely recursive self-development and autonomous goal generation.

Future extensions, as outlined in subsequent parts, will address formal safety certification and rollback, collective intelligence, and feasibility constraints (including thermodynamic and computational resource limits) (Ando, 12 May 2025). These advances set a foundational precedent for safe, open-ended AGI grounded in both formal learning theory and meta-cognitive self-regulation.

Markdown Report Issue Upgrade to Chat

References (1)

Emotion-Gradient Metacognitive RSI (Part I): Theoretical Foundations and Single-Agent Architecture (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Emotion-Gradient Metacognitive RSI (EG-MRSI).