Papers
Topics
Authors
Recent
Search
2000 character limit reached

Discrete Active Predictive Coding (ActPC)

Updated 1 March 2026
  • Discrete Active Predictive Coding is a computational paradigm that fuses prediction error minimization with discrete decision-making, reinforcement learning, and symbolic reasoning.
  • It employs hierarchical, multiscale processing by integrating continuous sensory inputs with discrete symbolic embeddings to achieve real-time perception and planning.
  • The framework leverages intrinsic and extrinsic rewards along with advanced metrics like the Wasserstein distance to ensure robust, interpretable learning.

Discrete Active Predictive Coding (ActPC) is a computational paradigm that generalizes traditional predictive coding by integrating discrete decision-making, reinforcement learning, hierarchical generative modeling, and symbolic reasoning. In ActPC, both models and data may be represented as discrete structures such as rewrite rules, parsing options, or symbolic embeddings, and learning is driven by the minimization of prediction error, modulated by intrinsic and extrinsic rewards. This hybrid neural-symbolic framework supports the construction of interpretable, compositional, and cognitively plausible agents capable of real-time perception, planning, and reasoning in complex, hierarchical environments (Gklezakos et al., 2022, Goertzel, 2024, Rao et al., 2022, Goertzel, 8 Jan 2025, Ororbia et al., 2022).

1. Mathematical Foundations and Model Formulation

At its core, ActPC extends the predictive coding principle—where the system seeks to minimize discrepancy between actual inputs and predictions—by operationalizing prediction error both for continuous neural activations and for discrete structures.

Generative and Predictive Coding Model

Let II denote a global observation (e.g., an image), with a sequence of local "glimpses" gt,τ=G(It(1),lt,τ,m)g_{t,\tau} = G(I^{(1)}_t, l_{t,\tau}, m), where It(1)I^{(1)}_t is a sub-region defined by a top-level macro-decision at time tt, and lt,τl_{t,\tau} are coordinates (often discretized) within intrinsic reference frames. Latent variables zt{1,,K}z_t \in \{1,\dots,K\} select which part-parse or program is executed at each macro-step, with each ztz_t corresponding to a discrete node in a parse tree or program hierarchy.

The generative (predictive-coding) model takes the form: p(gt,τrt,τ,Lt,lt,τ)=N(g^t,τ,σ2I)p(g_{t,\tau} \mid r_{t,\tau}, L_t, l_{t,\tau}) = \mathcal{N}(\hat g_{t,\tau}, \sigma^2I) where g^t,τ=D(rt,τ,Lt,lt,τ)\hat g_{t,\tau} = D(r_{t,\tau}, L_t, l_{t,\tau}) is the decoder's prediction (Gklezakos et al., 2022, Rao et al., 2022).

The predictive loss, over macro-steps tt and micro-steps τ\tau, is: Lpred=t=1T2τ=1T1gt,τg^t,τ22L_\mathrm{pred} = \sum_{t=1}^{T_2}\sum_{\tau=1}^{T_1} \lVert g_{t,\tau} - \hat g_{t,\tau} \rVert_2^2

For discrete symbolic ActPC (as in ActPC-Chem), the basic computational element is a set of rewrite rules R={ri}R = \{r_i\}, each rir_i described as a subgraph pattern transformation ri:Pi(in)Pi(out)r_i: P_i^{(\text{in})}\to P_i^{(\text{out})}. Prediction error is defined on outcome distributions over metagraph patterns: et=DKL(qtpt)=mqt(m)[lnqt(m)lnpt(m)]e_t = D_{KL}\left(q_t \| p_t\right) = \sum_m q_t(m) \left[\ln q_t(m) - \ln p_t(m)\right] where qtq_t is the empirical distribution derived from the observed output graph, and ptp_t from the generative rules (Goertzel, 2024).

2. Hierarchical Structure, Discrete Actions, and Policy Mechanisms

ActPC employs a hierarchical arrangement of latent variables and policy networks, enabling multilevel parsing, planning, and control. Macro-steps tt correspond to high-level decisions (e.g., selection of object part, reference frame, or rewrite program), while micro-steps τ\tau execute fine-grained actions within the chosen scope.

At each macro-step:

  • The agent samples a discrete option ztz_t and a reference-frame location LtL_t.
  • Hypernetworks generate weights for lower-level recurrent networks parameterized by top-level embeddings, enabling dynamic instantiation of context-sensitive sub-programs (Gklezakos et al., 2022, Rao et al., 2022).

At the micro-step level, discrete or categorical actions at,τa_{t,\tau} are sampled according to policies: πθ(as)=πθa(ztAt)πθL(LtAt)τ=1T1πθ(lt,τat,τ)\pi_\theta(a \mid s) = \pi_{\theta_a}(z_t \mid A_t)\,\pi_{\theta_L}(L_t \mid A_t) \prod_{\tau=1}^{T_1} \pi_{\theta_\ell}(l_{t,\tau} \mid a_{t,\tau})

Parameters governing discrete decisions are updated via REINFORCE or (for Gumbel-Softmax-relaxed variants) by straight-through gradient estimators (Gklezakos et al., 2022, Rao et al., 2022).

3. Reinforcement Learning and Intrinsic/Extrinsic Motivation

ActPC systems integrate predictive coding objectives with RL-style reward maximization, where rewards may include:

  • Epistemic (intrinsic) rewards: Proportional to normalized prediction error, encouraging exploration of unfamiliar or surprising states. The typical formalization is:

rtep==0L1e2r_t^{\text{ep}} = \sum_{\ell=0}^{L-1}\lVert e^\ell \rVert^2

  • Instrumental (extrinsic) rewards: Penalties or bonuses for task-relevant outcomes, often defined as negative prediction error or environmental feedback.
  • Combined reward: rt=αeprtep+αinrtin+rtenvr_t = \alpha_{\text{ep}} r_t^{\text{ep}} + \alpha_{\text{in}} r_t^{\text{in}} + r_t^{\text{env}}

The total learning objective combines prediction error minimization and accumulated rewards: Ltotal=Lstate+LactionL_{\text{total}} = L_\mathrm{state} + L_\mathrm{action} where LactionL_\mathrm{action} absorbs REINFORCE gradients and the (possibly task-specific) per-step rewards (Gklezakos et al., 2022, Ororbia et al., 2022, Rao et al., 2022).

Eligibility traces and credit assignment are handled as in RL, with eligibility-trace-style updates in the discrete rule case (Goertzel, 2024).

4. Symbolic Embedding, Information Geometry, and Neural–Symbolic Integration

Advances in ActPC such as ActPC-Geom replace the KL-divergence in predictive coding with the Wasserstein metric for model robustness and geometry-awareness. For a predictive distribution μ\mu and target qq, the predictive coding loss is: LPC(μ)=W2(q,μ)\mathcal{L}_{\rm PC}(\mu) = W_2(q, \mu)

A measure-dependent graph Laplacian L(μ)L(\mu) and its pseudoinverse are used to construct natural-gradient-type updates: θt+1=θtηG(θt)1θLPC\theta_{t+1} = \theta_{t} - \eta\,G(\theta_t)^{-1}\nabla_\theta \mathcal{L}_{\rm PC} where G(θ)G(\theta) is the natural metric tensor in parameter space (Goertzel, 8 Jan 2025).

ActPC-Geom bridges continuous and discrete domains using low-rank kernel-PCA embeddings and high-dimensional hypervector algebra. Hypervector embeddings support compositional reasoning, allowing binding, bundling, and permutation to encode and operate on symbolic and subsymbolic representations (Goertzel, 8 Jan 2025, Goertzel, 2024). Concept hierarchies are distilled via fuzzy FCA lattices, facilitating ontology construction and concept-level aggregation.

Neural–symbolic integration is realized by:

  • Mapping both symbolic rule expansions and continuous state updates into a common embedding (hypervector or Wasserstein) space.
  • Aggregating and pruning candidate actions via shared ground-metric Wasserstein distances.
  • Using discrete modules (e.g., ActPC-Chem, AIRIS, PLN) to generate chain-of-thought steps, which are then validated or composed by continuous PC layers (Goertzel, 8 Jan 2025, Goertzel, 2024).

5. Learning Algorithms and Training Procedures

Training in ActPC proceeds by optimizing a joint loss that incorporates self-supervised predictive coding and reinforcement learning signals, both in continuous and discrete spaces. Key elements include:

  • Backpropagation through time for all differentiable state networks and hypernetworks.
  • Policy gradient estimators (e.g., REINFORCE, Gumbel-Softmax) for discrete policies governing part selection, location choice, and rule application.
  • Hebbian-like updates in some ActPC variants (e.g., NGC/ActPC in (Ororbia et al., 2022)) where learning is entirely local and backpropagation-free.
  • Discrete natural gradient updates leveraging Wasserstein Riemannian geometry for accelerated learning in rule-based/symbolic modules (Goertzel, 8 Jan 2025, Goertzel, 2024).

A canonical training loop consists of: episodic data collection driven by current policies, accumulation of prediction errors and rewards, local (neural or symbolic) parameter updates, and interleaved replay or self-imitation for credit assignment.

6. Applications, Empirical Results, and System Properties

Hierarchical Perception and Parse-Tree Induction

Two-level ActPC models ("APCN-2") demonstrate the ability to dynamically parse visual scenes into part-whole hierarchies, learn compositional representations, and transfer to unseen object classes (Gklezakos et al., 2022, Rao et al., 2022). Class-specific parsing strategies are automatically discovered.

Experimental results on MNIST, Fashion-MNIST, and Omniglot datasets show superior reconstruction MSE for APCN-2 relative to single-level models and random baselines. For instance, APCN-2 achieves MSEs of 0.0085 (MNIST), 0.0138 (Fashion-MNIST), and 0.0227 (Omniglot-Test) (Gklezakos et al., 2022).

Robotic Reinforcement Learning and Control

ActPC achieves high performance in robotic manipulation tasks under sparse reward regimes, e.g., success rates of 96.5% (block lifting) and 94.0% (can pick-and-place) in the SURREAL Robosuite, outperforming backpropagation-based DDPG baselines (Ororbia et al., 2022).

Goal-Guided Discrete Reasoning and Algorithmic Chemistry

ActPC-Chem extends ActPC into a discrete "algorithmic chemistry" over metagraph rewrite rules. Agents self-organize predictive rule sets through discrete local search, eligibility traces, and natural-gradient updates. The "robot bug" task illustrates how prediction-error minimization transforms generic initial rules into specialized, context-sensitive policies, even under delayed and context-dependent rewards (Goertzel, 2024).

Neural–Symbolic Deliberation and Hybrid Sequence Models

Stacked, rule-driven ActPC modules substitute for transformer layers, enabling next-token prediction and structured reasoning without backpropagation through layers. Discrete rewrite-rule modules govern attention and logical consistency (via AIRIS/PLN), while prediction error locally adjusts rule probabilities (Goertzel, 2024, Goertzel, 8 Jan 2025).

7. Theoretical Significance and Prospects

Discrete ActPC synthesizes predictive coding, hierarchical reinforcement learning, combinatorial program induction, and symbolic AI. By uniting predictive error-driven learning with neural and symbolic modules, it provides a principled architecture for explainable, compositional, and scalable intelligence, well-suited to both perception (vision, sensorimotor control) and high-level reasoning (algorithmic chemistry, language modeling).

Research directions include: accelerating online learning via information geometry (Wasserstein metrics), further integration of neural–symbolic pipelines (transformers with Hopfield-memory or discrete rule modules), and leveraging fuzzy concept lattices for abstraction and generalization (Goertzel, 8 Jan 2025, Goertzel, 2024). The embedding of discrete and continuous predictive coding in a shared metric space enables parallel evaluation and pruning of candidate expansions across subsymbolic and symbolic domains, supporting real-time, deliberative reasoning (Goertzel, 8 Jan 2025).

A plausible implication is that Discrete ActPC architectures could serve as cognitive kernels for advanced AGI frameworks (e.g., OpenCog Hyperon, PRIMUS), given their ability to recruit, compose, and adapt symbolic world models, while maintaining continuous sensorimotor grounding and flexible credit assignment.

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Discrete Active Predictive Coding (ActPC).