Discrete Active Predictive Coding (ActPC)
- Discrete Active Predictive Coding is a computational paradigm that fuses prediction error minimization with discrete decision-making, reinforcement learning, and symbolic reasoning.
- It employs hierarchical, multiscale processing by integrating continuous sensory inputs with discrete symbolic embeddings to achieve real-time perception and planning.
- The framework leverages intrinsic and extrinsic rewards along with advanced metrics like the Wasserstein distance to ensure robust, interpretable learning.
Discrete Active Predictive Coding (ActPC) is a computational paradigm that generalizes traditional predictive coding by integrating discrete decision-making, reinforcement learning, hierarchical generative modeling, and symbolic reasoning. In ActPC, both models and data may be represented as discrete structures such as rewrite rules, parsing options, or symbolic embeddings, and learning is driven by the minimization of prediction error, modulated by intrinsic and extrinsic rewards. This hybrid neural-symbolic framework supports the construction of interpretable, compositional, and cognitively plausible agents capable of real-time perception, planning, and reasoning in complex, hierarchical environments (Gklezakos et al., 2022, Goertzel, 2024, Rao et al., 2022, Goertzel, 8 Jan 2025, Ororbia et al., 2022).
1. Mathematical Foundations and Model Formulation
At its core, ActPC extends the predictive coding principle—where the system seeks to minimize discrepancy between actual inputs and predictions—by operationalizing prediction error both for continuous neural activations and for discrete structures.
Generative and Predictive Coding Model
Let denote a global observation (e.g., an image), with a sequence of local "glimpses" , where is a sub-region defined by a top-level macro-decision at time , and are coordinates (often discretized) within intrinsic reference frames. Latent variables select which part-parse or program is executed at each macro-step, with each corresponding to a discrete node in a parse tree or program hierarchy.
The generative (predictive-coding) model takes the form: where is the decoder's prediction (Gklezakos et al., 2022, Rao et al., 2022).
The predictive loss, over macro-steps and micro-steps , is:
For discrete symbolic ActPC (as in ActPC-Chem), the basic computational element is a set of rewrite rules , each described as a subgraph pattern transformation . Prediction error is defined on outcome distributions over metagraph patterns: where is the empirical distribution derived from the observed output graph, and from the generative rules (Goertzel, 2024).
2. Hierarchical Structure, Discrete Actions, and Policy Mechanisms
ActPC employs a hierarchical arrangement of latent variables and policy networks, enabling multilevel parsing, planning, and control. Macro-steps correspond to high-level decisions (e.g., selection of object part, reference frame, or rewrite program), while micro-steps execute fine-grained actions within the chosen scope.
At each macro-step:
- The agent samples a discrete option and a reference-frame location .
- Hypernetworks generate weights for lower-level recurrent networks parameterized by top-level embeddings, enabling dynamic instantiation of context-sensitive sub-programs (Gklezakos et al., 2022, Rao et al., 2022).
At the micro-step level, discrete or categorical actions are sampled according to policies:
Parameters governing discrete decisions are updated via REINFORCE or (for Gumbel-Softmax-relaxed variants) by straight-through gradient estimators (Gklezakos et al., 2022, Rao et al., 2022).
3. Reinforcement Learning and Intrinsic/Extrinsic Motivation
ActPC systems integrate predictive coding objectives with RL-style reward maximization, where rewards may include:
- Epistemic (intrinsic) rewards: Proportional to normalized prediction error, encouraging exploration of unfamiliar or surprising states. The typical formalization is:
- Instrumental (extrinsic) rewards: Penalties or bonuses for task-relevant outcomes, often defined as negative prediction error or environmental feedback.
- Combined reward:
The total learning objective combines prediction error minimization and accumulated rewards: where absorbs REINFORCE gradients and the (possibly task-specific) per-step rewards (Gklezakos et al., 2022, Ororbia et al., 2022, Rao et al., 2022).
Eligibility traces and credit assignment are handled as in RL, with eligibility-trace-style updates in the discrete rule case (Goertzel, 2024).
4. Symbolic Embedding, Information Geometry, and Neural–Symbolic Integration
Advances in ActPC such as ActPC-Geom replace the KL-divergence in predictive coding with the Wasserstein metric for model robustness and geometry-awareness. For a predictive distribution and target , the predictive coding loss is:
A measure-dependent graph Laplacian and its pseudoinverse are used to construct natural-gradient-type updates: where is the natural metric tensor in parameter space (Goertzel, 8 Jan 2025).
ActPC-Geom bridges continuous and discrete domains using low-rank kernel-PCA embeddings and high-dimensional hypervector algebra. Hypervector embeddings support compositional reasoning, allowing binding, bundling, and permutation to encode and operate on symbolic and subsymbolic representations (Goertzel, 8 Jan 2025, Goertzel, 2024). Concept hierarchies are distilled via fuzzy FCA lattices, facilitating ontology construction and concept-level aggregation.
Neural–symbolic integration is realized by:
- Mapping both symbolic rule expansions and continuous state updates into a common embedding (hypervector or Wasserstein) space.
- Aggregating and pruning candidate actions via shared ground-metric Wasserstein distances.
- Using discrete modules (e.g., ActPC-Chem, AIRIS, PLN) to generate chain-of-thought steps, which are then validated or composed by continuous PC layers (Goertzel, 8 Jan 2025, Goertzel, 2024).
5. Learning Algorithms and Training Procedures
Training in ActPC proceeds by optimizing a joint loss that incorporates self-supervised predictive coding and reinforcement learning signals, both in continuous and discrete spaces. Key elements include:
- Backpropagation through time for all differentiable state networks and hypernetworks.
- Policy gradient estimators (e.g., REINFORCE, Gumbel-Softmax) for discrete policies governing part selection, location choice, and rule application.
- Hebbian-like updates in some ActPC variants (e.g., NGC/ActPC in (Ororbia et al., 2022)) where learning is entirely local and backpropagation-free.
- Discrete natural gradient updates leveraging Wasserstein Riemannian geometry for accelerated learning in rule-based/symbolic modules (Goertzel, 8 Jan 2025, Goertzel, 2024).
A canonical training loop consists of: episodic data collection driven by current policies, accumulation of prediction errors and rewards, local (neural or symbolic) parameter updates, and interleaved replay or self-imitation for credit assignment.
6. Applications, Empirical Results, and System Properties
Hierarchical Perception and Parse-Tree Induction
Two-level ActPC models ("APCN-2") demonstrate the ability to dynamically parse visual scenes into part-whole hierarchies, learn compositional representations, and transfer to unseen object classes (Gklezakos et al., 2022, Rao et al., 2022). Class-specific parsing strategies are automatically discovered.
Experimental results on MNIST, Fashion-MNIST, and Omniglot datasets show superior reconstruction MSE for APCN-2 relative to single-level models and random baselines. For instance, APCN-2 achieves MSEs of 0.0085 (MNIST), 0.0138 (Fashion-MNIST), and 0.0227 (Omniglot-Test) (Gklezakos et al., 2022).
Robotic Reinforcement Learning and Control
ActPC achieves high performance in robotic manipulation tasks under sparse reward regimes, e.g., success rates of 96.5% (block lifting) and 94.0% (can pick-and-place) in the SURREAL Robosuite, outperforming backpropagation-based DDPG baselines (Ororbia et al., 2022).
Goal-Guided Discrete Reasoning and Algorithmic Chemistry
ActPC-Chem extends ActPC into a discrete "algorithmic chemistry" over metagraph rewrite rules. Agents self-organize predictive rule sets through discrete local search, eligibility traces, and natural-gradient updates. The "robot bug" task illustrates how prediction-error minimization transforms generic initial rules into specialized, context-sensitive policies, even under delayed and context-dependent rewards (Goertzel, 2024).
Neural–Symbolic Deliberation and Hybrid Sequence Models
Stacked, rule-driven ActPC modules substitute for transformer layers, enabling next-token prediction and structured reasoning without backpropagation through layers. Discrete rewrite-rule modules govern attention and logical consistency (via AIRIS/PLN), while prediction error locally adjusts rule probabilities (Goertzel, 2024, Goertzel, 8 Jan 2025).
7. Theoretical Significance and Prospects
Discrete ActPC synthesizes predictive coding, hierarchical reinforcement learning, combinatorial program induction, and symbolic AI. By uniting predictive error-driven learning with neural and symbolic modules, it provides a principled architecture for explainable, compositional, and scalable intelligence, well-suited to both perception (vision, sensorimotor control) and high-level reasoning (algorithmic chemistry, language modeling).
Research directions include: accelerating online learning via information geometry (Wasserstein metrics), further integration of neural–symbolic pipelines (transformers with Hopfield-memory or discrete rule modules), and leveraging fuzzy concept lattices for abstraction and generalization (Goertzel, 8 Jan 2025, Goertzel, 2024). The embedding of discrete and continuous predictive coding in a shared metric space enables parallel evaluation and pruning of candidate expansions across subsymbolic and symbolic domains, supporting real-time, deliberative reasoning (Goertzel, 8 Jan 2025).
A plausible implication is that Discrete ActPC architectures could serve as cognitive kernels for advanced AGI frameworks (e.g., OpenCog Hyperon, PRIMUS), given their ability to recruit, compose, and adapt symbolic world models, while maintaining continuous sensorimotor grounding and flexible credit assignment.