Papers
Topics
Authors
Recent
Search
2000 character limit reached

AGINAO: Self-Programming Cognitive Engine

Updated 2 July 2026
  • AGINAO is an AGI architecture combining self-programming, intrinsic motivation, and on-line RL to autonomously discover and compose sensorimotor-based cognitive modules.
  • The system employs a dual-layer design with a hand-crafted core for stability and a self-programmed cognitive layer running on a custom virtual machine inspired by a Universal Turing Machine.
  • Heuristic program generation, hash pooling, and runtime validation effectively prune and integrate codelets into a dynamic, hierarchical concept graph.

AGINAO is an architecture for human-level artificial general intelligence (AGI), realized as a self-programming, open-ended cognitive engine embedded in the Aldebaran NAO humanoid robot. Its approach combines stochastic code generation, information-theoretic intrinsic motivation, hierarchical program composition, and on-line reinforcement learning operating over a virtual machine (VM) simulating a Universal Turing Machine (UTM). The system is designed to autonomously discover, evaluate, and compose computational modules (“concepts”) interacting with real-world sensorimotor experience (Skaba, 2018, Skaba, 2018).

1. System Architecture and Virtual Machine Model

AGINAO bifurcates its software into two distinct layers:

  • Core Layer: Fully hand-crafted, responsible for stability, real-time scheduling, exception handling, and atomic I/O services (e.g., sensor polling, actuator command invocation). This layer operates natively on the robot’s host CPU.
  • Cognitive Layer: Entirely self-programmed, runs as a multi-threaded, embedded control program atop a custom 16-bit, multi-tape TM-inspired virtual machine. The VM features accumulator and index registers, integer word memory, control flags, and exposes an instruction set adequate for general computation.

All higher-level cognitive structures, or "concepts", are represented as compact sequences of VM instructions (“codelets”). These codelets are instantiated as nodes in a dynamic, hierarchical graph, with directed links indicating data flow and execution dependency among concepts.

2. Self-Programming and Concept Discovery

AGINAO continuously expands its concept graph via a staged pipeline:

  1. Heuristic Program Generation: The Program Generator assembles candidate codelets of length 4–8 instructions. Over 30 heuristics immediately eliminate code fragments with illegal operations (e.g., use-before-init, unconditional jumps into invalid locations, unreachable RET), reducing the candidate pool from approximately 102010^{20} to 10810^8 for 7-instruction codelets.
  2. Hash Pooling: Redundant code generation is curtailed by hash-based frequency control. A codelet is rejected if

count[h]>2TN,h=hash(p)modN\text{count}[h] > 2 \cdot \frac{T}{N}\,,\quad h = \mathrm{hash}(p) \bmod N

where TT is the total number of releases and NN the hash pool size (N214N \approx 2^{14}).

  1. Sanity Checks: Instantiated codelets undergo early-stage runtime validation, verifying avoidance of out-of-bounds access, illegal opcodes, infinite loops, or persistent errors. Repeatedly failing codelets and their incident links are pruned.
  2. Concept Hierarchy Integration: Validated codelets become nodes in the hierarchy, with formalized I/O signatures, static memory, maximum output sizing, and links to both upstream and downstream concepts (including potential actuator templates).

The complexity-based curriculum search strategy biases concept discovery towards shorter and faster codelets, reflecting the prevalence of low Kolmogorov-complexity patterns in real-world sensorimotor data.

3. Intrinsic Reward, Reinforcement Learning, and Thread Management

AGINAO implements fully online, multi-threaded reinforcement learning, guided primarily by an intrinsic, information-theoretic reward model:

  • Thread Dynamics: Each execution of a concept is instantiated as a lightweight thread, characterized by a priority (proportional to action-value QQ), resource (CPU time) budget, and expiration timestamp. Threads failing to terminate productively are discarded.
  • Binary Space Partitioning: Each codelet partitions its input vector space into "positive" (RET) and "negative" (EXIT) regions, registering empirical match probability pp as

p=NposNpos+Nnegp = \frac{N_{\rm pos}}{N_{\rm pos} + N_{\rm neg}}

where NposN_{\rm pos} and 10810^80 are cumulative positive and negative event counters.

  • Intrinsic Reward Metric: The expected instantaneous reward per execution is

10810^81

maximizing for 10810^82. The corresponding information-theoretic quantity per positive detection is 10810^83.

  • Exploration/Exploitation: For concept 10810^84 with outgoing actions 10810^85 and values 10810^86, exploration occurs with

10810^87

and exploitation follows

10810^88

  • Temporal-Difference Learning: When action 10810^89 terminates at count[h]>2TN,h=hash(p)modN\text{count}[h] > 2 \cdot \frac{T}{N}\,,\quad h = \mathrm{hash}(p) \bmod N0, its value is updated as

count[h]>2TN,h=hash(p)modN\text{count}[h] > 2 \cdot \frac{T}{N}\,,\quad h = \mathrm{hash}(p) \bmod N1

where count[h]>2TN,h=hash(p)modN\text{count}[h] > 2 \cdot \frac{T}{N}\,,\quad h = \mathrm{hash}(p) \bmod N2 captures expected downstream value.

Positive thread terminations deliver bonuses, enabling deeper branching and greater concept proliferation.

4. Actuator Evaluation and Global Reward Model

Actuator concepts (leaf nodes representing effectors) do not compute intrinsic information gain, requiring indirect evaluation. Their value is determined by their influence on the global average reward, measured over real time:

  • Actuator Value Update:

count[h]>2TN,h=hash(p)modN\text{count}[h] > 2 \cdot \frac{T}{N}\,,\quad h = \mathrm{hash}(p) \bmod N3

count[h]>2TN,h=hash(p)modN\text{count}[h] > 2 \cdot \frac{T}{N}\,,\quad h = \mathrm{hash}(p) \bmod N4

where count[h]>2TN,h=hash(p)modN\text{count}[h] > 2 \cdot \frac{T}{N}\,,\quad h = \mathrm{hash}(p) \bmod N5 encodes actuator count[h]>2TN,h=hash(p)modN\text{count}[h] > 2 \cdot \frac{T}{N}\,,\quad h = \mathrm{hash}(p) \bmod N6's credit, count[h]>2TN,h=hash(p)modN\text{count}[h] > 2 \cdot \frac{T}{N}\,,\quad h = \mathrm{hash}(p) \bmod N7 is the count of overlapping activations, count[h]>2TN,h=hash(p)modN\text{count}[h] > 2 \cdot \frac{T}{N}\,,\quad h = \mathrm{hash}(p) \bmod N8 the actuation cost for input count[h]>2TN,h=hash(p)modN\text{count}[h] > 2 \cdot \frac{T}{N}\,,\quad h = \mathrm{hash}(p) \bmod N9, and TT0 a normalization factor.

  • Global Reward Computation:

TT1

with decay parameter TT2. Poorly credited actuator concepts are pruned from the hierarchy.

An empirical measurement on a deployed NAO robot established a characteristic sensorimotor feedback delay of approximately 300 ms following a visual stimulus, informing the dynamic assignment of credit in actuator evaluation (Skaba, 2018).

5. Hierarchical and Temporal Organization

The AGINAO hierarchy is explicitly structured along both spatial and temporal axes:

  • Pattern vs Concept: Patterns are any regularities in space/time; concepts instantiate codelets that operationalize these patterns as classifiers or effectors.
  • Hierarchy Levels: Atomic sensory concepts (level 0) encapsulate raw inputs (pixels, joint sensors). Higher-level, self-generated concepts aggregate outputs of lower-level nodes; actuator concepts form terminating leaf nodes.
  • Runtime Concurrency: Multiple (multi-threaded) instances of the same concept may process distinct data concurrently, propagating outputs to successor concepts, or spawning children upon termination.
  • Temporal Integration: The execution model accommodates temporal structures via explicit WAIT instructions, allowing threads to suspend execution and detect time-gapped, sequential patterns within the real world.

6. Empirical Performance, Scalability, and Limiting Factors

  • On physical NAO hardware (Intel Atom 1.6 GHz, Wi-Fi-connected), the AGINAO system generated hundreds of thousands of concepts during hours of unsupervised operation approximating a “preschool-like” environment. Codelet discovery is accelerated by approximately TT3-fold due to heuristic search and filtering relative to brute-force methods.
  • Real-time constraints are maintained by internal economic mechanisms: per-thread budgets, prioritization, and expiration.
  • Scalability of the VM is inherently bounded by memory and search overhead, but dynamic pruning via RL plus multi-core parallelization (under investigation) can mitigate these effects. The VM-based architecture introduces only a constant slow-down by the Church–Turing thesis, preserving theoretical completeness.
  • Noted limitations include the risk of combinatorial explosion, incomplete instantiation of global fitness functions, the necessity of a hand-crafted core (the universality of which remains an open question), the complexity of numerous hyperparameters, and the challenge of real-world noise and partial observability on binary partitioning accuracy.

A plausible implication is that AGINAO provides a unique demonstration of a fully self-programming cognitive engine coupling intrinsic information-theoretic reward, open-ended symbolic composition, and reinforcement learning in real-time robotic platforms. While promising, full realization of human-level AGI will require advances in scalability, more sophisticated global objective models, and improved integration of perceptual-motor contingencies (Skaba, 2018, Skaba, 2018).

7. Key Algorithms and Formal Summary

The principal mathematical mechanisms central to AGINAO operation are summarized below:

Algorithm Expression Context
Exploration Probability TT4 Exploration vs. exploitation
Action Selection TT5 Action selection among successors
TD-Learning Update TT6 Reinforcement learning
Concept Intrinsic Reward TT7, TT8, TT9 Information-theoretic evaluation
Average Reward Update NN0 Global reward estimation
Actuator Credit Update NN1 Actuator concept value assignment
Hash-Pooling Test reject if NN2, NN3 Redundant codelet pruning

This formalization enables the system to autonomously explore, select, and organize cognitive modules, directly coupling physical sensorimotor interaction with continual self-modification and open-ended learning (Skaba, 2018, Skaba, 2018).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to AGINAO.