AGINAO: Self-Programming Cognitive Engine

Updated 2 July 2026

AGINAO is an AGI architecture combining self-programming, intrinsic motivation, and on-line RL to autonomously discover and compose sensorimotor-based cognitive modules.
The system employs a dual-layer design with a hand-crafted core for stability and a self-programmed cognitive layer running on a custom virtual machine inspired by a Universal Turing Machine.
Heuristic program generation, hash pooling, and runtime validation effectively prune and integrate codelets into a dynamic, hierarchical concept graph.

AGINAO is an architecture for human-level artificial general intelligence (AGI), realized as a self-programming, open-ended cognitive engine embedded in the Aldebaran NAO humanoid robot. Its approach combines stochastic code generation, information-theoretic intrinsic motivation, hierarchical program composition, and on-line reinforcement learning operating over a virtual machine (VM) simulating a Universal Turing Machine (UTM). The system is designed to autonomously discover, evaluate, and compose computational modules (“concepts”) interacting with real-world sensorimotor experience (Skaba, 2018, Skaba, 2018).

1. System Architecture and Virtual Machine Model

AGINAO bifurcates its software into two distinct layers:

Core Layer: Fully hand-crafted, responsible for stability, real-time scheduling, exception handling, and atomic I/O services (e.g., sensor polling, actuator command invocation). This layer operates natively on the robot’s host CPU.
Cognitive Layer: Entirely self-programmed, runs as a multi-threaded, embedded control program atop a custom 16-bit, multi-tape TM-inspired virtual machine. The VM features accumulator and index registers, integer word memory, control flags, and exposes an instruction set adequate for general computation.

All higher-level cognitive structures, or "concepts", are represented as compact sequences of VM instructions (“codelets”). These codelets are instantiated as nodes in a dynamic, hierarchical graph, with directed links indicating data flow and execution dependency among concepts.

2. Self-Programming and Concept Discovery

AGINAO continuously expands its concept graph via a staged pipeline:

Heuristic Program Generation: The Program Generator assembles candidate codelets of length 4–8 instructions. Over 30 heuristics immediately eliminate code fragments with illegal operations (e.g., use-before-init, unconditional jumps into invalid locations, unreachable RET), reducing the candidate pool from approximately $10^{20}$ to $10^8$ for 7-instruction codelets.
Hash Pooling: Redundant code generation is curtailed by hash-based frequency control. A codelet is rejected if

$\text{count}[h] > 2 \cdot \frac{T}{N}\,,\quad h = \mathrm{hash}(p) \bmod N$

where $T$ is the total number of releases and $N$ the hash pool size ( $N \approx 2^{14}$ ).

Sanity Checks: Instantiated codelets undergo early-stage runtime validation, verifying avoidance of out-of-bounds access, illegal opcodes, infinite loops, or persistent errors. Repeatedly failing codelets and their incident links are pruned.
Concept Hierarchy Integration: Validated codelets become nodes in the hierarchy, with formalized I/O signatures, static memory, maximum output sizing, and links to both upstream and downstream concepts (including potential actuator templates).

The complexity-based curriculum search strategy biases concept discovery towards shorter and faster codelets, reflecting the prevalence of low Kolmogorov-complexity patterns in real-world sensorimotor data.

3. Intrinsic Reward, Reinforcement Learning, and Thread Management

AGINAO implements fully online, multi-threaded reinforcement learning, guided primarily by an intrinsic, information-theoretic reward model:

Thread Dynamics: Each execution of a concept is instantiated as a lightweight thread, characterized by a priority (proportional to action-value $Q$ ), resource (CPU time) budget, and expiration timestamp. Threads failing to terminate productively are discarded.
Binary Space Partitioning: Each codelet partitions its input vector space into "positive" (RET) and "negative" (EXIT) regions, registering empirical match probability $p$ as

$p = \frac{N_{\rm pos}}{N_{\rm pos} + N_{\rm neg}}$

where $N_{\rm pos}$ and $10^8$ 0 are cumulative positive and negative event counters.

Intrinsic Reward Metric: The expected instantaneous reward per execution is

$10^8$ 1

maximizing for $10^8$ 2. The corresponding information-theoretic quantity per positive detection is $10^8$ 3.

Exploration/Exploitation: For concept $10^8$ 4 with outgoing actions $10^8$ 5 and values $10^8$ 6, exploration occurs with

$10^8$ 7

and exploitation follows

$10^8$ 8

Temporal-Difference Learning: When action $10^8$ 9 terminates at $\text{count}[h] > 2 \cdot \frac{T}{N}\,,\quad h = \mathrm{hash}(p) \bmod N$ 0, its value is updated as

$\text{count}[h] > 2 \cdot \frac{T}{N}\,,\quad h = \mathrm{hash}(p) \bmod N$ 1

where $\text{count}[h] > 2 \cdot \frac{T}{N}\,,\quad h = \mathrm{hash}(p) \bmod N$ 2 captures expected downstream value.

Positive thread terminations deliver bonuses, enabling deeper branching and greater concept proliferation.

4. Actuator Evaluation and Global Reward Model

Actuator concepts (leaf nodes representing effectors) do not compute intrinsic information gain, requiring indirect evaluation. Their value is determined by their influence on the global average reward, measured over real time:

Actuator Value Update:

$\text{count}[h] > 2 \cdot \frac{T}{N}\,,\quad h = \mathrm{hash}(p) \bmod N$ 3

$\text{count}[h] > 2 \cdot \frac{T}{N}\,,\quad h = \mathrm{hash}(p) \bmod N$ 4

where $\text{count}[h] > 2 \cdot \frac{T}{N}\,,\quad h = \mathrm{hash}(p) \bmod N$ 5 encodes actuator $\text{count}[h] > 2 \cdot \frac{T}{N}\,,\quad h = \mathrm{hash}(p) \bmod N$ 6's credit, $\text{count}[h] > 2 \cdot \frac{T}{N}\,,\quad h = \mathrm{hash}(p) \bmod N$ 7 is the count of overlapping activations, $\text{count}[h] > 2 \cdot \frac{T}{N}\,,\quad h = \mathrm{hash}(p) \bmod N$ 8 the actuation cost for input $\text{count}[h] > 2 \cdot \frac{T}{N}\,,\quad h = \mathrm{hash}(p) \bmod N$ 9, and $T$ 0 a normalization factor.

Global Reward Computation:

$T$ 1

with decay parameter $T$ 2. Poorly credited actuator concepts are pruned from the hierarchy.

An empirical measurement on a deployed NAO robot established a characteristic sensorimotor feedback delay of approximately 300 ms following a visual stimulus, informing the dynamic assignment of credit in actuator evaluation (Skaba, 2018).

5. Hierarchical and Temporal Organization

The AGINAO hierarchy is explicitly structured along both spatial and temporal axes:

Pattern vs Concept: Patterns are any regularities in space/time; concepts instantiate codelets that operationalize these patterns as classifiers or effectors.
Hierarchy Levels: Atomic sensory concepts (level 0) encapsulate raw inputs (pixels, joint sensors). Higher-level, self-generated concepts aggregate outputs of lower-level nodes; actuator concepts form terminating leaf nodes.
Runtime Concurrency: Multiple (multi-threaded) instances of the same concept may process distinct data concurrently, propagating outputs to successor concepts, or spawning children upon termination.
Temporal Integration: The execution model accommodates temporal structures via explicit WAIT instructions, allowing threads to suspend execution and detect time-gapped, sequential patterns within the real world.

6. Empirical Performance, Scalability, and Limiting Factors

On physical NAO hardware (Intel Atom 1.6 GHz, Wi-Fi-connected), the AGINAO system generated hundreds of thousands of concepts during hours of unsupervised operation approximating a “preschool-like” environment. Codelet discovery is accelerated by approximately $T$ 3-fold due to heuristic search and filtering relative to brute-force methods.
Real-time constraints are maintained by internal economic mechanisms: per-thread budgets, prioritization, and expiration.
Scalability of the VM is inherently bounded by memory and search overhead, but dynamic pruning via RL plus multi-core parallelization (under investigation) can mitigate these effects. The VM-based architecture introduces only a constant slow-down by the Church–Turing thesis, preserving theoretical completeness.
Noted limitations include the risk of combinatorial explosion, incomplete instantiation of global fitness functions, the necessity of a hand-crafted core (the universality of which remains an open question), the complexity of numerous hyperparameters, and the challenge of real-world noise and partial observability on binary partitioning accuracy.

A plausible implication is that AGINAO provides a unique demonstration of a fully self-programming cognitive engine coupling intrinsic information-theoretic reward, open-ended symbolic composition, and reinforcement learning in real-time robotic platforms. While promising, full realization of human-level AGI will require advances in scalability, more sophisticated global objective models, and improved integration of perceptual-motor contingencies (Skaba, 2018, Skaba, 2018).

7. Key Algorithms and Formal Summary

The principal mathematical mechanisms central to AGINAO operation are summarized below:

Algorithm	Expression	Context
Exploration Probability	$T$ 4	Exploration vs. exploitation
Action Selection	$T$ 5	Action selection among successors
TD-Learning Update	$T$ 6	Reinforcement learning
Concept Intrinsic Reward	$T$ 7, $T$ 8, $T$ 9	Information-theoretic evaluation
Average Reward Update	$N$ 0	Global reward estimation
Actuator Credit Update	$N$ 1	Actuator concept value assignment
Hash-Pooling Test	reject if $N$ 2, $N$ 3	Redundant codelet pruning

This formalization enables the system to autonomously explore, select, and organize cognitive modules, directly coupling physical sensorimotor interaction with continual self-modification and open-ended learning (Skaba, 2018, Skaba, 2018).

Markdown Report Issue Upgrade to Chat

References (2)

The AGINAO Self-Programming Engine (2018)

Evaluating Actuators in a Purely Information-Theory Based Reward Model (2018)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to AGINAO.