AGINAO: Self-Programming Cognitive Engine
- AGINAO is an AGI architecture combining self-programming, intrinsic motivation, and on-line RL to autonomously discover and compose sensorimotor-based cognitive modules.
- The system employs a dual-layer design with a hand-crafted core for stability and a self-programmed cognitive layer running on a custom virtual machine inspired by a Universal Turing Machine.
- Heuristic program generation, hash pooling, and runtime validation effectively prune and integrate codelets into a dynamic, hierarchical concept graph.
AGINAO is an architecture for human-level artificial general intelligence (AGI), realized as a self-programming, open-ended cognitive engine embedded in the Aldebaran NAO humanoid robot. Its approach combines stochastic code generation, information-theoretic intrinsic motivation, hierarchical program composition, and on-line reinforcement learning operating over a virtual machine (VM) simulating a Universal Turing Machine (UTM). The system is designed to autonomously discover, evaluate, and compose computational modules (“concepts”) interacting with real-world sensorimotor experience (Skaba, 2018, Skaba, 2018).
1. System Architecture and Virtual Machine Model
AGINAO bifurcates its software into two distinct layers:
- Core Layer: Fully hand-crafted, responsible for stability, real-time scheduling, exception handling, and atomic I/O services (e.g., sensor polling, actuator command invocation). This layer operates natively on the robot’s host CPU.
- Cognitive Layer: Entirely self-programmed, runs as a multi-threaded, embedded control program atop a custom 16-bit, multi-tape TM-inspired virtual machine. The VM features accumulator and index registers, integer word memory, control flags, and exposes an instruction set adequate for general computation.
All higher-level cognitive structures, or "concepts", are represented as compact sequences of VM instructions (“codelets”). These codelets are instantiated as nodes in a dynamic, hierarchical graph, with directed links indicating data flow and execution dependency among concepts.
2. Self-Programming and Concept Discovery
AGINAO continuously expands its concept graph via a staged pipeline:
- Heuristic Program Generation: The Program Generator assembles candidate codelets of length 4–8 instructions. Over 30 heuristics immediately eliminate code fragments with illegal operations (e.g., use-before-init, unconditional jumps into invalid locations, unreachable RET), reducing the candidate pool from approximately to for 7-instruction codelets.
- Hash Pooling: Redundant code generation is curtailed by hash-based frequency control. A codelet is rejected if
where is the total number of releases and the hash pool size ().
- Sanity Checks: Instantiated codelets undergo early-stage runtime validation, verifying avoidance of out-of-bounds access, illegal opcodes, infinite loops, or persistent errors. Repeatedly failing codelets and their incident links are pruned.
- Concept Hierarchy Integration: Validated codelets become nodes in the hierarchy, with formalized I/O signatures, static memory, maximum output sizing, and links to both upstream and downstream concepts (including potential actuator templates).
The complexity-based curriculum search strategy biases concept discovery towards shorter and faster codelets, reflecting the prevalence of low Kolmogorov-complexity patterns in real-world sensorimotor data.
3. Intrinsic Reward, Reinforcement Learning, and Thread Management
AGINAO implements fully online, multi-threaded reinforcement learning, guided primarily by an intrinsic, information-theoretic reward model:
- Thread Dynamics: Each execution of a concept is instantiated as a lightweight thread, characterized by a priority (proportional to action-value ), resource (CPU time) budget, and expiration timestamp. Threads failing to terminate productively are discarded.
- Binary Space Partitioning: Each codelet partitions its input vector space into "positive" (RET) and "negative" (EXIT) regions, registering empirical match probability as
where and 0 are cumulative positive and negative event counters.
- Intrinsic Reward Metric: The expected instantaneous reward per execution is
1
maximizing for 2. The corresponding information-theoretic quantity per positive detection is 3.
- Exploration/Exploitation: For concept 4 with outgoing actions 5 and values 6, exploration occurs with
7
and exploitation follows
8
- Temporal-Difference Learning: When action 9 terminates at 0, its value is updated as
1
where 2 captures expected downstream value.
Positive thread terminations deliver bonuses, enabling deeper branching and greater concept proliferation.
4. Actuator Evaluation and Global Reward Model
Actuator concepts (leaf nodes representing effectors) do not compute intrinsic information gain, requiring indirect evaluation. Their value is determined by their influence on the global average reward, measured over real time:
- Actuator Value Update:
3
4
where 5 encodes actuator 6's credit, 7 is the count of overlapping activations, 8 the actuation cost for input 9, and 0 a normalization factor.
- Global Reward Computation:
1
with decay parameter 2. Poorly credited actuator concepts are pruned from the hierarchy.
An empirical measurement on a deployed NAO robot established a characteristic sensorimotor feedback delay of approximately 300 ms following a visual stimulus, informing the dynamic assignment of credit in actuator evaluation (Skaba, 2018).
5. Hierarchical and Temporal Organization
The AGINAO hierarchy is explicitly structured along both spatial and temporal axes:
- Pattern vs Concept: Patterns are any regularities in space/time; concepts instantiate codelets that operationalize these patterns as classifiers or effectors.
- Hierarchy Levels: Atomic sensory concepts (level 0) encapsulate raw inputs (pixels, joint sensors). Higher-level, self-generated concepts aggregate outputs of lower-level nodes; actuator concepts form terminating leaf nodes.
- Runtime Concurrency: Multiple (multi-threaded) instances of the same concept may process distinct data concurrently, propagating outputs to successor concepts, or spawning children upon termination.
- Temporal Integration: The execution model accommodates temporal structures via explicit WAIT instructions, allowing threads to suspend execution and detect time-gapped, sequential patterns within the real world.
6. Empirical Performance, Scalability, and Limiting Factors
- On physical NAO hardware (Intel Atom 1.6 GHz, Wi-Fi-connected), the AGINAO system generated hundreds of thousands of concepts during hours of unsupervised operation approximating a “preschool-like” environment. Codelet discovery is accelerated by approximately 3-fold due to heuristic search and filtering relative to brute-force methods.
- Real-time constraints are maintained by internal economic mechanisms: per-thread budgets, prioritization, and expiration.
- Scalability of the VM is inherently bounded by memory and search overhead, but dynamic pruning via RL plus multi-core parallelization (under investigation) can mitigate these effects. The VM-based architecture introduces only a constant slow-down by the Church–Turing thesis, preserving theoretical completeness.
- Noted limitations include the risk of combinatorial explosion, incomplete instantiation of global fitness functions, the necessity of a hand-crafted core (the universality of which remains an open question), the complexity of numerous hyperparameters, and the challenge of real-world noise and partial observability on binary partitioning accuracy.
A plausible implication is that AGINAO provides a unique demonstration of a fully self-programming cognitive engine coupling intrinsic information-theoretic reward, open-ended symbolic composition, and reinforcement learning in real-time robotic platforms. While promising, full realization of human-level AGI will require advances in scalability, more sophisticated global objective models, and improved integration of perceptual-motor contingencies (Skaba, 2018, Skaba, 2018).
7. Key Algorithms and Formal Summary
The principal mathematical mechanisms central to AGINAO operation are summarized below:
| Algorithm | Expression | Context |
|---|---|---|
| Exploration Probability | 4 | Exploration vs. exploitation |
| Action Selection | 5 | Action selection among successors |
| TD-Learning Update | 6 | Reinforcement learning |
| Concept Intrinsic Reward | 7, 8, 9 | Information-theoretic evaluation |
| Average Reward Update | 0 | Global reward estimation |
| Actuator Credit Update | 1 | Actuator concept value assignment |
| Hash-Pooling Test | reject if 2, 3 | Redundant codelet pruning |
This formalization enables the system to autonomously explore, select, and organize cognitive modules, directly coupling physical sensorimotor interaction with continual self-modification and open-ended learning (Skaba, 2018, Skaba, 2018).