Jurassic-X: Modular Neuro-Symbolic AI

Updated 16 December 2025

Jurassic-X is a production-grade system implementing the MRKL architecture that combines a neural LLM with expert symbolic modules.
It employs a directed acyclic graph with softmax-gated routing to delegate queries, enabling multi-hop compositional reasoning.
The system integrates dynamic external knowledge retrieval with precise symbolic reasoning, achieving high performance on diverse tasks.

Jurassic-X is AI21 Labs’ production-grade instantiation of the Modular Reasoning, Knowledge and Language (MRKL) system architecture, designed to overcome intrinsic limitations of LLMs in reasoning, factuality, and up-to-date knowledge access. It achieves this by orchestrating a dynamic interaction between a base neural LLM (Jurassic-1), a suite of external “expert” modules (both neural and symbolic), and a learned router that adjudicates the delegation of user queries to appropriate modules. Jurassic-X delivers a scalable, extensible, and interpretable architecture for complex question answering and reasoning tasks, combining the advantages of neural and symbolic paradigms (Karpas et al., 2022).

1. System Architecture and Data Flow

At the core of Jurassic-X is a directed acyclic graph $G = (V, E)$ , where the vertices comprise the query router ( $R$ ), the base LLM, a set of expert modules $\{M_1, ..., M_k\}$ , the user, and the output node. Edges define permissible data flows and multi-hop interaction protocols:

User $\to$ Router
Router $\to$ Module $M_i$
Module $M_i \to$ Router (for iterative/multi-hop logic)
$M_i \to$ Output or LLM $\to$ Output

The router maps a user query $u \in U$ to a selection of expert modules and their arguments via $f_{\mathrm{router}}: U \to \{(i, \mathrm{args}_i)\}$ for $i \in \{1, ..., k, \mathrm{LLM}\}$ . Modules $M_i$ implement signature $f_i(\mathrm{args}_i) \to y_i$ .

Data Flow Sequence:

User query $u$ is encoded to a hidden representation $h = \mathrm{Encoder}(u)$ by the LLM’s encoder.
Module selection is computed by softmax gating: $p = \mathrm{softmax}(W_r h + b_r)$ , with $i^* = \arg\max p$ .
For neural experts, $u$ or a templated prompt is forwarded; for symbolic modules, the router uses a dedicated argument-extraction head to yield $\mathrm{args}^*$ , then invokes $f_{i^*}(\mathrm{args}^*)$ .
The expert’s response $y^*$ is either directly output or recursively re-routed for further composition.

The architecture is designed for extensibility (adding new modules with minimal retraining) and for interpretability (by enforcing explicit delegation and argument extraction pathways).

2. Knowledge Retrieval and External Information Sources

Jurassic-X incorporates dynamic external knowledge through a dedicated retrieval subsystem. The design abstracts document collections, database tuples, and APIs as text chunks $d_j$ , each mapped to embedding vectors $k_j \in \mathbb{R}^d$ via a shared mapping function $E_{\mathrm{map}}$ (typically reusing the LLM’s pooling layer).

Given a query $q$ , the router (or specialized retriever) computes query embedding $\vec{q} = E_{\mathrm{map}}(q)$ . Top- $K$ relevant documents are identified with cosine similarity:

$\mathrm{sim}(\vec{q}, \vec{k}_j) = \frac{\vec{q} \cdot \vec{k}_j}{\|\vec{q}\|\|\vec{k}_j\|}$

Approximate nearest neighbor (ANN) search frameworks, such as FAISS, index and retrieve candidates efficiently.

An optional reranking network can further score candidates:

$s(q, d_j) = \sigma(w_s [\vec{q}; \vec{k}_j] + b_s)$

Returned text snippets or structured outputs are provided to downstream LLM or symbolic modules for answer composition.

3. Symbolic Reasoning Modules and Neural-to-Symbolic Interface

Jurassic-X exposes explicit symbolic reasoners for tasks that have proven brittle for LLMs:

Arithmetic calculator (supporting addition, subtraction, multiplication, division)
Currency and unit conversion APIs
External database lookup and logical query modules

To invoke symbolic modules, the router must extract formal argument tuples from free-form language queries. This is achieved by training a prompt-tunable argument-extraction head over synthetic data, with parameterization:

A learned prompt matrix $P \in \mathbb{R}^{10 \times d}$ prepended to each input
Jurassic-1 parameters $\theta_L$ frozen
Output is a linearized argument string (e.g., “mul 2845 1792”)
Training loss: cross-entropy, $L_{\mathrm{CE}}(P) = -\sum_{(u_i, s_i) \in D} \log p(s_i | [P; \mathrm{Encode}(u_i)]; \theta_L)$

The symbolic module subsequently applies a deterministic computation, e.g., $result = \mathrm{CALC}(op, [x_1, ..., x_n])$ .

4. Routing, Integration, and Compositional Multi-hop Reasoning

Routing decisions among $k+1$ experts (including the base LLM) are made via softmax probabilities $p_i(u)$ . A threshold $\tau$ is set, such that if $\max_i p_i < \tau$ , the query is handled by the LLM as a fallback. Otherwise, the highest-confidence expert $i^*$ is selected.

Jurassic-X supports multi-hop compositional reasoning: The router may iteratively dispatch intermediate results—e.g., first querying an external date API, then feeding the structured result to the arithmetic module as a new query—invoking itself recursively.

The decoupled architecture mitigates the “module explosion vs. model explosion” problem: introducing new experts requires only lightweight router/prompt retraining, not full LLM fine-tuning, enabling scalable expansion of domain coverage.

5. Training Procedures and Performance

Training of symbolic argument extractors is prompt-based, with router and symbolic module heads tuned using a synthetic dataset spanning diverse operand types (1–9 digits, digits/words), question formats, and operations. Hyperparameters include a learning rate of 0.3, linear decay, batch size 32, prompt length 10 tokens, and weight decay $\lambda=10^{-3}$ to mitigate overfitting.

End-to-end fine-tuning involves joint optimization of router, argument extractor, and LLM on a mixture of annotated user logs and synthetic examples:

$L_{\mathrm{total}} = L_{\mathrm{CE}}^{\mathrm{routing}} + \alpha L_{\mathrm{CE}}^{\mathrm{arg\_extract}} + \beta L_{\mathrm{LM}}(u, y_{\mathrm{ref}})$

where $L_{\mathrm{LM}}$ denotes cross-entropy over LLM outputs.

Performance on benchmark tasks demonstrates near-perfect accuracy in arithmetic argument extraction and robust generalization:

Digit-length generalization (1-digit to 9-digits): 100% accuracy on addition/multiplication; GPT-3 baselines fall below 30% at 4 digits.
Robustness to question-format and operation compositionality: $>99\%$ for four of five held-out phrasing types; $>90\%$ for 22/29 two-operation combinations.
Latency: additional router plus symbolic module round-trip incurs $<50$ ms overhead, enabling sub-second response times.

6. Implementation Challenges and Solutions

Key challenges included:

Neuro-symbolic chasm: Extraction of discrete arguments from natural language with high lexical and phrasal variability. Addressed via large-scale synthetic data generation, prompt-tuning, and regularization.
Expert extensibility: Avoiding LLM retraining when adding new modules. Solved by architectural decoupling and minimal retraining confined to router and prompts.
Interpretability and debugging: Diagnosing failures in module selection and argument extraction. Each symbolic invocation records an explicit rationale (e.g., operation and arguments), facilitating error tracing.

The modular structure also enables systematic auditing and maintenance of individual module performance, supporting improved reliability and explainability over monolithic LLM deployments.

7. Significance and Implications

Jurassic-X demonstrates the practical viability of the MRKL neuro-symbolic paradigm in AI question answering and reasoning (Karpas et al., 2022). By fusing neural language understanding with symbolic precision and externalized knowledge, it achieves substantially higher accuracy and generalization—for instance, in systematic arithmetic extraction—than LLMs alone. The low-intrusion approach to extensibility, rapid compositional reasoning, and interpretable logic are indicative of architectural trends poised to address the factuality, controllability, and transparency deficits of foundation models. The Jurassic-X system represents a reference implementation for scalable, production-grade, modular neuro-symbolic AI.

Markdown Report Issue Upgrade to Chat

References (1)

MRKL Systems: A modular, neuro-symbolic architecture that combines large language models, external knowledge sources and discrete reasoning (2022)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Jurassic-X.