Query-Conditioned Deterministic Inference Networks

Updated 14 November 2025

QDINs are modular architectures that deterministically map structured queries and observed states to specialized answer spaces using advanced encoding and attention-based fusion.
They integrate state and query encoders with specialized inference heads (reachability, path, comparison, policy) to efficiently extract actionable, interpretable outputs.
Empirical results show that mixed-objective training balances precise inference and robust control, achieving high performance in both structured query answering and reinforcement learning tasks.

Query-Conditioned Deterministic Inference Networks (QDIN) are a class of architectures designed to answer families of structured queries about probabilistic or dynamical systems in a deterministic, efficient, and interpretable manner. Originating from the intersection of probabilistic graphical models, operator-theoretic learning, and reinforcement learning, QDINs generalize traditional inference networks by parameterizing the entire mapping from a query (specifying conditioning, question type, and parameters) and an observed state to the corresponding answer. In deterministic reinforcement learning, QDINs reimagine agents as modular, queryable knowledge bases: rather than only selecting actions, the agent can efficiently answer questions such as reachability, path extraction, state comparisons, or policy queries "on demand."

1. Formal Definition and Architectural Framework

Let 𝒮 denote the (possibly high-dimensional) state space and 𝒬 the set of structured queries. For each query $q\in\mathcal{Q}$ , there is an associated answer space $\mathcal{Y}_q$ . A QDIN is defined as a parameterized function:

$f_\theta: \mathcal{S} \times \mathcal{Q} \to \mathcal{Y}$

implemented by composing a state encoder, a query encoder, and a fusion mechanism (often attention-based) feeding into specialized, query-specific inference heads.

a. State and Query Encoding:

Multi-scale convolutional layers extract hierarchical features from the input state, e.g., for an agent in a spatial grid:
- $h_1 = \mathrm{Conv}_{3\times3}(s;32)$ , $h_2 = \mathrm{Conv}_{3\times3}(h_1;64)$ , $h_3 = \mathrm{Conv}_{3\times3}(h_2;128)$ .
Query $q = (\text{type}, \text{params})$ $q = (type, params)$ is mapped to an embedding:
- Discrete type via learned lookup table; params through an MLP; the concatenated embedding is layer-normalized ( $\mathbb{R}^{80}$ ).

b. Query–State Fusion:

A single-head cross-attention module (as in Vaswani et al. 2017) allows the query embedding to attend over spatial state features, producing $h_\mathrm{fused}$ .

c. Specialized Heads:

Given the query type, $h_\mathrm{fused}$ is routed to the corresponding inference head:

Reachability: Produces a spatial mask $\hat{R}_H(s)$ denoting states reachable in $H$ steps.
Path: LSTM-pointer network extracts explicit waypoints and estimated distances.
Comparison: MLP-based absolute difference classifier for relative queries among goals.
Policy: Standard action distribution as softmax for direct control queries.

This modular structure ensures $\forall (s,q)$ , $f_\mathrm{QDIN}(s,q) = \operatorname{Head}_{q.\mathrm{type}}(h_\mathrm{fused}(s,q)) \in \mathcal{Y}_{q.\mathrm{type}}$ .

2. Training Objectives and Loss Structure

QDINs are trained with a multi-objective loss that jointly optimizes for control and various query-answering capacities:

$\mathcal{L}(\theta) = \alpha_\mathrm{control} \mathcal{L}_\mathrm{TD} + \sum_{q\in\mathcal{Q}} \alpha_q \mathcal{L}_q + \lambda \mathcal{L}_\mathrm{consistency}$

$\mathcal{L}_\mathrm{TD}$ : Temporal-difference loss for value or policy learning.
$\mathcal{L}_q$ : Losses specialized to each query type (binary cross-entropy per pixel for reachability, composite CE-MAE for paths, etc.).
$\mathcal{L}_\mathrm{consistency}$ : Enforces logical relations (e.g., consistency between reachability and path length).
Each per-query loss is normalized by an exponential moving average, mitigating head dominance during early training phases.

This structure explicitly promotes learning distinct, query-specific representations alongside traditional control, enabling high accuracy for diverse inference patterns.

3. Specialized Architectures in Practice

QDIN architecture is strongly modular:

Module	Inputs / Features	Output / Functionality
State Encoder	$s$ via 3 Conv layers, skip from $h_1$	$h_1, h_2, h_3$ (multi-scale features)
Query Encoder	type (emb.), params (MLP), layernorm	$h_q$ ( $\mathbb{R}^{80}$ query embedding)
Fusion	Cross-attention: $Q=h_q,\ K,V=h_s$	$h_\mathrm{fused}$ , shape: $W\times H\times C$
Heads	$h_\mathrm{fused}$ , additional params	Query-dependent output

Reachability head: Two ConvTranspose blocks, skip connection, outputs mask, sigmoid activation.
Path head: LSTM with pointer attention across spatial grid, predicts waypoint sequence plus distance.
Comparison head: Small MLP of size $64\rightarrow32\rightarrow1$ .
Policy head: Linear layer maps to $|\mathcal{A}|$ actions with softmax.

Each module is optimized for its query-specific inference pattern. Deactivating or ablating these modules leads to significant degradation in the corresponding metrics (e.g., removing specialized heads decreases reachability IoU by 0.18 and increases path MAE by +5.1).

4. Inference–Control Representation Decoupling

A central empirical observation of QDINs in deterministic (e.g., grid-world) environments is the pronounced separation between accurate world inference and optimal policy learning:

Training solely on inference (no control loss, $\alpha_\mathrm{control}=0$ ) yields peak reachability mask accuracy (IoU $\approx 0.99$ ), with poor navigation returns ( $\approx 0.31$ ).
Control-only training ( $\alpha_q=0$ ) achieves high navigation return ( $\approx 0.89$ ), but low reachability IoU ( $\approx 0.72$ ).
Mixed-objective QDINs nearly recover both (IoU $=0.97$ , return $=0.82$ ).

Training Mode	Reach IoU	Path MAE	Comp Acc	Policy Acc	Return
Control-Only	0.72±0.03	8.4±0.5	0.74±0.02	0.81±0.02	0.89±0.03
Query-Only	0.99±0.01	1.2±0.1	0.92±0.01	0.43±0.04	0.31±0.05
Mixed (Ours)	0.97±0.01	2.1±0.2	0.88±0.02	0.76±0.02	0.82±0.03

These results (see (Zakershahrak, 11 Nov 2025), Table 1) indicate that the representations sufficient for exact world-structure inference are not necessarily those that support high reward. This decoupling implies QDINs can serve as accurate knowledge bases irrespective of their control proficiency.

5. Empirical Performance and Comparative Evaluation

QDINs, evaluated across ablations, compositional generalization, inference efficiency, and calibration, show superior performance versus unified or post-hoc extraction baselines:

Ablation: Removing specialized heads or cross-attention causes significant drops in reachability IoU, path accuracy, and return.
Generalization: QDINs achieve 73% zero-shot accuracy on composite queries, compared to 41% for monolithic architectures.
Efficiency: QDINs answer reachability in 5–10 ms at 0.97 IoU; A* search requires 100–200 ms for exact solutions.
Calibration: Temperature scaling produces ECE $=0.031$ ; selective answering allows 95% accuracy at 80% coverage.

This suggests specialized, query-conditioned designs are more robust and efficient than sharing the same representation for all query types or extracting answers from monolithic control policies.

6. Interpretability, Verification, and Modularity

QDINs provide directly interpretable outputs for each query type:

Reachability masks offer explicit spatial visualization of possible agent locations within a horizon.
Path outputs enumerate step-wise plans as human-understandable trajectories.
Comparisons yield meaningful probabilistic preferences between alternative goals.
Standard policies retain RL compatibility.

The modularity of QDINs supports formal safety verification (e.g., checking that unsafe states do not appear in reachable sets) and interactive human–AI teaming—for example, a user querying whether an agent can reach a location within a specified budget and instantly observing either path or coverage mask. Extensions to temporal logic or probabilistic query types are enabled by simply adding new neural modules, without model reengineering.

QDINs generalize and unify previous approaches to deterministic and probabilistic inference:

The structure mirrors the Query DAG (Q-DAG) approach for Bayesian networks, which compiles possible inferences into a single optimized data structure, enabling fast evaluation and update (Darwiche et al., 2013).
In undirected graphical models, unrolled inference networks (e.g., QT-NN) answer arbitrary subsetting queries over observed variables, sidestepping intractable partition functions and amortizing inference via deterministic mappings (Lazaro-Gredilla et al., 2019).
From an operator-theoretic perspective, QDINs embody the paradigm of learning a conditional expectation operator in a finite basis: encoding arbitrary queries as coefficients and producing answers by a single forward pass and inner product, as characterized in Neural Conditional Probability (Kostic et al., 1 Jul 2024).

This modularity, combined with direct end-to-end training for diverse queries, positions QDINs as a general substrate for compositional, fast, and interpretable conditional inference, with applications ranging from explainable reinforcement learning to probabilistic reasoning and uncertainty quantification.