Interaction-Centric Knowledge Infusion

Updated 12 November 2025

Interaction-centric knowledge infusion is a paradigm that explicitly integrates structured external knowledge into neural architectures using relational context and dynamic interaction patterns.
It employs modular infusion strategies across multiple domains, including language, vision, RL, and recommendation systems, enhancing alignment, data efficiency, and robustness.
Empirical studies report significant improvements in accuracy, convergence speed, and stability by reducing hallucinations and bias through explicit interaction-aware mechanisms.

Interaction-Centric Knowledge Infusion is an architectural and methodological paradigm for integrating structured external knowledge into neural models in a manner that emphasizes the modeling, transfer, and utilization of explicit interactions—whether between tokens, objects, entities, or modality pairs—across deep learning systems. Distinguished from general knowledge infusion by the central focus on relational context, bidirectional information flow, and dynamic interaction patterns, this approach has become foundational in state-of-the-art language understanding, visual reasoning, reinforcement learning, and open-world recommendation systems.

1. Conceptual Foundations and Scope

Interaction-centric knowledge infusion targets the explicit encoding and propagation of interaction motifs—such as token-token, object-object, entity-relation, or user-item pairs—within neural architectures. The key tenet is that high-level reasoning and generalization require not merely the assimilation of discrete facts or attributes, but explicit grounding of model components in structured, relational, and interaction-aware priors. This paradigm is realized across a spectrum of domains:

LLMs: Injecting knowledge graphs or interaction constraints into transformers to guide self-attention, mitigate hallucinations, and improve alignment (Roy et al., 2023, Faldu et al., 2021).
Vision-language and vision-only models: Supervising transformers to preserve cross-modal interaction patterns, often via auxiliary query streams or interaction-based distillation (Gao et al., 23 Sep 2025, Li et al., 8 Nov 2025).
Reinforcement learning: Structuring meta-policies and interaction primitives over knowledge-induced type spaces to enable policy transfer and rapid generalization (Kumar et al., 2024).
Recommender systems: Decomposing knowledge into interaction factors and fusing them as features at the user-item or cluster level (Xi et al., 2024).
Event and causality extraction: Initializing model weights or filters directly with centroids from frequent n-gram interactions, yielding fast convergence and robust relational patterning (Wang et al., 2021).

2. Architectural Strategies for Infusion

A hallmark of interaction-centric approaches is the modularization of points for knowledge insertion, with systematic strategies for integrating knowledge at multiple levels:

Model Type	Infusion Point(s)	Infusion Mechanism
Transformer NLP	Embedding, Self-Attention, FFN	Additive vectors/matrices
Vision Transformer	Interaction Query modules, Attention maps	Parallel queries, gated fusion
Detectors/Scene Graph	Prompt engineering, Cross-attention layers	Bidirectional relation prompts
Sequential/Conv	Convolutional filter initialization	Centroid seeding
RL/Meta-policy	Type graph embeddings, Meta-action spaces	Graph embedding, reward shaping

In transformers (Roy et al., 2023), knowledge can be infused as per-token KG-derived vectors at the input, as additive bias matrices to attention scores (encoding pairwise concept relations), or post-attention as hidden-state augments. Scene-graph models (Li et al., 8 Nov 2025) integrate interaction knowledge at the prompt interface, using bidirectional prompts to constrain and inform cross-attention between subject-predicate-object triplets. In convolutional and sequential architectures, domain-interaction patterns serve as initialization prototypes for filters or submodules (Wang et al., 2021).

3. Mathematical Formulation and Pseudocode Patterns

Interaction-centric knowledge infusion relies on explicit mathematical constructs to regulate how extrinsic knowledge modulates model representations and inductive biases:

In multi-head self-attention, pairwise graph affinity matrices $M^{KG}$ (from KG node embeddings) are added to the raw attention score matrix before softmax, so that

$S^\ell_i = \frac{Q^\ell_i (K^\ell_i)^T}{\sqrt{d_k}} + \alpha^\ell M^{KG},$

with $\alpha^\ell$ gating the application per layer (Roy et al., 2023).

In ViT-based vision systems, an auxiliary stream of queries $P'_q$ captures cross-modal interaction priors; the interaction strength maps are fused via gates,

$C_F = g_1 \odot C_{\mathrm{AGT}} + g_2 \odot C_{\mathrm{VFM}},$

with $C_{\mathrm{AGT}}$ aligned to VLM-derived ground-truth via

$\mathcal{L}_{\mathrm{align}} = D_{\mathrm{KL}}(C_{\mathrm{AGT}} \parallel C_{\mathrm{VLM}}).$

In graph-based RL meta-policies, entity and relation types are embedded via GNNs; interaction primitives are mapped in $\mathbb{R}^d$ , and reward shaping encourages novel type and action activations (Kumar et al., 2024).

Representative pseudocode for transformer-based infusion (cf. (Roy et al., 2023)):

E_KG = load_KG_embeddings()    # Pretrained KG node embeddings
node_vecs = E_KG[x]            # For tokens x
corr_mat = node_vecs @ node_vecs.T # KG affinity

def TransformerBlock(h, layer_idx):
    for i in range(num_heads):
        Q = h @ W_q[i]; K = h @ W_k[i]; V = h @ W_v[i]
        scores = (Q @ K.T) / sqrt(d_k)
        if inject_matrix[layer_idx]:
            scores += corr_mat
        A = softmax(scores)
        heads[i] = A @ V
    h_mid = FFN(concat(heads))
    if inject_vector[layer_idx]:
        h_mid += node_vecs
    return LayerNorm(h_mid + h)

4. Empirical Effects and Evaluation

Interaction-centric infusion has demonstrated superior empirical performance, especially on benchmarks where relational reasoning or generalized transfer is critical. Notable results include:

On language benchmarks (GLUE), deep knowledge infusion into XLNet raises MNLI accuracy from 72.3% (baseline) to 88.5%; on RTE, 83.6% → 90.4%, with 5–10 point gains in data efficiency and knowledge uptake metrics under deep infusion (Roy et al., 2023).
In scene graph generation, ACC's interaction-centric infusion lifts zero-shot, open-vocabulary relation recall by 20–25% relative, and direct ablation shows that removing bidirectional interaction prompts reduces recall by 1.4–1.7% absolute (Li et al., 8 Nov 2025).
In visual classification and cross-modal transfer, LFI supplies 1.6–3.3 point gains on TinyImageNet, 1.6–2.4 on COCO tasks, and outperforms non-interaction methods by 2.7× in human-aligned semantic consistency (Gao et al., 23 Sep 2025).
In RL, KIX's interaction-centric framework enables immediate transfer to novel compositional tasks without retraining, with provable transfer bounds in type-graph space (Kumar et al., 2024).
In event causality, convolutional semantic infusion yields 0.9–2.0 F1 improvement and achieves peak accuracy in less than a third of the training epochs needed by baselines (Wang et al., 2021).

5. Impact on Hallucination, Alignment, and Generalization

A key effect of interaction-centric infusion is the systematic reduction of linguistic hallucinations, spurious associations, and misaligned outputs in LMs and VLMs. The explicit grounding of neural attention in verifiable entity or relational context:

Reduces “confident invention” of unsupported facts by biasing attention to factual concept relations (Roy et al., 2023).
Improves alignment with user intent through persistent exposure to meaningful interactions, both in inductive biases (attention) and latent representations (hidden states).
Drives empirical advances in stability and robustness, as deep (multi-layer) infusion outperforms shallow/isolated strategies.
In visual models, interaction-centric distillation ensures transfer of dynamic relational priors rather than static output vectors, leading to improved cross-domain generalization (Gao et al., 23 Sep 2025, Li et al., 8 Nov 2025).

6. Design Guidelines and Best Practices

Guidelines for implementing interaction-centric knowledge infusion include:

Decompose architectures to clearly separate latent (vector) and relational (matrix) infusion sites (Roy et al., 2023, Wang et al., 2021).
Precompute and compress external knowledge into forms that match model internals—e.g., node embeddings for per-token fusion, affinity/correlation matrices for attention layers.
Use explicit control flags per layer or module to facilitate flexible ablation and rapid strategy iteration.
Validate using both standard task metrics and interaction-sensitive evaluations (e.g., human semantic similarity, graph-informed accuracy, performance under data scarcity).
For best effect, consider deep or hybrid strategies—repeatedly reinforcing knowledge at both inductive and latent loci—taking care to balance model capacity and noise robustness.

A plausible implication of these principles is that future systems benefiting from relational knowledge will increasingly adopt architectures designed around explicit, persistent streams or modules for interaction-aware reasoning, rather than post-hoc or ad-hoc fusion.

7. Limitations, Open Challenges, and Future Directions

While interaction-centric knowledge infusion systematically improves generalization, data efficiency, and fidelity to factual context, several challenges persist:

Dependence on the coverage and quality of external knowledge resources, such as KGs or large VLMs. Domain adaptation remains sensitive to knowledge shift and incomplete interaction patterns.
Computational cost and complexity when scaling to massively multi-domain or streaming settings, particularly in online recommender scenarios; cluster-based or collective knowledge extraction addresses but does not fully resolve these issues (Xi et al., 2024).
The design of interaction templates, query prompts, and clusterings critically affects pseudo-label quality in weakly supervised or open-vocabulary settings (Li et al., 8 Nov 2025).
In multi-modal and RL domains, ensuring stability and preventing interference among interaction modules, and providing scalable, task-agnostic meta-policies, remain active areas of research (Kumar et al., 2024).

The ongoing convergence of deep relational reasoning, explicit interaction modeling, and meta-cognitive architectures signals that future progress in knowledge infusion will rely increasingly on interaction-centric design as the substrate for general intelligence and robust transfer in learned systems.