Tucker Decomposed Query-Key Retrieval
- TDQKR is a retrieval mechanism that leverages Tucker decomposition to model multi-relational and temporal interactions in knowledge graphs, extending conventional dot-product scoring.
- It employs a shared core tensor for high-order parameterization, ensuring full expressiveness and efficient parameter sharing across static and temporal inference tasks.
- TDQKR demonstrates improved retrieval performance and scalability, outperforming models like RESCAL and bilinear schemes on benchmarks such as ICEWS2014.
Tucker Decomposed Query-Key Retrieval (TDQKR) refers to a family of retrieval and inference mechanisms wherein query-key compatibility is computed via a shared, learnable core tensor originating from Tucker decomposition. This approach generalizes classical dot-product attention or bilinear scoring schemes by allowing flexible, high-order parameterization of the query-key interactions. The foundation rests on tensor factorization models, notably TuckER for static knowledge graphs and its temporal extensions, in which a multilinear mapping via a core tensor enables expressive and efficient associative retrieval for link prediction and knowledge graph completion.
1. Tucker Decomposition in Query-Key Models
Tucker decomposition generalizes singular value decomposition (SVD) to higher-order tensors. In the context of knowledge graphs, facts are represented as triples (subject, relation, object) or quadruples (for temporal graphs) and embedded as a binary tensor :
- Static KG:
- Temporal KG:
Tucker decomposition factorizes into lower-dimensional factor matrices and a shared core tensor:
- For triples:
- For quadruples:
The core tensor (editor's term: "retrieval operator") mediates all interactions, enabling rich multi-relational and temporal expressivity.
2. Mathematical Formulation and Scoring Functions
In TDQKR, given query and key embeddings, the retrieval score is computed via multilinear contraction with the core tensor:
- Static triplet score:
- Temporal quadruple score:
where denotes contraction over all tensor modes.
These scoring functions allow efficient batch computation ("1-N scoring"), with the query formed by the subject/relation pair (and possibly time), and keys as candidate object (and timestamp) embeddings. The logistic sigmoid is typically applied to the score to yield a probability estimate.
3. Expressiveness, Embedding Dimensionality, and Parameter Efficiency
The TDQKR paradigm as instantiated in TuckER and its temporal variant supports:
- Full expressiveness: TuckER is mathematically guaranteed to model any ground truth assignment over the KG triple tensor, provided sufficient embedding size (, ). This is achieved by one-hot encoding and direct indexing into the core tensor.
- Parameter sharing: The core tensor enables efficient parameterization, with linear growth in the number of entities and relations, unlike quadratic scaling in models like RESCAL.
- Practical embedding bounds: Empirically, much lower embedding dimensions suffice due to knowledge graph structure, with strong generalization observed even for highly compact models.
4. Retrieval and Inference Mechanisms
TDQKR architectures leverage the learned core tensor for high-throughput retrieval:
- For a given query , the retrieval operator constructs a transformation that ranks all candidate entities via inner product.
- In the temporal setting , the order-4 core infers object candidates conditioned on both relation and timestamp.
- The core tensor can be interpreted as a set of basis transformations, mixed via relation (and timestamp) embeddings, allowing tailored relation-specific retrieval.
This mechanism generalizes dot-product attention and query-key retrieval to non-linear, multiway interactions.
5. Temporal Extension: Tucker Decomposition-Based Retrieval
Temporal KGs are modeled via order-4 Tucker decomposition (Shao et al., 2020), providing:
- Temporal embeddings: Fact quadruples are scored via tensor contraction over subject, relation, object, and timestamp.
- Handling non-temporal facts: The TuckERTNT variant pools over time, supporting facts valid across all timestamps.
- Regularization: Temporal smoothness schemes encourage similarity among adjacent time embeddings, and Frobenius/ norm constraints mitigate overfitting.
The scoring function in temporal TDQKR is:
Performance significantly exceeds prior temporal KG models across ICEWS2014, ICEWS05-15, and GDELT datasets (e.g., MRR 0.604 on ICEWS2014 with TuckERTNT).
6. Comparative Analysis and TDQKR's Role in Retrieval Architectures
TDQKR frameworks subsume prior linear and bilinear KG embedding models:
| Model | Scoring Function | Parameters Growth | Expressivity |
|---|---|---|---|
| DistMult | linear | cannot model asymmetry | |
| ComplEx | linear | fully expressive (high bound) | |
| RESCAL | quadratic (in rels) | expressive but overfits | |
| TuckER | linear | fully expressive, efficient |
TuckER, and more generally TDQKR, is unique in combining full expressivity, efficient parameter sharing, and applicability to both static and temporal inference tasks. Various KG models (DistMult, ComplEx, SimplE) can be realized as special cases of TDQKR models by imposing structure (e.g., diagonal/banded core tensor).
7. Significance, Limitations, and Application Domains
TDQKR architectures provide a theoretically grounded, modular, and scalable approach for knowledge graph completion, link prediction, and general query-key matching. The framework is adaptable for both static and temporal datasets, showing pronounced performance superiority on temporally dynamic KGs. Temporal smoothness regularization yields measurable MRR gains. A plausible implication is that higher-order TDQKR schemes could generalize further to multi-way knowledge fusion, conditional reasoning, or multi-hop inference.
While full expressivity requires embeddings with dimensionality equal to the cardinality of entities and relations, practical deployments achieve strong results with much lower dimensions, leveraging knowledge structure and shared parametrization in the core tensor. This suggests TDQKR is particularly well-suited for large, sparse, and multi-relational graphs or for retrieval in attention-based neural architectures.