TimeMKG: Causal Reasoning for Time Series

Updated 11 May 2026

TimeMKG is a multimodal framework that explicitly integrates a multivariate knowledge graph with numerical time series data to enable causal reasoning.
The architecture employs dual-branch encoding and cross-modality attention, combining semantic prompts with statistical patterns for robust forecasting and classification.
Quantitative evaluations demonstrate that TimeMKG achieves significant accuracy improvements while offering transparent, interpretable insights into causal relationships.

TimeMKG refers both narrowly to the "TimeMKG" multimodal causal reasoning framework for multivariate time series modeling and, more broadly, to the intersection of temporal knowledge graphs (TKGs), time series, and temporally structured reasoning mechanisms. TimeMKG, as formalized in "TimeMKG: Knowledge-Infused Causal Reasoning for Multivariate Time Series Modeling," advances time series analysis by explicitly representing domain knowledge as a multivariate knowledge graph (MKG) and fusing these semantic signals with the statistical patterns in numerical time series data. This approach extends foundational ideas in temporal KG completion and temporal KG-based question answering, such as those embodied in the Temporal Message Passing (TeMP) framework and Time-aware Multiway Adaptive Fusion networks, by establishing knowledge-informed inference pipelines for time series scenarios (Sun et al., 13 Aug 2025, Wu et al., 2020, Liu et al., 2023).

1. Foundations: Temporal Knowledge and Multivariate Representations

Traditional time series models treat each variable as an anonymous channel of numerical observations, implicitly ignoring rich semantic information—variable names, textual descriptions, and inter-variable domain relationships. TimeMKG reframes the problem: each multivariate time series $\{X_{1:T} \in \mathbb{R}^{T \times N}\}$ is understood as a bimodal object:

Numerical modality: The matrix of sampled values.
Textual modality: Descriptions or headers $\mathcal{V} = \{v_1,\dots,v_N\}$ .

By constructing an explicit MKG $\mathcal{G}_M$ whose nodes correspond to variables and whose edges encode directed, domain-informed relations (e.g., "causes," "influences"), TimeMKG injects external, interpretable causal structure directly into the modeling process. The graph is constructed using a retrieval-augmented LLM (LightRAG) that processes variable metadata and external textual knowledge into $(v_i, r, v_j)$ triplets, forming the explicit relational backbone for subsequent reasoning (Sun et al., 13 Aug 2025).

2. TimeMKG Architecture: Dual-Modality Encoder and Cross-Modality Fusion

The TimeMKG architecture consists of four stages:

MKG Construction: Variable descriptions $\{\hat{\mathcal{S}}\}$ , external knowledge $\mathcal{T}$ , and the resulting textual input $\mathcal{I}$ are processed via LightRAG to generate the MKG structure $\mathcal{G}_M = (V, E, R)$ .
Causal Prompt Formation: For every variable $v_k$ , a query-based subgraph $\mathcal{G}_{M,v_k}$ is retrieved (capturing both global and local context). The result is a set of "causal prompts" $\mathcal{V} = \{v_1,\dots,v_N\}$ 0—textual templates encoding the causal context for each variable.
Dual-Branch Encoding:
- Semantic branch: Each prompt is tokenized, embedded, projected, and processed by CPEncoder—a Pre-LN Transformer acting over the set of $\mathcal{V} = \{v_1,\dots,v_N\}$ 1 variable prompts, capturing static, domain-derived causal structure.
- Numerical branch: Historical data is transposed, embedded (one row per variable), and modeled by TSEncoder—a Transformer that encodes variable-level statistical patterns.
Cross-Modality Attention: The semantic embeddings $\mathcal{V} = \{v_1,\dots,v_N\}$ 2 and numerical embeddings $\mathcal{V} = \{v_1,\dots,v_N\}$ 3 are aligned via a standard cross-attention mechanism. For each variable, the resulting fused vector $\mathcal{V} = \{v_1,\dots,v_N\}$ 4 integrates both causal prior and observed statistical behavior at the variable level.

The cross-modality decoder (again using a Pre-LN Transformer) further models intra- and inter-variable dependencies before task-specific heads compute forecasts or classifications (Sun et al., 13 Aug 2025).

3. Downstream Inference and Learning Objectives

TimeMKG applies its fused representations to core time series tasks:

Forecasting: A linear projection head outputs $\mathcal{V} = \{v_1,\dots,v_N\}$ 5; loss is mean squared error (MSE).
Classification: Classification head outputs $\mathcal{V} = \{v_1,\dots,v_N\}$ 6 with standard cross-entropy loss for multi-class prediction.

No explicit regularization is required beyond standard penalties, as the cross-modality attention "injects" domain priors implicitly. By design, this yields interpretable and robust predictions reflecting both data-driven correlations and structured semantic knowledge (Sun et al., 13 Aug 2025).

4. Interpretability and Causal Inspection

A central benefit of TimeMKG is direct interpretability:

The MKG $\mathcal{V} = \{v_1,\dots,v_N\}$ 7 itself is human-auditable, storing all retrieved triplets and edges.
For any variable (forecast or classification), its embedding can be traced to the particular subgraph $\mathcal{V} = \{v_1,\dots,v_N\}$ 8—exposing which upstream variables and relations are most influential.
Self-attention weights of CPEncoder correlate with static KG structure, while cross-attention scores reveal which semantic prompts (and thus which domain facts) are emphasized for each numerical variable during prediction.
In practice, the top-K attended prompts provide explanations for each prediction, facilitating root-cause analysis and model debugging (Sun et al., 13 Aug 2025).

5. Quantitative Evaluation and Ablations

TimeMKG demonstrates strong empirical performance across a suite of long/short-term forecasting (ETTh1/2, ETTm1/2, Weather, ILI, ICL, IoTFlow, Nasdaq, Internet, Battery) and multivariate classification tasks (SCP1/2, Ethanol, Heart, PEMS-SF):

Long-term forecasting: Achieves best results in 38/48 subtasks, with 7–15% MSE reduction versus the best non-knowledge baseline.
Short-term forecasting: Outperforms alternatives by all SMAPE/MASE/OWA metrics.
Classification: 71.0% average accuracy (vs. 69.7% for XGBoost, 68.2% TimesNet, 65.2% PatchTST).

Ablation studies confirm:

Removing MKG construction (using only direct LLM prompts) sharply degrades performance.
Eliminating either the causal prompt encoder or the time-series encoder results in the largest accuracy drops.
Simpler cross-modality fusion (e.g., simple concatenation) or omitting the decoder module leads to smaller, but significant, performance loss.
Pre-stored prompts render TimeMKG more efficient than LLM-heavy or mixture-of-experts baselines (Sun et al., 13 Aug 2025).

6. Relation to Temporal Knowledge Graph QA and Completion

TimeMKG conceptually sits at the intersection of temporal KG-based reasoning and time series modeling.

Temporal KGQA (e.g., TMA): TMA extracts top-m SPO triples directly from a KG, uses multiway (concat/dot/minus) attention to mix question and KG representations, and leverages adaptive gating for interpretability. This approach yields large accuracy gains (e.g., +24 percentage points Hits@1 on CronQuestions complex queries), but focuses on natural language QA over TKGs (Liu et al., 2023).
Temporal KG Completion (e.g., TeMP): TeMP employs message passing over the temporal neighborhood of an entity, frequency-based gating, and time-encodings to learn dynamic, time-indexed entity representations. TeMP significantly improves entity prediction on benchmarks like ICEWS14/ICEWS05-15/GDELT (+10.7% absolute Hits@10) (Wu et al., 2020).

TimeMKG extends these traditions by directly integrating knowledge graph structure with multivariate numerical signals in cross-modal settings, generalizing from completion/QA toward explicit, knowledge-grounded, causal time series analysis.

7. Limitations and Prospective Extensions

Known constraints of TimeMKG include:

Dependency on Knowledge Coverage: The causal reasoning quality is a function of MKG completeness and correctness. Insufficient or biased knowledge inputs limit the efficacy of injected priors.
Variable-Level Fusion: The entire fusion pipeline is variable-centric; scenarios with latent or evolving graph structure may require dynamic (re-)construction.
Generalization to Dynamic or Continuous Relations: The framework currently models static variable relationships; extension to dynamic temporal edges or time-evolving graph topologies is a plausible direction, as discussed in temporal message passing work (Wu et al., 2020).
Computational Efficiency: Although pre-stored prompts improve efficiency, initial knowledge graph construction (using LLMs and retrieval) may be computationally expensive for extremely high-dimensional datasets.

A natural area for further development lies in learning dynamic causal graphs, integrating uncertainty, or scaling to streaming, continuously updating multivariate data. Attention-based neighbor selection, interval-labeled causal facts, and online adaptation of gating parameters also represent open research challenges (Sun et al., 13 Aug 2025, Wu et al., 2020).