Causal-Temporal Narrative (CTN)

Updated 10 April 2026

Causal-Temporal Narrative (CTN) is a formalism that combines explicit causal graphs with ordered timestamps to represent event sequences clearly and accurately.
It employs methodologies such as causal discovery, graphical neural networks, and data-driven storytelling to extract and generate coherent narratives.
CTNs enhance applications in video captioning, story planning, and multimodal analysis by preserving both the 'when' and 'why' of complex events.

A Causal-Temporal Narrative (CTN) is a formalism for representing, extracting, or generating accounts of processes or stories in which event sequences are structured by both explicit causal relationships and temporal progression. CTNs integrate temporality and causality at the representational, algorithmic, and realization levels, operating across domains such as scientific explanation, story planning, video captioning, and multimodal document analysis. The essential goal is to enable robust reasoning or generation that preserves both “when” (chronological order) and “why” (causal mechanism) across complex event structures, yielding outputs interpretable by both machines and humans (Choudhry et al., 2020, Gong et al., 2023, Nadeem et al., 2024, Park et al., 2024, Zhang et al., 6 Jun 2025).

1. Formal Models and Structural Representations

The mathematical core of CTN is a combination of a directed acyclic causal graph with timestamped (or temporally ordered) nodes, often augmented with narrative or semantic segmentations (Choudhry et al., 2020, Gong et al., 2023).

Formally, a canonical CTN is expressed as:

$\text{CTN} = (V, E, T, W, \mathbb{N})$

$V = \{v_1, ..., v_n\}$ : event or state variable nodes
$E \subseteq V \times V$ : edges denoting directed (often weighted) causal links
$T: V \to \mathbb{R}^+$ : timestamp (or order) function
$W: E \to \mathbb{R}$ : edge weight function (strength or effect size)
$\mathbb{N}: V \to \text{Labels}$ : narrative or segment grouping

The temporal dimension may be explicit (timestamps) or based on narrative order. Causality is encoded as an irreflexive, asymmetric, and typically transitive relation. For narrative generation contexts, auxiliary constraints for character intentionality or plot coherence can be imposed (see IPOCL framework below) (Riedl et al., 2014).

In video or multimodal CTN, nodes represent detected atomic events over sequential frames, with explicit cause–effect pairs mapped onto temporal intervals (Nadeem et al., 2024, Park et al., 2024).

2. Methodological Frameworks for CTN Extraction and Generation

The construction or extraction of CTNs combines causal discovery, temporal reasoning, entity/event modeling, and, in generative settings, natural language realization. Prevailing architectures and pipelines include:

Causal Discovery with Temporal Data

Multivariate Time Series (MTS): Causality computation via constraint-based (PC, FCI, PCMCI), score-based (NOTEARS, structural EM), or functional models (LiNGAM, ANM, TiMINo). Temporality is managed via lagged variables or sliding window graphs.
Event Sequences: Hawkes processes, graphical event models (GEMs), and neural point processes infer directed edges among irregular events, typically associating lag or kernel-based temporal differences (Gong et al., 2023).

Integrated Extraction and Reasoning

Hybrid Pipelines: CATENA runs distinct but communicating classifiers for temporal (TLINK) and causal (CLINK) extraction, enforcing causality-precedence (causes must precede effects) during post-editing and closure operations (Mirza, 2016).
Graph Neural Methods: TC-GAT employs separate attention modules over temporal and causal relation graphs, with equilibrium gating for dynamic integration (Yuan et al., 2023).

Narrative Generation

Data-Driven Storytelling: Sentence templates are automatically filled based on aggregating and prioritizing causal paths, using degree-of-interest functions and aggregation of shared causal mediators (Choudhry et al., 2020).
Computational Plot Planning (IPOCL): Narrative planners search for partial-order plans where causal links, temporal constraints, and agent intentions (frames of commitment) are explicitly tracked and resolved (Riedl et al., 2014).
Commonsense Expansion: C2PO leverages probabilistic soft causal inference from pretrained ATOMIC/COMET to construct event–event graphs with plausible motivational/enabling edges, maintaining anchor event sequence for temporal coherence (Ammanabrolu et al., 2020).

Multimodal Approaches

Video Captioning: Cause-Effect Network (CEN) and Causal-Temporal Reasoning Module (CTRM) architectures use dedicated encoders for causality and temporality, learning soft causal attention weights and temporal consistency within sequence-to-sequence models (Nadeem et al., 2024, Park et al., 2024).
Entity-Event Dual-Graphs: E²RAG explicitly encodes both event sequences and evolving entity states with explicit bipartite mappings, supporting chronologically and causally consistent retrieval-augmented generation (Zhang et al., 6 Jun 2025).

3. Realization, Rendering, and Interaction Design

CTNs can be realized as textual narratives, structured queries, or hybrid graphics, contingent on application.

Textual Rendering: Predefined templates capture effect, major mediator, no-effect, and trend/spike events. Causal markers (“led to,” “triggered,” “as a result”), temporal connectors (“at T₁,” “subsequently”), aggregation, and cue phrases enable sentence fluency and document-level cohesion (Choudhry et al., 2020).
Document Structure: Linearization is typically performed by topologically sorting events according to timestamps and effect magnitudes, merging paths sharing mediators or outcomes to reduce redundancy.
Interaction Blocks: Interactive visualizations allow users to brush, hyperlink, search, or roll-up/drill-down specific narrative components, tightly coupling text and graphical causal representations (Choudhry et al., 2020).

For video/multimodal CTN, realization includes paired “Cause: … Effect: …” labels, or autoregressively generated captions with temporally and causally-linked sub-event references (Nadeem et al., 2024, Park et al., 2024).

4. Evaluation Protocols and Empirical Insights

Evaluation of CTN systems spans task-specific metrics and human-centric protocols.

Domain	Quantitative Metrics	Qualitative/Subjective
Causal/Temporal Extraction	F₁, TPR, AUROC, SHD (Mirza, 2016)	Expert review, crowd task accuracy (Choudhry et al., 2020)
Causal-Temporal Video Caption	CIDEr, BLEU-4, ROUGE-L (Nadeem et al., 2024, Park et al., 2024)	Human fluency, coherence, relevance (Likert)
Structured QA	LLM Likert ratings, passage support (Zhang et al., 6 Jun 2025)	Category-level (causal/temporal) gains

Notable findings include:

The addition of textual CTNs to visualizations or system outputs increases causal reasoning accuracy and subjective comprehension, despite minor reading overheads (Choudhry et al., 2020).
Video captioning models with explicit CTN modules yield substantial gains in CIDEr and human-rated causal/temporal consistency (Nadeem et al., 2024, Park et al., 2024).
Retrieval-augmented generation that preserves entity temporal progression and event causality improves answer consistency on narrative question benchmarks (Zhang et al., 6 Jun 2025).

5. Variants and Extensions Across Domains

The CTN paradigm has produced specialized formalisms in adjacent subfields:

Intentionality and Plot Planning: IPOCL plans merge causal links, temporal partial orders, and “frames of commitment” for character intentionality, supporting both plot coherence and character believability under a precise completeness criterion (Riedl et al., 2014).
Soft Causality and Commonsense Story Generation: Soft probabilities (“wants,” “needs”) derived via ATOMIC/COMET support the construction of story graphs where links encode both motivational and enablement causality without hard symbolic preconditions (Ammanabrolu et al., 2020).
Entity-Event Temporal QA: Explicit two-graph architectures map entity mentions to associated events indexed by chunk order, with bipartite mappings ensuring correct retrieval for queries about evolving character state, causal chains, and temporally evolving facts (Zhang et al., 6 Jun 2025).

6. Open Challenges and Future Directions

Key ongoing and prospective research challenges in CTN include:

Nonstationarity and Heterogeneity: Real-world processes shift regimes or distribution (CD-NOD, state-space models, group lasso/mixed effects) (Gong et al., 2023).
Unobserved Confounders: Latent variable FCMs, FCI extensions, and causal representation learning seek to recover hidden structure for robust graph induction (Gong et al., 2023).
Long-Chain/Multicausal Narratives: Current CTN video captioning predominantly encodes single-step cause–effect pairs; multi-step, multi-agent, and branch narrative chains are current frontiers (Nadeem et al., 2024).
Causal-Temporal Consistency in Generation: Ensuring that outputs of retrieval or generative QA models preserve narrative order and causal structure throughout, especially for dynamic or evolving characters and long-form content (Zhang et al., 6 Jun 2025).
Enhanced Multimodal Fusion: Integrating text, vision, and sensor data to produce unified causal-temporal structures spanning domains (Gong et al., 2023).
Evaluation Benchmarks: Datasets such as ChronoQA for narrative QA, MSVD/MSR-VTT-CTN for video CTN, and human protocols targeting plausible order, plot coherence, and causal quality remain essential for field advancement (Zhang et al., 6 Jun 2025, Nadeem et al., 2024).

Emerging directions include amortized or meta-learned CTN extraction, integration with fine-tuned commonsense models, and fusion of event-level and entity-level knowledge graphs for comprehensive narrative reasoning and generation.