Chronos: Temporal-Aware Structured Retrieval

Updated 23 March 2026

The paper introduces a dual-calendar framework that extracts structured SVO events with explicit time intervals and indexes raw dialogue turns for precise temporal filtering.
It employs dynamic, query-specific prompting and iterative agentic retrieval to support multi-hop reasoning and memory organization over long conversational histories.
Empirical results demonstrate significant accuracy gains, with up to +9.6% improvements on benchmarks and robust performance in temporal question answering.

Chronos is a temporal-aware structured retrieval framework designed to enable precise, multi-hop, and temporally grounded information access over long, evolving conversational or document histories. It introduces a dual-calendar architecture—one tracking canonicalized subject-verb-object (SVO) event tuples with explicit datetime intervals, the other preserving raw dialogue turns—coordinated through dynamic, query-specific prompting and iterative agentic retrieval. This design enables Chronos to support sophisticated temporal reasoning, memory organization, and multi-faceted query answering at scale, establishing state-of-the-art accuracy for long-term conversational agents and temporal question answering.

1. System Architecture and Data Structures

Chronos decomposes input streams—typically multi-turn dialogues or extensive document sequences—into two distinct memory stores:

Event Calendar

Extracted via a dedicated LLM-driven event extractor operating on sliding windows of conversation, each SVO tuple is normalized to (subject, verb, object), resolved to ISO-8601 start and end datetimes using forward- and backward-relative time inference, and associated with up to four paraphrased aliases to support robust paraphrase retrieval. Each event is stored as a JSON record and indexed using a high-dimensional embedding model (e.g., text-embedding-3-large), maintaining auxiliary fields for efficient time range filtering.

Turn Calendar

Independently, all raw turns (including user/assistant utterance, timestamp, session ID, and text) are embedded and stored, supporting both approximate-nearest-neighbor and grep-style searches.

Indexing Implementation

Both calendars use an approximate nearest neighbors (ANN) index (e.g., FAISS/HNSW). The event index supports time-filtered queries in $O(\log N)$ time. Chronos does not construct global knowledge graphs, instead indexing only events with well-defined temporal semantics.

Event Extraction Algorithm

For each dialogue window, the event extractor LLM outputs SVO tuples, corresponding datetime ranges (converted from relative expressions using conversation turn timestamps), and a set of aliases. Only explicit SVO events are extracted; turns lacking clear events are ignored.

2. Temporal-Aware Retrieval and Agentic Reasoning

At query time, Chronos orchestrates a multi-phase retrieval and reasoning process:

Dynamic Prompting

Each question is passed through a template generator (LLM) producing tailored retrieval guidance, specifying (i) target entities/attributes, (ii) temporal constraints, (iii) whether multi-hop aggregation is required. This prompt—instructive but not a direct rewrite—is merged with tool descriptions and chain-of-thought hints in the system prompt.

Initial Contextual Retrieval

A three-stage process is performed over the turn calendar:
1. Dense embedding search to select the top 100 turns by cosine similarity with the query.
2. Cross-encoder reranking (e.g., Cohere Rerank v3) to obtain the best 15 candidates.
3. Context expansion: for each, adjacent turns in the same session are pulled, forming ~45-turn blocks grouped by session and date.

ReAct Agentic Loop

Chronos employs a ReAct-style agent prompting loop, where the LLM alternates between (a) reasoning steps and (b) tool calls. Tools available are search_turns, grep_turns, search_events, and grep_events. Event searches can apply arbitrary date filters; vector and grep retrievals can be interleaved. The loop continues until an explicit Answer() action is emitted.

Temporal Filtering and Aggregation

For temporal questions, search_events tools enforce interval constraints on event datetime fields before final embedding similarity ranking, ensuring only temporally aligned events are considered. Multi-hop reasoning is facilitated by agent-controlled iteration over tool calls, supporting proof-like aggregation chains (e.g., summing exercise events over time).

3. Temporal Fusion: Model-Level Approaches

Chronos’s dual-calendar/indexing approach is complemented by model-level advancements from the TempRetriever/TempDPR paradigm (Abdallah et al., 28 Feb 2025). In this method, both text semantics and timestamp information are fused through:

Temporal Encoder: Projects timestamps (scalar or bucketized) into a learned embedding space, which is then fused with BERT-based semantic vectors.
Fusion Mechanisms: Explored are Vector Summation (VS), Relative Embedding (RE), Elementwise Interaction (EWI), and Feature Stacking (FS). Semantically, $h_q = \mathrm{Fuse}(v_q, t_q)$ and $h_i = \mathrm{Fuse}(v_i, t_i)$ . Similarity is computed as inner-product, with all fusion weights learned jointly.
Time-based Negative Sampling: Augments training with negatives matched or mismatched on year, enforcing fine-grained temporal discrimination.
Downstream Integration: Explicit temporal fusion in the retriever yields +6.6–9.6% top-1 accuracy gains on temporal QA, with feature stacking and interaction fusions giving the best performance (Abdallah et al., 28 Feb 2025).

4. Structured Temporal Retrieval: Advanced Pipelines

Relation-Aware Narrative Retrieval

ChronoRAG (Kim et al., 26 Aug 2025) extends temporal structuring by constructing dual-layer retrieval graphs. The first layer comprises high-precision, LLM-generated relation summaries of grouped text; the second retains original chunks. Neighborhood assembling ensures that retrieval clusters maintain local narrative/temporal continuity. Chronological indices enable temporal coherence scoring, and retrieval is optimized over cosine similarity and temporal ordering penalties.

Entity-Event Dual Graphs

E²RAG (Zhang et al., 6 Jun 2025) formalizes temporality through a bipartite graph between entity mentions (each tied to explicit chunk/time index) and event snippets. This dual-graph structuring inherently preserves evolving context and allows fine-grained, temporal-causal expansion and filtering during retrieval, outperforming single-graph RAG and standard knowledge graph approaches in narrative QA.

Iterative Timeline Summarization via Self-Questioning

In timeline construction for news summarization, the CHRONOS framework (Wu et al., 1 Jan 2025) leverages iterative LLM-driven self-questioning, event-graph updating, and strict temporal coherence checks. Every round, generated questions drive temporally filtered acquisition of events, whose summaries are assembled into a coherent, chronologically ordered timeline, outperforming vanilla search and rewrite-based approaches.

5. Empirical Results and Benchmarking

Chronos establishes a new state of the art on the LongMemEvalS benchmark (Sen et al., 17 Mar 2026), which stresses memory accuracy and temporal reasoning over extended multi-session conversational histories.

Performance Highlights

Chronos Low (GPT-4o backbone): 92.60% overall accuracy, +7.67% over the best prior system. Excels especially in Knowledge Update (96.15%), Multi-Session Aggregation (91.73%), and Temporal Reasoning (90.23%) categories.
Chronos High (Claude Opus 4.6 backbone): 95.60% accuracy, top scores in all facets including 100% in some single-session tasks.
Ablations: Event calendar contributes ~58.9% of overall gain; removal collapses performance by more than a third. Removal of initial retrieval or dynamic prompting each costs 15–22%. Neither vector nor grep-only retrieval suffices alone.
Other Domains: In open-domain news TLS, iterative self-questioning Chronos achieves Date-F₁=0.343 versus the rewrite-only baseline of 0.272 (+30%, (Wu et al., 1 Jan 2025)).

Method	LongMemEvalS Overall	Temporal QA/ROUGE-L
Chronos Low	92.60%	–
Chronos High	95.60%	–
ChronoRAG	–	0.308 (full), 0.268 (time set)
TempRetriever	+6.63–9.56% over DPR	–

6. Design Insights, Limitations, and Future Directions

Core Insights

Dual-indexing separates event-level and raw text context, facilitating both precise time-based filtering and full linguistic traceability.
Dynamic, question-specific prompting delegates high-level retrieval planning to LLMs but forces explicit temporal constraints into the retrieval process, reducing over-reliance on backbone LLM temporality.
Model-level fusion (as in TempRetriever/TempDPR) and agentic ReAct retrieval are complementary: the former hardwires temporal alignment in representations; the latter solves temporal reasoning by explicit tool-invocation over time-structured data.
Structured event extraction and indexing grant post hoc explainability and efficient timeline construction absent in monolithic vector stores.

Limitations

Chronos and analogous structures are bottlenecked by extraction errors (failures to surface relevant SVO/time events) and by incompleteness in event modeling (handling of implicit/ambiguous temporal expressions).
Current time encoding is coarse (often bucketized by year); finer granularity (month, day, relative range encoding) and learned distance-aware scoring are not yet robustly deployed.
System performance is sensitive to LLM variability and temporal coverage in initial extraction.

Future Extensions

Incorporation of gating or learned weight mechanisms to balance semantic and temporal dimensions in fusion.
Generalization to richer event schemas, multi-granular deadlines, and multi-hop/cross-topic temporal chains (e.g., historical document QA, legal evidence timelines, longitudinal clinical records).
Potential integration with model averaging approaches such as Time-Specifier Model Merging (Han et al., 9 Jul 2025), augmenting the dual-calendar framework with ensemble-based temporal specialization, while maintaining non-temporal retrieval fidelity.

7. Comparative Impact and Theoretical Significance

Chronos, through selective SVO+time structuring, dual-calendar indexing, agentic retrieval, and dynamic temporality-aware prompting, provides a scalable, explainable, and empirically validated solution for long-range, temporally grounded conversational and document retrieval. Its architecture consolidates and extends methodological advances in temporal retrieval fusion (Abdallah et al., 28 Feb 2025), structured multi-layer passage assembling (Kim et al., 26 Aug 2025), entity-event dual graph modeling (Zhang et al., 6 Jun 2025), and iterative timeline summarization (Wu et al., 1 Jan 2025), positioning it as a reference framework for temporal-aware information retrieval, with state-of-the-art results in both conversational memory and time-sensitive QA (Sen et al., 17 Mar 2026).