Chronos: Temporal-Aware Structured Retrieval
- The paper introduces a dual-calendar framework that extracts structured SVO events with explicit time intervals and indexes raw dialogue turns for precise temporal filtering.
- It employs dynamic, query-specific prompting and iterative agentic retrieval to support multi-hop reasoning and memory organization over long conversational histories.
- Empirical results demonstrate significant accuracy gains, with up to +9.6% improvements on benchmarks and robust performance in temporal question answering.
Chronos is a temporal-aware structured retrieval framework designed to enable precise, multi-hop, and temporally grounded information access over long, evolving conversational or document histories. It introduces a dual-calendar architecture—one tracking canonicalized subject-verb-object (SVO) event tuples with explicit datetime intervals, the other preserving raw dialogue turns—coordinated through dynamic, query-specific prompting and iterative agentic retrieval. This design enables Chronos to support sophisticated temporal reasoning, memory organization, and multi-faceted query answering at scale, establishing state-of-the-art accuracy for long-term conversational agents and temporal question answering.
1. System Architecture and Data Structures
Chronos decomposes input streams—typically multi-turn dialogues or extensive document sequences—into two distinct memory stores:
Event Calendar
- Extracted via a dedicated LLM-driven event extractor operating on sliding windows of conversation, each SVO tuple is normalized to (subject, verb, object), resolved to ISO-8601 start and end datetimes using forward- and backward-relative time inference, and associated with up to four paraphrased aliases to support robust paraphrase retrieval. Each event is stored as a JSON record and indexed using a high-dimensional embedding model (e.g., text-embedding-3-large), maintaining auxiliary fields for efficient time range filtering.
Turn Calendar
- Independently, all raw turns (including user/assistant utterance, timestamp, session ID, and text) are embedded and stored, supporting both approximate-nearest-neighbor and grep-style searches.
Indexing Implementation
- Both calendars use an approximate nearest neighbors (ANN) index (e.g., FAISS/HNSW). The event index supports time-filtered queries in time. Chronos does not construct global knowledge graphs, instead indexing only events with well-defined temporal semantics.
Event Extraction Algorithm
- For each dialogue window, the event extractor LLM outputs SVO tuples, corresponding datetime ranges (converted from relative expressions using conversation turn timestamps), and a set of aliases. Only explicit SVO events are extracted; turns lacking clear events are ignored.
2. Temporal-Aware Retrieval and Agentic Reasoning
At query time, Chronos orchestrates a multi-phase retrieval and reasoning process:
- Each question is passed through a template generator (LLM) producing tailored retrieval guidance, specifying (i) target entities/attributes, (ii) temporal constraints, (iii) whether multi-hop aggregation is required. This prompt—instructive but not a direct rewrite—is merged with tool descriptions and chain-of-thought hints in the system prompt.
Initial Contextual Retrieval
- A three-stage process is performed over the turn calendar:
- Dense embedding search to select the top 100 turns by cosine similarity with the query.
- Cross-encoder reranking (e.g., Cohere Rerank v3) to obtain the best 15 candidates.
- Context expansion: for each, adjacent turns in the same session are pulled, forming ~45-turn blocks grouped by session and date.
ReAct Agentic Loop
- Chronos employs a ReAct-style agent prompting loop, where the LLM alternates between (a) reasoning steps and (b) tool calls. Tools available are
search_turns,grep_turns,search_events, andgrep_events. Event searches can apply arbitrary date filters; vector and grep retrievals can be interleaved. The loop continues until an explicitAnswer()action is emitted.
Temporal Filtering and Aggregation
- For temporal questions,
search_eventstools enforce interval constraints on event datetime fields before final embedding similarity ranking, ensuring only temporally aligned events are considered. Multi-hop reasoning is facilitated by agent-controlled iteration over tool calls, supporting proof-like aggregation chains (e.g., summing exercise events over time).
3. Temporal Fusion: Model-Level Approaches
Chronos’s dual-calendar/indexing approach is complemented by model-level advancements from the TempRetriever/TempDPR paradigm (Abdallah et al., 28 Feb 2025). In this method, both text semantics and timestamp information are fused through:
- Temporal Encoder: Projects timestamps (scalar or bucketized) into a learned embedding space, which is then fused with BERT-based semantic vectors.
- Fusion Mechanisms: Explored are Vector Summation (VS), Relative Embedding (RE), Elementwise Interaction (EWI), and Feature Stacking (FS). Semantically, and . Similarity is computed as inner-product, with all fusion weights learned jointly.
- Time-based Negative Sampling: Augments training with negatives matched or mismatched on year, enforcing fine-grained temporal discrimination.
- Downstream Integration: Explicit temporal fusion in the retriever yields +6.6–9.6% top-1 accuracy gains on temporal QA, with feature stacking and interaction fusions giving the best performance (Abdallah et al., 28 Feb 2025).
4. Structured Temporal Retrieval: Advanced Pipelines
Relation-Aware Narrative Retrieval
- ChronoRAG (Kim et al., 26 Aug 2025) extends temporal structuring by constructing dual-layer retrieval graphs. The first layer comprises high-precision, LLM-generated relation summaries of grouped text; the second retains original chunks. Neighborhood assembling ensures that retrieval clusters maintain local narrative/temporal continuity. Chronological indices enable temporal coherence scoring, and retrieval is optimized over cosine similarity and temporal ordering penalties.
Entity-Event Dual Graphs
- E²RAG (Zhang et al., 6 Jun 2025) formalizes temporality through a bipartite graph between entity mentions (each tied to explicit chunk/time index) and event snippets. This dual-graph structuring inherently preserves evolving context and allows fine-grained, temporal-causal expansion and filtering during retrieval, outperforming single-graph RAG and standard knowledge graph approaches in narrative QA.
Iterative Timeline Summarization via Self-Questioning
- In timeline construction for news summarization, the CHRONOS framework (Wu et al., 1 Jan 2025) leverages iterative LLM-driven self-questioning, event-graph updating, and strict temporal coherence checks. Every round, generated questions drive temporally filtered acquisition of events, whose summaries are assembled into a coherent, chronologically ordered timeline, outperforming vanilla search and rewrite-based approaches.
5. Empirical Results and Benchmarking
Chronos establishes a new state of the art on the LongMemEvalS benchmark (Sen et al., 17 Mar 2026), which stresses memory accuracy and temporal reasoning over extended multi-session conversational histories.
Performance Highlights
- Chronos Low (GPT-4o backbone): 92.60% overall accuracy, +7.67% over the best prior system. Excels especially in Knowledge Update (96.15%), Multi-Session Aggregation (91.73%), and Temporal Reasoning (90.23%) categories.
- Chronos High (Claude Opus 4.6 backbone): 95.60% accuracy, top scores in all facets including 100% in some single-session tasks.
- Ablations: Event calendar contributes ~58.9% of overall gain; removal collapses performance by more than a third. Removal of initial retrieval or dynamic prompting each costs 15–22%. Neither vector nor grep-only retrieval suffices alone.
- Other Domains: In open-domain news TLS, iterative self-questioning Chronos achieves Date-F₁=0.343 versus the rewrite-only baseline of 0.272 (+30%, (Wu et al., 1 Jan 2025)).
| Method | LongMemEvalS Overall | Temporal QA/ROUGE-L |
|---|---|---|
| Chronos Low | 92.60% | – |
| Chronos High | 95.60% | – |
| ChronoRAG | – | 0.308 (full), 0.268 (time set) |
| TempRetriever | +6.63–9.56% over DPR | – |
6. Design Insights, Limitations, and Future Directions
Core Insights
- Dual-indexing separates event-level and raw text context, facilitating both precise time-based filtering and full linguistic traceability.
- Dynamic, question-specific prompting delegates high-level retrieval planning to LLMs but forces explicit temporal constraints into the retrieval process, reducing over-reliance on backbone LLM temporality.
- Model-level fusion (as in TempRetriever/TempDPR) and agentic ReAct retrieval are complementary: the former hardwires temporal alignment in representations; the latter solves temporal reasoning by explicit tool-invocation over time-structured data.
- Structured event extraction and indexing grant post hoc explainability and efficient timeline construction absent in monolithic vector stores.
Limitations
- Chronos and analogous structures are bottlenecked by extraction errors (failures to surface relevant SVO/time events) and by incompleteness in event modeling (handling of implicit/ambiguous temporal expressions).
- Current time encoding is coarse (often bucketized by year); finer granularity (month, day, relative range encoding) and learned distance-aware scoring are not yet robustly deployed.
- System performance is sensitive to LLM variability and temporal coverage in initial extraction.
Future Extensions
- Incorporation of gating or learned weight mechanisms to balance semantic and temporal dimensions in fusion.
- Generalization to richer event schemas, multi-granular deadlines, and multi-hop/cross-topic temporal chains (e.g., historical document QA, legal evidence timelines, longitudinal clinical records).
- Potential integration with model averaging approaches such as Time-Specifier Model Merging (Han et al., 9 Jul 2025), augmenting the dual-calendar framework with ensemble-based temporal specialization, while maintaining non-temporal retrieval fidelity.
7. Comparative Impact and Theoretical Significance
Chronos, through selective SVO+time structuring, dual-calendar indexing, agentic retrieval, and dynamic temporality-aware prompting, provides a scalable, explainable, and empirically validated solution for long-range, temporally grounded conversational and document retrieval. Its architecture consolidates and extends methodological advances in temporal retrieval fusion (Abdallah et al., 28 Feb 2025), structured multi-layer passage assembling (Kim et al., 26 Aug 2025), entity-event dual graph modeling (Zhang et al., 6 Jun 2025), and iterative timeline summarization (Wu et al., 1 Jan 2025), positioning it as a reference framework for temporal-aware information retrieval, with state-of-the-art results in both conversational memory and time-sensitive QA (Sen et al., 17 Mar 2026).