MemoriesDB: Temporal-Semantic Database
- MemoriesDB is a temporal–semantic–relational database that encodes experiences as quadruples, integrating microsecond timestamps, normalized embeddings, and JSON metadata.
- It unifies time-series analysis, high-dimensional semantics, and directed graph modeling to support hybrid queries with robust temporal and relational coherence.
- The architecture leverages PostgreSQL with the pgvector extension and scalable extensions, enabling efficient retrieval and graph-based context expansion over millions of records.
MemoriesDB is a temporal–semantic–relational database architecture designed for long-term computational memory management and retrieval. Unifying time-series analysis, high-dimensional semantic embeddings, and directed graph modeling, MemoriesDB encodes each experience or memory as a structured entity with explicit temporal, semantic, and relational features. This integrated approach supports efficient hybrid queries, robust contextualization, and robust cross-temporal coherence. The initial implementation leverages PostgreSQL with the pgvector extension, allowing scalable operations over millions of records while supporting further extension to columnar or distributed backends.
1. Formal Structure and Core Definitions
A single memory in MemoriesDB is represented as a quadruple: where:
- : Microsecond-resolution timestamp, unique for each entry.
- : Categorical kind (e.g., "message", "observation", "summary").
- : Set of normalized embeddings (typically both low-dimensional, e.g., 128-d, and high-dimensional, e.g., 768-d).
- : Arbitrary JSONB metadata (agent_id, topic tags, importance metrics, etc.).
A fused vector representation for retrieval is defined by a function , which may be a weighted linear combination or Reciprocal-Rank Fusion (RRF). All sub-embeddings are normalized to unit -norm to ensure that cosine similarity holds.
Directed edges, , connect memories:
- : Edge label (e.g., "reply", "summary-of", "related-to").
- 0 with 1, 2.
- 3: Edge-level JSON metadata.
Each memory defines a local temporal–semantic "plane" parameterized by offset pairs 4, with 5 and semantic difference 6. The full database forms an ordered stack over such planes, and directed edges act as arrows in this 7-dimensional similarity field.
Key coherence metrics:
- Pairwise: 8, with 9.
- Local: 0 for active edges 1 in a temporal window.
2. Data Schema, Storage, and Normalization
The append-only schema is instantiated in PostgreSQL as shown below. The use of the pgvector extension enables efficient vector storage and computation.
9
Indexes:
- B-tree on (kind, id_time) for efficient range scans.
- IVFFLAT index on high-dimensional and fused vectors for approximate nearest neighbor (ANN) retrieval with cosine distance.
- Multigraph lookups and GIN indexing on JSON metadata in
edges.
All vector insertions are normalized to unit 2-norm before storage. Multiple edges between the same pair of vertices (with potentially different relations or strength/confidence values) are permitted.
3. Query Mechanisms and Algorithms
3.1 Time-Bounded and Hybrid Semantic Retrieval
Time-bounded retrieval uses a B-tree index: 0 where 3 is typically user-provided.
Hybrid, time-and-semantic similarity search employs the vector index: 1 where 4 is cosine distance under pgvector, and 5 controls the temporal decay factor.
3.2 Structural and Graph Expansion
After identifying top-K nearest memories, edges are expanded via join: 2 This join supports rapid graph expansion for contextual or reasoning queries. The planner parallelizes index scans and vector computations.
3.3 Composite Scoring
The final retrieval score for each candidate combines semantic, temporal, and relational signals: 6 where 7 encodes edge density or other relation-specific criteria.
4. Graph Construction, Mutation, and Consistency
All memory insertions and edge creations follow an append-only discipline. Each memory is written with a unique timestamp and associated normalized vectors. New edges between memories are created as needed:
3
No deletions occur online; low-confidence or obsolete edges may be pruned offline. Foreign keys enforce referential integrity.
5. Performance Characteristics and Scalability
Empirical results on a large-scale prototype (PostgreSQL 16 + pgvector, 32 core, 128 GB RAM, NVMe storage):
| Operation | Dataset | Latency (ms) | Throughput (rec/s) |
|---|---|---|---|
| Single insert | 1 M | 2.5 | — |
| Batch insert (100) | 1 M | — | 8,000 |
| Vector-only query (K=100) | 10 M | ~390 | — |
| Hybrid time+vector (K=100) | 10 M | ~410 | — |
| Hybrid+graph expand (K=50, M=50) | 10 M | ~620 | — |
Throughput scales linearly with CPU until the I/O bottleneck is reached. ivfflat vector search is the computational bottleneck; adding time and graph filters increases cost only marginally (≤10%).
Local coherence remains stable over 100 million inserts, indicating predictable drift.
6. Extensions: Columnar, Distributed, and Topic Modeling
Columnar Backend. Memories can be partitioned by (agent_id, date) with Parquet or equivalent, and a sidecar HNSW index over high-dimensional embeddings. Graph edges may be stored in a companion key-value or NoSQL store. Spark or Polars jobs can apply vector search and edge expansion over partitions, supporting large-scale, distributed workloads.
Distributed Clustering and "Eureka" Edges. Periodically, k-means is applied over fused vectors in sliding windows to group temporally disjoint but semantically related memories. New edges with label="eureka" and confidence proportional to pairwise coherence are established, serving as reinforced links for future queries. Topic drift is flagged when the assigned cluster centroid is distant in fused-vector space.
Emergent Topic Modeling. The spectral gap of the local graph Laplacian,
8
signals bifurcation and the emergence of new topics. Super-vertices—clusters of linked memories—may then be formed and connected via specialized "topic-of" edges.
MemoriesDB, as formalized and prototyped in this architecture, delivers a unified, mathematically principled, and operationally scalable substrate for temporally and semantically coherent long-term agent memory. The integration of vectorized semantics, explicit temporal ordering, and flexible relational modeling under a single append-only schema distinguishes MemoriesDB within both AI memory systems and general data infrastructure for cognitive agents (Ward, 9 Nov 2025).