Papers
Topics
Authors
Recent
2000 character limit reached

MemoriesDB: Temporal-Semantic Database

Updated 16 November 2025
  • MemoriesDB is a temporal–semantic–relational database that encodes experiences as quadruples, integrating microsecond timestamps, normalized embeddings, and JSON metadata.
  • It unifies time-series analysis, high-dimensional semantics, and directed graph modeling to support hybrid queries with robust temporal and relational coherence.
  • The architecture leverages PostgreSQL with the pgvector extension and scalable extensions, enabling efficient retrieval and graph-based context expansion over millions of records.

MemoriesDB is a temporal–semantic–relational database architecture designed for long-term computational memory management and retrieval. Unifying time-series analysis, high-dimensional semantic embeddings, and directed graph modeling, MemoriesDB encodes each experience or memory as a structured entity with explicit temporal, semantic, and relational features. This integrated approach supports efficient hybrid queries, robust contextualization, and robust cross-temporal coherence. The initial implementation leverages PostgreSQL with the pgvector extension, allowing scalable operations over millions of records while supporting further extension to columnar or distributed backends.

1. Formal Structure and Core Definitions

A single memory in MemoriesDB is represented as a quadruple: Mi=(ti,κi,Vi,mi)M_i = (t_i, \kappa_i, V_i, m_i) where:

  • tit_i: Microsecond-resolution timestamp, unique for each entry.
  • κi\kappa_i: Categorical kind (e.g., "message", "observation", "summary").
  • Vi={vi(1),,vi(k)}V_i = \{ v_i^{(1)}, \ldots, v_i^{(k)} \}: Set of normalized embeddings (typically both low-dimensional, e.g., 128-d, and high-dimensional, e.g., 768-d).
  • mim_i: Arbitrary JSONB metadata (agent_id, topic tags, importance metrics, etc.).

A fused vector representation for retrieval is defined by a function vfuse,i=ffuse(Vi)v_{\mathrm{fuse},i} = f_{\mathrm{fuse}}(V_i), which may be a weighted linear combination or Reciprocal-Rank Fusion (RRF). All sub-embeddings are normalized to unit 2\ell_2-norm to ensure that cosine similarity vq=cos(v,q)v \cdot q = \cos(v, q) holds.

Directed edges, Eij=(ij,ρij,Wij,mij)E_{ij} = (i \rightarrow j, \rho_{ij}, W_{ij}, m_{ij}), connect memories:

  • ρij\rho_{ij}: Edge label (e.g., "reply", "summary-of", "related-to").
  • Wij=(strength,confidence)W_{ij} = (\text{strength}, \text{confidence}) with strengthR\text{strength}\in \mathbb{R}, confidence[0,1]\text{confidence} \in [0,1].
  • mijm_{ij}: Edge-level JSON metadata.

Each memory defines a local temporal–semantic "plane" parameterized by offset pairs (Δtij,sij)(\Delta t_{ij}, s_{ij}), with Δtij=tjti\Delta t_{ij} = t_j - t_i and semantic difference sij=1cos(vi(H),vj(H))s_{ij}=1-\cos(v_i^{(H)}, v_j^{(H)}). The full database forms an ordered stack over such planes, and directed edges act as arrows in this $1+1$-dimensional similarity field.

Key coherence metrics:

  • Pairwise: Cpair(Mi,Mj)=exp(d(Mi,Mj))C_{\mathrm{pair}}(M_i, M_j) = \exp(-d(M_i, M_j)), with d(Mi,Mj)=vfuse,ivfuse,j2d(M_i, M_j) = \| v_{\mathrm{fuse},i} - v_{\mathrm{fuse},j} \|_2.
  • Local: Clocal,t=1Et(i,j)Etexp(d(Mi,Mj))C_{\mathrm{local},t} = \frac{1}{|E_t|} \sum_{(i, j) \in E_t} \exp(-d(M_i, M_j)) for active edges EtE_t in a temporal window.

2. Data Schema, Storage, and Normalization

The append-only schema is instantiated in PostgreSQL as shown below. The use of the pgvector extension enables efficient vector storage and computation.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
CREATE EXTENSION IF NOT EXISTS pgvector;

CREATE TABLE memories (
  id_time   BIGINT     PRIMARY KEY, -- microsecond timestamp
  kind      TEXT       NOT NULL,
  content   TEXT,
  v_low     VECTOR(128),
  v_high    VECTOR(768),
  v_fuse    VECTOR(768),
  meta      JSONB      DEFAULT '{}'
);

CREATE TABLE edges (
  edge_id      BIGSERIAL   PRIMARY KEY,
  source       BIGINT      NOT NULL REFERENCES memories(id_time),
  destination  BIGINT      NOT NULL REFERENCES memories(id_time),
  relationship TEXT        NOT NULL,
  strength     REAL        DEFAULT 1.0,
  confidence   REAL        DEFAULT 1.0,
  meta         JSONB       DEFAULT '{}'
);

Indexes:

  • B-tree on (kind, id_time) for efficient range scans.
  • IVFFLAT index on high-dimensional and fused vectors for approximate nearest neighbor (ANN) retrieval with cosine distance.
  • Multigraph lookups and GIN indexing on JSON metadata in edges.

All vector insertions are normalized to unit 2\ell_2-norm before storage. Multiple edges between the same pair of vertices (with potentially different relations or strength/confidence values) are permitted.

3. Query Mechanisms and Algorithms

3.1 Time-Bounded and Hybrid Semantic Retrieval

Time-bounded retrieval uses a B-tree index:

1
2
3
4
5
SELECT id_time, kind, content, v_fuse, meta
FROM memories
WHERE kind = %%%%0%%%%2 AND $3
ORDER BY id_time DESC
LIMIT K;
where KK is typically user-provided.

Hybrid, time-and-semantic similarity search employs the vector index:

1
2
3
4
5
6
SELECT m.*, (m.v_high <=> $1) AS sim,
       exp(-(EXTRACT(EPOCH FROM NOW())*1e6 - m.id_time)/tau) AS decay
FROM memories m
WHERE m.id_time BETWEEN %%%%2%%%%3
ORDER BY m.v_high <=> $1
LIMIT $4;
where <=><=> is cosine distance under pgvector, and τ\tau controls the temporal decay factor.

3.2 Structural and Graph Expansion

After identifying top-K nearest memories, edges are expanded via join:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
WITH topk AS (
  SELECT id_time, v_fuse
  FROM memories
  WHERE id_time BETWEEN :tmin AND :tmax
  ORDER BY v_high <=> :q
  LIMIT K
)
SELECT e.source, e.destination, e.relationship,
       e.strength, e.confidence,
       m2.content AS dest_content,
       exp(-||v_fuse_source - v_fuse_dest||_2) AS coherence_pair
FROM edges e
JOIN topk t ON e.source = t.id_time
JOIN memories m2 ON e.destination = m2.id_time
WHERE exp(-||t.v_fuse - m2.v_fuse||_2) >= tau_coh
ORDER BY coherence_pair DESC
LIMIT M;
This join supports rapid graph expansion for contextual or reasoning queries. The planner parallelizes index scans and vector computations.

3.3 Composite Scoring

The final retrieval score for each candidate combines semantic, temporal, and relational signals: Si=αcos(vfuse,i,q)+βexp(tnowtiτt)+γΦiS_i = \alpha \cdot \cos(v_{\mathrm{fuse},i}, q) + \beta \cdot \exp\left(-\frac{t_{\mathrm{now}}-t_i}{\tau_t}\right) + \gamma \cdot \Phi_i where Φi\Phi_i encodes edge density or other relation-specific criteria.

4. Graph Construction, Mutation, and Consistency

All memory insertions and edge creations follow an append-only discipline. Each memory is written with a unique timestamp and associated normalized vectors. New edges between memories are created as needed:

1
2
3
4
5
INSERT INTO memories (id_time, kind, content, v_low, v_high, v_fuse, meta)
VALUES (1627670400000000, 'message', 'Hello world', vlow, vhigh, vfuse, '{"agent":"A"}');

INSERT INTO edges (source, destination, relationship, strength, confidence, meta)
VALUES (1627670400000000, 1627670410000000, 'reply', 0.9, 0.95, '{"topic":"greeting"}');

No deletions occur online; low-confidence or obsolete edges may be pruned offline. Foreign keys enforce referential integrity.

5. Performance Characteristics and Scalability

Empirical results on a large-scale prototype (PostgreSQL 16 + pgvector, 32 core, 128 GB RAM, NVMe storage):

Operation Dataset Latency (ms) Throughput (rec/s)
Single insert 1 M 2.5
Batch insert (100) 1 M 8,000
Vector-only query (K=100) 10 M ~390
Hybrid time+vector (K=100) 10 M ~410
Hybrid+graph expand (K=50, M=50) 10 M ~620

Throughput scales linearly with CPU until the I/O bottleneck is reached. ivfflat vector search is the computational bottleneck; adding time and graph filters increases cost only marginally (≤10%).

Local coherence remains stable over 100 million inserts, indicating predictable drift.

6. Extensions: Columnar, Distributed, and Topic Modeling

Columnar Backend. Memories can be partitioned by (agent_id, date) with Parquet or equivalent, and a sidecar HNSW index over high-dimensional embeddings. Graph edges may be stored in a companion key-value or NoSQL store. Spark or Polars jobs can apply vector search and edge expansion over partitions, supporting large-scale, distributed workloads.

Distributed Clustering and "Eureka" Edges. Periodically, k-means is applied over fused vectors in sliding windows to group temporally disjoint but semantically related memories. New edges with label="eureka" and confidence proportional to pairwise coherence are established, serving as reinforced links for future queries. Topic drift is flagged when the assigned cluster centroid is distant in fused-vector space.

Emergent Topic Modeling. The spectral gap of the local graph Laplacian,

Δλ=λ2(Llocal)λ1(Llocal)\Delta\lambda = \lambda_2(\mathcal{L}_{\text{local}}) - \lambda_1(\mathcal{L}_{\text{local}})

signals bifurcation and the emergence of new topics. Super-vertices—clusters of linked memories—may then be formed and connected via specialized "topic-of" edges.


MemoriesDB, as formalized and prototyped in this architecture, delivers a unified, mathematically principled, and operationally scalable substrate for temporally and semantically coherent long-term agent memory. The integration of vectorized semantics, explicit temporal ordering, and flexible relational modeling under a single append-only schema distinguishes MemoriesDB within both AI memory systems and general data infrastructure for cognitive agents (Ward, 9 Nov 2025).

Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to MemoriesDB.