Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash 99 tok/s
Gemini 2.5 Pro 60 tok/s Pro
GPT-5 Medium 32 tok/s
GPT-5 High 27 tok/s Pro
GPT-4o 102 tok/s
GPT OSS 120B 461 tok/s Pro
Kimi K2 227 tok/s Pro
2000 character limit reached

LiveDB: Real-Time Data Management

Updated 4 September 2025
  • LiveDB is a dynamic data management system that continuously streams updates through bidirectional schema evolution to support concurrent operations.
  • It incorporates live query processing with event-driven propagation, enabling real-time data updates and efficient change tracking.
  • In blockchain applications, LiveDB optimizes state management by splitting mutable state and archival data, reducing storage overhead and enhancing throughput.

LiveDB refers to a family of data management systems and core architectural mechanisms whose distinguishing feature is the continuous, real-time reflection and streaming of changes—either at the logical schema level or in the physical state itself—to support flexible, robust, and concurrently evolving data consumers. While the term "LiveDB" appears in several database contexts, the principal research line outlined here is drawn from three foundational papers: InVerDa and BiDEL for schema version co-existence (Herrmann et al., 2016), event-driven live queries with GraphQL (Silveria, 2020), and LiveDB as a forkless blockchain state store (Jordan et al., 28 Aug 2025). These projects target distinct data models but are unified by a commitment to eliminating downtime, reducing maintenance complexity, and propagating changes in a timely and cost-effective fashion across a heterogenous and evolving application landscape.

1. Bidirectional Schema Evolution and Co-Existence

Supporting multiple, parallel schema versions in a live operational database has historically faced technical obstacles due to the prevalence of monodirectional evolution languages and brittle, handwritten "delta code." The InVerDa system with its BiDEL language (Herrmann et al., 2016) introduces end-to-end support for co-existing schema versions without the need for a disruptive, atomic migration.

InVerDa's architecture consists of:

  • A bidirectional evolution language (BiDEL) where schema modifications are specified as succinct, formally annotated scripts. These scripts compile into a set of Schema Modification Operations (SMOs), each defined by bidirectional Datalog mappings (src, tgt) that propagate reads and writes among all versions.
  • An evolution graph (a directed acyclic hypergraph) records schema genealogy at the level of table instances and transforms. SMO instances are edges; table versions are vertices.
  • Two major DBA-accessible operations:
    • Database Evolution Operation: Instantiates a new schema version and autocreates the delta code (views, triggers) needed for logical equivalence and change propagation.
    • Database Migration Operation: Atomically moves physical table materialization among versions to optimize for workload, without downtime.

This structure yields the property that each schema version behaves as an independent, logically complete database—empirically and formally verified through Datalog bidirectionality proofs—while all versions continue to operate directly atop a single physical database.

2. Live Query Processing and Real-Time Propagation

A distinct but thematically aligned use of LiveDB is the integration of live queries at the database level, as evidenced by the GraphQL live query architecture over a DynamoDB model (Silveria, 2020). Here, the system achieves real-time data reflection and event-driven propagation by extending the basic get/put interfaces of a key-value store:

  • Clients initiate a stream(key, fields) operation that results in a persistent, server-maintained subscription to subsets of fields for a particular key.
  • On each subsequent put(key, object), the system efficiently computes the set of updated fields (by sparse object, or field-level diff, optionally using Merkle trees for sublinear diffing).
  • If the intersection of updated fields and the client’s subscription is non-empty, a new view is asynchronously pushed to the client using a messaging system such as SNS/SQS or NSQ.

This design decouples database update logic from client polling, reduces bandwidth by only transferring relevant data, and enables declarative, real-time application architectures. The efficiency of this event-driven live query processing is maintained by asynchronous diff-computation and streaming, although potential concerns include write performance degradation, diff-computation overhead, and consistency windows in distributed, eventually consistent environments.

3. Forkless Blockchain State Management

A third instantiation of LiveDB principles is in blockchain systems, targeting the inefficiency of legacy storage (Merkle Patricia Tries) even after the adoption of forkless consensus protocols (Jordan et al., 28 Aug 2025). The LiveDB/ArchiveDB split addresses both storage bloat and throughput constraints:

  • LiveDB acts as a mutable, pruned, dense key–value state engine, holding only the current state (balances, nonces, contract storage) relevant for validator and observer nodes. Storage overhead is reduced by approximately 100×100\times; throughput increases by 10×10\times compared to baseline geth implementations.
  • ArchiveDB is an append-only log of state changes, retaining historical state in an efficient, serialized form, without incurring the overhead of full multi-versioned state or redundant cryptographic hashing.

This bifurcation separates "hot" operational state from long-term archival data, enabling both high-throughput execution and efficient, low-cost historical queries. When integrated with consensus and mempool-layer enhancements as in Narwhal and Tusk, such a storage system enables blockchain nodes to sustain higher transaction rates and handle block production surges without state transition bottlenecks.

4. Formal Guarantees and Propagation Semantics

The LiveDB paradigm in all these systems is anchored in rigorous formal semantics. In bidirectional schema evolution (Herrmann et al., 2016), correctness is established by two-sided mapping functions (forwards: tgt, backwards: src) for each schema mutation, specified in Datalog. The central invariants:

Dt=tgtdata(src(Dt))D_t = \operatorname{tgt}^{data}(\operatorname{src}(D_t))

Ds=srcdata(tgt(Ds))D_s = \operatorname{src}^{data}(\operatorname{tgt}(D_s))

are proven for both reads and updates—including chains of SMOs—ensuring no loss or duplication of user-visible data. For live query models (Silveria, 2020), propagation logic is functionally deterministic: a client always receives a new view if and only if at least one listened field has changed between updates.

This mathematically precise change propagation is central for both practical correctness—each version or subscriber receives a coherent and complete view—and for higher-level guarantees, such as logical independence of schema versions or cryptographically sound state transitions in blockchains.

5. Implementation Considerations, Performance, and Limitations

Across these systems, implementation is characterized by the automatic generation of delta code or update logic from high-level declarative specifications (BiDEL, live stream descriptors). This has several practical consequences:

  • Complexity reduction: BiDEL scripts can be up to 359 times shorter than equivalent handwritten SQL for schema evolution delta code. This substantially reduces maintenance effort and mitigates error introduction.
  • Performance: InVerDa’s automatically generated code incurs negligible overhead (about 4% slower than direct implementations in typical use), with local queries on materialized schema versions achieving up to a 2x speedup compared to non-local queries.
  • Materialization flexibility: Rapid, transactionally safe database migration operations allow DBAs to optimize live system layout for workload, without interrupting co-existing clients.
  • Scalability: Empirical results (e.g., Wikimedia’s 170+ version evolution) show only linear scaling in overhead related to the number of SMOs.

Potential limitations and challenges include the need for auxiliary tables to preserve information in non-invertible transformations, the computational cost of live diffing (for live queries), and the management overhead in tracking extensive schema genealogies or large numbers of field subscriptions.

6. Practical Impact and Use Cases

LiveDB designs address concrete problems in agile, real-world system development:

  • Enterprise deployment: Applications or microservices evolving at different paces are able to share a common data infrastructure without synchronization freezes.
  • Mobile and analytics clients: Real-time propagation ensures both immediacy and minimal data transfer, critical in bandwidth-constrained or interactive applications.
  • Blockchain validation: Forkless LiveDB enables economically viable, performant node operation even as consensus and mempool layers scale beyond previous block rates.

By permitting real-time, version-robust operations, LiveDB systems foster architectural decoupling: application and business logic developers are insulated from schema evolution; DBAs retain full control over physical design and optimization. The mathematical formalism behind LiveDB ensures strong guarantees of consistency, recoverability, and correctness independent of underlying storage or concurrency mechanisms.