Papers
Topics
Authors
Recent
Search
2000 character limit reached

EduDB: A Pedagogical DBMS Prototype

Updated 3 February 2026
  • EduDB is a pedagogical database system featuring a concise codebase and modular architecture aimed at enhancing undergraduate DBMS understanding.
  • It integrates key components such as parsing, query execution, buffer management, and concurrency control, allowing students to implement and optimize essential modules.
  • Future enhancements include advanced join algorithms, indexing, and recovery mechanisms, making it a practical sandbox for experimental database development.

EduDB is a pedagogically-focused database management system (DBMS) prototype designed explicitly for educational use in undergraduate curricula. Developed as a response to pedagogical limitations in traditional DBMS coursework—specifically targeting the University of Wisconsin–Madison’s CS 564—it provides a comprehensive yet minimal, end-to-end DBMS skeleton. Its primary characteristics are a concise codebase (~1.7 K lines of C++), clear separation of modular components, and explicit support for project-based learning (PBL), allowing students to both observe system fundamentals and implement, extend, or optimize key architectural modules (Lyu et al., 27 Jan 2026).

1. Motivations and Pedagogical Objectives

EduDB was created to bridge gaps between concepts and practical skills in undergraduate DBMS instruction. Traditional module-based assignments (e.g., B+-tree implementation in isolation) consume significant student effort on peripheral code and corner cases, often without advancing holistic comprehension of database internals. EduDB privileges an integrative, constructivist pedagogy by exposing the entire system in an accessible codebase, enabling students to observe and experiment with the control flow from client interface through to storage and concurrency management.

A central objective is to provide a clear, top-to-bottom view of the database system: client–server protocols, lexical and syntactic parsing, query execution logic, buffer and file management, and transaction/concurrency control. Students have explicit extension points for implementing or replacing modules such as hash joins, advanced buffer replacement schemes, fine-grained concurrency protocols, and recovery mechanisms. Benchmarking and automated grading—using a public leaderboard and performance tests relative to baseline algorithms—are intended to facilitate both technical and metacognitive learning outcomes (Lyu et al., 27 Jan 2026).

2. System Architecture and Component Overview

EduDB is architected as a single-server, multi-client system, with all interactions occurring over terminal-based, socket-driven client–server communications. Its high-level component graph is as follows:

Module Responsibilities Extension Points
Client–Server UI Manages socket-based protocol, dispatches parsed requests None
Parser Lexer and grammar rules, produces AST/operation descriptor Grammar rules, AST node types
Query Executor Dispatches operations to handler functions Join algorithms, handler registration
Buffer Manager Fixed-size page cache, manages pin/unpin, replacement policy Replacement/prefetch strategies
File Manager Block I/O; fixed-length binary record storage None; schema structure in-memory
Concurrency Manager Coarse-grained S/X/Global lock management, 2PL protocol Fine-grained/alternative locking
Transaction Manager Coordinates commit/abort, enforces isolation Recovery/atomicity mechanisms

The interaction pipeline proceeds from client query submission to parsing, query execution—potentially invoking custom, student-supplied operators—buffer/cache and file accesses, lock and transaction coordination, and finally results return to the client.

3. Core Data Structures and Algorithms

EduDB exposes canonical internal abstractions that serve both as learning vehicles and extension stubs.

3.1 Parser and AST

The parser uses finely delimited lexer types (keywords, identifiers, constants, delimiters) and deterministic, linear grammar rules per operation (e.g., CREATE TABLE pattern). Predicate expressions are parsed recursively into an AST that supports conjunction/disjunction (AND/OR), relational operations (=, <>, <, >), and leaves (column/constant) (Lyu et al., 27 Jan 2026).

3.2 Query Executor and Join Algorithms

The baseline join operation is a nested loop join, expressed as:

1
2
3
4
5
6
7
function NestedLoopJoin(R, S, pred, proj):
  result  
  for each r in R:
    for each s in S:
      if pred(r,s):
        result.add( proj(r,s) )
  return result
This presents a O(RS)O(|R|\cdot|S|) runtime, forming the baseline for extension assignments (students implement hash or sort-merge joins for improved efficiency).

3.3 Buffer Management

The buffer manager orchestrates a fixed pool of frames, each tagged by block id, dirty flag, and pin count. Requests to pin a block either increment local pins, evict (with possible dirty flush) to load a new block, or wait with a timeout, aborting if no frame is free within 10 seconds. The default eviction policy is FIFO/naïve; students can implement alternatives (e.g., LRU, CLOCK). Sample pseudocode (pin action):

1
2
3
4
5
6
7
8
9
10
pin(block):
  acquire mutex
  if block in pool:
    pool[block].pin +=1; return
  elif free frame f exists:
    if f.dirty: flush(f)
    load block into f; f.pin=1; return
  else:
    wait up to T seconds on condition variable
    if still no free frame: abort transaction

3.4 Storage and Indexing

Each table is mapped to a single OS file; records are in fixed-length binary format and the schema resides in memory. By default, EduDB omits all index structures (B+-trees, hash indexes), but explicit interfaces allow students to add modular indexing components such as:

  • create_index(table, column)
  • index_lookup(value) → list of block_no + offset

B+-tree search cost is O(logfN)O(\log_f N), where ff is the fan-out and NN the cardinality of leaf records (Lyu et al., 27 Jan 2026).

4. Concurrency Control and Recovery Properties

4.1 Concurrency Management

EduDB implements coarse-grained Two-Phase Locking (2PL) with three lock modes: Global, Shared (S), and Exclusive (X). Compatibility is restricted: S vs S is compatible, but all other pairs are not. Lock acquisition is blocking; exceeding a wait threshold results in transaction abort. There is no deadlock prevention or detection; students may implement schemes such as wait-die and wound-wait as extensions.

4.2 Recovery and Atomicity

EduDB omits write-ahead logging (WAL) and both undo/redo recovery protocols. On transaction commit, all dirty buffers are flushed and unpinned; all locks are released. An abort simply unpins buffers, potentially resulting in lost data. Students can add a recovery subsystem to implement WAL and atomicity/durability (Lyu et al., 27 Jan 2026).

5. Extensibility and Project-Based Experimentation

EduDB is explicitly designed as a platform for student-driven modification and experimentation. Predefined projects include replacing the nested-loop join with hash join or sort-merge join, where performance is graded via automated test cases and public leaderboards comparing elapsed time to the baseline version.

Potential future projects include:

  • Implementation of CLOCK, LRU, and LFU buffer replacements
  • Development of fine-grained locking mechanisms (page/record-level, deadlock prevention)
  • Extension of indexing with B+-trees and integrated parser/query executor support
  • Rule-based or cost-based query optimization
  • Addition of recovery via WAL, e.g., ARIES-style protocols

APIs follow an open class/interface organization permitting modular replacement of parser rules, executor handlers, buffer and eviction policies, and concurrency control logic. An example extension, Hash Join, is outlined as:

1
2
3
4
class HashJoin : public JoinOperator {
  void open() { buildHashTable(leftInput); }
  bool next(tuple &out) {  probeHashTable(rightInput, out); }
}

6. Implementation Scope, Evaluation, and Limitations

EduDB’s initial implementation covers the bulk of canonical DBMS modules except for query optimization, fine-grained concurrency, recovery, and indexing. The system’s compact codebase and pedagogical annotations aim to maximize comprehensibility. Full-scale student trials and usage in CS 564 at UW–Madison are planned, with expectations—drawn from the PBL literature—of improved student engagement, holistic architectural understanding, and soft skills development (benchmarking, teamwork, technical communication).

Limitations include lack of indexing, optimizer, recovery out of the box; only coarse-grained locking and naive buffer replacement; and performance limitations in large-scale workloads. Documentation and testing infrastructure are described as works in progress. Planned future enhancements concern systematic testing, expanded API documentation, baseline cost-based optimization, ARIES-style recovery, built-in indexing, and concurrency granularity improvements (Lyu et al., 27 Jan 2026).

7. Summary and Outlook

EduDB represents a minimalist, didactically-driven DBMS prototype that presents an unfettered view of fundamental control flows—parsing, execution, buffering, concurrency, persistence—while providing clearly-marked extension points for educational experimentation. Its design focus is on making the systems-level structure of a DBMS transparent and manipulable for students, forming a practical sandbox to construct database concepts from the bottom up (Lyu et al., 27 Jan 2026).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to EduDB.