Unified Peripartum Relational Database

Updated 25 October 2025

The Unified Peripartum Relational Database is a comprehensive system that consolidates heterogeneous peripartum clinical data with strict schema design and advanced querying capabilities.
It employs rigorous entity–relationship modeling, logical schema transformation, and NL2SQL integration to ensure data consistency across diverse clinical records.
The database enhances clinical care and research reproducibility by enabling real-time access, standardized audits, and integration of fragmented data sources.

A Unified Peripartum Relational Database refers to a centralized, schema-driven system designed to consolidate and rigorously structure heterogeneous peripartum clinical data—including patient, pregnancy, examination, delivery, neonatal, and continuous monitoring records—by employing enterprise-class relational techniques augmented with modern metadata management and natural-language-to-SQL (NL2SQL) interactability. Developed and prototyped at Udine University Hospital, it resolves the fragmentation typical of hospital information systems and creates a computable platform for both enhanced clinical care and reproducible research (Armenise et al., 18 Oct 2025). Methodologically, it integrates well-defined entity–relationship modeling, logical schema transformation, strict referential and domain constraints, and advanced querying interfaces inspired by recent advances in integrating graph data models with relational database technology (Crowe et al., 2023).

1. Conceptual and Logical Schema Design

The schema design of the Unified Peripartum Relational Database is rooted in a meticulous entity–relationship (ER) analysis conducted in cooperation with clinicians. Core entities include Patient, Pregnancy, Condition, Examination, Test, Delivery, Induction, Newborn, Tracing, and Measurement. Relationships and attributes are defined to ensure referential integrity and compliance with real-world clinical semantics.

Patient: Uniquely identified (e.g., by Italian tax code), storing limited demographic data.
Pregnancy: Modeled as a weak entity (identified by patient plus first examination date or synthetic ID), linking multiple pregnancies to a single patient.
Condition: Supports many-to-many relations with Pregnancy, storing possible treatment details.
Examination: Union-type entity covering first trimester, second trimester, biometrical ultrasound, and other examination types, with contextual and type-specific attributes.
Test: Maintains results (numeric, string, enumerated), enforcing data type coherence via triggers.
Delivery: Models multiple delivery modes and specializations, with common attributes centralized.
Induction: Details labor induction protocols.
Newborn: Represents each delivered child, distinguishing multiples by birth time, with physiological measures (Apgar, pH, etc.).
Tracing and Measurement: Facilitates longitudinal CTG storage; Measurements are timestamped and attached to Newborn records.

Referential integrity is established by foreign keys (e.g., linking Delivery, Newborn, and Tracing) and further reinforced by triggers that enforce complex integral constraints (such as the requirement that each pregnancy be linked to at least one Examination or a Delivery).

2. Integration of Heterogeneous Data Sources

The database is designed to unify previously fragmented peripartum datasets, resolving inconsistencies across legacy spreadsheets, EHR exports, device repositories, and laboratory systems.

Maternal longitudinal history: Imports pre-existing conditions and medication records originally held in non-standard formats.
Current pregnancy findings: Integrates laboratory and ultrasound data collected at multiple timepoints within a normalized schema.
Intrapartum course data: Ingests continuous CTG monitoring streams by encapsulating timestamped Measurements within Tracing instances.
Delivery and neonatal outcomes: Segregates these into dedicated, rigorously linked entities for auditability.

Discrete identifiers (e.g., synthetic pregnancy IDs) and timestamp harmonization mitigate inconsistencies, while additional triggers enforce schema-wide semantic rules. This synthesis assures both integration and harmonized representation across sources that previously operated independently.

3. SQL Implementation and Schema Constraints

Following standard ER-to-relational mapping protocols, including Crow’s Foot notation and the methods outlined by Atzeni et al., the conceptual schema is transformed into an operational PostgreSQL implementation.

DDL encoding: Every table, key, and relationship is rendered via SQL Data Definition Language, respecting foreign key constraints for referential soundness.
Domain constraints: Enforced using triggers; for example, verification that test results match declared types and uniqueness constraints on examinations per pregnancy.
Specialization: Handled via overlapping relations with shared attributes centralized, reducing complexity and redundancy.
Timestamps and calculations: Managed natively, e.g., for CTG Measurements or induction–delivery intervals.

A representative computed field is:

1	AVG(EXTRACT(EPOCH FROM (d.expulsion_time - i.administration_time)))/3600 AS average_interval_hours

This formula measures average time (in hours) between induction and expulsion event across deliveries.

4. Natural-Language-to-SQL (NL2SQL) Query Capabilities

To maximize usability for clinicians, a natural-language-to-SQL layer powered by advanced LLMs (XiYanSQL-QwenCoder-32B-2504) and an interactive web front-end (SQLChat) is implemented.

Front-end: SQLChat enables input of natural-language queries in a GUI that embeds full schema context.
Back-end: XiYanSQL-QwenCoder-32B-2504 translates queries with competitive accuracy (e.g., 67.14% on BIRD Dev benchmarks).
Privacy and extensibility: The model supports self-hosting and fine-tuning, critical for clinical environments.

Complex relational queries (including JOINs and EXISTS operations) are accurately generated; preliminary testing reports successful resolution of approximately 7 out of 8 moderately complex queries.

5. Challenges in Data Consolidation and Solutions

The project addresses several key challenges:

Fragmentation and heterogeneity: Overcome by replacing disparate formats with uniform schema, utilizing triggers for semantic consistency, and enforcing synthetic IDs.
Consistency and redundancy: Managed through foreign key constraints, cycle enforcement among core entities, and uniqueness restrictions to prevent duplication (e.g., examination per pregnancy type limits).
Clinical usability: Addressed by the NL2SQL interface, obviating the need for direct SQL command writing.

These solutions correct error-prone processes endemic to non-centralized and spreadsheet-based workflows in clinical settings.

6. Impact on Clinical Practice and Reproducible Research

The unified peripartum relational database provides significant technical and operational improvements:

Clinical care: Enables real-time, comprehensive access to peripartum records, supporting informed decisions during critical intrapartum events and reducing clinician cognitive load through immediate data availability.
Research and quality improvement: Facilitates standardized audits, trend analyses, and predictive model development, underpinning reproducibility and data harmonization across research studies.
Technical modularity: Accommodates future evolution of schema with minimal re-engineering, allowing for straightforward introduction of new data types and clinical parameters.

Preliminary clinical feedback highlights streamlined retrieval and enhanced analytical flexibility via natural-language querying.

7. Theoretical Foundations: Graph–Relational Integration

Underlying the database architecture is a growing body of research on integrating graph data models into relational database environments—most notably, the Typed Graph Model (TGM) framework and metadata-driven schema control (Crowe et al., 2023). TGM permits enterprise-grade semantic enforcement by storing node and edge types as base tables, augmenting SQL syntax to support graph-specific constructs (CREATE Node {...}, MATCH Node {...}). Typed Graph Schema (TGS) encapsulates node sets (NS), edge sets (E), type system (T), and constraints (C), with metadata acting as a bridge between conceptual and physical schema definitions.

This suggests the possibility of evolving peripartum databases toward hybrid implementations, in which relational and graph-based queries coexist, further enhancing capabilities for modeling complex relationships (e.g., care transitions, historical event chains) inherent to obstetric workflows.

In summary, the Unified Peripartum Relational Database synthesizes contemporary relational engineering, graph-modeling theory, and natural-language-to-SQL intelligence to resolve longstanding fragmentation in obstetric information systems. Through rigorous schema and constraint design, advanced integration methods, and clinician-centric interfaces, it offers a robust platform for both high-quality intrapartum care and reproducible biomedical research (Armenise et al., 18 Oct 2025, Crowe et al., 2023).