Universal Data Format (.serva)
- Universal Data Format (.serva) is a self-describing, machine-readable container that standardizes metadata, units, and baselines across multiple disciplines.
- It unifies heterogeneous data storage and transmission with an HDF5/JSON structure that enables direct compute operations via embedded libraries and converters.
- Recent innovations such as lossless holographic compression and homomorphic compute integration significantly reduce compute and storage costs.
Universal Data Format (.serva) is a self-describing, machine-readable, and extensible container format engineered to unify the storage, transmission, and direct computation of scientific, engineering, and AI data across heterogeneous environments and disciplines. It is characterized by strict semantic conventions for metadata, hierarchically organized groups and datasets, standardized units and baselines, and—in its most recent incarnation—lossless, holographic compression enabling compute operations directly in compressed space. The format has been adopted in domains such as computational materials science (Ghiringhelli et al., 2016), wireless propagation measurement pooling (Shakya et al., 30 Sep 2025), and universal AI infrastructure (Clair et al., 14 Jan 2026), delivering interoperability, rapid data pooling, and dramatic reductions in compute and storage costs.
1. Fundamental Container Architecture and Metadata Hierarchy
.serva files employ an HDF5-based or JSON-based self-describing structure, mirroring the conceptual organization of metadata and results in computational sciences. In materials science, the hierarchy follows the NOMAD Meta Info ontology (Ghiringhelli et al., 2016):
- Top-level group:
/section_run/- Attributes:
run_id,code_name,code_version,timestamp - Nested groups:
/section_method/,/section_system/,/section_single_configuration_calculation/ /section_method/: model parameters (e.g.,xc_method, pseudopotential, convergence criteria)/section_system/: atomic configuration (number_of_atoms,atom_species[i],lattice_vectors)/section_single_configuration_calculation/: computed results (energy_total,forces[i], densities), with further nesting for iterative schemes (section_scf_iteration)
- Attributes:
- Metadata is partitioned into two types:
- Section-type: logical blocks that nest/reference each other (e.g., system, method)
- Concrete-type: scalars, strings, arrays with units
Naming conventions enforce lowercase with underscores (e.g., energy_total, k_mesh). Each concrete metadata carries a unique name, descriptive description, and a strict units attribute (always SI; energy in J, length in m, etc.).
2. Unit Standardization, Baseline Conventions, and Data Fusion
.serva mandates SI units for every concrete value and enforces code-independent conventions for reference points ("zero baselines"). Conversion from native code units occurs during data ingestion using code-specific mapping tables (Ghiringhelli et al., 2016). Energies are stored as relative values:
Reference schemes include:
- Free atoms (spin-unpolarized, non-relativistic) for each pseudopotential/x_c_method
- Bulk crystals from canonical lists (e.g., Lejaeghere et al.)
Band-structure and density-of-states data set the Fermi level to zero by convention.
In wireless propagation pooling, standardized fields include geometric (distance_m), spectral (frequency_GHz), and statistical (path_loss_dB, RMS spreads) features. Campaigns are merged by reprojecting coordinates to a unified frame, harmonizing metadata, and validating key fields against 3D environment maps (Shakya et al., 30 Sep 2025).
3. Interoperability Strategies: Converters and Embedded Libraries
.serva supports two complementary strategies for ecosystem integration (Ghiringhelli et al., 2016):
- Converter layer: External parsers read code-specific output/input and apply mapping functions (unit/sign/indexing translation), generating .serva/HDF5 or JSON output.
- Embedded library: APIs in C/C++, Fortran, or Python enable codes to natively emit compliant .serva/HDF5 groups. Example:
escdf_insert_scalar(group, "energy_total", value, "J").
Both produce identical hierarchical containers, validated against a shared NOMAD Meta Info schema. Embedded outputs and converted files interoperate with downstream tools, ensuring seamless searchability, comparability, and aggregation.
4. Compression, Holographic Encoding, and Direct Compute Integration
Recent deployments of .serva introduce an invertible bit-vector hologram representation, based on laser-holography principles and lossless quantization (Clair et al., 14 Jan 2026):
- Header: Magic bytes, version, encoder parameters (dimension D, quantization step Δ, bit-depth B, sampling rate Fₛ, pseudo-random seed S)
- Payload: N blocks ; phase-encoded, quantized coefficients, permuted/encrypted with reversible schemes
- Footer: Optional checksum
Mathematical foundation:
- Shannon’s sampling-quantization: , half-minimum-feature-spacing, guaranteeing invertibility
- Reference wave , interference , spectral encoding , phase jitter , followed by quantization and reversible permutation/XOR.
Compression ratios empirically reach 4×–34× (e.g., Fashion-MNIST 54.88 MB → 1.59 MB; Canterbury Corpus 17.66 MiB → 4.24 MiB). Bitrates in 1.7–1.9 bpb range are competitive with traditional lossless formats.
The Chimera compute engine enables direct model execution on compressed .serva payloads via homomorphic transforms:
- Convolutional layers:
- Fully-connected layers: , where
- RNN cells: Linear steps represented in H-space, nonlinearities approximated by polynomial/lookup
No decompression is required. Original model checkpoints and weight tensors are mapped into H-space via exact, invertible transformations—the wrapper orchestrates the process. Performance on RNN, CNN, and MLP architectures maintains baseline accuracy (±0.2 pp deviation).
5. Domain-Specific Implementations and Example Datasets
Computational Materials Science (Ghiringhelli et al., 2016)
.serva enables agglomeration and direct comparison of electronic-structure calculations by standardizing metadata hierarchies, unit conventions, and baselines. HDF5/JSON schema enforce reproducibility. Example materials calculation:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
{
"section_run": {
"run_id": "...",
"code_name": "Quantum ESPRESSO",
"code_version": "6.1",
"section_method": {...},
"section_system": {...},
"section_single_configuration_calculation": {
"energy_total": {"value": -1.278e-17, "units": "J"},
...
}
}
} |
Wireless Propagation Pooling (Shakya et al., 30 Sep 2025)
A .serva point-data file contains a header—with environmental map and metadata references—and an array of measurement records. Every record is keyed to campaign metadata, environmental descriptors, and standard feature fields.
| campaign_id | tx_id | rx_id | env_condition | frequency_GHz | distance_m | path_loss_dB | ds_mean_dir_ns | asa_lobe_deg |
|---|---|---|---|---|---|---|---|---|
| NYU_142UMi | TX1 | RX1 | LOS | 142.0 | 24.43 | 102.6 | 50.8 | 2.3 |
| USC_145UMi | TX1 | RX7 | NLOS | 145.5 | 83.0 | 130.0 | 117.6 | 121.1 |
Integration of environmental maps (GeoJSON, CAD) and campaign metadata enables pooling of disparate campaigns, harmonization by re-projection, and validation against obstruction tests.
6. Practical Impact and AI Infrastructure Significance
.serva removes barriers to data interoperability, preprocessing, and compute efficiency. In AI applications (Clair et al., 14 Jan 2026):
- Energy reduction: 96–99% less energy per training/inference run, .
- Compute payload reduction: Up to 68× less data processed per iteration.
- Time/capex savings: Hyperscalers save \$4.85M per petabyte per training cycle; training times fall by 35×–723× across architectures.
- Pipeline simplification: Six legacy steps (validation, conversion, cleaning, feature engineering, augmentation, loading) reduced to one encoding call.
.serva is format-agnostic, model-agnostic, and hardware-agnostic: any new model can consume legacy .serva datasets, and any hardware can execute Chimera-wrapped code directly. This enables reproducible, future-proof extensibility and transitions the bottleneck in AI development from infrastructure constraints to conceptual innovation.
7. Extensibility, Community Governance, and Future Prospects
.serva’s metadata schema is modular and user-expandable (e.g., via git-maintained NOMAD Meta Info (Ghiringhelli et al., 2016)). New scientific properties or application-specific features (from advanced MBPT data to custom wireless metrics) are slotted into the hierarchy without breaking compatibility. In wireless propagation, additional vendor/application fields are supported; unused fields are explicitly marked null.
Interoperability and extensibility are ensured by programmatic validation against published schemas, and domain-specific fields are harmonized at ingestion. Adoption across scientific computing, metrology, and AI is enabled by both community conversion tools and embedded library support.
A plausible implication is that, given continued adoption, .serva will serve as both a lingua franca for structured data across domains and as an efficient substrate for future compute paradigms, where both data and algorithms exist natively in compressive, homomorphic spaces.