Grammar of FAIR: Modular Digital Architecture
- Grammar of FAIR is a semantics-first framework that decomposes data into modular, independently identifiable semantic units with persistent identifiers and rich metadata.
- It leverages biological and linguistic analogies to illustrate compositionality, enabling logical independence, cross-ecosystem interoperability, and granular scholarly citation.
- The approach operationalizes semantic units as FAIR Digital Objects, fostering AI-ready infrastructures, scalable assessments, and format-agnostic data integration.
The "Grammar of FAIR" is a semantics-first architecture proposing that data and knowledge can be systematically decomposed into modular, independently identifiable, and machine-actionable semantic units. This granular approach, inspired by analogies from biological organization and linguistic grammar, frames FAIRness as a property not just of whole datasets or systems, but also of the atomic and composite semantic units that make up scholarly communication and digital infrastructures. The resulting framework provides a foundation for cross-ecosystem interoperability and fine-grained citation in AI-ready scientific workflows, ultimately enabling a modular Internet of FAIR Data and Services (IFDS).
1. Semantic Units: Atomic and Compound Building Blocks
The core concept in the Grammar of FAIR is the semantic unit. These are modular building blocks distinguished by semantic modularisation, allowing for logical, semantic, and technical heterogeneity within a coherent, machine-actionable framework:
- Atomic Statement Units: The smallest indivisible semantic proposition (e.g., “Parasite X has a mass of 24.76 grams”), each of which is assigned a globally unique and persistent identifier (GUPRI). Accompanying each unit is its own metadata profile, comprising provenance, logical framework, and schema information, enabling independent addressability, citation, and direct machine-actionability.
- Composite Compound Units: Higher-level aggregations that group multiple atomic statement units into contextually coherent collections, such as all statements about a specific entity, or a bibliographic metadata group. These compound units carry no additional semantic content but serve as structured pointers (via the GUPRIs) to their constituent statements, much as phrases and sentences in language are comprised of words.
This modularisation ensures logical independence, semantic coherence for each unit, and supports complex information systems by encapsulating contextually meaningful knowledge at varying granularity. Each semantic unit is represented by a meta-graph (GUPRI + metadata) linked to its content graph (the actual assertion or collection).
2. Biological and Linguistic Foundations
The architecture draws on two complementary metaphors to justify and elucidate its design:
- Biological Analogy: Just as biological hierarchies progress from atoms to molecules, cells, tissues, organs, and organisms—each demarcated by natural (bona fide) boundaries such as membranes—the Grammar of FAIR posits that information systems should be organized into semantic units delimited by boundaries of coherence (semantic “membranes”). These boundaries are defined by metadata, logical frameworks, and schemas, mirroring the encapsulating layers found in biological systems. Granular decomposition in this sense provides semantic “cells” that can be recombined, specialized, or cited in larger informational “organisms.”
- Linguistic Analogy: Human languages achieve infinite expressive power by combining a finite lexicon according to compositional grammatical rules (syntax and morphology). Similarly, the Grammar of FAIR recognizes semantic units (words/statements) as recombinable parts according to explicit linking rules, schema metamodels, and display patterns. The act of “coining” terms or reusing handled semantic entities allows cross-ecosystem linkage and persistent reference for both machines and humans, enabling robust concept formation, citation, and adaptation.
Both analogies emphasize modularity, compositionality, and the preservation of functional unity across scales, bridging human cognitive intuitions with formal, machine-actionable semantics.
3. Mapping to FAIR Digital Objects (FDOs)
The Grammar of FAIR operationalizes its semantic units through their instantiation as FAIR Digital Objects (FDOs):
- Statement FDOs: Each atomic statement unit is materialized as a Statement FDO, equipped with a GUPRI, full provenance, schema, and logic specification.
- Nested/Compound FDOs: Composite compound units are realized as nested FDOs that aggregate pointers (GUPRIs) to atomic units, maintaining the internal structure through standardized schemas.
- Serialization & Implementation: Different technical ecosystems serialize these FDOs differently:
- In RDF/OWL, atomic statement FDOs correspond to Nanopublications.
- In tabular/document-based systems, RO-Crates encapsulate a statement or compound unit with associated metadata and schema.
- Semantic Transitivity: The semantics of each FDO is anchored by explicit mapping to a shared natural language token model and/or a Rosetta Statement metamodel. This enables semantic content to be represented format-agnostically and ported across schemas (e.g., SHACL in RDF, SQL DDL), ensuring no loss of meaning or context—a property termed semantic transitivity.
A central mathematical idea is that overall FAIRness is compositional: where the latter term captures the richness of the internal structure.
4. Granularity, Interoperability, and Modular Assessments
The use of semantic units and FDOs enables granular assessments and modular management of FAIRness:
- Multi-level Assessment: FAIRness can be scored at the level of atomic statement units, compound units, or whole datasets, enabling detailed diagnostics and targeted improvements.
- Cross-ecosystem Operability: Because semantic content is explicitly typed, schema-aligned, and identified by persistent metadata, information can be referenced, queried, and cited across heterogeneous infrastructures—spanning RDF graphs, SQL databases, and document repositories—without semantic ambiguity.
- Citation-Granularity: Instead of referencing entire papers or datasets, scholarly communication can cite specific claims, facts, or hypotheses, each with their own FDO and GUPRI.
This modularization supports interoperable, reusable, and citation-rich digital scholarship and facilitates federated knowledge networks.
5. Implications for AI-ready Infrastructure and IFDS
The Grammar of FAIR lays the groundwork for AI-ready research infrastructures and the Internet of FAIR Data and Services (IFDS):
- Service Ecosystem: The approach anticipates a co-evolution of terminological, schema, operations, and workflow services—each managing the lifecycle, transformation, validation, and enrichment of semantic units—enabling scalable, cross-domain data integration and discovery.
- Scholarly Communication: Semantic-level citation, modularity, and persistent identifiers refine the granularity of attribution and reuse. This supports trustworthy, transparent AI applications that rely on provenance-rich, machine-interpretable scholarly artifacts.
- Format Agnosticism and Evolution: By abstracting from specific technical formats and grounding semantics at the unit level, the Grammar of FAIR enables smooth technology transitions and crosswalks as ecosystems evolve.
- FAIRness as an Emergent Property: The framework recognizes that overall FAIRness of a system is an emergent function of the FAIRness, structure, and interoperability of its constituent semantic units.
6. Future Directions and Broader Impact
The Grammar of FAIR reframes the FAIR principles not as static, top-down requirements, but as properties that can be manifested, measured, and evolved at arbitrary levels of semantic granularity:
- Modular and Scalable Infrastructures: By championing semantic modularisation, the approach enables the construction of modular, cross-ecosystem infrastructures capable of supporting large-scale, evolving scientific collaborations.
- AI Alignment and Machine Actionability: The explicit, machine-actionable structures created by this grammar enhance the prospects for trustworthy, explainable AI and automated data integration.
- Extensibility: The blueprint supports the evolution of new technical frameworks, ontologies, and identifier schemes as the research landscape and AI technologies progress.
- Scholarly Citation and Attribution: The paradigm makes possible a future where granular scholarly contributions—specific assertions or findings—are directly attributable, recombinable, and citable, enabling more precise scholarly communication and credit allocation.
In sum, the Grammar of FAIR offers a semantics-first, granular, and modular architecture that underpins the next generation of interoperable, AI-ready, and citation-granular digital research infrastructures. By decomposing knowledge into independently manageable semantic units, anchoring them in biological and linguistic analogies, and operationalizing them through FAIR Digital Objects, this approach advances the theoretical and technical foundation for an Internet of FAIR Data and Services as required by the evolving demands of scientific discovery.