RSFM Database (RS-FMD) Overview
- RSFM Database (RS-FMD) is a multi-domain resource that provides rigorously structured datasets for remote sensing, respiratory diagnostics, and fluid mechanics benchmarking.
- It employs standardized schemas, versatile data formats (JSONL, HDF5), and advanced retrieval tools to support automated model selection and data-driven evaluations.
- The integrated datasets advance reproducible research, driving progress in machine learning, biomedical diagnostics, and reduced-order fluid dynamic modeling.
The RSFM Database (RS-FMD) refers to three distinct, well-documented research databases: (1) the Remote Sensing Foundation Model Database for automated model selection in remote sensing; (2) the Respiratory Sound and Functional Measurements Database for multimodal respiratory diagnostics; and (3) the Reduced-Complexity Modeling of Fluid Flows Database for benchmarking data-driven and physics-based modeling in fluid mechanics. Each instance is domain-specific, offering rigorously structured resources for machine learning, applied physics, or biomedical research. The following exposition delineates the technical architecture, content, and research applications of these databases within their respective domains.
1. Remote Sensing Foundation Model Database (RS-FMD)
The Remote Sensing Foundation Model Database (RS-FMD) provides a schema-guided registry of over 150 public remote sensing foundation models (RSFMs), systematically capturing multi-modal, multi-resolution, and multi-task metadata to facilitate automated, reproducible model selection (Chen et al., 21 Nov 2025). The motivation arises from the heterogeneity and disparate documentation of recent RSFMs, which impedes comparative evaluation and operational deployment.
Supported Task Types:
RS-FMD spans a broad range of remote sensing tasks, including:
- Single-label, multi-label, and few-shot classification
- Semantic segmentation (e.g., land cover, surface water, cropland)
- Change detection (binary, semantic, multi-temporal)
- Object detection (buildings, vehicles)
- Image captioning and visual question answering (VQA)
- Multimodal vision–language tasks
Database Schema and Structure:
The RS-FMD schema is implemented as a flat collection of JSON records, where each foundation model entry comprises a main model table and two structured sub-tables ("PretrainingPhase" and "Benchmark"). The model table tracks identifiers (model_id, model_name, version), backbone architecture (e.g., ViT-Large), parameter count, domain knowledge embeddings, supported sensors (e.g., Sentinel-2, Landsat-8), modality integration, alignment (spectral/temporal), and resolution metadata. Nested "PretrainingPhase" and "Benchmark" tables enumerate datasets, temporal/geographic coverage, benchmarks, metrics (accuracy, mIoU, mAP, F1), and deployment constraints. One-to-many relations between models and pretraining/benchmarking phases, combined with controlled vocabularies and enumerations (e.g., for modality, alignment), enforce normalization and facilitate automated parsing.
Modality and Learning Paradigm Coverage:
RS-FMD catalogues models across sensor modalities: RGB/optical, multispectral, hyperspectral, SAR, LiDAR, and image–text. Models are annotated with spatial resolution (high: <5 m; medium: 5–30 m; low: ≥30 m), learning paradigms (supervised, self-supervised, multimodal pretraining, PEFT), backbone details, and pretext training types (e.g., masked autoencoding, contrastive). This supports both homogeneous and heterogeneous modality integration.
Query and Retrieval Interface:
Data is versioned as JSONL records with validation enforced via pydantic. Programmatic access is provided through a retrieval tool that encodes constraints and FM metadata using Sentence-BERT, indexed by FAISS (cosine similarity). Candidates can be filtered with hard constraints (sensor, modality, min_performance), and reranked using LLM-based in-context learning. Example usage is shown below:
1 2 3 4 5 6 7 8 9 10 11 |
from rsfmd import RSFMDClient client = RSFMDClient(db_path="rsfmd.jsonl") query = { "application":"land cover classification", "modality":"multispectral", "sensor":["Sentinel-2"], "min_performance":{"metric":["accuracy"],"value":[85]} } cands = client.retrieve(query, top_k=50) filt = client.filter(cands, query) ranked = client.rank(filt, query, top_k=5) |
Illustrative Model Records:
| model_id | backbone | modalities | benchmark (task/dataset/metric) |
|---|---|---|---|
| A2-MAE | ViT-Large | Multispectral, Multi-temporal | Land cover classification / EuroSAT / accuracy=99.09 |
| CROMA | ViT-Base | SAR, Multispectral | Semantic segmentation / Sen1Floods11 / mIoU=85.2 |
Model Selection and Extraction Confidence:
Extraction of free-text metadata is assigned a confidence score: with , , and threshold for field acceptance.
Overall, RS-FMD enables interpretable, constraint-driven RSFM selection, supported by the REMSA agent for agentic interaction and expert-centered benchmarking (Chen et al., 21 Nov 2025).
2. Respiratory Sound and Functional Measurements Database (RS-FMD)
In biomedical engineering, RS-FMD denotes the Respiratory Sound and Functional Measurements Database, which extends the RespiratoryDatabase@TR to provide synchronized, multi-site auscultation, spirometry, and chest X-ray imaging for both healthy and pathological cohorts (Altan et al., 2021). This resource targets the development and validation of signal-processing, ML, and diagnostic tools for obstructive and restrictive lung diseases.
Dataset Composition:
- 75 adults (age: 38–68; 13 female, 62 male): 30 healthy controls (normal PFTs, no smoking), 45 patients (asthma, chronic bronchitis, COPD; n≈15 per class).
- Clinical and demographic attributes tracked per subject.
Recording Modalities:
- 12-channel lung auscultation (anterior/posterior, bilateral, upper/mid/lower lobes)
- 4-channel heart auscultation (aortic, pulmonic, tricuspid, mitral)
- Chest X-ray (PA, lateral views)
- Pulmonary function tests (PFT): FEV₁, FVC, FEV₁/FVC ratio, PEF, with systematic ATS/ERS protocol adherence.
Hardware and Synchronization:
Dual Littmann 3200 electronic stethoscopes, multi-frequency response (Bell/Diaphragm/Extended, 20–1000 Hz), 16-bit/4 kHz PCM, real-time Bluetooth/audio storage. All auscultation traces are precisely synchronized to a standardized cough event, permitting sub-millisecond alignment across channels.
PFT Metrics and Spirometric Curves:
Chest X-ray acquisition is at 2000 × 2000 pixels (12-bit DICOM), with preprocessing and optional lung field segmentation.
Annotation, Labeling, and Access:
Board-certified pulmonologists review all PFTs, images, and auscultation waveforms. Diagnostic labels include normal, obstructive, restrictive, and mixed. Data is stored hierarchically (per-subject folders), with demographics, PFT CSVs, DICOM images, channel-wise WAV files, and annotation files. Metadata conforms to a simplified HL7-FHIR Observation schema. The dataset is licensed under CC BY-NC-SA 4.0, with data-use agreements for privacy.
RS-FMD thus offers an integrated, high-fidelity platform for ML-based and algorithmic advancement in respiratory medicine (Altan et al., 2021).
3. Reduced-Complexity Modeling of Fluid Flows Database (RSFM)
In computational physics, the RSFM Database is a curated resource comprising six time-resolved fluid-mechanics datasets combining both canonical and applied flows for benchmarking reduced-order modeling (ROM) methods (Towne et al., 2022). The key objectives are to support both data-driven (e.g., POD, DMD, neural nets) and physics-based (resolvent, Galerkin) techniques via high-quality, diverse flow data accessible to the community.
Data Organization and Access:
- All data is stored as HDF5 files on the University of Michigan Deep Blue Data repository, organized into dataset-specific folders (jet/BLdns/BLexp/airfoilDNS/gustexp/airfoilLES).
- Download via web browser or high-throughput Globus transfer; all datasets total ∼8 TB.
- Each HDF5 contains 3D snapshots (velocity, pressure, vorticity), precomputed modes, statistics, and time-series (e.g., lift/drag).
Dataset Overview:
| Section | Flow | Abbrev. | Method | Notes |
|---|---|---|---|---|
| 2.1 | Turbulent jet | jet | LES | 10,000 snapshots |
| 2.2 | TBL DNS | BLdns | DNS | 5 TB, planar/vol. |
| 2.3 | Experimental TBL PIV | BLexp | EXP | 6000 snapshots/Re |
| 2.4 | Pitching flat-plate airfoils | airfoilDNS | DNS | laminar |
| 2.5 | Airfoil gust encounter | gustexp | EXP | PIV, force bal. |
| 2.6 | Separated airfoil-wake LES | airfoilLES | LES | 16,000 snapshots |
Supported Modeling Workflows:
- Proper Orthogonal Decomposition (POD) and Spectral POD (SPOD): Extraction of energetic and frequency-resolved modes.
- Resolvent analysis: Linearization of the Navier–Stokes equations; singular-value decomposition; response and forcing modes.
- Dynamic Mode Decomposition (DMD): Linear mapping estimate between successive states.
- Galerkin Projection & State-Space Modeling: Projection of dynamics onto low-dimensional basis with ODE/PDE closure.
- Causality Analysis (Information Theory): Estimates of information flux between scales/zones using conditional entropy.
- Conditional Projection Averaging: Phase-conditioned averaging based on modal projections.
Guidelines for Use:
Each dataset is accompanied by standard example scripts, metadata (variable/grid naming), and precomputed fields (means, modes, statistics). Best practice recommendations include initial exploration with smaller datasets (e.g., laminar pitching-airfoil DNS), hierarchy-aware modeling, and methodical comparison across canonical and application datasets.
The RSFM Database is intended as a common testbed to facilitate systematic comparison and advancement of reduced-complexity models in fluid mechanics (Towne et al., 2022).
4. Comparative Table of RSFM/RS-FMD Databases
| Domain | Database Name | Primary Content | Modality/Format |
|---|---|---|---|
| Remote Sensing | RS-FMD | ~150 RS foundation models | JSONL, nested schema |
| Respiratory Medicine | RS-FMD | 75 subjects, 16-channel ausc. | WAV, DICOM, CSV, JSON |
| Fluid Mechanics | RSFM Database | 6 time-resolved datasets | HDF5, >8 TB, numerical grids |
The acronym RSFM/RS-FMD is thus overloaded across domains, each connoting a rigorously structured, openly accessible, and richly annotated dataset for community-driven algorithmic and methodological progress.
5. Research Impact and Utilization
Each instance of the RSFM/RS-FMD database serves a pivotal infrastructure function within its respective discipline. The Remote Sensing variant catalyzes reproducible, interpretable model selection and benchmarking across a rapidly diversifying landscape of foundation models. The Respiratory database provides an integrated set of benchmarks and multimodal ground truth for ML-driven diagnostics in cardio-pulmonary medicine. The Fluid Mechanics collection bridges the gap between simulation and experimental data, enabling apples-to-apples assessment of reduced-order modeling strategies in both canonical and application-driven flows.
These resources have accelerated progress on LLM-based agentic model selection (Chen et al., 21 Nov 2025), advanced ML in biomedical diagnostics (Altan et al., 2021), and fostered reproducible, comparative algorithmic evaluation in fluid-physics ROM (Towne et al., 2022). Their design principles—schema enforcement, modular access, multi-modality, and expert labeling—set technical standards for FAIR data within and beyond their respective research communities.