DeXposure Dataset Benchmarks
- DeXposure Dataset is a comprehensive benchmark comprising three distinct resources for millimeter-wave dosimetry, DeFi credit exposure, and compressive sensing video evaluation.
- It supports advanced surrogate modeling and machine learning tasks with rigorous evaluation protocols, including Gaussian copula synthesis and state-of-the-art performance metrics.
- The datasets come with complete code, standardized formats, and open licenses to promote reproducibility and future research expansion.
The DeXposure dataset refers to three distinct, technically significant resources in contemporary computational research: (1) a benchmark dataset for millimeter-wave dosimetry and surrogate modeling (Kapetanovic et al., 2023); (2) a large-scale dataset for measuring inter-protocol credit exposure in decentralized finance (DeFi) networks (Wu et al., 27 Nov 2025); and (3) a compressive sensing video dataset featuring pixel-wise coded exposure for assessing compressive sensing (CS) algorithms (Narayanan et al., 2019). Although these datasets serve disparate domains, each establishes new benchmarks in their respective areas, providing high-quality curated data and comprehensive evaluation protocols.
1. Localized Millimeter-Wave Exposure: Dataset Structure and Purpose
The DeXposure Dataset for localized exposure at 10–90 GHz is an open-source, statistically modeled benchmark intended to standardize and accelerate research in electromagnetic exposure and skin dosimetry (Kapetanovic et al., 2023). It provides 10,000 physically plausible synthetic samples, each simulating the steady-state temperature rise on the skin from a half-wave dipole antenna placed at varying distances and frequencies.
Each sample comprises a 6-dimensional input vector:
- Antenna–skin separation distance (mm)
- Frequency (GHz)
- Four spatially averaged incident power density (IPD) metrics: , (), , (), and one output variable:
- Maximum temperature rise (°C) on the skin surface
These are distributed in CSV or Numpy arrays with columns: , , , , , , .
The synthetic generation employs a Gaussian copula fitted via maximum likelihood to 115 high-fidelity simulation cases, enforcing strict dosimetric constraints over input ranges: mm, GHz, and all IPD/. Distributional fidelity to the original samples is verified via KS- and Fisher z-tests.
2. Surrogate Modeling and Benchmarking
The dataset supports surrogate modeling across polynomial and spline-based function approximators, all trained on the synthetic distribution. Four surrogate families are constructed:
- Quadratic-polynomial regression ()
- Minimal-energy tensor-product B-splines and cubic Hermite splines, providing smooth, regularized mappings
- A Mixture-of-Experts (MoE) strategy utilizing k-means clustering () and training local experts (quadratic or spline) per cluster, with soft gating over predictions
Benchmark results, validated on both an 80/20 synthetic split and the original high-fidelity simulations, report the following MAEs:
- MoE: 0.058°C (best)
- XGBoost: 0.063°C
- TabNet: 0.069°C
- MLP: 0.091°C
Training times for MoE surrogates are substantially lower than exhaustive tree-based grid searches.
3. Inter-Protocol Credit Exposure in DeFi Networks: Dataset Formalization
The DeXposure Dataset for DeFi inter-protocol credit exposure (Wu et al., 27 Nov 2025) quantifies daily, token-resolved credit exposures between DeFi protocols from 2020–2025. It synthesizes 43.7 million entries spanning 4,300+ protocols, 602 blockchains, and over 24,300 tokens, utilizing DefiLlama metadata as primary provenance.
Each daily JSON/CSV snapshot captures:
- Nodes: protocol ID, total value locked (TVL), token-level TVL breakdown
- Links: directed exposure from protocol to (TVL in USD per day), with linked token flows
Formally, for protocol , token , issuing protocol (from a cascaded mapping procedure), and day , the daily edge weight is: where is the token-value change.
Curation includes hierarchical mapping for via direct metadata, manual assignment, TF-IDF/cosine similarity on metadata, and fallback self-mapping.
4. Machine Learning Benchmarks and Analytical Protocols
The DeXposure DeFi dataset is benchmarked for three principal ML tasks:
- Graph Clustering: Protocol features (degree, betweenness, PageRank, TVL, etc.) are embedded (t-SNE, perplexity=30), clustered (DBSCAN), and analyzed via weekly centralization, entropy, concentration, and assortativity metrics.
- Sector-level Vector Autoregression (VAR): Exposure-shift ratios are computed per sector, with VAR modeling for shock propagation (impulse response functions) after major events (e.g., Terra, FTX collapses).
- Temporal Graph Neural Networks (TGNN): Node- and graph-level time series (ROLAND-style TGNNs) predict dynamic edges; performance assessed by AUPRC per snapshot.
Key findings: rapid network growth (nodes: 9→11,087; edges: 8→69,710), increasing protocol concentration, shrinking density, and sector-differentiated contagion after systemic shocks.
5. Compressive Sensing Video: Acquisition, Representation, and Annotation
The DeXposure compressive video dataset (Narayanan et al., 2019) provides the first public benchmark of videos captured and encoded via pixel-wise coded exposure (PCE), designed for CS algorithm and compressed-domain vision benchmarking.
The corpus contains 375 annotated video clips (≈90 min, 76,400 original frames, 5,873 CS frames) spanning indoor/outdoor, person and car scenes, and four canonical motion scenarios (static/moving backgrounds and objects, camera motion).
CS frames are formed by compressing each pixel with a random , “bump” binary code: with per-pixel SNR normalization. All raw frames, masks, and annotations are provided. Annotation combines YOLOv3 pseudo-boxes, bounding-box aggregation across K frames, and manual correction (via VATIC).
6. Evaluation Protocols and Use Cases
For the millimeter-wave and DeFi datasets, standardized evaluation metrics (MAE, AUPRC, Silhouette score, VAR-IRF uncertainty bands) are prescribed. For the CS video benchmark, typical CS algorithm baselines (Basis Pursuit, OMP, TV-min, learned dictionaries) are recommended, with evaluation via PSNR, SSIM, and reconstruction time.
Suggested ML and application domains:
- Real-time dosimetric assessment and surrogate-based modeling (Kapetanovic et al., 2023)
- DeFi risk monitoring, contagion analysis, systemic risk modeling, and algorithmic trading backtesting (Wu et al., 27 Nov 2025)
- Compressive-sensing reconstruction, end-to-end recognition pipelines, coding pattern research, detection/tracking on compressed data (Narayanan et al., 2019)
7. Access, Distribution, and Prospects for Extension
All three datasets are released under open licenses with complete code and documentation:
- Millimeter-wave dosimetry and surrogates: https://github.com/akapet00/thermal-dosimetry-surrogate.git
- DeFi credit exposure graphs: https://github.com/dthinkr/DeXposure and live visualization at https://ccaf.io/defi/ecosystem-map/visualisation/graph
- Coded exposure video (CS): source packages include numpy/JSON-formatted data, loader scripts, and sample solvers
These resources establish shared community benchmarks and reproducibility foundations for research in computational dosimetry, financial systemic risk, and compressive video processing. They are positioned for ongoing expansion—e.g., increased scene diversity, higher compression rates, new protocol inclusion, richer labels, and adaptation to novel algorithmic paradigms (Kapetanovic et al., 2023, Wu et al., 27 Nov 2025, Narayanan et al., 2019).