Papers
Topics
Authors
Recent
Search
2000 character limit reached

PMPBench: Model Checking & Medical Imaging

Updated 29 January 2026
  • PMPBench is a dual-domain benchmark suite offering standardized datasets for parametric timed model checking and paired multi-modal pan-cancer image synthesis.
  • In parametric model checking, benchmarks include 34 models and 122 queries across academic, industrial, and challenging cases, evaluated via metrics like solving time and memory use.
  • For medical imaging, PMPBench provides paired CT and DCE-MR datasets with rigorous preprocessing and evaluation metrics, facilitating robust image translation studies.

PMPBench denotes established, publicly accessible benchmark suites and datasets in two distinct research domains: parametric timed model checking for real-time systems (Étienne, 2018) and paired multi-modal pan-cancer medical image synthesis (Chen et al., 22 Jan 2026). Each provides rigorously curated, task-driven benchmarks specifically aimed at advancing algorithmic methods and fostering reproducible evaluation in its field. This article surveys both instantiations, outlining definitions, design principles, dataset statistics, evaluation protocols, baseline models, and their broader impact.

1. Definitions and Scope

PMPBench originally referenced “Parametric Model Checking Parametrics Benchmarks,” a library aggregating 34 benchmarks, 80 models, and 122 verification queries for parametric timed automata (PTA) (Étienne, 2018). These address symbolic timing analysis under uncertainty (deadlines, delays, periods), evaluating academic, industrial, and unsolvable cases. In a separate domain, PMPBench is established as “Paired Multi-Modal Pan-Cancer Benchmark,” a large-scale dataset for medical image translation tasks, providing paired non-contrast and contrast-enhanced volumes across 11 organs, supporting both CT and multi-phase DCE-MR imaging (Chen et al., 22 Jan 2026).

Both usages share the principle of enabling fair comparison across state-of-the-art methods via standardized, meticulously labeled, and openly available resources.

2. Parametric Timed Model Checking Benchmark Suite

Formalism and Model Definition

A Parametric Timed Automaton is formally defined as a tuple A=(L,0,C,P,E,Inv)\mathcal{A} = (L, \ell_{0}, C, P, E, \mathrm{Inv}) where LL is locations, 0\ell_0 initial location, CC clocks, PP parameters, EE transitions, and Inv\mathrm{Inv} assigns location invariants. Each edge ee comprises guards, actions, resets, and destination location, with guards written as conjunctions of {xyp, xp}\{x-y \le p,\ x \le p\} for x,yC, pPx,y \in C,\ p \in P.

Tool evaluation centers on synthesizing parameter constraints, analyzing reachability, safety, optimality, robustness, and pattern matching under symbolic timing uncertainty.

Categories and Problems

Benchmarks fall into:

  • Academic: Hardware circuits (And–Or, Flip–flop), protocols (CSMA/CD, Fischer mutual exclusion), schedulability (Idle-time, Jobshop), safety under uncertain periods, and parameterized delays.
  • Industrial: Automotive pattern matching (PTPM), protocols (BRP, RCP), FMTV challenge, automation controllers (SIMOP), and synchronous circuits (SPSMALL).
  • Challenging: Toy families (p=1/np = 1/n, nn), exhibiting reachability for solution sets not presently solvable by tools.

Benchmark Descriptions

Representative instances include:

  • Fischer-AHV93 mutual exclusion: Scalable process count, safety constraint synthesis, solved up to 3–6 processes.
  • Jobshop scheduling: Parametric optimal makespan, up to 16 delay parameters.
  • PTPM (accel:10): Real-world automotive logs, pattern matching queries.
  • Toy (1/n)(1/n) PTA: Non-convex, infinite solution sets analytically tractable but currently unsolved by automation.

3. Evaluation Protocols and Metrics

Tools are assessed using:

  • Solving time (wall-clock): Quantified in seconds per benchmark.
  • Memory consumption (RSS peak): Tracks runtime memory requirements.
  • Parameter-space coverage: Measures completeness of synthesized constraints over all parameter valuations.
  • Scalability: Quantified as function of clocks, parameters, and automaton replicas.

Table: Select PMPBench results (Étienne, 2018)

Benchmark #Clocks #Params Property Time (s)
Fischer‐AHV93 (3proc) 2 4 Safety synthesis 0.04
Flip–flop:12 5 12 Robustness 23.07
Jobshop (3×4) 3 12 Optimal reachability 5.58
accel:10 (PTPM) 2 3 Pattern matching 12.67
toy (1/n) 2 1 Reachability unsolved

4. Paired Multi-Modal Pan-Cancer Medical Image Synthesis Benchmark

Dataset Structure

  • 2,642 volumes: 1,526 CT (10 organs), 1,116 DCE-MR (breast). Organs span adrenal, ovary, uterus, breast, stomach, pancreas, liver, colon, bladder, kidney, lung.
  • Modalities: Non-contrast CT (CTnc), contrast-enhanced CT (CTctc), DCE-MRI phases (DCE1–3).
  • Labeling: Automated DICOM parsing, radiologist manual correction, anatomical correspondence validated post-alignment.
  • Preprocessing: Multi-resolution rigid+affine, deformable B-spline registration (Elastix), resampling to 1×1×1 mm, organ-specific intensity normalization.

Translation Task Hierarchy

  • 1→1: Single-modality prediction (e.g., CTnc → CTctc).
  • N→1: Multi-modality/phase inputs for single output (e.g., {DCE1, DCE3} → DCE2).
  • 1→N: Single input to multi-phase outputs.
  • N→N: General many-to-many modality translation.

Data and Metrics

  • Stratified 70/10/20 splits by organ/cancer type; 5% “test-mini” subset for rapid iteration.
  • Quantitative evaluations: MSE, PSNR, SSIM, MAE, LPIPS, FID, KID. Metrics averaged over slices and volumes.

Table: PMPBench synthesis performance (test-mini) (Chen et al., 22 Jan 2026)

Mechanism Method CT→CTC (PSNR±σ / SSIM%) DCE₁→DCE₂ DCE₁→DCE₂,₃ DCE₁,₃→DCE₂
Direct HiNet 22.23±3.70 / 77.7±10.3 23.41±3.48 23.88±3.37 26.47±3.32
GAN CycleGAN 21.90±3.98 / 75.8±13.2 24.18±3.33 24.46±3.52 25.74±3.28
Diffusion Palette 15.88±5.30 / 33.2±21.8 9.54±2.17 10.19±1.99 4.79±1.25
Flow-matching FlowMI 24.47±4.15 / 78.5±8.6 26.52±3.13 26.63±3.11 29.17±3.24

FlowMI achieves best overall fidelity, structural similarity, and perceptual consistency across tasks.

5. Baseline Models and Training Objectives (Medical Imaging)

  • Direct regression: UNet, ResViT, MambaIR, I2IMamba, Restore-RWKV (minimize Lrec=f(X)Y22L_{rec} = \|f(X) - Y\|_2^2)
  • GAN-based: Pix2Pix, CycleGAN (weighted L1L_1 and adversarial losses)
  • Diffusion-based: Palette, SelfRDB (denoising score objectives)
  • Flow-matching: ConcatFM, DirectFM, PMRF, FlowMI (LFML_{FM} in latent space)

Implementation (FlowMI): 3D VAE UNet backbone, AdamW optimizer (lr = 10410^{-4}), 200 epochs, cosine anneal schedule, ODE-based latent imputation.

6. Usage, Integration, and Impact

The parametric timed model checking PMPBench is released under GPL with IMITATOR input formats, supported by an online catalog and integration scripts for reproducible experiments. Medical imaging PMPBench is released with code and dataset under CC BY-NC-ND 4.0, supporting plug-in usage for PACS and downstream radiomics, segmentation, and planning workflows. Both benchmarks catalyze reproducible tool comparison, identification of algorithmic limitations, and development of more robust, broadly applicable verification and synthesis methods.

In parametric verification, PMPBench supports the emergence of scalable abstractions, improved heuristics, and rigorous coverage of industrial and academic systems (Étienne, 2018). In oncology imaging, PMPBench establishes the foundation for credible synthetic contrast generation, reducing clinical resource constraints and potential patient risks (Chen et al., 22 Jan 2026). Each resource is positioned as an inflection point for community-wide benchmarking standards.

7. Limitations and Future Directions

In parametric timed verification, certain benchmark instances remain unsolved—particularly non-convex or infinite parameter solution sets (e.g., toy (1/n)(1/n) PTA), highlighting fundamental computational limits of current tools. In medical imaging synthesis, 1→N and multi-modality tasks remain challenging for cross-phase consistency, and potential safety concerns warrant local clinical validation. Robustness against out-of-distribution anatomy and synthetic artifact detection are ongoing areas of investigation.

Plausibly, both PMPBench versions will drive advances in domain-specific benchmarking, lead to new verification and generative models, and foster industrial adoption through detailed, reproducible evaluation standards.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to PMPBench.