CDDMBench: FPGA DDR4 & Crop Disease Benchmark
- CDDMBench is a dual-purpose benchmarking framework that evaluates DDR4 performance on data-center FPGAs and supports crop disease diagnosis with high precision.
- The FPGA DDR4 benchmark employs a modular design with a Memory Interface, Traffic Generator, and Host Controller, delivering detailed metrics on throughput and latency.
- The crop disease benchmark features a large, multimodal dataset with over 137,000 annotated images and extensive QA pairs to assess and enhance LVLM-based diagnostics.
CDDMBench is a term referring to two distinct, domain-specific benchmarking frameworks, each serving as a foundational platform for rigorous, reproducible evaluation in its respective area: (1) a modular DDR4 benchmarking platform for data-center-class FPGAs, and (2) a large-scale, multimodal agricultural dataset and benchmark for crop disease diagnosis and management, both central to current research in their domains (Galimberti et al., 26 Jan 2025, Liu et al., 10 Mar 2025, Zhang et al., 31 Dec 2025). The following survey details both CDDMBench systems, their technical composition, evaluation methodologies, and research significance.
1. FPGA DDR4 Performance Benchmark: Design Principles and Architecture
The CDDMBench framework for memory benchmarking is engineered for data-center FPGA platforms, targeting reproducible measurement of DDR4 performance under configurable and realistic access patterns (Galimberti et al., 26 Jan 2025). Its modular design comprises three principal units for each memory channel—(1) Memory Interface, (2) Traffic Generator (TG), and (3) Host Controller—integrated via a parameterized fabric.
- Memory Interface: Implements a split architecture, with the PHY operating at quadruple AXI clock frequency (e.g., ). The PHY handles JEDEC DDR4 command timing, while the controller reorders operations and buffers at the AXI level. Timing parameters (, refresh period) are managed by dedicated state machines.
- Traffic Generator (TG): Drives concurrent AXI4 read and write traffic, exposes runtime-configurable registers for read/write ratio, addressing mode (sequential/random), burst type (fixed/incremental/wrapping), burst length (1–128), and signaling mode. Pseudorandom or user-defined data patterns are generated, with in-line verification.
- Host Controller: Receives high-level, UART-encoded control commands, sets up test batches, retrieves counters, and calculates throughput/utilization without requiring PCIe or OS drivers.
- FPGA Build Parameterization: Number of channels , memory data rate , AXI clock , and per-channel instantiation are all configurable; the platform supports up to 3 independent DDR4 channels (on Kintex UltraScale 115).
2. DDR4 Benchmark: Analytical Performance Model and Experimental Results
Formalization of performance evaluation hinges on the following core metrics (Galimberti et al., 26 Jan 2025):
- Theoretical Peak Bandwidth per Channel:
where is in MT/s, is bus width (bits).
- Aggregate Bandwidth:
- Measured Bandwidths and Latency:
- Average latency per beat:
- Utilization:
,
Empirical tests on the Kintex UltraScale 115 (with JEDEC DDR4-1600—2400 daughterboards) reveal:
| Burst Length | Seq Read (GB/s) | Rand Read (GB/s) | Seq Write (GB/s) | Rand Write (GB/s) |
|---|---|---|---|---|
| 1 | 3.08 | 0.56 | 3.03 | 0.42 |
| 4 | 6.20 | 2.24 | 6.00 | 1.66 |
| 32 | 6.27 | 6.08 | 6.03 | 5.79 |
| 128 | 6.29 | 6.30 | 6.04 | 6.04 |
Key findings: random single-beat access is 5–7× slower than sequential due to activate/precharge overhead; performance saturates with bursts 32, especially for sequential patterns. Throughput scales linearly with data rate for long sequential bursts, but improvements are muted () for small random accesses (Galimberti et al., 26 Jan 2025).
3. DDR4 Benchmark: Implementation in Data-Center FPGAs and Practical Considerations
For deployment on Kintex UltraScale 115:
- Channels and Data Rates: Up to 3 physically distinct memory channels at DDR4-1600/1866/2133/2400 MT/s.
- Logic and BRAM Utilization (Triple-channel build): 38,797 LUTs, 52,457 FFs, 76.5 BRAMs, 9 DSPs.
- Floorplanning: Each channel occupies a localized FPGA quadrant minimizing IO route length; host logic resides centrally.
- Operation: Standalone over UART, no host drivers, with hardware counters and error-checking active at all times.
Strengths include flexible runtime configuration (without re-synthesis), per-channel independence, and extensibility. Limitation: tied to Xilinx/AMD MIG PHY, does not natively support LPDDR4, DDR5, or HBM; re-targeting would require replacement of memory interface modules and careful revision of timing logic (Galimberti et al., 26 Jan 2025).
4. Crop Disease Diagnosis and Management Benchmark: Dataset and Tasks
CDDMBench, as introduced by Liu et al., is a multimodal benchmark dataset for fine-grained crop disease diagnosis and management, tailored for the development and evaluation of large vision-LLMs (LVLMs) (Liu et al., 10 Mar 2025, Zhang et al., 31 Dec 2025).
- Corpus: 137,000 images (62,000 web, 75,000 private field survey), spanning 16 crop species and 60 disease classes. Each image is annotated by experts for crop/disease type and free-text appearance.
- QA Dataset: Over 1,000,000 question-answer pairs, comprising (A) crop-disease diagnosis QA (8 per image) and (B) domain knowledge QA. Coverage includes classification, symptomatology, severity, management, and etiology.
- Task Definition:
- Crop Disease Diagnosis: Classify both crop and disease per image.
- Knowledge QA: Generate expert-level management answers in response to image-plus-question prompts.
- Format and Variation: Images cover diverse geographical, cultivar, and seasonal domains, introducing natural domain shift. All labels and QA pairs are expert-generated and reviewed.
5. Evaluation Protocols and Baselines in Crop Disease Benchmark
Standardized benchmark splits define a 3,000-image test set for classification and 20 image–question QA pairs per 10 disease types (total 20 QA instances) (Zhang et al., 31 Dec 2025).
- Diagnosis Metrics: Strict keyword-matched accuracy for both crop and disease, defined as
- Knowledge QA: Each answer is GPT-4-judged over 5 dimensions (): accuracy, completeness, specificity, practical relevance, scientific validity, summed and re-normalized:
- Baseline Performance (no caption; Table extracted from (Zhang et al., 31 Dec 2025)):
| Model | Crop Acc (%) | Disease Acc (%) | QA Score |
|---|---|---|---|
| Qwen-VL-Chat | 28.55 | 5.80 | 41.5 |
| GPT-5-Nano | 47.00 | 11.00 | 65.0 |
Fine-tuned models (e.g., Qwen-VL-Chat-AG LoRA, LLaVA-AG LoRA) achieve up to 98.0% crop and 91.8% disease classification, and knowledge QA scores of up to 98/100 (Liu et al., 10 Mar 2025). LoRA fine-tuning on vision encoder, adapter, and LLM is essential for maximized accuracy.
6. Research Applications, Limitations, and Future Directions
CDDMBench FPGAs: Enables quantification of true DDR4 bandwidth versus theoretical limits and exposes design tradeoffs (burst length, traffic mix, data rate, and random vs. sequential access) essential for data-center architects. The measurement equations and in-FPGA counters are directly portable to next-generation memory systems; extension to LPDDR4, DDR5, or HBM would require interface reimplementation and careful timing revalidation. The standalone UART interface is readily adapted for use with networked test harnesses in multi-tenant and cloud environments (Galimberti et al., 26 Jan 2025).
CDDMBench Crop Disease: Facilitates evaluation of LVLMs for domain-robust, high-accuracy agri-diagnostic systems. The benchmark's scale and multimodal QA task design permit the application and assessment of state-of-the-art models with various adaptation regimes, including LoRA. Identified limitations include persistent challenges with OOD disease generalization, class imbalance, and the absence of multi- or hyperspectral data. Ongoing work addresses these through severity regression, integration of additional sensing modalities, and fine-grained dialogue expansion (Liu et al., 10 Mar 2025, Zhang et al., 31 Dec 2025).
7. Comparative Perspective and Significance
CDDMBench, whether in the FPGA memory or crop disease context, is exemplary for its synthesis of open, high-complexity benchmarking infrastructure and its role in closing the simulation-to-reality gap. Across both domains, CDDMBench systems provide robust methodologies for stress-testing current models and platforms, yielding actionable insights for both hardware and algorithmic improvement. These benchmarks, with their large-scale data, rigorous evaluation protocols, and extensibility, serve as blueprints within their respective fields for future benchmarking frameworks, and as standard platforms against which new models, architectures, or FPGA subsystems can be quantifiably compared (Galimberti et al., 26 Jan 2025, Liu et al., 10 Mar 2025, Zhang et al., 31 Dec 2025).