A Statically and Dynamically Scalable Soft GPGPU (2401.04261v1)
Abstract: Current soft processor architectures for FPGAs do not utilize the potential of the massive parallelism available. FPGAs now support many thousands of embedded floating point operators, and have similar computational densities to GPGPUs. Several soft GPGPU or SIMT processors have been published, but the reported large areas and modest Fmax makes their widespread use unlikely for commercial designs. In this paper we take an alternative approach, building the soft GPU microarchitecture around the FPGA resource mix available. We demonstrate a statically scalable soft GPGPU processor (where both parameters and feature set can be determined at configuration time) that always closes timing at the peak speed of the slowest embedded component in the FPGA (DSP or hard memory), with a completely unconstrained compile into a current Intel Agilex FPGA. We also show dynamic scalability, where a subset of the thread space can be specified on an instruction-by-instruction basis. For one example core type, we show a logic range -- depending on the configuration -- of 4k to 10k ALMs, along with 24 to 32 DSP Blocks, and 50 to 250 M20K memories. All of these instances close timing at 771 MHz, a performance level limited only by the DSP Blocks. We describe our methodology for reliably achieving this clock rate by matching the processor pipeline structure to the physical structure of the FPGA fabric. We also benchmark several algorithms across a range of data sizes, and compare to a commercial soft RISC processor.
- 2016. Nios II Classic Processor Reference GuideNios II Classic Processor Reference Guide. https://www.intel.com/content/www/us/en/docs/programmable/683620/current/overview-67435.html.
- 2017a. FFT IP Core: User Guide. https://www.intel.co.uk/content/www/uk/en/products/details/fpga/intellectual-property/dsp/fft.html.
- 2017b. High-speed Reed-Solomon IP Core User Guide. https://www.intel.com/content/www/us/en/docs/programmable/683120/17-1/about-the-high-speed-reed-solomon-ip-core.html.
- 2018. Microblaze Processor Reference Guide. https://docs.xilinx.com/v/u/2018.2-English/ug984-vivado-microblaze-ref.
- 2020. HB0919 Handbook CoreVectorBlox. https://www.microsemi.com/existing-parts/parts/152678.
- 2021. Block-by-Block Configurable Fast Fourier Transform Implementation on AI Engine (XAPP1356). https://docs.xilinx.com/r/en-US/xapp1356-fft-ai-engine/FFT-on-Multiple-AI-Engines.
- 2021. Intel Agilex7 FPGAs and SoCs F-Series: Product Table. https://www.intel.com/content/dam/www/programmable/us/en/pdfs/literature/pt/intel-agilex-f-series-product-table.pdf.
- 2022. Fast Fourier Transform v9.1. https://www.xilinx.com/content/dam/xilinx/support/documents/ip_documentation/xfft/v9_1/pg109-xfft.pdf.
- 2022. Nios V Processor Reference Manual. https://www.intel.com/content/www/us/en/products/details/fpga/nios-processor/v.html.
- 2023. Floating-Point IP Cores User Guide. https://www.intel.com/content/www/us/en/docs/programmable/683750/23-1/about-floating-point-ip-cores.html.
- 2023. Intel Agilex 7 Variable Precision DSP Blocks. https://www.intel.com/content/www/us/en/docs/programmable/683037/23-3/variable-precision-dsp-blocks-overview.html.
- 2023. Intel Agilex7 Embedded Memory User Guide. https://www.intel.com/content/www/us/en/docs/programmable/683241/23-2/embedded-memory-overview.html.
- 2023. Versal Adaptive SoC AI Engine Architecture Manual (AM009). https://docs.xilinx.com/v/u/en-US/wp506-ai-engine.
- Guppy: A GPU-like soft-core processor. In 2012 International Conference on Field-Programmable Technology. 57–60. https://doi.org/10.1109/FPT.2012.6412112
- FGPU: An SIMT-Architecture for FPGAs. In Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (Monterey, California, USA) (FPGA ’16). Association for Computing Machinery, New York, NY, USA, 254–263. https://doi.org/10.1145/2847263.2847273
- Kevin Andryc. 2018. An Architecture Evaluation and Implementaiton of a Soft GPGPU for FPGAs. (2018). https://doi.org/10.7275/12722172
- FlexGrip: A soft GPGPU for FPGAs. In 2013 International Conference on Field-Programmable Technology (FPT). 230–237. https://doi.org/10.1109/FPT.2013.6718358
- Enabling GPGPU Low-Level Hardware Explorations with MIAOW: An Open-Source RTL Implementation of a GPGPU. ACM Trans. Archit. Code Optim. 12, 2, Article 21 (jun 2015), 25 pages. https://doi.org/10.1145/2764908
- K. E. Batcher. 1968. Sorting Networks and Their Applications. In Proceedings of the April 30–May 2, 1968, Spring Joint Computer Conference (Atlantic City, New Jersey) (AFIPS ’68 (Spring)). Association for Computing Machinery, New York, NY, USA, 307–314. https://doi.org/10.1145/1468075.1468121
- The IDEA DSP Block-Based Soft Processor for FPGAs. ACM Trans. Reconfigurable Technol. Syst. 7, 3, Article 19 (sep 2014), 23 pages. https://doi.org/10.1145/2629443
- Analysis and optimization of a deeply pipelined FPGA soft processor. In 2014 International Conference on Field-Programmable Technology (FPT). 235–238. https://doi.org/10.1109/FPT.2014.7082783
- VEGAS: soft vector processor with scratchpad memory. In Proceedings of the ACM/SIGDA 19th International Symposium on Field Programmable Gate Arrays, FPGA 2011, Monterey, California, USA, February 27, March 1, 2011, John Wawrzynek and Katherine Compton (Eds.). ACM, 15–24. https://doi.org/10.1145/1950413.1950420
- Architectural Enhancements in Intel® Agilex™ FPGAs. In FPGA ’20: The 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Seaside, CA, USA, February 23-25, 2020, Stephen Neuendorffer and Lesley Shannon (Eds.). ACM, 140–149. https://doi.org/10.1145/3373087.3375308
- SCRATCH: An End-to-End Application-Aware Soft-GPGPU Architecture and Trimming Tool. In Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture (Cambridge, Massachusetts) (MICRO-50 ’17). Association for Computing Machinery, New York, NY, USA, 165–177. https://doi.org/10.1145/3123939.3123953
- Jeffrey Kingyens and J. Gregory Steffan. 2010. A GPU-inspired soft processor for high-throughput acceleration. In 2010 IEEE International Symposium on Parallel and Distributed Processing, Workshops and Phd Forum (IPDPSW). 1–8. https://doi.org/10.1109/IPDPSW.2010.5470679
- Ian Kuon and Jonathan Rose. 2006. Measuring the gap between FPGAs and ASICs. In Proceedings of the ACM/SIGDA 14th International Symposium on Field Programmable Gate Arrays, FPGA 2006, Monterey, California, USA, February 22-24, 2006, Steven J. E. Wilton and André DeHon (Eds.). ACM, 21–30. https://doi.org/10.1145/1117201.1117205
- Martin Langhammer and Gregg Baeckler. 2018. High Density and Performance Multiplication for FPGA. In 25th IEEE Symposium on Computer Arithmetic, ARITH 2018, Amherst, MA, USA, June 25-27, 2018. IEEE, 5–12. https://doi.org/10.1109/ARITH.2018.8464695
- Martin Langhammer and George A. Constantinides. 2023. eGPU: A 750 MHz Class Soft GPGPU for FPGA. In 2023 33rd International Conference on Field-Programmable Logic and Applications (FPL). 277–282. https://doi.org/10.1109/FPL60245.2023.00047
- DO-GPU: Domain Optimizable Soft GPUs. In 2021 31st International Conference on Field-Programmable Logic and Applications (FPL). 140–144. https://doi.org/10.1109/FPL53798.2021.00031
- Aaron Severance and Guy Lemieux. 2012. VENICE: A compact vector processor for FPGA applications. In 2012 International Conference on Field-Programmable Technology, FPT 2012, Seoul, Korea (South), December 10-12, 2012. IEEE, 261–268. https://doi.org/10.1109/FPT.2012.6412146
- Aaron Severance and Guy G. F. Lemieux. 2013. Embedded supercomputing in FPGAs with the VectorBlox MXP Matrix Processor. In Proceedings of the International Conference on Hardware/Software Codesign and System Synthesis, CODES+ISSS 2013, Montreal, QC, Canada, September 29 - October 4, 2013. IEEE, 6:1–6:10. https://doi.org/10.1109/CODES-ISSS.2013.6658993
- Martin Langhammer (9 papers)
- George A. Constantinides (41 papers)