TeraPool-SDR: An 1.89TOPS 1024 RV-Cores 4MiB Shared-L1 Cluster for Next-Generation Open-Source Software-Defined Radios (2405.04988v1)
Abstract: Radio Access Networks (RAN) workloads are rapidly scaling up in data processing intensity and throughput as the 5G (and beyond) standards grow in number of antennas and sub-carriers. Offering flexible Processing Elements (PEs), efficient memory access, and a productive parallel programming model, many-core clusters are a well-matched architecture for next-generation software-defined RANs, but staggering performance requirements demand a high number of PEs coupled with extreme Power, Performance and Area (PPA) efficiency. We present the architecture, design, and full physical implementation of Terapool-SDR, a cluster for Software Defined Radio (SDR) with 1024 latency-tolerant, compact RV32 PEs, sharing a global view of a 4MiB, 4096-banked, L1 memory. We report various feasible configurations of TeraPool-SDR featuring an ultra-high bandwidth PE-to-L1-memory interconnect, clocked at 730MHz, 880MHz, and 924MHz (TT/0.80 V/25 {\deg}C) in 12nm FinFET technology. The TeraPool-SDR cluster achieves high energy efficiency on all SDR key kernels for 5G RANs: Fast Fourier Transform (93GOPS/W), Matrix-Multiplication (125GOPS/W), Channel Estimation (96GOPS/W), and Linear System Inversion (61GOPS/W). For all the kernels, it consumes less than 10W, in compliance with industry standards.
- Efficient Parallelization of 5G-PUSCH on a Scalable RISC-V Many-Core Processor. In 2023 Design, Automation, and Test in Europe Conference and Exhibition. IEEE, Antwerp, Belgium, 396–401. https://doi.org/10.23919/DATE56975.2023.10137247
- Accelerating ML recommendation with over a thousand RISC-V/Tensor processors on Esperanto’s ET-SoC-1 Chip. In 2021 IEEE Hot Chips 33 Symp. IEEE, Palo Alto, California, 209–220. https://doi.org/10.1109/HCS52781.2021.9566904
- EdgeQ. 2023. 5G meets AI. https://www.edgeq.io/technology/. Accessed: 11/13/2023.
- Ramon Space RC64-based AI/ML Inference Engine. In European Workshop on On-Board Data Processing (OBDP). Zenedo, Online, 1–33.
- Towards a Modular RISC-V Based Many-Core Architecture for FPGA Accelerators. IEEE Access 8 (2020), 148812–148826. https://doi.org/10.1109/ACCESS.2020.3015706
- Moonsoo Kang. 2023. Heterogeneous Integration Platform for Next Generation Computing. https://r6.ieee.org/scv-eps/wp-content/uploads/sites/58/2023/02/D2-3-kang.pdf. Accessed: 2023-11-20.
- A Survey of the Functional Splits Proposed for 5G Mobile Crosshaul Networks. IEEE Communications Surveys & Tutorials 21, 1 (2019), 146–172. https://doi.org/10.1109/COMST.2018.2868805
- Marvell. 2023. Data Processing Units (DPU) Empowering 5G carrier, enterprise and AI cloud data infrastructure. https://www.marvell.com/products/data-processing-units.html. Accessed: 11/13/2023.
- Joe Mitola. 1995. The software radio architecture. IEEE Communications Magazine 33, 5 (1995), 26–38. https://doi.org/10.1109/35.393001
- Qualcomm. 2023. How we won the acceleration architecture debate. https://www.qualcomm.com/news/onq/2023/03/how-we-won-the-acceleration-architecture-debate. Accessed: 11/13/2023.
- MemPool: A Scalable Manycore Architecture With a Low-Latency Shared L1 Memory. IEEE Trans. Comput. 72, 12 (2023), 3561–3575. https://doi.org/10.1109/TC.2023.3307796
- Xilinx. 2023. Breakthrough Adaptive Radio Platform for Mass 5G Deployments. https://www.xilinx.com/products/silicon-devices/soc/rfsoc/zynq-ultrascale-plus-rfsoc-dfe.html. Accessed: 11/13/2023.
- Snitch: A Tiny Pseudo Dual-Issue Processor for Area and Energy Efficient Execution of Floating-Point Intensive Workloads. IEEE Trans. Comput. 70, 11 (Nov. 2021), 1845–1860. https://doi.org/10.1109/TC.2020.3027900
- A method to speed up VLSI hierarchical physical design in floorplanning. In 2017 IEEE 12th International Conference on ASIC (ASICON). IEEE, Guiyang, China, 347–350. https://doi.org/10.1109/ASICON.2017.8252484