Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
133 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

TeraPool-SDR: An 1.89TOPS 1024 RV-Cores 4MiB Shared-L1 Cluster for Next-Generation Open-Source Software-Defined Radios (2405.04988v1)

Published 8 May 2024 in cs.DC and cs.AR

Abstract: Radio Access Networks (RAN) workloads are rapidly scaling up in data processing intensity and throughput as the 5G (and beyond) standards grow in number of antennas and sub-carriers. Offering flexible Processing Elements (PEs), efficient memory access, and a productive parallel programming model, many-core clusters are a well-matched architecture for next-generation software-defined RANs, but staggering performance requirements demand a high number of PEs coupled with extreme Power, Performance and Area (PPA) efficiency. We present the architecture, design, and full physical implementation of Terapool-SDR, a cluster for Software Defined Radio (SDR) with 1024 latency-tolerant, compact RV32 PEs, sharing a global view of a 4MiB, 4096-banked, L1 memory. We report various feasible configurations of TeraPool-SDR featuring an ultra-high bandwidth PE-to-L1-memory interconnect, clocked at 730MHz, 880MHz, and 924MHz (TT/0.80 V/25 {\deg}C) in 12nm FinFET technology. The TeraPool-SDR cluster achieves high energy efficiency on all SDR key kernels for 5G RANs: Fast Fourier Transform (93GOPS/W), Matrix-Multiplication (125GOPS/W), Channel Estimation (96GOPS/W), and Linear System Inversion (61GOPS/W). For all the kernels, it consumes less than 10W, in compliance with industry standards.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (14)
  1. Efficient Parallelization of 5G-PUSCH on a Scalable RISC-V Many-Core Processor. In 2023 Design, Automation, and Test in Europe Conference and Exhibition. IEEE, Antwerp, Belgium, 396–401. https://doi.org/10.23919/DATE56975.2023.10137247
  2. Accelerating ML recommendation with over a thousand RISC-V/Tensor processors on Esperanto’s ET-SoC-1 Chip. In 2021 IEEE Hot Chips 33 Symp. IEEE, Palo Alto, California, 209–220. https://doi.org/10.1109/HCS52781.2021.9566904
  3. EdgeQ. 2023. 5G meets AI. https://www.edgeq.io/technology/. Accessed: 11/13/2023.
  4. Ramon Space RC64-based AI/ML Inference Engine. In European Workshop on On-Board Data Processing (OBDP). Zenedo, Online, 1–33.
  5. Towards a Modular RISC-V Based Many-Core Architecture for FPGA Accelerators. IEEE Access 8 (2020), 148812–148826. https://doi.org/10.1109/ACCESS.2020.3015706
  6. Moonsoo Kang. 2023. Heterogeneous Integration Platform for Next Generation Computing. https://r6.ieee.org/scv-eps/wp-content/uploads/sites/58/2023/02/D2-3-kang.pdf. Accessed: 2023-11-20.
  7. A Survey of the Functional Splits Proposed for 5G Mobile Crosshaul Networks. IEEE Communications Surveys & Tutorials 21, 1 (2019), 146–172. https://doi.org/10.1109/COMST.2018.2868805
  8. Marvell. 2023. Data Processing Units (DPU) Empowering 5G carrier, enterprise and AI cloud data infrastructure. https://www.marvell.com/products/data-processing-units.html. Accessed: 11/13/2023.
  9. Joe Mitola. 1995. The software radio architecture. IEEE Communications Magazine 33, 5 (1995), 26–38. https://doi.org/10.1109/35.393001
  10. Qualcomm. 2023. How we won the acceleration architecture debate. https://www.qualcomm.com/news/onq/2023/03/how-we-won-the-acceleration-architecture-debate. Accessed: 11/13/2023.
  11. MemPool: A Scalable Manycore Architecture With a Low-Latency Shared L1 Memory. IEEE Trans. Comput. 72, 12 (2023), 3561–3575. https://doi.org/10.1109/TC.2023.3307796
  12. Xilinx. 2023. Breakthrough Adaptive Radio Platform for Mass 5G Deployments. https://www.xilinx.com/products/silicon-devices/soc/rfsoc/zynq-ultrascale-plus-rfsoc-dfe.html. Accessed: 11/13/2023.
  13. Snitch: A Tiny Pseudo Dual-Issue Processor for Area and Energy Efficient Execution of Floating-Point Intensive Workloads. IEEE Trans. Comput. 70, 11 (Nov. 2021), 1845–1860. https://doi.org/10.1109/TC.2020.3027900
  14. A method to speed up VLSI hierarchical physical design in floorplanning. In 2017 IEEE 12th International Conference on ASIC (ASICON). IEEE, Guiyang, China, 347–350. https://doi.org/10.1109/ASICON.2017.8252484

Summary

  • The paper introduces TeraPool-SDR, a novel architecture with 1024 RISC-V cores and shared 4MiB L1 memory that achieves high energy efficiency (e.g., 93 GOPS/W for FFT) in SDR workloads.
  • It employs a hierarchical design and robust interconnection networks to minimize latency and optimize data access across multiple computational tiles.
  • The scalable, open-source design of TeraPool-SDR paves the way for innovations in power-sensitive, next-generation software-defined radio applications.

Exploring TeraPool-SDR: Next-Gen Architecture for Software-Defined Radios

Introduction to TeraPool-SDR

The evolution of 5G and beyond places enormous demands on the processing power of software-defined radios (SDR), driving the need for highly efficient, scalable, and flexible computing solutions. The TeraPool-SDR architecture, developed by researchers from ETH Zurich, leverages a cluster of 1024 latency-tolerant RISC-V cores, all sharing a 4 MiB L1 memory. This architecture aims to address the staggering performance needs of next-generation software-defined radios while adhering to power efficiency standards.

Architectural Innovations

TeraPool-SDR introduces several key architectural innovations designed to handle large-scale computations efficiently:

  • Shared-L1 Cluster: The design employs a shared-L1 memory model that helps in reducing the overheads associated with inter-cluster data synchronization and workload distribution, which is common in many-core systems.
  • Hierarchical Design: Multiple hierarchical levels (Tiles, SubGroups, and Groups) are used to manage the interconnectivity between cores and memory banks, enabling efficient data access and high-bandwidth communication within the cluster.
  • Robust Interconnection Network: Employing fully connected crossbar and logarithmic crossbar networks at various levels, the architecture ensures low-latency communication between the processing elements and the memory banks.
  • Advanced Physical Design: Utilizing a 12nm FinFET technology, the physical implementation is optimized for speed and energy efficiency across different operational configurations.

Performance Metrics and Efficiency

One of the standout features of TeraPool-SDR is its ability to achieve impressive energy efficiency metrics across various SDR key kernels (FFT, Matrix-Multiplication, Channel Estimation, and Linear System Inversion) while maintaining peak performance:

  • FFT: Achieves up to 93 Giga Operations Per Second (GOPS) per watt.
  • Matrix Multiplication: Optimally utilizes energy at 125 GOPS per watt.
  • Channel Estimation: Efficiently handled at 96 GOPS per watt.
  • Linear System Inversion: Managed with a 61 GOPS per watt efficiency.

The architecture sustains this performance with less than 10 watts of power consumption, staying within the stringent industry power standards.

Implications and Future Directions

TeraPool-SDR not only advances the architectural design for handling complex SDR workloads but also sets a new benchmark in the scalability and energy efficiency of many-core systems:

  • Scalability: The capacity to scale up to 1024 processing elements without significant efficiency losses is critical as the complexity and size of data processing tasks continue to grow in the telecom sector.
  • Energy Efficiency: Maintaining high performance at minimal energy costs is crucial for the widespread deployment of SDRs in energy-sensitive environments.
  • Flexibility and Open-Source Accessibility: As an open-sourced platform, TeraPool-SDR paves the way for future innovations and collaborative developments in the SDR and broader telecom community.

Closing Thoughts

As we push the boundaries of what's possible in software-defined radio technology, architectures like TeraPool-SDR represent a critical step forward. They offer a blend of high performance, scalability, and energy efficiency necessary to meet the growing demands of modern telecommunication infrastructures. With ongoing research and development, the abilities of platforms like TeraPool-SDR will continue to expand, hopefully yielding groundbreaking enhancements in SDR technologies and applications.