RISC-V Soft-Cores: Architecture & Performance
- RISC-V based soft-cores are synthesizable processor designs implementing an open ISA on FPGAs, offering a range from minimalist pipelines to advanced superscalar architectures.
- They utilize innovative techniques like optimized branch prediction and dynamic micro-decoding to enhance throughput, energy efficiency, and real-time performance.
- Advanced designs integrate custom ISA extensions and accelerators for specialized applications in IoT, automotive, AI, and space, ensuring portability and scalability.
RISC-V based soft-cores are synthesizable processor designs that implement the open RISC-V instruction set architecture (ISA) and are realized on reconfigurable logic, primarily FPGAs. These soft-cores span from minimalist 2-stage pipelines for ultra-compact deployments to superscalar and heterogeneous designs supporting hundreds of cores, accelerators, and advanced cache subsystems. RISC-V’s modular and open ISA encourages rapid development, high flexibility, and enables research into parallel processing, custom ISAs, safety features, and domain-specific hardware extensions. This landscape integrates innovations in architectural efficiency, real-time performance, design portability, heterogeneous computing, and hardware–software co-design frameworks.
1. Microarchitectural Diversity and Base Pipeline Structures
RISC-V soft-core implementations display significant diversity in pipeline depth, resource optimization, and feature inclusion. Minimalist designs, such as the GRVI RV32I core in GRVI Phalanx, reduce pipeline stages and remove rarely-used instructions to minimize LUT usage, achieving approximately 0.7 MIPS per LUT and supporting up to 400 cores per FPGA at 100,000 MIPS aggregate throughput (Gray, 2016). Other efforts highlight a classic 5-stage pipeline, as in RVCoreP, with optimizations such as pipelined branch prediction, ALU one-hot encoding, and align/extend logic to improve maximum clock frequency by 10–30% and enhance performance by up to 30% against contemporary open-source designs (Miyazaki et al., 2020).
Intermediate designs trade off some instruction latency for area or frequency efficiency. NoX, a 4-stage pipeline core, merges the load/store and writeback stages, providing full data bypassing and enabling plug-and-play integration within MPSoCs while fitting within a 27.0 kGE area budget and attaining 2.50 CoreMark/MHz (Silva et al., 25 Jun 2024). More advanced pipelines incorporate out-of-order execution and dual-issue superscalar designs, as in CVA6S+, supporting 64-bit, 2-issue fetch, and aggressive register renaming to reach a 43.5% improvement in throughput at just ~9% area overhead (Tedeschi et al., 20 Apr 2025).
2. Parallel and Heterogeneous Architectures
Soft-core fabric architectures have evolved to maximize both raw throughput and flexibility via cluster organization, parallel memory systems, and network-on-chip (NoC) interconnects. GRVI Phalanx clusters 8 minimal RISC-V cores with shared interleaved banked memories and high-bandwidth Hoplite NoC to sustain up to 600 GB/s memory bandwidth and 700 Gbps NoC bisection bandwidth, supporting SPMD and MIMD workloads efficiently (Gray, 2016). MPSoC frameworks like ANDROMEDA leverage parameterized Rocket cores and tile-level NoC connections in grid topologies, providing distributed BRAM-based memory, scalable cache hierarchies, and custom switching mechanisms. Resource allocation is modulated by high-level system parameters, permitting rapid exploration of core counts, cache sizing, and NoC configurations to match bandwidth or processing requirements (Merchant et al., 2021).
The integration of heterogeneous ISAs and microarchitectures is exemplified by platforms like JuxtaPiton, wherein a PicoRV32 core (RV32I, multicycle) is integrated alongside OpenSPARC T1 (SPARC v9) within the OpenPiton manycore environment, using transducers to mediate memory transactions and resolve endianness, enabling research into shared-memory heterogeneous-ISA systems (Lim et al., 2018). Similarly, RISC-V soft-cores are utilized in safety-critical, real-time domains by leveraging triple-core lockstep (TCLS) architectures (e.g., SentryCore), combining lockstep voting, ECC-protected memory, and deterministic context switching under 110 clock cycles to meet automotive or industrial reliability constraints (Rogenmoser et al., 16 May 2024).
3. ISA Extensions, Overlays, and Accelerator Integration
The open nature of RISC-V underpins rapid innovation in custom extensions, overlays, and tight accelerator coupling. Overlays such as those described in (Ng et al., 2016) feature a 4-stage pipelined RV32I core with direct accelerator coupling, employing shared DMEM and custom instructions (BAA/RPA) under the Multiple Runtime Architecture Computer (MURAC) model. This method achieves near hardware-only performance with ~18% reduced resource usage compared to traditional 5-stage designs and provides zero-overhead switching between processor and accelerator execution.
Advanced custom instruction support is demonstrated in papers that develop dynamic micro-decoder units, which insert an additional pipeline stage between decode and issue to translate macro-instructions into sequences of microinstructions at runtime. This is achieved with a microinstruction ROM, FSM, and synchronization FIFO, enabling dynamic patching, binary compression, and obfuscation features at a modest 4% LUT/FF area cost (Pottier et al., 21 Jun 2024). Reconfigurable vector SIMD extensions and soft SIMD instructions are realized via custom instruction types (I′, S′) and flexible HDL templates, reducing instruction count in data-parallel workloads and supporting high-throughput streaming with wide, multi-banked cache blocks (Papaphilippou et al., 2021).
Custom arithmetic units, including a full-featured Posit FPU (supporting dynamic es-mode) or mixed-precision MAC units for DNN acceleration, are integrated either as core execution units or as co-processors. These enable superior domain-specific performance and energy efficiency: for example, 15× energy reductions with less than 1% DNN accuracy loss in mixed-precision inference, achieved via multi-pumping and soft SIMD on a RISC-V core (Armeniakos et al., 19 Jul 2024); or tightly-coupled Posit/IEEE-754 execution for scientific and signal-processing workloads (Tiwari et al., 2019).
4. System-Level Integration, Design Portability, and MPSoC Ecosystems
Modularity and standardized interface schemes are core to RISC-V soft-core portability and rapid system integration. Platforms such as BRISC-V enforce design space exploration by composing parameterized RTL modules for cores, caches, memory, and NoCs, allowing for combinatorial generation of pipeline depths, coherency protocols, replacement schemes, and system topologies with integrated tooling for RTL simulation and web-based hardware configuration (Bandara et al., 2019). Plug-and-play integration is facilitated by adoption of standard AXI4/AHB interfaces, as in NoX or ESP—where CVA6 multicore integration is accomplished with AXI Coherency Extensions (ACE), L1 cache invalidate units, and L2-level atomic adapters to support Linux SMP, atomic operations, and accelerator-centric coherent execution (Zuckerman et al., 2022).
Safety- and qualification-oriented SoCs, such as the METASAT platform, showcase the coupling of multicore NOEL-V CPUs with domain-optimized GPUs and AI accelerators within partitioned, qualifiable software stacks (RTEMS, XtratuM) for institutional space missions. The hypervisor-managed resource sharing and independent GPU manager ensure deterministic access and fault containment, crucial for mixed-criticality systems (Bonet et al., 28 Feb 2025).
5. Benchmarking, Performance, and Energy Monitoring
Evaluations consistently use standardized benchmarking suites (CoreMark, DMIPS, Embench) to assess MIPS, IPC, area, frequency, and energy efficiency under real workloads. For instance, RVCoreP-32IM demonstrates a 3.13× speed-up for core routines by integrating efficient DSP-based multipliers using a simple fork–join execution structure (Islam et al., 2020). The impact of instruction compression is shown via a fetch unit supporting dual program counters, yielding up to a 42.5% gain in DMIPS over decompressor-based designs (Kanamori et al., 2020). Systems such as CVA6S+ exhibit a 74.1% bandwidth improvement from cache subsystem upgrades and up to 43.5% throughput gain from dual-issue/fine-grained forwarding, with only 9.3% area overhead (Tedeschi et al., 20 Apr 2025).
Comprehensive energy measurement, crucial for design-space exploration, is achieved by hardware–software co-designed frameworks using external measurement boards, shunt-resistor-based current sensing, and high-speed ADCs sampled via FPGA logic. Power and energy (E = V × I × Δt) are quantified in real time and reported via MQTT telemetry, enabling per-core and per-node assessment in clustered deployments—critically enabling the investigation of shallow neural networks as energy-efficient surrogates in aeronautical design scenarios (Scionti et al., 30 Sep 2025).
6. Application Domains and Future Perspectives
RISC-V soft-cores are deployed across a diverse spectrum—datacenter accelerators (high-throughput parallel fabrics (Gray, 2016)), low-power IoT controllers (Klessydra multithreaded cores in battery-operated Pulpino platforms (Cheikh et al., 2017)), edge/IoT MPSoCs (NoX plug-and-play resource-efficient cores (Silva et al., 25 Jun 2024)), reconfigurable AI accelerators and vector engines (custom SIMD, mixed-precision MAC units (Papaphilippou et al., 2021, Armeniakos et al., 19 Jul 2024)), reliable automotive/avionics co-processors (triple-lockstep SentryCore (Rogenmoser et al., 16 May 2024)), and safety-qualified, partitioned SoCs for institutional space missions (METASAT (Bonet et al., 28 Feb 2025)).
Key trends and future research directions include modular overlays supporting hybrid computation (co-integrated processors and accelerators), heterogeneous-ISA/heterogeneous-microarchitecture structures enabling workload migration and energy-aware scheduling, in-field microcode/table updates for security and binary adaptation, and the co-design of ISA, hardware blocks, and toolchains for domain-specific acceleration. The open RISC-V ISA continues to catalyze these innovations by enabling unrestricted architectural exploration, broadening both the research scope and real-world impact of FPGA-based soft-core processors.