Dual-Issue Execution of Mixed Integer and Floating-Point Workloads on Energy-Efficient In-Order RISC-V Cores

Published 26 Mar 2025 in cs.AR | (2503.20590v1)

Abstract: To meet the computational requirements of modern workloads under tight energy constraints, general-purpose accelerator architectures have to integrate an ever-increasing number of extremely area- and energy-efficient processing elements (PEs). In this context, single-issue in-order cores are commonplace, but lean dual-issue cores could boost PE IPC, especially for the common case of mixed integer and floating-point workloads. We develop the COPIFT methodology and RISC-V ISA extensions to enable low-cost and flexible dual-issue execution of mixed integer and floating-point instruction sequences. On such kernels, our methodology achieves speedups of 1.47x, reaching a peak 1.75 instructions per cycle, and 1.37x energy improvements on average, over optimized RV32G baselines.

Abstract PDF Upgrade to Chat

Authors (2)

Summary

An Overview of Dual-Issue Execution of Mixed Integer and Floating-Point Workloads on Energy-Efficient In-Order RISC-V Cores

The paper investigates a novel approach to enhancing the performance of energy-efficient in-order RISC-V cores through dual-issue execution tailored for workloads that combine both integer and floating-point operations. This study emerges in the context of evolving architectures that must address the rising computational demands imposed by modern applications while operating under stringent energy constraints.

The architectural choice of leveraging dual-issue over single-issue cores is primarily driven by the requirement to boost the Instructions Per Cycle (IPC) in processing elements (PEs), which are pivotal for the accelerated execution of mixed integer and floating-point tasks. The proposed methodology, termed COPIFT, alongside RISC-V ISA extensions, underpins this exploration of dual-issue capabilities. The methodology attains a speedup of 1.47x and an impressive peak IPC of 1.75, while delivering an average 1.37x energy improvement over baseline designs, signifying a noticeable enhancement in both throughput and energy efficiency.

Methodological Insights

The study introduces COPIFT (Co-Operative Parallel Integer and Floating-point Threads), a carefully constructed methodology enabling streamlined dual-issue execution on RISC-V cores. COPIFT dissects the instruction mix and strategically partitions Data Flow Graphs (DFGs) to minimize dependencies between integer and floating-point operations. Through loop tiling and software pipelining strategies, the methodology interleaves these executions, allowing for concurrency in processing that yields substantial performance gains.

Moreover, the integration of Stream Semantic Registers (SSR) and their extended form, Indirection SSR (ISSR), play a crucial role in obviating memory access bottlenecks by streamlining data flow directly from memory to registers, thus achieving maximum compute utilization. This approach not only alleviates instruction overheads prevalent in traditional designs but also circumvents the full dependence on the traditional instruction pipeline for load/store operations.

Experimental and Numerical Evaluation

The paper presents a comprehensive evaluation across various benchmark kernels, including Monte Carlo integration methods and transcendent function evaluations. The accelerated COPIFT implementations are benchmarked against highly optimized RV32G baselines, illustrating significant improvements in IPC and overall execution speed.

For example, the expf kernel, which is instrumental in AI-related applications such as softmax operations, showcases a speedup exceeding 2.05x compared to its baseline, emphasizing the methodology's potential impact on energy-efficient AI model deployments. The reported results consistently log power increases below 17%, which are marginal when juxtaposed with the IPC gains, thus validating the energy-performance trade-off favorably.

Implications and Future Directions

COPIFT's contributions hold significant implications for the design of next-generation processors. By offering a pathway to incrementally improve the IPC of RISC-V processors with minimal area overhead, this work signals a step forward in the refinement of low-power computational architectures. The methodology is particularly salient for applications in resource-constrained environments where energy efficiency is paramount.

The paper prompts future exploration regarding the scalability of COPIFT to multithreaded scenarios and its potential adaptability to other architectures beyond RISC-V. Such endeavors could further refine the dual-issue execution capabilities and expand them to more diverse workloads, including those found in neural network inferencing and edge computing applications.

In essence, this work provides an empirical foundation for advancing the efficiency of in-order processors through a targeted exploitation of dual-issue execution, marking substantial progress in the pursuit of optimized energy-performance ratios within modern computing frameworks.

Markdown Report Issue