Analyzing the capabilities of HLS and RTL tools in the design of an FPGA Montgomery Multiplier (2509.08067v1)
Abstract: We present the analysis of various FPGA design implementations of a Montgomery Modular Multiplier, compatible with the BLS12-381 elliptic curve, using the Coarsely Integrated Operand Scanning approach of working with complete partial products on different digit sizes. The scope of the implemented designs is to achieve a high-frequency, high-throughput solution capable of computing millions of operations per second, which can provide a strong foundation for different Elliptic Curve Cryptography operations such as point addition and point multiplication. One important constraint for our designs was to only use FPGA DSP primitives for the arithmetic operations between digits employed in the CIOS algorithm as these primitives, when pipelined properly, can operate at a high frequency while also relaxing the resource consumption of FPGA LUTs and FFs. The target of the analysis is to see how different design choices and tool configurations influence the frequency, latency and resource consumption when working with the latest AMD-Xilinx tools and Alveo FPGA boards in an RTL-HLS hybrid approach. We compare three categories of designs: a Verilog naive approach where we rely on the Vivado synthesizer to automatically choose when and where to use DSPs, a Verilog optimized approach by manually instantiating the DSP primitives ourselves and a complete High-Level Synthesis approach. We also compare the FPGA implementations with an optimized software implementation of the same Montgomery multiplier written in Rust.
Collections
Sign up for free to add this paper to one or more collections.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.