Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
95 tokens/sec
Gemini 2.5 Pro Premium
32 tokens/sec
GPT-5 Medium
18 tokens/sec
GPT-5 High Premium
20 tokens/sec
GPT-4o
97 tokens/sec
DeepSeek R1 via Azure Premium
87 tokens/sec
GPT OSS 120B via Groq Premium
468 tokens/sec
Kimi K2 via Groq Premium
202 tokens/sec
2000 character limit reached

DGEMM without FP64 Arithmetic -- using FP64 Emulation and FP8 Tensor Cores with Ozaki Scheme (2508.00441v1)

Published 1 Aug 2025 in cs.PF, cs.AR, and cs.MS

Abstract: Since AI computations require low-precision matrix multiplications, processors with enhanced performance for these operations are increasing along with the growing demand for AI computations. However, it is difficult to use these operations directly for scientific computations. The Ozaki scheme, an accurate matrix multiplication method proposed by Ozaki et al. in 2012, enables FP64 matrix multiplication (DGEMM) using low-precision floating-point operations such as FP16. The method was subsequently extended to utilize integer arithmetic. The use of integer operations reduces computational cost compared to the floating-point based approach. It has also demonstrated higher performance than hardware FP64 operations on GPUs with fast INT8 Tensor Cores for AI workloads. However, the latest hardware tends to enhance low-precision floating-point operation performance such as FP8 instead of INT8. This study revisits the utilization of low-precision floating-point operations in the Ozaki scheme, considering the latest AI hardware. Specifically, we consider the use of FP6 and FP8 Tensor Cores. Moreover, for processors that support very slow FP64 operations or do not support them at all, we consider the use of the FP64 emulation based on integer arithmetic. We also examine a new blocking strategy. We demonstrate the effectiveness of these methods by evaluating the performance of DGEMM using FP8 Tensor Cores and FP64 emulation on a Blackwell architecture GPU.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

Authors (1)

X Twitter Logo Streamline Icon: https://streamlinehq.com

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube