Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
92 tokens/sec
Gemini 2.5 Pro Premium
50 tokens/sec
GPT-5 Medium
15 tokens/sec
GPT-5 High Premium
23 tokens/sec
GPT-4o
97 tokens/sec
DeepSeek R1 via Azure Premium
87 tokens/sec
GPT OSS 120B via Groq Premium
466 tokens/sec
Kimi K2 via Groq Premium
201 tokens/sec
2000 character limit reached

A Pilot Study on Tunable Precision Emulation via Automatic BLAS Offloading (2503.22875v2)

Published 28 Mar 2025 in cs.DC and cs.PF

Abstract: This study explores the use of automatic BLAS offloading and INT8-based emulation for accelerating traditional HPC workloads on modern GPU architectures. Through the use of low-bitwidth integer units and cache-coherent Unified Memory Architecture, we emulate double-precision matrix multiplications in the MuST application without code changes. We find that accuracy depends on both arithmetic precision and the properties of the operator, which can be dealt with through tunable precision emulation. Unlike traditional mixed-precision approaches, this method preserves original algorithms while optimizing hardware utilization. We showcases the potential of improving accuracy and performance at the same time. This work highlights the potential of AI-driven hardware to transform HPC, advocating for adaptive precision strategies in future scientific computing.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.