QiMeng-TensorOp: Automatically Generating High-Performance Tensor Operators with Hardware Primitives (2505.06302v1)

Published 8 May 2025 in cs.LG and cs.AI

Abstract: Computation-intensive tensor operators constitute over 90\% of the computations in LLMs and Deep Neural Networks.Automatically and efficiently generating high-performance tensor operators with hardware primitives is crucial for diverse and ever-evolving hardware architectures like RISC-V, ARM, and GPUs, as manually optimized implementation takes at least months and lacks portability.LLMs excel at generating high-level language codes, but they struggle to fully comprehend hardware characteristics and produce high-performance tensor operators. We introduce a tensor-operator auto-generation framework with a one-line user prompt (QiMeng-TensorOp), which enables LLMs to automatically exploit hardware characteristics to generate tensor operators with hardware primitives, and tune parameters for optimal performance across diverse hardware. Experimental results on various hardware platforms, SOTA LLMs, and typical tensor operators demonstrate that QiMeng-TensorOp effectively unleashes the computing capability of various hardware platforms, and automatically generates tensor operators of superior performance. Compared with vanilla LLMs, QiMeng-TensorOp achieves up to $1291 \times$ performance improvement. Even compared with human experts, QiMeng-TensorOp could reach $251 \%$ of OpenBLAS on RISC-V CPUs, and $124 \%$ of cuBLAS on NVIDIA GPUs. Additionally, QiMeng-TensorOp also significantly reduces development costs by $200 \times$ compared with human experts.

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

Generate Now

Authors (15)

HackerNews

Generating High-Performance Tensor Operators with Hardware Primitives (2 points, 0 comments)

QiMeng-TensorOp: Automatically Generating High-Performance Tensor Operators with Hardware Primitives (2505.06302v1)

Summary

Follow-up Questions

Related Papers

Authors (15)

HackerNews

Don't miss out on important new AI/ML research