Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Performance Portability Study of Linear Algebra Kernels in OpenCL (1409.0669v1)

Published 2 Sep 2014 in cs.MS, cs.DC, and cs.PF

Abstract: The performance portability of OpenCL kernel implementations for common memory bandwidth limited linear algebra operations across different hardware generations of the same vendor as well as across vendors is studied. Certain combinations of kernel implementations and work sizes are found to exhibit good performance across compute kernels, hardware generations, and, to a lesser degree, vendors. As a consequence, it is demonstrated that the optimization of a single kernel is often sufficient to obtain good performance for a large class of more complicated operations.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Karl Rupp (7 papers)
  2. Philippe Tillet (3 papers)
  3. Florian Rudolf (1 paper)
  4. Josef Weinbub (7 papers)
  5. Tibor Grasser (13 papers)
  6. Ansgar Jüngel (113 papers)
Citations (7)

Summary

We haven't generated a summary for this paper yet.