Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Unified, Hardware-Fitted, Cross-GPU Performance Model (1604.04997v1)

Published 18 Apr 2016 in cs.PF and cs.DC

Abstract: We present a mechanism to symbolically gather performance-relevant operation counts from numerically-oriented subprograms (kernels') expressed in the Loopy programming system, and apply these counts in a simple, linear model of kernel run time. We use a series ofperformance-instructive' kernels to fit the parameters of a unified model to the performance characteristics of GPU hardware from multiple hardware generations and vendors. We evaluate the predictive power of the model on a broad array of computational kernels relevant to scientific computing. In terms of the geometric mean, our simple, vendor- and GPU-type-independent model achieves relative accuracy comparable to that of previously published work using hardware specific models.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. James Stevens (3 papers)
  2. Andreas Klöckner (27 papers)
Citations (1)

Summary

We haven't generated a summary for this paper yet.