Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Automated Parallel Kernel Extraction from Dynamic Application Traces (2001.09995v1)

Published 27 Jan 2020 in cs.DC and cs.PL

Abstract: Modern program runtime is dominated by segments of repeating code called kernels. Kernels are accelerated by increasing memory locality, increasing data-parallelism, and exploiting producer-consumer parallelism among kernels - which requires hardware specialized for a particular class of kernels. Programming this hardware can be difficult, requiring that the kernels be identified and annotated in the code or translated to a domain-specific language. This paper describes a technique to automatically localize parallel kernels from a dynamic application trace, facilitating further code optimization. Dynamic trace collection is fast and compact. With optimization, it only incurs a time-dilation of a factor on nine and file-size of one megabyte per second, addressing a significant criticism of this approach. Kernel extraction is accurate and performed in linear time within logarithmic memory, detecting a wide range of kernels. This approach was validated across 16 libraries, comprised of 10,507 kernels instances. To validate the accuracy of our detected kernels, five test programs were written that spans traditional kernel definitions and were certified to contain all the kernels that were expected.

Citations (5)

Summary

We haven't generated a summary for this paper yet.