Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
144 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

TTC: A high-performance Compiler for Tensor Transpositions (1603.02297v1)

Published 7 Mar 2016 in cs.MS, cs.DC, and cs.PF

Abstract: We present TTC, an open-source parallel compiler for multidimensional tensor transpositions. In order to generate high-performance C++ code, TTC explores a number of optimizations, including software prefetching, blocking, loop-reordering, and explicit vectorization. To evaluate the performance of multidimensional transpositions across a range of possible use-cases, we also release a benchmark covering arbitrary transpositions of up to six dimensions. Performance results show that the routines generated by TTC achieve close to peak memory bandwidth on both the Intel Haswell and the AMD Steamroller architectures, and yield significant performance gains over modern compilers. By implementing a set of pruning heuristics, TTC allows users to limit the number of potential solutions; this option is especially useful when dealing with high-dimensional tensors, as the search space might become prohibitively large. Experiments indicate that when only 100 potential solutions are considered, the resulting performance is about 99% of that achieved with exhaustive search.

Citations (17)

Summary

We haven't generated a summary for this paper yet.