Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
162 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Model-Driven Automatic Tiling with Cache Associativity Lattices (1511.05585v1)

Published 17 Nov 2015 in cs.PF

Abstract: Traditional compiler optimization theory distinguishes three separate classes of cache miss -- Cold, Conflict and Capacity. Tiling for cache is typically guided by capacity miss counts. Models of cache function have not been effectively used to guide cache tiling optimizations due to model error and expense. Instead, heuristic or empirical approaches are used to select tilings. We argue that conflict misses, traditionally neglected or seen as a small constant effect, are the only fundamentally important cache miss category, that they form a solid basis by which caches can become modellable, and that models leaning on cache associatvity analysis can be used to generate cache performant tilings. We develop a mathematical framework that expresses potential and actual cache misses in associative caches using Associativity Lattices. We show these lattices to possess two theoretical advantages over rectangular tiles -- volume maximization and miss regularity. We also show that to generate such lattice tiles requires, unlike rectangular tiling, no explicit, expensive lattice point counting. We also describe an implementation of our lattice tiling approach, show that it can be used to give speedups of over 10x versus unoptimized code, and despite currently only tiling for one level of cache, can already be competitive with the aggressive compiler optimizations used in general purposes compares such as GCC and Intel's ICC. We also show that the tiling approach can lead to reasonable automatic parallelism when compared to existing auto-threading compilers.

Summary

We haven't generated a summary for this paper yet.