Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
153 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Tight Bounds for Low Dimensional Star Stencils in the Parallel External Memory Model (1205.0606v3)

Published 3 May 2012 in cs.CC

Abstract: Stencil computations on low dimensional grids are kernels of many scientific applications including finite difference methods used to solve partial differential equations. On typical modern computer architectures, such stencil computations are limited by the performance of the memory subsystem, namely by the bandwidth between main memory and the cache. This work considers the computation of star stencils, like the 5-point and 7-point stencil, in the external memory model and parallel external memory model and analyses the constant of the leading term of the non-compulsory I/Os. While optimizing stencil computations is an active field of research, there has been a significant gap between the lower bounds and the performance of the algorithms so far. In two dimensions, this work provides matching constants for lower and upper bounds closing a multiplicative gap of 4. In three dimensions, the bounds match up to a factor of $\sqrt{2}$ improving the known results by a factor of $2 \sqrt{3}\sqrt{B}$, where $B$ is the block (cache line) size of the external memory model. For dimensions $d\geq 4$, the lower bound is improved between a factor of $4$ and $6$. For arbitrary dimension~$d$, the first analysis of the constant of the leading term of the non-compulsory I/Os is presented. For $d\geq 3$ the lower and upper bound match up to a factor of $\sqrt[d-1]{d!}\approx \frac{d}{e}$.

Citations (2)

Summary

We haven't generated a summary for this paper yet.