2000 character limit reached
Enabling full-speed random access to the entire memory on the A100 GPU (2405.11425v1)
Published 19 May 2024 in cs.PF and cs.AR
Abstract: We describe some features of the A100 memory architecture. In particular, we give a technique to reverse-engineer some hardware layout information. Using this information, we show how to avoid TLB issues to obtain full-speed random HBM access to the entire memory, as long as we constrain any particular thread to a reduced access window of less than 64GB.
Sponsor
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.