Fast TLB Simulation for RISC-V Systems

Published 16 May 2019 in cs.AR | (1905.06825v1)

Abstract: Address translation and protection play important roles in today's processors, supporting multiprocessing and enforcing security. Historically, the design of the address translation mechanisms has been closely tied to the instruction set. In contrast, RISC-V defines its privileged specification in a way that permits a variety of designs. An important part of the design space is the organisation of Translation Lookaside Buffers (TLBs). This paper presents our recent work on simulating TLB behaviours in multi-core RISC-V systems. Our TLB simulation framework allows rapid, flexible and versatile prototyping of various hardware TLB design choices, and enables validation, profiling and benchmarking of software running on RISC-V systems. We show how this framework can be integrated with the dynamic binary translated emulator QEMU to perform online simulation. When simulating complicated multi-level shared TLB designs, the framework runs at around 400 million instructions per second (MIPS) when simulating an 8-core system. The performance overhead compared to unmodified QEMU is only 18% when the benchmark's L1 TLB miss rate is 1%. We also demonstrate how this tool can be used to explore the instruction-set level design space. We test a shared last-level TLB design that is not currently permitted by the RISC-V's privileged specification. We then propose an extension to RISC-V's virtual memory system design based on these experimental results.