Virtuoso: Enabling Fast and Accurate Virtual Memory Research via an Imitation-based Operating System Simulation Methodology (2403.04635v2)
Abstract: The unprecedented growth in data demand from emerging applications has turned virtual memory (VM) into a major performance bottleneck. Researchers explore new hardware/OS co-designs to optimize VM across diverse applications and systems. To evaluate such designs, researchers rely on various simulation methodologies to model VM components.Unfortunately, current simulation tools (i) either lack the desired accuracy in modeling VM's software components or (ii) are too slow and complex to prototype and evaluate schemes that span across the hardware/software boundary. We introduce Virtuoso, a new simulation framework that enables quick and accurate prototyping and evaluation of the software and hardware components of the VM subsystem. The key idea of Virtuoso is to employ a lightweight userspace OS kernel, called MimicOS, that (i) accelerates simulation time by imitating only the desired kernel functionalities, (ii) facilitates the development of new OS routines that imitate real ones, using an accessible high-level programming interface, (iii) enables accurate and flexible evaluation of the application- and system-level implications of VM after integrating Virtuoso to a desired architectural simulator. We integrate Virtuoso into five diverse architectural simulators, each specializing in different aspects of system design, and heavily enrich it with multiple state-of-the-art VM schemes. Our validation shows that Virtuoso ported on top of Sniper, a state-of-the-art microarchitectural simulator, models the memory management unit of a real high-end server-grade page fault latency of a real Linux kernel with high accuracy . Consequently, Virtuoso models the IPC performance of a real high-end server-grade CPU with 21% higher accuracy than the baseline version of Sniper. The source code of Virtuoso is freely available at https://github.com/CMU-SAFARI/Virtuoso.
- DaxVM: Stressing the Limits of Memory as a File Interface. In MICRO 2022.
- Efficient Virtual Memory for Big Memory Servers. In ISCA 2013.
- Mosaic Pages: Big TLB Reach with Small Pages. In ASPLOS 2023.
- Graph 500. Graph 500 Large-Scale Benchmarks. http://www.graph500.org/.
- The Architectural Implications of Facebook’s DNN-based Personalized Recommendation. In arXiv 2019.
- Hpsresearchgroup. “hpsresearchgroup/scarab: Joint hps and eth repository to work towards open sourcing scarab and ramulator.”. https://github.com/hpsresearchgroup/scarab.
- NeuMMU: Architectural Support for Efficient Address Translations in Neural Processing Units. In ASPLOS ’20.
- Intel Corp. 3rd Generation Intel® Xeon® Scalable processore. https://www.intel.com/content/www/us/en/products/docs/processors/embedded/3rd-gen-xeon-scalable-iot-product-brief.html.
- Coordinated and Efficient Huge Page Management with Ingens. In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16).
- Page Size Aware Cache Prefetching. In MICRO 2022.
- Practical, Transparent Operating System Support for Superpages. In OSDI.
- D. Ernst T. Austin, E. Larson. SimpleScalar: an infrastructure for computer system modeling. In Computer, ( Volume: 35, Issue: 2, February 2002). IEEE, 59–67.
- Mondrian Memory Protection. In ASPLOS.
- The HPC Challenge (HPCC) Benchmark Suite. In SC.
- M.T. Yourst. PTLsim: A Cycle Accurate Full System x86-64 Microarchitectural Simulator (IEEE International Symposium on Performance Analysis of Systems and Software).
- Accelerating Two-Dimensional Page Walks for Virtualized Systems. In ASPLOS.
- SpecTLB: A Mechanism for Speculative Address Translation. In ISCA.
- The gem5 Simulator. Comput. Archit. News (2011).
- Sniper: Exploring the Level of Abstraction for Scalable and Accurate Parallel Multi-Core Simulations. In SC.
- Jonathan Corbet. Transparent Huge Pages in 2.6.38. https://lwn.net/inproceedingss/423584/.
- Reducing Memory Reference Energy with Opportunistic Virtual Caching. In ISCA.
- CoLT: Coalesced Large-Reach TLBs. In MICRO.
- Multi2Sim: A Simulation Framework for CPU-GPU Computing. In PACT.
- ZSim: Fast and Accurate Microarchitectural Simulation of Thousand-Core Systems. In ISCA.
- Efficient Memory Virtualization: Reducing Dimensionality of Nested Page Walks. In MICRO.
- Supporting Superpages in Non-Contiguous Physical Memory. In HPCA.
- Redundant Memory Mappings for Fast Access to Large Memories. In ISCA.
- Ramulator: A Fast and Extensible DRAM Simulator. In CAL.
- Agile Paging: Exceeding the Best of Nested and Shadow Paging. In ISCA.
- Hash, Don’t Cache (the Page Table). In SIGMETRICS.
- Utility-Based Hybrid Memory Management. In CLUSTER.
- CSALT: Context Switch Aware Large TLB. In MICRO.
- Near-Memory Address Translation. In PACT.
- RethInking TLB Designs in Virtualized Environments: A Very Large Part-of-Memory TLB. In ISCA.
- A Case for Richer Cross-Layer Abstractions: Bridging the Semantic Gap with Expressive Memory. In ISCA.
- Janus: Optimizing Memory and Storage Support for Non-Volatile Memory Systems. In ISCA.
- Hawkeye: Efficient Fine-grained OS Support for Huge Pages. In ASPLOS.
- Translation Ranger: Operating System Support for Contiguity-Aware TLBs. In ISCA.
- Enhancing and Exploiting Contiguity for Fast Memory Virtualization. In ISCA.
- The Virtual Block Interface: A Flexible Alternative to the Conventional Virtual Memory Framework. In ISCA.
- Centaur: A Chiplet-based, Hybrid Sparse-Dense Accelerator for Personalized Recommendations. In 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA).
- A Case for Hardware-Based Demand Paging. In 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA). 1103–1116. https://doi.org/10.1109/ISCA45697.2020.00093
- Elastic Cuckoo Page Tables: Rethinking Virtual Memory Translation for Parallelism. In ASPLOS.
- Rebooting Virtual Memory with Midgard. In ISCA.
- PTEMagnet: FIne-graIned Physical Memory Reservation for Faster Page Walks in Public Clouds. In ASPLOS.
- GenomicsBench: A Benchmark Suite for Genomics. In ISPASS.
- Exploiting Page Table Locality for Agile TLB Prefetching. In ISCA.
- The Championship Simulator: Architectural Simulation for Education and Competition. In arXiv.
- Every Walk’s a Hit: Making Page Walks Single-Access Cache Hits. In ASPLOS.
- Parallel Virtualized Memory Translation with Nested Elastic Cuckoo Page Tables. In ASPLOS.
- Reducing Minor Page Fault Overheads through Enhanced Page Walker. ACM Trans. Archit. Code Optim.
- Utopia: Efficient Address Translation using Hybrid Virtual-to-Physical Address Mapping. arXiv:2211.12205 [cs.AR]
- Victima: Drastically Increasing Address Translation Reach by Leveraging Underutilized Cache Resources. In MICRO.
- Memory-Efficient Hashed Page Tables. In 2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA). 1221–1235. https://doi.org/10.1109/HPCA56546.2023.10071061
- Contiguitas: the Pursuit of Physical Memory Contiguity in Datacenters. In ISCA.