Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

LFOC: A Lightweight Fairness-Oriented Cache Clustering Policy for Commodity Multicores (2402.07578v1)

Published 12 Feb 2024 in cs.DC and cs.AR

Abstract: Multicore processors constitute the main architecture choice for modern computing systems in different market segments. Despite their benefits, the contention that naturally appears when multiple applications compete for the use of shared resources among cores, such as the last-level cache (LLC), may lead to substantial performance degradation. This may have a negative impact on key system aspects such as throughput and fairness. Assigning the various applications in the workload to separate LLC partitions with possibly different sizes, has been proven effective to mitigate shared-resource contention effects. In this article we propose LFOC, a clustering-based cache partitioning scheme that strives to deliver fairness while providing acceptable system throughput. LFOC leverages the Intel Cache Allocation Technology (CAT), which enables the system software to divide the LLC into different partitions. To accomplish its goals, LFOC tries to mimic the behavior of the optimal cache-clustering solution, which we could approximate by means of a simulator in different scenarios. To this end, LFOC effectively identifies streaming aggressor programs and cache sensitive applications, which are then assigned to separate cache partitions. We implemented LFOC in the Linux kernel and evaluated it on a real system featuring an Intel Skylake processor, where we compare its effectiveness to that of two state-of-the-art policies that optimize fairness and throughput, respectively. Our experimental analysis reveals that LFOC is able to bring a higher reduction in unfairness by leveraging a lightweight algorithm suitable for adoption in a real OS.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (36)
  1. J. Brock et al. 2015. Optimal Cache Partition-Sharing. In Proceedings of the 2015 44th International Conference on Parallel Processing (ICPP) (ICPP ’15). 749–758.
  2. E. Ebrahimi et al. 2010. Fairness via source throttling: a configurable and high-performance fairness substrate for multi-core memory systems. In 15th Int’l Conf. Architectural Support Programming Lang. and Oper. Syst. (ASPLOS 10). 335–346.
  3. N. El-Sayed et al. 2018a. KPart: A Hybrid Cache Partitioning-Sharing Technique for Commodity Multicores. In 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA). 104–117.
  4. N. El-Sayed et al. 2018b. Source Code of KPart. https://github.com/Nosayba/kpart. Accessed: 2019-02-20.
  5. S. Eyerman and L. Eeckhout. 2008. System-Level Performance Metrics for Multiprogram Workloads. IEEE Micro 28, 3 (May 2008), 42–53.
  6. J. Feliu et al. 2016. Perf & Fair: a Progress-Aware Scheduler to Enhance Performance and Fairness in SMT Multicores. IEEE Trans. Comput. PP, 99 (2016).
  7. Ginseng: Market-driven LLC Allocation. In Proceedings of the 2016 USENIX Annual Technical Conference (USENIX ATC ’16). 295–308.
  8. PBBCache: A parallel branch-and-bound based cache-partitioning simulator. https://github.com/pbbcache/cachesim. Accessed: 2019-05-10.
  9. Contention-Aware Fair Scheduling for Asymmetric Single-ISA Multicore Systems. IEEE Trans. Comput. 67, 12 (Dec 2018), 1703–1719.
  10. S. M. Khan et al. 2014. Improving cache performance using read-write partitioning. In 20th IEEE International Symposium on High Performance Computer Architecture, HPCA 2014. 452–463.
  11. D. Lo et al. 2015. Heracles: improving resource efficiency at scale. In Proc. of the 42nd Annual International Symposium on Computer Architecture. 450–462.
  12. R. Love. 2010. Linux Kernel Development (3rd ed.). Addison-Wesley Professional.
  13. Probabilistic Shared Cache Management (PriSM). In Proceedings of the 39th Annual International Symposium on Computer Architecture (ISCA ’12). 428–439.
  14. S. Mittal. 2017. A Survey of Techniques for Cache Partitioning in Multicore Processors. ACM Comput. Surv. 50, 2, Article 27 (May 2017), 27:1–27:39 pages.
  15. T.Y. Morad et al. 2016. EFS: Energy-Friendly Scheduler for memory bandwidth constrained systems. J. Parallel and Distrib. Comput. 95 (2016), 3 – 14.
  16. Whirlpool: Improving Dynamic Cache Management with Static Data Classification. In Proc. of the 21st Int’l Conf. on Arch. Support for Programming Lang. and Oper. Syst. (ASPLOS ’16). 113–127.
  17. O. Mutlu and T. Moscibroda. 2007. Stall-Time Fair Memory Access Scheduling for Chip Multiprocessors. In 40th Ann. IEEE/ACM Int’l Symp. on Microarchitecture (MICRO 07). 146–160.
  18. K. Nguyen. 2016. Introduction to Cache Allocation Technology in the Intel Xeon Processor E5 v4 Family. https://software.intel.com/en-us/articles/introduction-to-cache-allocation-technology. Accessed: 2019-03-20.
  19. M.K. Qureshi and Y.N. Patt. 2006. Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches. In Proceedings of MICRO 06. 423–432.
  20. J.C. Saez et al. 2017a. PMCTrack: Delivering Performance Monitoring Counter Support to the OS Scheduler. Comput. J. 60, 1 (2017), 60–85.
  21. J.C. Saez et al. 2017b. Towards completely fair scheduling on asymmetric single-ISA multicore processors. J. Parallel and Distrib. Comput. 102 (2017), 115 – 131.
  22. Improving Priority Enforcement via Non-Work-Conserving Scheduling. In ICPP ’08: Proceedings of the 2008 37th International Conference on Parallel Processing. 99–106.
  23. A Software Cache Partitioning System for Hash-Based Caches. ACM Trans. Archit. Code Optim. 13, 4, Article 57 (Dec. 2016), 57:1–57:24 pages.
  24. V. Selfa et al. 2017. Application Clustering Policies to Address System Fairness with Intel’s Cache Allocation Technology. In 2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT). 194–205.
  25. Reducing Cache Misses Using Hardware and Software Page Placement. In Proceedings of the 13th International Conference on Supercomputing (ICS ’99). 155–164.
  26. L. Subramanian et al. 2015. The Application Slowdown Model: Quantifying and Controlling the Impact of Inter-application Interference at Shared Caches and Main Memory. In Proceedings of the 48th International Symposium on Microarchitecture (MICRO-48). 62–75.
  27. K. Van Craeynest et al. 2013. Fairness-aware scheduling on single-ISA heterogeneous multi-cores. In 22nd Int’l Conf. Parallel Arch. Compilation Techniques (PACT 13). 177–187.
  28. R. Wang and L. Chen. 2014. Futility Scaling: High-Associativity Cache Partitioning. In Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-47). 356–367.
  29. D. Xu et al. 2012. Providing Fairness on Shared-memory Multiprocessors via Process Scheduling. In Proc. ACM Int’l Conf. Measurement and Modeling Comp. Syst. (SIGMETRICS 12). 295–306.
  30. Y. Ye et al. 2014. COLORIS: A Dynamic Cache Partitioning System Using Page Coloring. In Proceedings of the 23rd International Conference on Parallel Architectures and Compilation (PACT ’14). 381–392.
  31. C. Yu and P. Petrov. 2010. Off-chip Memory Bandwidth Minimization Through Cache Partitioning for Multi-core Platforms. In Proceedings of the 47th Design Automation Conference (DAC ’10). 132–137.
  32. H. Yun et al. 2014. PALLOC: DRAM bank-aware memory allocator for performance isolation on multicore platforms. In 20th Real-Time Embedded Tech. and Applications Symp. (RTAS 14). 155–166.
  33. H. Yun et al. 2016. Memory Bandwidth Management for Efficient Performance Isolation in Multi-Core Platforms. IEEE Trans. Comput. 65, 2 (Feb 2016), 562–576.
  34. Towards Practical Page Coloring-based Multicore Cache Management. In Proceedings of the 4th ACM European Conference on Computer Systems (EuroSys ’09). 89–102.
  35. H. Zhu and M. Erez. 2016. Dirigent: Enforcing QoS for Latency-Critical Tasks on Shared Multicore Systems. In Proc. of the 21st Int’l Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS ’16). 33–47.
  36. S. Zhuravlev et al. 2012. Survey of Scheduling Techniques for Addressing Shared Resources in Multicore Processors. ACM Comput. Surv. 45, 1, Article 4 (Dec. 2012), 28 pages.
Citations (8)

Summary

We haven't generated a summary for this paper yet.