Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
126 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

QED: Scalable Verification of Hardware Memory Consistency (2404.03113v1)

Published 3 Apr 2024 in cs.AR

Abstract: Memory consistency model (MCM) issues in out-of-order-issue microprocessor-based shared-memory systems are notoriously non-intuitive and a source of hardware design bugs. Prior hardware verification work is limited to in-order-issue processors, to proving the correctness only of some test cases, or to bounded verification that does not scale in practice beyond 7 instructions across all threads. Because cache coherence (i.e., write serialization and atomicity) and pipeline front-end verification and testing are well-studied, we focus on the memory ordering in an out-of-order-issue processor's load-store queue and the coherence interface between the core and global coherence. We propose QED based on the key notion of observability that any hardware reordering matters only if a forbidden value is produced. We argue that one needs to consider (1) only directly-ordered instruction pairs -- transitively non-redundant pairs connected by an edge in the MCM-imposed partial order -- and not all in-flight instructions, and (2) only the ordering of external events from other cores (e.g.,invalidations) but not the events' originating cores, achieving verification scalability in both the numbers of in-flight memory instructions and of cores. Exhaustively considering all pairs of instruction types and all types of external events intervening between each pair, QED attempts to restore any reordered instructions to an MCM-complaint order without changing the execution values, where failure indicates an MCM violation. Each instruction pair's exploration results in a decision tree of simple, narrowly-defined predicates to be evaluated against the RTL. In our experiments, we automatically generate the decision trees for SC, TSO, and RISC-V WMO, and illustrate automatable verification by evaluating a substantial predicate against BOOMv3 implementation of RISC-V WMO, leaving full automation to future work.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (63)
  1. S. Adve and K. Gharachorloo, “Shared memory consistency models: a tutorial,” Computer, vol. 29, no. 12, pp. 66–76, 1996.
  2. A. V. Aho, M. R. Garey, and J. D. Ullman, “The transitive reduction of a directed graph,” SIAM Journal on Computing, vol. 1, no. 2, pp. 131–137, 1972. [Online]. Available: https://doi.org/10.1137/0201008
  3. AMD, “Revision guide for amd family 10h processors,” August 2011. [Online]. Available: https://www.yumpu.com/en/document/view/19257338/revision-guide-for-amd-family-10h-processors-amd-developer-
  4. ARM, “Cortex-a9 mpcore, programmer advice notice, read-after-read hazards,” 2011. [Online]. Available: http://infocenter.arm.com/help/topic/com.arm.doc.uan0004a/UAN0004A_a9_read_read.pdf
  5. M. F. Atig, A. Bouajjani, S. Burckhardt, and M. Musuvathi, “On the verification problem for weak memory models,” SIGPLAN Not., vol. 45, no. 1, p. 7–18, jan 2010. [Online]. Available: https://doi.org/10.1145/1707801.1706303
  6. J. Bachrach, H. Vo, B. Richards, Y. Lee, A. Waterman, R. Avižienis, J. Wawrzynek, and K. Asanović, “Chisel: Constructing hardware in a scala embedded language,” in Proceedings of the 49th Annual Design Automation Conference, ser. DAC ’12.   New York, NY, USA: Association for Computing Machinery, 2012, p. 1216–1225. [Online]. Available: https://doi.org/10.1145/2228360.2228584
  7. T. Ball, R. Majumdar, T. Millstein, and S. K. Rajamani, “Automatic predicate abstraction of c programs,” in Proceedings of the ACM SIGPLAN 2001 Conference on Programming Language Design and Implementation, ser. PLDI ’01.   New York, NY, USA: Association for Computing Machinery, 2001, p. 203–213. [Online]. Available: https://doi.org/10.1145/378795.378846
  8. Y. Chen, Y. Lv, W. Hu, T. Chen, H. Shen, P. Wang, and H. Pan, “Fast complete memory consistency verification,” in 2009 IEEE 15th International Symposium on High Performance Computer Architecture, 2009, pp. 381–392.
  9. J. Choi, M. Vijayaraghavan, B. Sherman, A. Chlipala, and Arvind, “Kami: A platform for high-level parametric hardware specification and its modular verification,” Proc. ACM Program. Lang., vol. 1, no. ICFP, aug 2017. [Online]. Available: https://doi.org/10.1145/3110268
  10. C.-T. Chou, P. K. Mannava, and S. Park, “A simple method for parameterized verification of cache coherence protocols,” in Formal Methods in Computer-Aided Design, A. J. Hu and A. K. Martin, Eds.   Berlin, Heidelberg: Springer Berlin Heidelberg, 2004, pp. 382–398.
  11. E. M. Clarke, O. Grumberg, H. Hiraishi, S. Jha, D. E. Long, K. L. McMillan, and L. A. Ness, “Verification of the futurebus+ cache coherence protocol,” in Computer Hardware Description Languages and their Applications, ser. IFIP Transactions A: Computer Science and Technology, D. AGNEW, L. CLAESEN, and R. CAMPOSANO, Eds.   Amsterdam: North-Holland, 1993, pp. 15–30. [Online]. Available: https://www.sciencedirect.com/science/article/pii/B9780444816412500071
  12. Y. Duan, A. Muzahid, and J. Torrellas, “Weefence: Toward making fences free in tso,” in Proceedings of the 40th Annual International Symposium on Computer Architecture, ser. ISCA ’13.   New York, NY, USA: Association for Computing Machinery, 2013, p. 213–224. [Online]. Available: https://doi.org/10.1145/2485922.2485941
  13. M. Elver and V. Nagarajan, “Mcversi: A test generation framework for fast memory consistency verification in simulation,” in 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA), 2016, pp. 618–630.
  14. S. Flur, K. E. Gray, C. Pulte, S. Sarkar, A. Sezgin, L. Maranget, W. Deacon, and P. Sewell, “Modelling the ARMv8 architecture, operationally: concurrency and ISA,” in Proceedings of the 43rd ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (St. Petersburg, FL, USA), Jan. 2016, pp. 608–621.
  15. K. Gharachorloo, A. Gupta, and J. L. Hennessy, “Two techniques to enhance the performance of memory consistency models,” in Proceedings of the International Conference on Parallel Processing, ICPP ’91, Austin, Texas, USA, August 1991. Volume I: Architecture/Hardware.   CRC Press, 1991, pp. 355–364.
  16. K. Gharachorloo, D. Lenoski, J. Laudon, P. Gibbons, A. Gupta, and J. Hennessy, “Memory consistency and event ordering in scalable shared-memory multiprocessors,” in Proceedings of the 17th Annual International Symposium on Computer Architecture, ser. ISCA ’90.   New York, NY, USA: Association for Computing Machinery, 1990, p. 15–26. [Online]. Available: https://doi.org/10.1145/325164.325102
  17. P. B. Gibbons and E. Korach, “Testing shared memories,” SIAM Journal on Computing, vol. 26, no. 4, pp. 1208–1244, 1997. [Online]. Available: https://doi.org/10.1137/S0097539794279614
  18. S. Hangal, D. Vahia, C. Manovit, J.-Y. Lu, and S. Narayanan, “Tsotool: a program for verifying memory systems using the memory consistency model,” in Proceedings. 31st Annual International Symposium on Computer Architecture, 2004., 2004, pp. 114–123.
  19. T. A. Henzinger, R. Jhala, R. Majumdar, and G. Sutre, “Lazy abstraction,” in Proceedings of the 29th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, ser. POPL ’02.   New York, NY, USA: Association for Computing Machinery, 2002, p. 58–70. [Online]. Available: https://doi.org/10.1145/503272.503279
  20. N. Hossain, C. Trippel, and M. Martonosi, “Transform: Formally specifying transistency models and synthesizing enhanced litmus tests,” in 47th ACM/IEEE Annual International Symposium on Computer Architecture, ISCA 2020, Valencia, Spain, May 30 - June 3, 2020.   IEEE, 2020, pp. 874–887. [Online]. Available: https://doi.org/10.1109/ISCA45697.2020.00076
  21. Y. Hsiao, D. P. Mulligan, N. Nikoleris, G. Petri, and C. Trippel, “Synthesizing formal models of hardware from rtl for efficient verification of memory model implementations,” in MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture, ser. MICRO ’21.   New York, NY, USA: Association for Computing Machinery, 2021, p. 679–694. [Online]. Available: https://doi.org/10.1145/3466752.3480087
  22. W. Hu, Y. Chen, T. Chen, C. Qian, and L. Li, “Linear time memory consistency verification,” IEEE Transactions on Computers, vol. 61, no. 4, pp. 502–516, 2012.
  23. J. Huh, J. Chang, D. Burger, and G. S. Sohi, “Coherence decoupling: Making use of incoherence,” SIGPLAN Not., vol. 39, no. 11, p. 97–106, oct 2004. [Online]. Available: https://doi.org/10.1145/1037187.1024406
  24. Intel, “Intel xeon processor e3-1200 v3 product family, specification update,” April 2015. [Online]. Available: https://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/xeon-e3-1200v3-spec-update-oct2016.pdf
  25. C. N. Ip and D. L. Dill, “Better verification through symmetry,” in Proceedings of the 11th IFIP WG10.2 International Conference Sponsored by IFIP WG10.2 and in Cooperation with IEEE COMPSOC on Computer Hardware Description Languages and Their Applications, ser. CHDL ’93.   NLD: North-Holland Publishing Co., 1993, p. 97–111.
  26. S. Krstic, “Parameterized system verification with guard strengthening and parameter abstraction,” Automated verification of infinite state systems, 2005.
  27. L. Lamport, “How to make a multiprocessor computer that correctly executes multiprocess programs,” IEEE Transactions on Computers, vol. C-28, no. 9, pp. 690–691, 1979.
  28. J. Laudon and D. Lenoski, “The sgi origin: A ccnuma highly scalable server,” in Conference Proceedings. The 24th Annual International Symposium on Computer Architecture, 1997, pp. 241–251.
  29. K. Lepak, G. Bell, and M. Lipasti, “Silent stores and store value locality,” IEEE Transactions on Computers, vol. 50, no. 11, pp. 1174–1190, 2001.
  30. P. Loewenstein and D. L. Dill, “Verification of a multiprocessor cache protocol using simulation relations and higher-order logic (summary),” in Computer-Aided Verification, E. M. Clarke and R. P. Kurshan, Eds.   Berlin, Heidelberg: Springer Berlin Heidelberg, 1991, pp. 302–311.
  31. D. Lustig, M. Pellauer, and M. Martonosi, “Pipecheck: Specifying and verifying microarchitectural enforcement of memory consistency models,” in 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture, 2014, pp. 635–646.
  32. D. Lustig, G. Sethi, M. Martonosi, and A. Bhattacharjee, “Coatcheck: Verifying memory ordering at the hardware-os interface,” in Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems, ser. ASPLOS ’16.   New York, NY, USA: Association for Computing Machinery, 2016, p. 233–247. [Online]. Available: https://doi.org/10.1145/2872362.2872399
  33. D. Lustig, A. Wright, A. Papakonstantinou, and O. Giroux, “Automated synthesis of comprehensive memory model litmus test suites,” in Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems, ser. ASPLOS ’17.   New York, NY, USA: Association for Computing Machinery, 2017, p. 661–675. [Online]. Available: https://doi.org/10.1145/3037697.3037723
  34. S. Mador-Haim, L. Maranget, S. Sarkar, K. Memarian, J. Alglave, S. Owens, R. Alur, M. M. K. Martin, P. Sewell, and D. Williams, “An axiomatic memory model for POWER multiprocessors,” in Proceedings of the 24th International Conference on Computer Aided Verification, 2012, pp. 495–512.
  35. Y. A. Manerkar, D. Lustig, and M. Martonosi, “Realitycheck: Bringing modularity, hierarchy, and abstraction to automated microarchitectural memory consistency verification.”   arXiv, 2020.
  36. Y. A. Manerkar, D. Lustig, M. Martonosi, and A. Gupta, “Pipeproof: Automated memory consistency proofs for microarchitectural specifications,” in Proceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture, ser. MICRO-51.   IEEE Press, 2018, p. 788–801. [Online]. Available: https://doi.org/10.1109/MICRO.2018.00069
  37. Y. A. Manerkar, D. Lustig, M. Martonosi, and M. Pellauer, “Rtlcheck: Verifying the memory consistency of rtl designs,” in Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, ser. MICRO-50 ’17.   New York, NY, USA: Association for Computing Machinery, 2017, p. 463–476. [Online]. Available: https://doi.org/10.1145/3123939.3124536
  38. Y. A. Manerkar, D. Lustig, M. Pellauer, and M. Martonosi, “Ccicheck: Using μ𝜇\muitalic_μhb graphs to verify the coherence-consistency interface,” in Proceedings of the 48th International Symposium on Microarchitecture, ser. MICRO-48.   New York, NY, USA: Association for Computing Machinery, 2015, p. 26–37. [Online]. Available: https://doi.org/10.1145/2830772.2830782
  39. C. Manovit and S. Hangal, “Completely verifying memory consistency of test program executions,” in The Twelfth International Symposium on High-Performance Computer Architecture, 2006., 2006, pp. 166–175.
  40. O. Matthews, J. Bingham, and D. J. Sorin, “Verifiable hierarchical protocols with network invariants on parametric systems,” in 2016 Formal Methods in Computer-Aided Design (FMCAD), 2016, pp. 101–108.
  41. O. Matthews and D. J. Sorin, “Architecting hierarchical coherence protocols for push-button parametric verification,” in 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), 2017, pp. 477–489.
  42. K. L. McMillan, “Parameterized verification of the flash cache coherence protocol by compositional model checking,” in Correct Hardware Design and Verification Methods, T. Margaria and T. Melham, Eds.   Berlin, Heidelberg: Springer Berlin Heidelberg, 2001, pp. 179–195.
  43. A. Meixner and D. Sorin, “Dynamic verification of memory consistency in cache-coherent multithreaded computer architectures,” in International Conference on Dependable Systems and Networks (DSN’06), 2006, pp. 73–82.
  44. S. Park and D. L. Dill, “Verification of flash cache coherence protocol by aggregation of distributed transactions,” in Proceedings of the Eighth Annual ACM Symposium on Parallel Algorithms and Architectures, ser. SPAA ’96.   New York, NY, USA: Association for Computing Machinery, 1996, p. 288–296. [Online]. Available: https://doi.org/10.1145/237502.237573
  45. M. Plakal, D. J. Sorin, A. E. Condon, and M. D. Hill, “Lamport clocks: Verifying a directory cache-coherence protocol,” in Proceedings of the Tenth Annual ACM Symposium on Parallel Algorithms and Architectures, ser. SPAA ’98.   New York, NY, USA: Association for Computing Machinery, 1998, p. 67–76. [Online]. Available: https://doi.org/10.1145/277651.277672
  46. F. Pong and M. Dubois, “The verification of cache coherence protocols,” in Proceedings of the 5th Annual ACM Symposium on Parallel Algorithms and Architectures, SPAA ’93, Velen, Germany, June 30 - July 2, 1993, L. Snyder, Ed.   ACM, 1993, pp. 11–20. [Online]. Available: https://doi.org/10.1145/165231.165233
  47. F. Pong and M. Dubois, “A new approach for the verification of cache coherence protocols,” IEEE Trans. Parallel Distributed Syst., vol. 6, no. 8, pp. 773–787, 1995. [Online]. Available: https://doi.org/10.1109/71.406955
  48. F. Pong and M. Dubois, “Verification techniques for cache coherence protocols,” ACM Comput. Surv., vol. 29, no. 1, p. 82–126, mar 1997. [Online]. Available: https://doi.org/10.1145/248621.248624
  49. F. Pong and M. Dubois, “Formal verification of complex coherence protocols using symbolic state models,” J. ACM, vol. 45, no. 4, pp. 557–587, 1998. [Online]. Available: https://doi.org/10.1145/285055.285057
  50. C. Pulte, J. Pichon-Pharabod, J. Kang, S. Lee, and C. Hur, “Promising-arm/risc-v: a simpler and faster operational concurrency model,” in Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation.   ACM, 2019, pp. 1–15. [Online]. Available: https://doi.org/10.1145/3314221.3314624
  51. A. Reid, R. Chen, A. Deligiannis, D. Gilday, D. Hoyes, W. Keen, A. Pathirane, O. Shepherd, P. Vrabel, and A. Zaidi, “End-to-end verification of processors with isa-formal,” in Computer Aided Verification, S. Chaudhuri and A. Farzan, Eds.   Cham: Springer International Publishing, 2016, pp. 42–58.
  52. C. Scheurich and M. Dubois, “Correct memory operation of cache-based multiprocessors,” in Proceedings of the 14th Annual International Symposium on Computer Architecture, ser. ISCA ’87.   New York, NY, USA: Association for Computing Machinery, 1987, p. 234–243. [Online]. Available: https://doi.org/10.1145/30350.30377
  53. P. Sewell, S. Sarkar, S. Owens, F. Zappa Nardelli, and M. O. Myreen, “x86-TSO: A rigorous and usable programmer’s model for x86 multiprocessors,” Communications of the ACM, vol. 53, no. 7, pp. 89–97, Jul. 2010, (Research Highlights). [Online]. Available: http://doi.acm.org/10.1145/1785414.1785443
  54. D. Shasha and M. Snir, “Efficient and correct execution of parallel programs that share memory,” ACM Trans. Program. Lang. Syst., vol. 10, no. 2, p. 282–312, apr 1988. [Online]. Available: https://doi.org/10.1145/42190.42277
  55. D. J. Sorin, M. Plakal, A. Condon, M. D. Hill, M. M. K. Martin, and D. A. Wood, “Specifying and verifying a broadcast and a multicast snooping cache coherence protocol,” IEEE Trans. Parallel Distributed Syst., vol. 13, pp. 556–578, 2002.
  56. M. Talupur and M. R. Tuttle, “Going with the flow: Parameterized verification using message flows,” in 2008 Formal Methods in Computer-Aided Design.   IEEE, 2008, pp. 1–8.
  57. C. Trippel, Y. A. Manerkar, D. Lustig, M. Pellauer, and M. Martonosi, “Tricheck: Memory model verification at the trisection of software, hardware, and isa,” ser. ASPLOS ’17.   New York, NY, USA: Association for Computing Machinery, 2017, p. 119–133. [Online]. Available: https://doi.org/10.1145/3037697.3037719
  58. G. Voskuilen and T. N. Vijaykumar, “Fractal++: Closing the performance gap between fractal and conventional coherence,” in 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA), 2014, pp. 409–420.
  59. J. Wickerson, M. Batty, T. Sorensen, and G. A. Constantinides, “Automatically comparing memory consistency models,” in Proceedings of the 44th ACM SIGPLAN Symposium on Principles of Programming Languages, ser. POPL ’17.   New York, NY, USA: Association for Computing Machinery, 2017, p. 190–204. [Online]. Available: https://doi.org/10.1145/3009837.3009838
  60. H. Zhang, C. Trippel, Y. A. Manerkar, A. Gupta, M. Martonosi, and S. Malik, “Ila-mcm: Integrating memory consistency models with instruction-level abstractions for heterogeneous system-on-chip verification,” in 2018 Formal Methods in Computer Aided Design (FMCAD), 2018, pp. 1–10.
  61. M. Zhang, J. D. Bingham, J. Erickson, and D. J. Sorin, “Pvcoherence: Designing flat coherence protocols for scalable verification,” in 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA), 2014, pp. 392–403.
  62. M. Zhang, A. R. Lebeck, and D. J. Sorin, “Fractal coherence: Scalably verifiable cache coherence,” in 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture, 2010, pp. 471–482.
  63. J. Zhao, B. Korpan, A. Gonzalez, and K. Asanovic, “Sonicboom: The 3rd generation berkeley out-of-order machine,” Fourth Workshop on Computer Architecture Research with RISC-V, May 2020.

Summary

We haven't generated a summary for this paper yet.