QED: Scalable Verification of Hardware Memory Consistency (2404.03113v1)
Abstract: Memory consistency model (MCM) issues in out-of-order-issue microprocessor-based shared-memory systems are notoriously non-intuitive and a source of hardware design bugs. Prior hardware verification work is limited to in-order-issue processors, to proving the correctness only of some test cases, or to bounded verification that does not scale in practice beyond 7 instructions across all threads. Because cache coherence (i.e., write serialization and atomicity) and pipeline front-end verification and testing are well-studied, we focus on the memory ordering in an out-of-order-issue processor's load-store queue and the coherence interface between the core and global coherence. We propose QED based on the key notion of observability that any hardware reordering matters only if a forbidden value is produced. We argue that one needs to consider (1) only directly-ordered instruction pairs -- transitively non-redundant pairs connected by an edge in the MCM-imposed partial order -- and not all in-flight instructions, and (2) only the ordering of external events from other cores (e.g.,invalidations) but not the events' originating cores, achieving verification scalability in both the numbers of in-flight memory instructions and of cores. Exhaustively considering all pairs of instruction types and all types of external events intervening between each pair, QED attempts to restore any reordered instructions to an MCM-complaint order without changing the execution values, where failure indicates an MCM violation. Each instruction pair's exploration results in a decision tree of simple, narrowly-defined predicates to be evaluated against the RTL. In our experiments, we automatically generate the decision trees for SC, TSO, and RISC-V WMO, and illustrate automatable verification by evaluating a substantial predicate against BOOMv3 implementation of RISC-V WMO, leaving full automation to future work.
- S. Adve and K. Gharachorloo, “Shared memory consistency models: a tutorial,” Computer, vol. 29, no. 12, pp. 66–76, 1996.
- A. V. Aho, M. R. Garey, and J. D. Ullman, “The transitive reduction of a directed graph,” SIAM Journal on Computing, vol. 1, no. 2, pp. 131–137, 1972. [Online]. Available: https://doi.org/10.1137/0201008
- AMD, “Revision guide for amd family 10h processors,” August 2011. [Online]. Available: https://www.yumpu.com/en/document/view/19257338/revision-guide-for-amd-family-10h-processors-amd-developer-
- ARM, “Cortex-a9 mpcore, programmer advice notice, read-after-read hazards,” 2011. [Online]. Available: http://infocenter.arm.com/help/topic/com.arm.doc.uan0004a/UAN0004A_a9_read_read.pdf
- M. F. Atig, A. Bouajjani, S. Burckhardt, and M. Musuvathi, “On the verification problem for weak memory models,” SIGPLAN Not., vol. 45, no. 1, p. 7–18, jan 2010. [Online]. Available: https://doi.org/10.1145/1707801.1706303
- J. Bachrach, H. Vo, B. Richards, Y. Lee, A. Waterman, R. Avižienis, J. Wawrzynek, and K. Asanović, “Chisel: Constructing hardware in a scala embedded language,” in Proceedings of the 49th Annual Design Automation Conference, ser. DAC ’12. New York, NY, USA: Association for Computing Machinery, 2012, p. 1216–1225. [Online]. Available: https://doi.org/10.1145/2228360.2228584
- T. Ball, R. Majumdar, T. Millstein, and S. K. Rajamani, “Automatic predicate abstraction of c programs,” in Proceedings of the ACM SIGPLAN 2001 Conference on Programming Language Design and Implementation, ser. PLDI ’01. New York, NY, USA: Association for Computing Machinery, 2001, p. 203–213. [Online]. Available: https://doi.org/10.1145/378795.378846
- Y. Chen, Y. Lv, W. Hu, T. Chen, H. Shen, P. Wang, and H. Pan, “Fast complete memory consistency verification,” in 2009 IEEE 15th International Symposium on High Performance Computer Architecture, 2009, pp. 381–392.
- J. Choi, M. Vijayaraghavan, B. Sherman, A. Chlipala, and Arvind, “Kami: A platform for high-level parametric hardware specification and its modular verification,” Proc. ACM Program. Lang., vol. 1, no. ICFP, aug 2017. [Online]. Available: https://doi.org/10.1145/3110268
- C.-T. Chou, P. K. Mannava, and S. Park, “A simple method for parameterized verification of cache coherence protocols,” in Formal Methods in Computer-Aided Design, A. J. Hu and A. K. Martin, Eds. Berlin, Heidelberg: Springer Berlin Heidelberg, 2004, pp. 382–398.
- E. M. Clarke, O. Grumberg, H. Hiraishi, S. Jha, D. E. Long, K. L. McMillan, and L. A. Ness, “Verification of the futurebus+ cache coherence protocol,” in Computer Hardware Description Languages and their Applications, ser. IFIP Transactions A: Computer Science and Technology, D. AGNEW, L. CLAESEN, and R. CAMPOSANO, Eds. Amsterdam: North-Holland, 1993, pp. 15–30. [Online]. Available: https://www.sciencedirect.com/science/article/pii/B9780444816412500071
- Y. Duan, A. Muzahid, and J. Torrellas, “Weefence: Toward making fences free in tso,” in Proceedings of the 40th Annual International Symposium on Computer Architecture, ser. ISCA ’13. New York, NY, USA: Association for Computing Machinery, 2013, p. 213–224. [Online]. Available: https://doi.org/10.1145/2485922.2485941
- M. Elver and V. Nagarajan, “Mcversi: A test generation framework for fast memory consistency verification in simulation,” in 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA), 2016, pp. 618–630.
- S. Flur, K. E. Gray, C. Pulte, S. Sarkar, A. Sezgin, L. Maranget, W. Deacon, and P. Sewell, “Modelling the ARMv8 architecture, operationally: concurrency and ISA,” in Proceedings of the 43rd ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (St. Petersburg, FL, USA), Jan. 2016, pp. 608–621.
- K. Gharachorloo, A. Gupta, and J. L. Hennessy, “Two techniques to enhance the performance of memory consistency models,” in Proceedings of the International Conference on Parallel Processing, ICPP ’91, Austin, Texas, USA, August 1991. Volume I: Architecture/Hardware. CRC Press, 1991, pp. 355–364.
- K. Gharachorloo, D. Lenoski, J. Laudon, P. Gibbons, A. Gupta, and J. Hennessy, “Memory consistency and event ordering in scalable shared-memory multiprocessors,” in Proceedings of the 17th Annual International Symposium on Computer Architecture, ser. ISCA ’90. New York, NY, USA: Association for Computing Machinery, 1990, p. 15–26. [Online]. Available: https://doi.org/10.1145/325164.325102
- P. B. Gibbons and E. Korach, “Testing shared memories,” SIAM Journal on Computing, vol. 26, no. 4, pp. 1208–1244, 1997. [Online]. Available: https://doi.org/10.1137/S0097539794279614
- S. Hangal, D. Vahia, C. Manovit, J.-Y. Lu, and S. Narayanan, “Tsotool: a program for verifying memory systems using the memory consistency model,” in Proceedings. 31st Annual International Symposium on Computer Architecture, 2004., 2004, pp. 114–123.
- T. A. Henzinger, R. Jhala, R. Majumdar, and G. Sutre, “Lazy abstraction,” in Proceedings of the 29th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, ser. POPL ’02. New York, NY, USA: Association for Computing Machinery, 2002, p. 58–70. [Online]. Available: https://doi.org/10.1145/503272.503279
- N. Hossain, C. Trippel, and M. Martonosi, “Transform: Formally specifying transistency models and synthesizing enhanced litmus tests,” in 47th ACM/IEEE Annual International Symposium on Computer Architecture, ISCA 2020, Valencia, Spain, May 30 - June 3, 2020. IEEE, 2020, pp. 874–887. [Online]. Available: https://doi.org/10.1109/ISCA45697.2020.00076
- Y. Hsiao, D. P. Mulligan, N. Nikoleris, G. Petri, and C. Trippel, “Synthesizing formal models of hardware from rtl for efficient verification of memory model implementations,” in MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture, ser. MICRO ’21. New York, NY, USA: Association for Computing Machinery, 2021, p. 679–694. [Online]. Available: https://doi.org/10.1145/3466752.3480087
- W. Hu, Y. Chen, T. Chen, C. Qian, and L. Li, “Linear time memory consistency verification,” IEEE Transactions on Computers, vol. 61, no. 4, pp. 502–516, 2012.
- J. Huh, J. Chang, D. Burger, and G. S. Sohi, “Coherence decoupling: Making use of incoherence,” SIGPLAN Not., vol. 39, no. 11, p. 97–106, oct 2004. [Online]. Available: https://doi.org/10.1145/1037187.1024406
- Intel, “Intel xeon processor e3-1200 v3 product family, specification update,” April 2015. [Online]. Available: https://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/xeon-e3-1200v3-spec-update-oct2016.pdf
- C. N. Ip and D. L. Dill, “Better verification through symmetry,” in Proceedings of the 11th IFIP WG10.2 International Conference Sponsored by IFIP WG10.2 and in Cooperation with IEEE COMPSOC on Computer Hardware Description Languages and Their Applications, ser. CHDL ’93. NLD: North-Holland Publishing Co., 1993, p. 97–111.
- S. Krstic, “Parameterized system verification with guard strengthening and parameter abstraction,” Automated verification of infinite state systems, 2005.
- L. Lamport, “How to make a multiprocessor computer that correctly executes multiprocess programs,” IEEE Transactions on Computers, vol. C-28, no. 9, pp. 690–691, 1979.
- J. Laudon and D. Lenoski, “The sgi origin: A ccnuma highly scalable server,” in Conference Proceedings. The 24th Annual International Symposium on Computer Architecture, 1997, pp. 241–251.
- K. Lepak, G. Bell, and M. Lipasti, “Silent stores and store value locality,” IEEE Transactions on Computers, vol. 50, no. 11, pp. 1174–1190, 2001.
- P. Loewenstein and D. L. Dill, “Verification of a multiprocessor cache protocol using simulation relations and higher-order logic (summary),” in Computer-Aided Verification, E. M. Clarke and R. P. Kurshan, Eds. Berlin, Heidelberg: Springer Berlin Heidelberg, 1991, pp. 302–311.
- D. Lustig, M. Pellauer, and M. Martonosi, “Pipecheck: Specifying and verifying microarchitectural enforcement of memory consistency models,” in 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture, 2014, pp. 635–646.
- D. Lustig, G. Sethi, M. Martonosi, and A. Bhattacharjee, “Coatcheck: Verifying memory ordering at the hardware-os interface,” in Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems, ser. ASPLOS ’16. New York, NY, USA: Association for Computing Machinery, 2016, p. 233–247. [Online]. Available: https://doi.org/10.1145/2872362.2872399
- D. Lustig, A. Wright, A. Papakonstantinou, and O. Giroux, “Automated synthesis of comprehensive memory model litmus test suites,” in Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems, ser. ASPLOS ’17. New York, NY, USA: Association for Computing Machinery, 2017, p. 661–675. [Online]. Available: https://doi.org/10.1145/3037697.3037723
- S. Mador-Haim, L. Maranget, S. Sarkar, K. Memarian, J. Alglave, S. Owens, R. Alur, M. M. K. Martin, P. Sewell, and D. Williams, “An axiomatic memory model for POWER multiprocessors,” in Proceedings of the 24th International Conference on Computer Aided Verification, 2012, pp. 495–512.
- Y. A. Manerkar, D. Lustig, and M. Martonosi, “Realitycheck: Bringing modularity, hierarchy, and abstraction to automated microarchitectural memory consistency verification.” arXiv, 2020.
- Y. A. Manerkar, D. Lustig, M. Martonosi, and A. Gupta, “Pipeproof: Automated memory consistency proofs for microarchitectural specifications,” in Proceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture, ser. MICRO-51. IEEE Press, 2018, p. 788–801. [Online]. Available: https://doi.org/10.1109/MICRO.2018.00069
- Y. A. Manerkar, D. Lustig, M. Martonosi, and M. Pellauer, “Rtlcheck: Verifying the memory consistency of rtl designs,” in Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, ser. MICRO-50 ’17. New York, NY, USA: Association for Computing Machinery, 2017, p. 463–476. [Online]. Available: https://doi.org/10.1145/3123939.3124536
- Y. A. Manerkar, D. Lustig, M. Pellauer, and M. Martonosi, “Ccicheck: Using μ𝜇\muitalic_μhb graphs to verify the coherence-consistency interface,” in Proceedings of the 48th International Symposium on Microarchitecture, ser. MICRO-48. New York, NY, USA: Association for Computing Machinery, 2015, p. 26–37. [Online]. Available: https://doi.org/10.1145/2830772.2830782
- C. Manovit and S. Hangal, “Completely verifying memory consistency of test program executions,” in The Twelfth International Symposium on High-Performance Computer Architecture, 2006., 2006, pp. 166–175.
- O. Matthews, J. Bingham, and D. J. Sorin, “Verifiable hierarchical protocols with network invariants on parametric systems,” in 2016 Formal Methods in Computer-Aided Design (FMCAD), 2016, pp. 101–108.
- O. Matthews and D. J. Sorin, “Architecting hierarchical coherence protocols for push-button parametric verification,” in 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), 2017, pp. 477–489.
- K. L. McMillan, “Parameterized verification of the flash cache coherence protocol by compositional model checking,” in Correct Hardware Design and Verification Methods, T. Margaria and T. Melham, Eds. Berlin, Heidelberg: Springer Berlin Heidelberg, 2001, pp. 179–195.
- A. Meixner and D. Sorin, “Dynamic verification of memory consistency in cache-coherent multithreaded computer architectures,” in International Conference on Dependable Systems and Networks (DSN’06), 2006, pp. 73–82.
- S. Park and D. L. Dill, “Verification of flash cache coherence protocol by aggregation of distributed transactions,” in Proceedings of the Eighth Annual ACM Symposium on Parallel Algorithms and Architectures, ser. SPAA ’96. New York, NY, USA: Association for Computing Machinery, 1996, p. 288–296. [Online]. Available: https://doi.org/10.1145/237502.237573
- M. Plakal, D. J. Sorin, A. E. Condon, and M. D. Hill, “Lamport clocks: Verifying a directory cache-coherence protocol,” in Proceedings of the Tenth Annual ACM Symposium on Parallel Algorithms and Architectures, ser. SPAA ’98. New York, NY, USA: Association for Computing Machinery, 1998, p. 67–76. [Online]. Available: https://doi.org/10.1145/277651.277672
- F. Pong and M. Dubois, “The verification of cache coherence protocols,” in Proceedings of the 5th Annual ACM Symposium on Parallel Algorithms and Architectures, SPAA ’93, Velen, Germany, June 30 - July 2, 1993, L. Snyder, Ed. ACM, 1993, pp. 11–20. [Online]. Available: https://doi.org/10.1145/165231.165233
- F. Pong and M. Dubois, “A new approach for the verification of cache coherence protocols,” IEEE Trans. Parallel Distributed Syst., vol. 6, no. 8, pp. 773–787, 1995. [Online]. Available: https://doi.org/10.1109/71.406955
- F. Pong and M. Dubois, “Verification techniques for cache coherence protocols,” ACM Comput. Surv., vol. 29, no. 1, p. 82–126, mar 1997. [Online]. Available: https://doi.org/10.1145/248621.248624
- F. Pong and M. Dubois, “Formal verification of complex coherence protocols using symbolic state models,” J. ACM, vol. 45, no. 4, pp. 557–587, 1998. [Online]. Available: https://doi.org/10.1145/285055.285057
- C. Pulte, J. Pichon-Pharabod, J. Kang, S. Lee, and C. Hur, “Promising-arm/risc-v: a simpler and faster operational concurrency model,” in Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation. ACM, 2019, pp. 1–15. [Online]. Available: https://doi.org/10.1145/3314221.3314624
- A. Reid, R. Chen, A. Deligiannis, D. Gilday, D. Hoyes, W. Keen, A. Pathirane, O. Shepherd, P. Vrabel, and A. Zaidi, “End-to-end verification of processors with isa-formal,” in Computer Aided Verification, S. Chaudhuri and A. Farzan, Eds. Cham: Springer International Publishing, 2016, pp. 42–58.
- C. Scheurich and M. Dubois, “Correct memory operation of cache-based multiprocessors,” in Proceedings of the 14th Annual International Symposium on Computer Architecture, ser. ISCA ’87. New York, NY, USA: Association for Computing Machinery, 1987, p. 234–243. [Online]. Available: https://doi.org/10.1145/30350.30377
- P. Sewell, S. Sarkar, S. Owens, F. Zappa Nardelli, and M. O. Myreen, “x86-TSO: A rigorous and usable programmer’s model for x86 multiprocessors,” Communications of the ACM, vol. 53, no. 7, pp. 89–97, Jul. 2010, (Research Highlights). [Online]. Available: http://doi.acm.org/10.1145/1785414.1785443
- D. Shasha and M. Snir, “Efficient and correct execution of parallel programs that share memory,” ACM Trans. Program. Lang. Syst., vol. 10, no. 2, p. 282–312, apr 1988. [Online]. Available: https://doi.org/10.1145/42190.42277
- D. J. Sorin, M. Plakal, A. Condon, M. D. Hill, M. M. K. Martin, and D. A. Wood, “Specifying and verifying a broadcast and a multicast snooping cache coherence protocol,” IEEE Trans. Parallel Distributed Syst., vol. 13, pp. 556–578, 2002.
- M. Talupur and M. R. Tuttle, “Going with the flow: Parameterized verification using message flows,” in 2008 Formal Methods in Computer-Aided Design. IEEE, 2008, pp. 1–8.
- C. Trippel, Y. A. Manerkar, D. Lustig, M. Pellauer, and M. Martonosi, “Tricheck: Memory model verification at the trisection of software, hardware, and isa,” ser. ASPLOS ’17. New York, NY, USA: Association for Computing Machinery, 2017, p. 119–133. [Online]. Available: https://doi.org/10.1145/3037697.3037719
- G. Voskuilen and T. N. Vijaykumar, “Fractal++: Closing the performance gap between fractal and conventional coherence,” in 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA), 2014, pp. 409–420.
- J. Wickerson, M. Batty, T. Sorensen, and G. A. Constantinides, “Automatically comparing memory consistency models,” in Proceedings of the 44th ACM SIGPLAN Symposium on Principles of Programming Languages, ser. POPL ’17. New York, NY, USA: Association for Computing Machinery, 2017, p. 190–204. [Online]. Available: https://doi.org/10.1145/3009837.3009838
- H. Zhang, C. Trippel, Y. A. Manerkar, A. Gupta, M. Martonosi, and S. Malik, “Ila-mcm: Integrating memory consistency models with instruction-level abstractions for heterogeneous system-on-chip verification,” in 2018 Formal Methods in Computer Aided Design (FMCAD), 2018, pp. 1–10.
- M. Zhang, J. D. Bingham, J. Erickson, and D. J. Sorin, “Pvcoherence: Designing flat coherence protocols for scalable verification,” in 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA), 2014, pp. 392–403.
- M. Zhang, A. R. Lebeck, and D. J. Sorin, “Fractal coherence: Scalably verifiable cache coherence,” in 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture, 2010, pp. 471–482.
- J. Zhao, B. Korpan, A. Gonzalez, and K. Asanovic, “Sonicboom: The 3rd generation berkeley out-of-order machine,” Fourth Workshop on Computer Architecture Research with RISC-V, May 2020.