Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
167 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Self-Supervised Learning to Prove Equivalence Between Straight-Line Programs via Rewrite Rules (2109.10476v4)

Published 22 Sep 2021 in cs.LG and cs.PL

Abstract: We target the problem of automatically synthesizing proofs of semantic equivalence between two programs made of sequences of statements. We represent programs using abstract syntax trees (AST), where a given set of semantics-preserving rewrite rules can be applied on a specific AST pattern to generate a transformed and semantically equivalent program. In our system, two programs are equivalent if there exists a sequence of application of these rewrite rules that leads to rewriting one program into the other. We propose a neural network architecture based on a transformer model to generate proofs of equivalence between program pairs. The system outputs a sequence of rewrites, and the validity of the sequence is simply checked by verifying it can be applied. If no valid sequence is produced by the neural network, the system reports the programs as non-equivalent, ensuring by design no programs may be incorrectly reported as equivalent. Our system is fully implemented for one single grammar which can represent straight-line programs with function calls and multiple types. To efficiently train the system to generate such sequences, we develop an original incremental training technique, named self-supervised sample selection. We extensively study the effectiveness of this novel training approach on proofs of increasing complexity and length. Our system, S4Eq, achieves 97% proof success on a curated dataset of 10,000 pairs of equivalent programs.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (98)
  1. Ian J. Goodfellow, Yoshua Bengio and Aaron Courville “Deep Learning” http://www.deeplearningbook.org Cambridge, MA, USA: MIT Press, 2016
  2. Donald M Kaplan “Regular expressions and the equivalence of programs” In Journal of Computer and System Sciences 3.4 Academic Press, 1969, pp. 361–386
  3. “Inference rules for proving the equivalence of recursive procedures” In Acta Informatica 45.6 Springer, 2008, pp. 403–439
  4. Sven Verdoolaege, Gerda Janssens and Maurice Bruynooghe “Equivalence checking of static affine programs using widening to handle recurrences” In Computer aided verification, 2009, pp. 599–613 Springer
  5. “Well-structured program equivalence is highly undecidable” In ACM Transactions on Computational Logic (TOCL) 13.3 ACM, 2012, pp. 26
  6. Nachum Dershowitz “Computing with rewrite systems” In Information and Control 65.2-3 Elsevier, 1985, pp. 122–157
  7. George C Necula “Translation validation for an optimizing compiler” In ACM SIGPLAN Notices 35.5 ACM, 2000, pp. 83–94
  8. Philip Ginsbach, Bruce Collie and Michael FP O’Boyle “Automatically harnessing sparse acceleration” In Proceedings of the 29th International Conference on Compiler Construction, 2020, pp. 179–190
  9. “Semantics-Based Obfuscation-Resilient Binary Code Similarity Comparison with Applications to Software and Algorithm Plagiarism Detection” In IEEE Transactions on Software Engineering 43.12, 2017, pp. 1157–1177 DOI: 10.1109/TSE.2017.2655046
  10. “Program Equivalence for Assisted Grading of Functional Programs” In Proc. ACM Program. Lang. 4.OOPSLA New York, NY, USA: Association for Computing Machinery, 2020 DOI: 10.1145/3428239
  11. “OpenNMT: Open-Source Toolkit for Neural Machine Translation” In Proc. ACL, 2017 DOI: 10.18653/v1/P17-4012
  12. GitHub “The 2020 State of the Octoverse”, 2021 URL: https://octoverse.github.com/
  13. John Cocke “Global common subexpression elimination” In Proceedings of a symposium on Compiler optimization, 1970, pp. 20–24
  14. “Language-parametric compiler validation with application to LLVM” In Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2021, pp. 1004–1019
  15. Kunal Banerjee, Chittaranjan Mandal and Dipankar Sarkar “Extending the scope of translation validation by augmenting path based equivalence checkers with SMT solvers” In 18th International Symposium on VLSI Design and Test, 2014, pp. 1–6 IEEE
  16. Steve Kommrusch “S4Eq Software”, https://github.com/SteveKommrusch/PrgEq, 2021
  17. “Probabilistic Algorithms for Deciding Equivalence of Straight-Line Programs” In J. ACM 30, 1983, pp. 217–228 DOI: 10.1145/322358.322373
  18. Vijay S Pai and Sarita Adve “Code transformations to improve memory parallelism” In MICRO-32. Proceedings of the 32nd Annual ACM/IEEE International Symposium on Microarchitecture, 1999, pp. 147–155 IEEE
  19. “Source-to-source optimization for HLS” In FPGAs for Software Programmers Springer, 2016, pp. 137–163
  20. “Program analysis for compiler validation” In Proceedings of the 8th ACM SIGPLAN-SIGSOFT workshop on Program analysis for software tools and engineering, 2008, pp. 1–7
  21. Steve Kommrusch, Théo Barollet and Louis-Noël Pouchet “Equivalence of dataflow graphs via rewrite rules using a graph-to-sequence neural model” In arXiv preprint arXiv:2002.06799, 2020
  22. Steve Kommrusch “MACHINE LEARNING FOR COMPUTER AIDED PROGRAMMING: FROM STOCHASTIC PROGRAM REPAIR TO VERIFIABLE PROGRAM EQUIVALENCE”, 2021
  23. “Taylor expansion diagrams: A compact, canonical representation with applications to symbolic verification” In Proceedings 2002 Design, Automation and Test in Europe Conference and Exhibition, 2002, pp. 285–289 IEEE
  24. Steven Muchnick “Advanced Compiler Design Implementation.” Morgan Kaufman, 1997
  25. George C. Necula “Translation Validation for an Optimizing Compiler” In SIGPLAN Not. 35.5 Association for Computing Machinery, 2000, pp. 83–94 DOI: 10.1145/358438.349314
  26. “Verification of Loop and Arithmetic Transformations of Array-Intensive Behaviors” In IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 32.11, 2013, pp. 1787–1800 DOI: 10.1109/TCAD.2013.2272536
  27. “Polycheck: Dynamic verification of iteration space transformations on affine programs” In ACM SIGPLAN Notices 51.1, 2016, pp. 539–554 ACM
  28. “When polyhedral transformations meet SIMD code generation” In Proceedings of the 34th ACM SIGPLAN conference on Programming language design and implementation, 2013, pp. 127–138
  29. “Black-Box Equivalence Checking Across Compiler Optimizations” In Asian Symposium on Programming Languages and Systems, 2017
  30. “Learning dynamic polynomial proofs” In Advances in Neural Information Processing Systems 32 Curran Associates, Inc., 2019, pp. 4179–4188 URL: http://papers.nips.cc/paper/8671-learning-dynamic-polynomial-proofs.pdf
  31. “HOList: An Environment for Machine Learning of Higher Order Logic Theorem Proving” In Proceedings of the 36th International Conference on Machine Learning 97, Proceedings of Machine Learning Research Long Beach, California, USA: PMLR, 2019, pp. 454–463 URL: http://proceedings.mlr.press/v97/bansal19a.html
  32. “Graph Representations for Higher-Order Logic and Theorem Proving” In arXiv e-prints, 2019, pp. arXiv:1905.10006 arXiv:1905.10006 [cs.LG]
  33. Sal Khan “Properties of matrix multiplication” In Khan Academy (accessed May 20, 2020), 2020 URL: https://www.khanacademy.org/math/precalculus/x9e81a4f98389efdf:matrices/x9e81a4f98389efdf:properties-of-matrix-multiplication/a/properties-of-matrix-multiplication
  34. “On the naturalness of software” In 2012 34th International Conference on Software Engineering (ICSE), 2012, pp. 837–847 IEEE
  35. “SequenceR: Sequence-to-Sequence Learning for End-to-End Program Repair” In IEEE Transactions on Software Engineering, 2019 DOI: 10.1109/TSE.2019.2940179
  36. I Sutskever, O Vinyals and QV Le “Sequence to sequence learning with neural networks” In Advances in NIPS, 2014
  37. “Google’s neural machine translation system: Bridging the gap between human and machine translation” In arXiv preprint arXiv:1609.08144, 2016
  38. “Abstractive text summarization using sequence-to-sequence rnns and beyond” In arXiv preprint arXiv:1602.06023, 2016
  39. “Attention is all you need” In Advances in neural information processing systems, 2017, pp. 5998–6008
  40. Diederik P. Kingma and Jimmy Ba “Adam: A Method for Stochastic Optimization” In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, 2015 URL: http://arxiv.org/abs/1412.6980
  41. “An empirical investigation of catastrophic forgetting in gradient-based neural networks” In arXiv preprint arXiv:1312.6211, 2013
  42. Thomas G. Dietterich “Ensemble Methods in Machine Learning” In Proceedings of the First International Workshop on Multiple Classifier Systems, MCS ’00 Berlin, Heidelberg: Springer-Verlag, 2000, pp. 1–15
  43. “Hindsight Experience Replay” In Advances in Neural Information Processing Systems 30 Curran Associates, Inc., 2017, pp. 5048–5058 URL: http://papers.nips.cc/paper/7090-hindsight-experience-replay.pdf
  44. Peng Zhao and José Nelson Amaral “Ablego: A Function Outlining and Partial Inlining Framework: Research Articles” In Softw. Pract. Exper. 37.5 USA: John Wiley & Sons, Inc., 2007, pp. 465–491
  45. Lutz Prechelt “Early stopping-but when?” In Neural Networks: Tricks of the trade Springer, 1998, pp. 55–69
  46. “Client-Specific Equivalence Checking” In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering, ASE 2018 Montpellier, France: Association for Computing Machinery, 2018, pp. 441–451 DOI: 10.1145/3238147.3238178
  47. Steve Kommrusch, Théo Barollet and Louis-Noël Pouchet “Proving Equivalence Between Complex Expressions Using Graph-to-Sequence Neural Models” In CoRR abs/2106.02452, 2021 arXiv: https://arxiv.org/abs/2106.02452
  48. “Scaling Laws for Neural Language Models” In ArXiv abs/2001.08361, 2020
  49. Sven Verdoolaege, Gerda Janssens and Maurice Bruynooghe “Equivalence checking of static affine programs using widening to handle recurrences” In ACM Trans. on Programming Languages and Systems (TOPLAS) 34.3 ACM, 2012, pp. 11
  50. “On the recognition of algorithm templates” In Electronic Notes in Theoretical Computer Science 82.2 Elsevier, 2004, pp. 395–409
  51. Denis Barthou, Paul Feautrier and Xavier Redon “On the equivalence of two systems of affine recurrence equations” In Euro-Par 2002 Parallel Processing, 2002
  52. Guillaume Iooss, Christophe Alias and Sanjay Rajopadhye “On program equivalence with reductions” In International Static Analysis Symposium, 2014, pp. 168–183 Springer
  53. “Verification of Polyhedral Optimizations with Constant Loop Bounds in Finite State Space Computations” In Proc. of the 6th International Symposium On Leveraging Applications of Formal Methods, Verification and Validation Springer, 2014
  54. “Semantic program alignment for equivalence checking” In Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation, 2019, pp. 1027–1040
  55. “ARDiff: Scaling Program Equivalence Checking via Iterative Abstraction and Refinement of Common Code” In Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2020 Virtual Event, USA: Association for Computing Machinery, 2020, pp. 13–24 DOI: 10.1145/3368089.3409757
  56. “Interactive theorem proving and program development: Coq’Art: the calculus of inductive constructions” Springer Science & Business Media, 2013
  57. Lawrence C. Paulson “Isabelle Page”, https://www.cl.cam.ac.uk/research/hvg/Isabelle
  58. Bernhard Steffen “Data flow analysis as model checking” In International Symposium on Theoretical Aspects of Computer Software, 1991, pp. 346–364 Springer
  59. Edmund Clarke, Daniel Kroening and Karen Yorav “Behavioral consistency of C and Verilog programs using bounded model checking” In Proceedings 2003. Design Automation Conference (IEEE Cat. No. 03CH37451), 2003, pp. 368–371 IEEE
  60. “Model checking programs” In Automated software engineering 10.2 Springer, 2003, pp. 203–232
  61. Kedar S Namjoshi and Robert P Kurshan “Syntactic program transformations for automatic abstraction” In International Conference on Computer Aided Verification, 2000, pp. 435–449 Springer
  62. Sara Kalvala, Richard Warburton and David Lacey “Program transformations using temporal logic side conditions” In ACM Trans. on Programming Languages and Systems (TOPLAS) 31.4 ACM, 2009, pp. 14
  63. “A framework for formal verification of compiler optimizations” In Interactive Theorem Proving Springer, 2010
  64. Eelco Visser “Program transformation with Stratego/XT” In Domain-specific program generation Springer, 2004, pp. 216–238
  65. “Program equivalence by circular reasoning” In Formal Aspects of Computing 27.4 Springer, 2015, pp. 701–726
  66. Uday S Reddy “Rewriting techniques for program synthesis” In International Conference on Rewriting Techniques and Applications, 1989, pp. 388–403 Springer
  67. “Egg: Fast and Extensible Equality Saturation” In Proc. ACM Program. Lang. 5.POPL New York, NY, USA: Association for Computing Machinery, 2021 DOI: 10.1145/3434304
  68. Andrzej S Murawski and Joël Ouaknine “On probabilistic program equivalence and refinement” In International Conference on Concurrency Theory, 2005, pp. 156–170 Springer
  69. “Approximate probabilistic model checking” In International Workshop on Verification, Model Checking, and Abstract Interpretation, 2004, pp. 73–84 Springer
  70. “Probabilistic theorem proving” In arXiv preprint arXiv:1202.3724, 2012
  71. Sahar Badihi, Yi Li and Julia Rubin “EqBench: A Dataset of Equivalent and Non-equivalent Program Pairs” In 2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR), 2021, pp. 610–614 DOI: 10.1109/MSR52588.2021.00084
  72. “An Appraisal of Incremental Learning Methods” In Entropy 22.11, 2020 DOI: 10.3390/e22111190
  73. He Ye, Matias Martinez and Martin Monperrus “Neural Program Repair with Execution-based Backpropagation” In CoRR abs/2105.04123, 2021 arXiv: https://arxiv.org/abs/2105.04123
  74. Wei Ding “Exploring the Possibilities of Applying Transfer Learning Methods for Natural Language Processing in Software Development”, 2021
  75. Antonio-Javier Gallego, Jorge Calvo-Zaragoza and Robert B. Fisher “Incremental Unsupervised Domain-Adversarial Training of Neural Networks” In IEEE Transactions on Neural Networks and Learning Systems 32.11, 2021, pp. 4864–4878 DOI: 10.1109/TNNLS.2020.3025954
  76. Shan Huang, Xiao Zhou and Sang Chin “Application of Seq2Seq Models on Code Correction” In Frontiers in artificial intelligence 4, 2021, pp. 590215 DOI: 10.3389/frai.2021.590215
  77. “Decision Transformer: Reinforcement Learning via Sequence Modeling” In CoRR abs/2106.01345, 2021 arXiv: https://arxiv.org/abs/2106.01345
  78. “Deep Learning For Symbolic Mathematics” In International Conference on Learning Representations, 2020 URL: https://openreview.net/forum?id=S1eZYeHFDS
  79. “HyperTree Proof Search for Neural Theorem Proving” arXiv, 2022 DOI: 10.48550/ARXIV.2205.11491
  80. “Recognizing and Verifying Mathematical Equations using Multiplicative Differential Neural Units” In 35th AAAI Conference on Artificial Intelligence AAAI Press, 2021, pp. 5006–5015 URL: https://ojs.aaai.org/index.php/AAAI/article/view/16634
  81. “A Deep Reinforcement Learning Approach to First-Order Logic Theorem Proving” In Proceedings of the AAAI Conference on Artificial Intelligence 35.7, 2021, pp. 6279–6287 URL: https://ojs.aaai.org/index.php/AAAI/article/view/16780
  82. “Generative Language Modeling for Automated Theorem Proving” In arXiv e-prints, 2020, pp. arXiv:2009.03393 DOI: 10.48550/arXiv.2009.03393
  83. “A Survey of Machine Learning for Big Code and Naturalness” In ACM Comput. Surv. 51.4 New York, NY, USA: ACM, 2018, pp. 81:1–81:37 DOI: 10.1145/3212695
  84. “Code2Vec: Learning Distributed Representations of Code” In Proc. ACM Program. Lang. 3.POPL New York, NY, USA: ACM, 2019, pp. 40:1–40:29 DOI: 10.1145/3290353
  85. “An Empirical Study on Learning Bug-Fixing Patches in the Wild via Neural Machine Translation” In ACM Trans. Softw. Eng. Methodol. 28.4 New York, NY, USA: ACM, 2019, pp. 19:1–19:29 DOI: 10.1145/3340544
  86. “DIRE: A Neural Approach to Decompiled Identifier Naming” In International Conference on Automated Software Engineering, ASE ’19, 2019
  87. Veselin Raychev, Martin Vechev and Andreas Krause “Predicting Program Properties from "Big Code"” In Proceedings of the 42Nd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL ’15 Mumbai, India: ACM, 2015, pp. 111–124 DOI: 10.1145/2676726.2677009
  88. Rohan Bavishi, Michael Pradel and Koushik Sen “Context2Name: A Deep Learning-Based Approach to Infer Natural Variable Names from Usage Contexts”, 2017 URL: http://tubiblio.ulb.tu-darmstadt.de/101419/
  89. Pavol Bielik, Veselin Raychev and Martin Vechev “PHOG: Probabilistic Model for Code” In Proceedings of The 33rd International Conference on Machine Learning 48, Proceedings of Machine Learning Research New York, New York, USA: PMLR, 2016, pp. 2933–2942 URL: http://proceedings.mlr.press/v48/bielik16.pdf
  90. Zimin Chen, Steve James Kommrusch and Martin Monperrus “Neural Transfer Learning for Repairing Security Vulnerabilities in C Code” In IEEE Transactions on Software Engineering, 2022, pp. 1–1 DOI: 10.1109/TSE.2022.3147265
  91. “Generating Bug-Fixes Using Pretrained Transformers” In Proceedings of the 5th ACM SIGPLAN International Symposium on Machine Programming, MAPS 2021 Virtual, Canada: Association for Computing Machinery, 2021, pp. 1–8 DOI: 10.1145/3460945.3464951
  92. “An Empirical Evaluation of Rule Extraction from Recurrent Neural Networks” In Neural Comput. 30.9 Cambridge, MA, USA: MIT Press, 2018, pp. 2568–2591 DOI: 10.1162/neco_a_01111
  93. M. Tomita “Dynamic Construction of Finite Automata from examples using Hill-climbing” In Proceedings of the Fourth Annual Conference of the Cognitive Science Society, 1982, pp. 105–108
  94. “On the generalizability of Neural Program Models with respect to semantic-preserving program transformations” In Information and Software Technology 135, 2021, pp. 106552 DOI: https://doi.org/10.1016/j.infsof.2021.106552
  95. Nghi D.Q. Bui “Efficient Framework for Learning Code Representations through Semantic-Preserving Program Transformations” In arXiv e-prints, 2020, pp. arXiv:2009.02731 arXiv:2009.02731 [cs.SE]
  96. Nghi D.Q. Bui, Yijun Yu and Lingxiao Jiang “Self-Supervised Contrastive Learning for Code Retrieval and Summarization via Semantic-Preserving Transformations” In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval New York, NY, USA: Association for Computing Machinery, 2021, pp. 511–521 URL: https://doi.org/10.1145/3404835.3462840
  97. Miltiadis Allamanis, Henry Jackson-Flux and Marc Brockschmidt “Self-Supervised Bug Detection and Repair” In NeurIPS, 2021
  98. “Learning from Self-Sampled Correct and Partially-Correct Programs” arXiv, 2022 DOI: 10.48550/ARXIV.2205.14318
Citations (6)

Summary

We haven't generated a summary for this paper yet.