Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

PyTy: Repairing Static Type Errors in Python (2401.06619v1)

Published 12 Jan 2024 in cs.SE

Abstract: Gradual typing enables developers to annotate types of their own choosing, offering a flexible middle ground between no type annotations and a fully statically typed language. As more and more code bases get type-annotated, static type checkers detect an increasingly large number of type errors. Unfortunately, fixing these errors requires manual effort, hampering the adoption of gradual typing in practice. This paper presents PyTy, an automated program repair approach targeted at statically detectable type errors in Python. The problem of repairing type errors deserves specific attention because it exposes particular repair patterns, offers a warning message with hints about where and how to apply a fix, and because gradual type checking serves as an automatic way to validate fixes. We addresses this problem through three contributions: (i) an empirical study that investigates how developers fix Python type errors, showing a diverse set of fixing strategies with some recurring patterns; (ii) an approach to automatically extract type error fixes, which enables us to create a dataset of 2,766 error-fix pairs from 176 GitHub repositories, named PyTyDefects; (iii) the first learning-based repair technique for fixing type errors in Python. Motivated by the relative data scarcity of the problem, the neural model at the core of PyTy is trained via cross-lingual transfer learning. Our evaluation shows that PyTy offers fixes for ten frequent categories of type errors, successfully addressing 85.4% of 281 real-world errors. This effectiveness outperforms state-of-the-art LLMs asked to repair type errors (by 2.1x) and complements a previous technique aimed at type errors that manifest at runtime. Finally, 20 out of 30 pull requests with PyTy-suggested fixes have been merged by developers, showing the usefulness of PyTy in practice.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (56)
  1. Typilus: Neural Type Hints. In PLDI.
  2. Learning to Represent Programs with Graphs. In International Conference on Learning Representations (ICLR).
  3. Getafix: Learning to Fix Bugs Automatically. In OOPSLA. 159:1–159:27.
  4. Phoenix: Automated data-driven synthesis of repairs for static analysis violations. In Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 613–624.
  5. TFix: Learning to Fix Coding Errors with a Text-to-Text Transformer. In ICML.
  6. SequenceR: Sequence-to-Sequence Learning for End-to-End Program Repair. IEEE TSE (2019).
  7. Neural Transfer Learning for Repairing Security Vulnerabilities in C Code. IEEE Transactions on Software Engineering (2022), 1–1. https://doi.org/10.1109/TSE.2022.3147265
  8. Jacob Cohen. 1960. A coefficient of agreement for nominal scales. Educational and psychological measurement 20, 1 (1960), 37–46.
  9. Luca Di Grazia and Michael Pradel. 2022. The Evolution of Type Annotations in Python: An Empirical Study. In Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (Singapore, Singapore) (ESEC/FSE 2022). Association for Computing Machinery, New York, NY, USA, 209–220. https://doi.org/10.1145/3540250.3549114
  10. Hoppity: Learning graph transformations to detect and fix bugs in programs. In International Conference on Learning Representations (ICLR).
  11. CodeTrans: Towards Cracking the Language of Silicon’s Code Through Self-Supervised Deep Learning and High Performance Computing. arXiv preprint arXiv:2104.02443 (2021).
  12. Sorald: Automatic Patch Suggestions for SonarQube Static Analysis Violations. IEEE Transactions on Dependable and Secure Computing 20, 4 (2023), 2794–2810. https://doi.org/10.1109/TDSC.2022.3167316
  13. CodeBERT: A Pre-Trained Model for Programming and Natural Languages. In Findings of the Association for Computational Linguistics: EMNLP 2020, Online Event, 16-20 November 2020 (Findings of ACL, Vol. EMNLP 2020), Trevor Cohn, Yulan He, and Yang Liu (Eds.). Association for Computational Linguistics, 1536–1547. https://doi.org/10.18653/v1/2020.findings-emnlp.139
  14. The discovery of grounded theory; strategies for qualitative research. Nursing research 17, 4 (1968), 364.
  15. An empirical study of flaky tests in python. In 2021 14th IEEE Conference on Software Testing, Verification and Validation (ICST). IEEE, 148–158.
  16. GraphCodeBERT: Pre-training Code Representations with Data Flow. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net. https://openreview.net/forum?id=jLoC4ez43PZ
  17. Deep learning type inference. In Proceedings of the 2018 ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/SIGSOFT FSE 2018, Lake Buena Vista, FL, USA, November 04-09, 2018, Gary T. Leavens, Alessandro Garcia, and Corina S. Pasareanu (Eds.). ACM, 152–162. https://doi.org/10.1145/3236024.3236051
  18. Automatically Reducing Tree-Structured Test Inputs. In ASE.
  19. Impact of code language models on automated program repair. In Proceedings of the 45th International Conference on Software Engineering (ICSE 2023). Association for Computing Machinery.
  20. An Empirical Study of Type-Related Defects in Python Projects. IEEE Transactions on Software Engineering (2021).
  21. Automatic patch generation learned from human-written patches.. In International Conference on Software Engineering (ICSE). 802–811.
  22. Taku Kudo and John Richardson. 2018. SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing. In Conference on Empirical Methods in Natural Language Processing.
  23. J. Richard Landis and Gary G. Koch. 1977. The Measurement of Observer Agreement for Categorical Data. Biometrics 33, 1 (1977), 159–174. http://www.jstor.org/stable/2529310
  24. GenProg: A Generic Method for Automatic Software Repair. IEEE Trans. Software Eng. 38, 1 (2012), 54–72.
  25. Automated program repair. Commun. ACM 62, 12 (2019), 56–65. https://doi.org/10.1145/3318162
  26. CoCoNuT: combining context-aware neural translation models using ensemble for program repair. In ISSTA ’20: 29th ACM SIGSOFT International Symposium on Software Testing and Analysis, Virtual Event, USA, July 18-22, 2020, Sarfraz Khurshid and Corina S. Pasareanu (Eds.). ACM, 101–114. https://doi.org/10.1145/3395363.3397369
  27. NL2Type: Inferring JavaScript function types from natural language information. In Proceedings of the 41st International Conference on Software Engineering, ICSE 2019, Montreal, QC, Canada, May 25-31, 2019. 304–315. https://doi.org/10.1109/ICSE.2019.00045
  28. SpongeBugs: Automatically generating fix suggestions in response to static code analysis warnings. J. Syst. Softw. 168 (2020), 110671. https://doi.org/10.1016/j.jss.2020.110671
  29. Angelix: Scalable multiline program patch synthesis via symbolic analysis. In Proceedings of the 38th international conference on software engineering. 691–701.
  30. Type4Py: Practical Deep Similarity Learning-Based Type Inference for Python. In Proceedings of the 44th International Conference on Software Engineering (Pittsburgh, Pennsylvania) (ICSE ’22). Association for Computing Machinery, New York, NY, USA, 2241–2252. https://doi.org/10.1145/3510003.3510124
  31. SemFix: program repair via semantic analysis. In 35th International Conference on Software Engineering, ICSE ’13, San Francisco, CA, USA, May 18-26, 2013. 772–781.
  32. Wonseok Oh and Hakjoo Oh. 2022. PyTER: Effective Program Repair for Python Type Errors. In ESEC/FSE.
  33. Static Inference Meets Deep Learning: A Hybrid Type Inference Approach for Python. In Proceedings of the 44th International Conference on Software Engineering (Pittsburgh, Pennsylvania) (ICSE ’22). Association for Computing Machinery, New York, NY, USA, 2019–2030. https://doi.org/10.1145/3510003.3510038
  34. TypeWriter: Neural Type Prediction with Search-based Validation. In ESEC/FSE ’20: 28th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, Virtual Event, USA, November 8-13, 2020. 209–220. https://doi.org/10.1145/3368089.3409715
  35. Michael Pradel and Koushik Sen. 2018. DeepBugs: A learning approach to name-based bug detection. PACMPL 2, OOPSLA (2018), 147:1–147:25. https://doi.org/10.1145/3276517
  36. An analysis of patch plausibility and correctness for generate-and-validate patch generation systems. In Proceedings of the 2015 International Symposium on Software Testing and Analysis. 24–36.
  37. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. J. Mach. Learn. Res. 21 (2020), 140:1–140:67. http://jmlr.org/papers/v21/20-074.html
  38. Python 3 Types in the Wild: A Tale of Two Type Systems. In DLS.
  39. Type error feedback via analytic program repair. In Proceedings of the 41st ACM SIGPLAN International Conference on Programming Language Design and Implementation, PLDI 2020, London, UK, June 15-20, 2020, Alastair F. Donaldson and Emina Torlak (Eds.). ACM, 16–30. https://doi.org/10.1145/3385412.3386005
  40. Jeremy G. Siek and Walid Taha. 2007. Gradual Typing for Objects. In ECOOP 2007 - Object-Oriented Programming, 21st European Conference, Berlin, Germany, July 30 - August 3, 2007, Proceedings (Lecture Notes in Computer Science, Vol. 4609), Erik Ernst (Ed.). Springer, 2–27. https://doi.org/10.1007/978-3-540-73589-2_2
  41. Perses: syntax-guided program reduction. In Proceedings of the 40th International Conference on Software Engineering, ICSE 2018, Gothenburg, Sweden, May 27 - June 03, 2018, Michel Chaudron, Ivica Crnkovic, Marsha Chechik, and Mark Harman (Eds.). ACM, 361–371. https://doi.org/10.1145/3180155.3180236
  42. TreeGen: A tree-based transformer architecture for code generation. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 8984–8991.
  43. On learning meaningful code changes via neural machine translation. In Proceedings of the 41st International Conference on Software Engineering, ICSE 2019, Montreal, QC, Canada, May 25-31, 2019. 25–36. https://dl.acm.org/citation.cfm?id=3339509
  44. Neural Program Repair by Jointly Learning to Localize and Repair. In ICLR.
  45. What Do They Capture? A Structural Analysis of Pre-Trained Language Models for Source Code. In Proceedings of the 44th International Conference on Software Engineering (Pittsburgh, Pennsylvania) (ICSE ’22). Association for Computing Machinery, New York, NY, USA, 2377–2388. https://doi.org/10.1145/3510003.3510050
  46. Demystifying Code Summarization Models. CoRR (2021). https://arxiv.org/abs/2102.04625
  47. A survey on software fault localization. IEEE Transactions on Software Engineering 42, 8 (2016), 707–740.
  48. Automated program repair in the era of large pre-trained language models. In Proceedings of the 45th International Conference on Software Engineering (ICSE 2023). Association for Computing Machinery.
  49. Chunqiu Steven Xia and Lingming Zhang. 2022. Less training, more repairing please: revisiting automated program repair via zero-shot learning. In Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2022, Singapore, Singapore, November 14-18, 2022, Abhik Roychoudhury, Cristian Cadar, and Miryung Kim (Eds.). ACM, 959–971. https://doi.org/10.1145/3540250.3549101
  50. Chunqiu Steven Xia and Lingming Zhang. 2023. Keep the Conversation Going: Fixing 162 out of 337 bugs for $0.42 each using ChatGPT. arXiv preprint arXiv:2304.00385 (2023).
  51. Python probabilistic type inference with natural language support. In Proceedings of the 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, FSE 2016, Seattle, WA, USA, November 13-18, 2016. 607–618. https://doi.org/10.1145/2950290.2950343
  52. Michihiro Yasunaga and Percy Liang. 2020. Graph-based, Self-Supervised Program Repair from Diagnostic Feedback. In Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event (Proceedings of Machine Learning Research, Vol. 119). PMLR, 10799–10808. http://proceedings.mlr.press/v119/yasunaga20a.html
  53. Neural Program Repair with Execution-based Backpropagation. In ICSE.
  54. Andreas Zeller. 2002. Isolating cause-effect chains from computer programs. ACM SIGSOFT Software Engineering Notes 27, 6 (2002), 1–10.
  55. A Study of Bug Resolution Characteristics in Popular Programming Languages. IEEE Transactions on Software Engineering 47, 12 (2021), 2684–2697. https://doi.org/10.1109/TSE.2019.2961897
  56. A syntax-guided edit decoder for neural program repair. In ESEC/FSE ’21: 29th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, Athens, Greece, August 23-28, 2021, Diomidis Spinellis, Georgios Gousios, Marsha Chechik, and Massimiliano Di Penta (Eds.). ACM, 341–353. https://doi.org/10.1145/3468264.3468544
Citations (10)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com