Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
139 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Understanding How CodeLLMs (Mis)Predict Types with Activation Steering (2404.01903v2)

Published 2 Apr 2024 in cs.CL, cs.LG, and cs.PL

Abstract: CodeLLMs are transforming software development as we know it. This is especially true for tasks where rule-based approaches fall short, like type prediction. The type prediction task consists in adding a new type annotation to a partially typed program, such that the resulting program is closer to being fully typed. The intractability of rule-based approaches and high cost of manual annotation make CodeLLMs an attractive solution to the problem. However, CodeLLMs are still far from being deployed on the large-scale due to doubts surrounding their reliability. To shed some light on how CodeLLMs approach type prediction, we investigate what happens when a model mispredicts a type. We show that by applying semantics-preserving edits to code, CodeLLMs are eventually misled into mispredicting type annotations. However, by leveraging activation steering we are able to "steer" the model back to the correct prediction, making models more robust against semantically irrelevant prompt features. We show that steering achieves comparable performance to fine-tuning directly on the type prediction task. Furthermore, we find that steering vectors computed from Python code are effective at correcting TypeScript mispredictions, and vice versa. To our knowledge, this is the first evidence of its kind to suggest that CodeLLMs learn task representations that transfer across languages.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (52)
  1. GitHub Copilot: Your AI pair programmer, 2021. URL https://github.com/features/copilot/.
  2. Efficient training of language models to fill in the middle. arXiv preprint arXiv:2207.14255, 2022.
  3. Understanding TypeScript. In Richard Jones (ed.), ECOOP 2014 – Object-Oriented Programming, Lecture Notes in Computer Science, pp.  257–281, Berlin, Heidelberg, 2014. Springer. ISBN 978-3-662-44202-9. doi: 10.1007/978-3-662-44202-9˙11.
  4. Migrating Gradual Types. Proceedings of the ACM on Programming Languages (PACMPL), 2(POPL), 2018.
  5. Soft typing. In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), 1991.
  6. Evaluating Large Language Models Trained on Code, July 2021. URL http://arxiv.org/abs/2107.03374. arXiv:2107.03374 [cs].
  7. Hints on Test Data Selection: Help for the Practicing Programmer. Computer, 11(4):34–41, April 1978. ISSN 1558-0814. doi: 10.1109/C-M.1978.218136. URL https://ieeexplore.ieee.org/document/1646911. Conference Name: Computer.
  8. Probing explicit and implicit gender bias through llm conditional text generation. arXiv preprint arXiv:2311.00306, 2023.
  9. A mathematical framework for transformer circuits. Transformer Circuits Thread, 1:1, 2021.
  10. InCoder: A Generative Model for Code Infilling and Synthesis. In International Conference on Learning Representations (ICLR), 2023.
  11. Transformer feed-forward layers build predictions by promoting concepts in the vocabulary space. In Yoav Goldberg, Zornitsa Kozareva, and Yue Zhang (eds.), Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pp.  30–45, Abu Dhabi, United Arab Emirates, December 2022. Association for Computational Linguistics. doi: 10.18653/v1/2022.emnlp-main.3. URL https://aclanthology.org/2022.emnlp-main.3.
  12. DeepSeek-Coder: When the Large Language Model Meets Programming – The Rise of Code Intelligence, January 2024. URL http://arxiv.org/abs/2401.14196. arXiv:2401.14196 [cs].
  13. On the type structure of standard ML. ACM Transactions on Programming Languages and Systems, 15(2):211–252, April 1993. ISSN 0164-0925. doi: 10.1145/169701.169696. URL https://dl.acm.org/doi/10.1145/169701.169696.
  14. Deep Learning Type Inference. In ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE), 2018.
  15. In-context learning creates task vectors. arXiv preprint arXiv:2310.15916, 2023.
  16. Safe polymorphic type inference for a dynamically typed language: Translating Scheme to ML. In International Conference on Functional Programming Languages and Computer Architecture (FPCA), 1995.
  17. Do large code models understand programming concepts? a black-box approach. arXiv preprint arXiv:2402.05980, 2024.
  18. Deploying and Evaluating LLMs to Program Service Mobile Robots. IEEE Robotics and Automation Letters, pp.  1–8, 2024. ISSN 2377-3766. doi: 10.1109/LRA.2024.3360020. URL https://ieeexplore.ieee.org/document/10416558.
  19. Editing models with task arithmetic. arXiv preprint arXiv:2212.04089, 2022.
  20. Learning type annotation: is big data enough? In Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pp.  1483–1486, Athens Greece, August 2021. ACM. ISBN 978-1-4503-8562-6. doi: 10.1145/3468264.3473135. URL https://dl.acm.org/doi/10.1145/3468264.3473135.
  21. Learning To Predict User-Defined Types. IEEE Transactions on Software Engineering, pp.  1–1, 2022. ISSN 0098-5589, 1939-3520, 2326-3881. doi: 10.1109/TSE.2022.3178945. URL https://ieeexplore.ieee.org/document/9785755/.
  22. Are mutants a valid substitute for real faults in software testing? In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering, FSE 2014, pp.  654–665, New York, NY, USA, November 2014. Association for Computing Machinery. ISBN 978-1-4503-3056-5. doi: 10.1145/2635868.2635929. URL https://doi.org/10.1145/2635868.2635929.
  23. The Stack: 3 TB of permissively licensed source code. In Deep Learning for Code Workshop (DL4C), 2023. URL http://arxiv.org/abs/2211.15533.
  24. Inference-time intervention: Eliciting truthful answers from a language model. Advances in Neural Information Processing Systems, 36, 2024.
  25. StarCoder: may the source be with you! Transactions of Machine Learning Research (TMLR), December 2023.
  26. Competition-level code generation with AlphaCode. Science, 378(6624):1092–1097, December 2022. doi: 10.1126/science.abq1158. URL https://www.science.org/doi/full/10.1126/science.abq1158. Publisher: American Association for the Advancement of Science.
  27. The hydra effect: Emergent self-repair in language model computations. arXiv preprint arXiv:2307.15771, 2023.
  28. Locating and Editing Factual Associations in GPT, October 2022. URL http://arxiv.org/abs/2202.05262. arXiv:2202.05262 [cs].
  29. What is Decidable about Gradual Types? Proceedings of the ACM on Programming Languages (PACMPL), 4(POPL), 2020.
  30. Manytypes4py: A benchmark python dataset for machine learning-based type inference. In 2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR), pp.  585–589. IEEE, 2021.
  31. Using an llm to help with code understanding. In 2024 IEEE/ACM 46th International Conference on Software Engineering (ICSE), pp.  881–881. IEEE Computer Society, 2024.
  32. Probabilistic Type Inference by Optimising Logical and Natural Constraints, 2021. URL https://arxiv.org/abs/2004.00348v3.
  33. The linear representation hypothesis and the geometry of large language models. arXiv preprint arXiv:2311.03658, 2023.
  34. Solver-based Gradual Type Migration. Proceedings of the ACM on Programming Languages (PACMPL), 5(OOPSLA), 2021. doi: https://doi.org/10.1145/3485488.
  35. Python: the full monty. In ACM SIGPLAN Conference on Object Oriented Programmingm, Systems, Languages and Applications (OOPSLA), pp.  217–232, Indianapolis, IN, USA, October 2013. ACM. ISBN 978-1-4503-2374-1. doi: 10.1145/2509136.2509536. URL https://dl.acm.org/doi/10.1145/2509136.2509536.
  36. The Ins and Outs of Gradual Type Inference. In ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL), 2012.
  37. Steering llama 2 via contrastive activation addition. arXiv preprint arXiv:2312.06681, 2023.
  38. Code Llama: Open Foundation Models for Code, January 2024. URL http://arxiv.org/abs/2308.12950. arXiv:2308.12950 [cs].
  39. Adaptive Test Generation Using a Large Language Model. IEEE Transactions on Software Engineering (TSE), 50(1), 2024. doi: 10.1109/TSE.2023.3334955.
  40. Gradual Typing for Functional Languages. In Scheme Workshop, 2006.
  41. Gradual Typing with Unification-based Inference. In ACM SIGPLAN International Symposium on Dynamic Languages (DLS), 2008.
  42. Extracting latent steering vectors from pretrained language models. arXiv preprint arXiv:2205.05124, 2022.
  43. Bugs in large language models generated code. arXiv preprint arXiv:2403.08937, 2024.
  44. Interlanguage Migration: From Scripts to Programs. In ACM SIGPLAN International Symposium on Dynamic Languages (DLS), 2006.
  45. Activation addition: Steering language models without optimization. arXiv preprint arXiv:2308.10248, 2023.
  46. Look before you leap: A universal emergent decomposition of retrieval tasks in language models. arXiv preprint arXiv:2312.10091, 2023.
  47. Causal mediation analysis for interpreting neural nlp: The case of gender bias. arXiv preprint arXiv:2004.12265, 2020.
  48. LambdaNet: Probabilistic Type Inference using Graph Neural Networks. In International Conference on Learning Representations (ICLR), 2020.
  49. Tabnine: AI Assistant for software developers, 2013. URL https://www.tabnine.com/.
  50. If llm is the wizard, then code is the wand: A survey on how code empowers large language models to serve as intelligent agents. arXiv preprint arXiv:2401.00812, 2024.
  51. Do Machine Learning Models Produce TypeScript Types that Type Check? In European Conference on Object Oriented Programming (ECOOP), 2023.
  52. Parsel: Algorithmic reasoning with language models by composing decompositions. Advances in Neural Information Processing Systems, 36, 2024.

Summary

  • The paper introduces activation steering to correct mispredictions in CodeLLMs using minimal, semantics-preserving code edits.
  • The methodology employs steering vectors derived from model activations to mitigate the impact of syntactic noise in Python and TypeScript.
  • The technique achieved up to 90% correction in type mispredictions, indicating robust cross-language type representations.

Activation Steering for Robust Type Prediction in CodeLLMs

Introduction

In the field of LLMs trained on code (CodeLLMs), the capability to accurately predict types in programming languages is of paramount importance. While these models have demonstrated remarkable success across a spectrum of programming tasks, their vulnerability to syntactic variations poses a significant challenge. This vulnerability can lead to inconsistent predictions, undermining the reliability of CodeLLMs, particularly in the context of type prediction for gradually typed languages such as Python and TypeScript. The research conducted by Francesca Lucchetti and Arjun Guha introduces an innovative inference-time technique named "Activation Steering" to enhance the robustness of CodeLLMs by mitigating the impact of syntactic distractors.

The landscape of neural type prediction is an evolving field, with prior attempts emphasizing the training of specialized models for type prediction tasks. However, these specialized models generally fall short of matching the performance of contemporary CodeLLMs. Notably, CodeLLMs, with their decoder-only architecture, have been trained on large datasets incorporating various programming languages, utilizing approaches such as Fill-in-the-Middle (FIM) to bolster their understanding and prediction capabilities in programming contexts. Within this backdrop, the current research positions activation steering as a method to correct model mispredictions by manipulating internal model activations, a concept underpinned by the interpretation of "task vectors."

Methodology

The core methodology revolves around the creation of steering vectors from semantics-preserving code edits to counter the semantically irrelevant syntactic features that typically lead to model mispredictions. By leveraging mutation testing principles, the approach involves constructing minimal, semantics-preserving edits that, while not altering the program's functionality, are capable of inducing mispredictions by the model. These edits serve as the foundation for generating steering pairs, which are subsequently used to calculate steering vectors for each layer of the CodeLLM. This process tailors the model's activation in a manner that aligns its predictions more closely with the correct output, essentially 'steering' the model towards the desired behavior.

Evaluation

The evaluation of this technique covered a detailed analysis across different layers of the model and a diverse set of semantics-preserving edits. The findings underscore the efficacy of activation steering, particularly in the field of type prediction for Python and TypeScript. Notably, the technique demonstrated a remarkable capacity to correct up to 90% of type mispredictions, highlighting its potential as a robust solution for enhancing model reliability. Furthermore, the research intriguingly revealed that steering vectors computed from one programming language (e.g., Python) could effectively correct type mispredictions in another (e.g., TypeScript), suggesting a shared representation of types across languages within CodeLLMs.

Implications and Future Directions

The implications of this research extend both theoretically and practically within the field of artificial intelligence and programming languages. Theoretically, it contributes to the ongoing discourse on model interpretability and the mechanisms underlying model predictions in the context of code. Practically, it offers a viable pathway towards the development of more reliable CodeLLMs, potentially transforming how these models are deployed in development environments and programming tools. Looking ahead, further exploration into the mechanisms of activation steering and its applicability across other types of programming tasks could pave the way for broader applications and a deeper understanding of LLMs in code prediction and generation tasks.

Conclusion

The advent of activation steering presents a promising avenue for mitigating the challenges associated with model robustness in the face of syntactic variations in code. By offering a method to directly influence model predictions towards accuracy, this research not only enhances the reliability of CodeLLMs but also beckons further investigation into the underlying representational and operational dynamics of these complex models.

X Twitter Logo Streamline Icon: https://streamlinehq.com