Understanding How CodeLLMs (Mis)Predict Types with Activation Steering (2404.01903v2)

Published 2 Apr 2024 in cs.CL, cs.LG, and cs.PL

Abstract: CodeLLMs are transforming software development as we know it. This is especially true for tasks where rule-based approaches fall short, like type prediction. The type prediction task consists in adding a new type annotation to a partially typed program, such that the resulting program is closer to being fully typed. The intractability of rule-based approaches and high cost of manual annotation make CodeLLMs an attractive solution to the problem. However, CodeLLMs are still far from being deployed on the large-scale due to doubts surrounding their reliability. To shed some light on how CodeLLMs approach type prediction, we investigate what happens when a model mispredicts a type. We show that by applying semantics-preserving edits to code, CodeLLMs are eventually misled into mispredicting type annotations. However, by leveraging activation steering we are able to "steer" the model back to the correct prediction, making models more robust against semantically irrelevant prompt features. We show that steering achieves comparable performance to fine-tuning directly on the type prediction task. Furthermore, we find that steering vectors computed from Python code are effective at correcting TypeScript mispredictions, and vice versa. To our knowledge, this is the first evidence of its kind to suggest that CodeLLMs learn task representations that transfer across languages.

References (52)

Summary

The paper introduces activation steering to correct mispredictions in CodeLLMs using minimal, semantics-preserving code edits.
The methodology employs steering vectors derived from model activations to mitigate the impact of syntactic noise in Python and TypeScript.
The technique achieved up to 90% correction in type mispredictions, indicating robust cross-language type representations.

Activation Steering for Robust Type Prediction in CodeLLMs

Introduction

In the field of LLMs trained on code (CodeLLMs), the capability to accurately predict types in programming languages is of paramount importance. While these models have demonstrated remarkable success across a spectrum of programming tasks, their vulnerability to syntactic variations poses a significant challenge. This vulnerability can lead to inconsistent predictions, undermining the reliability of CodeLLMs, particularly in the context of type prediction for gradually typed languages such as Python and TypeScript. The research conducted by Francesca Lucchetti and Arjun Guha introduces an innovative inference-time technique named "Activation Steering" to enhance the robustness of CodeLLMs by mitigating the impact of syntactic distractors.

The landscape of neural type prediction is an evolving field, with prior attempts emphasizing the training of specialized models for type prediction tasks. However, these specialized models generally fall short of matching the performance of contemporary CodeLLMs. Notably, CodeLLMs, with their decoder-only architecture, have been trained on large datasets incorporating various programming languages, utilizing approaches such as Fill-in-the-Middle (FIM) to bolster their understanding and prediction capabilities in programming contexts. Within this backdrop, the current research positions activation steering as a method to correct model mispredictions by manipulating internal model activations, a concept underpinned by the interpretation of "task vectors."

Methodology

The core methodology revolves around the creation of steering vectors from semantics-preserving code edits to counter the semantically irrelevant syntactic features that typically lead to model mispredictions. By leveraging mutation testing principles, the approach involves constructing minimal, semantics-preserving edits that, while not altering the program's functionality, are capable of inducing mispredictions by the model. These edits serve as the foundation for generating steering pairs, which are subsequently used to calculate steering vectors for each layer of the CodeLLM. This process tailors the model's activation in a manner that aligns its predictions more closely with the correct output, essentially 'steering' the model towards the desired behavior.

Evaluation

The evaluation of this technique covered a detailed analysis across different layers of the model and a diverse set of semantics-preserving edits. The findings underscore the efficacy of activation steering, particularly in the field of type prediction for Python and TypeScript. Notably, the technique demonstrated a remarkable capacity to correct up to 90% of type mispredictions, highlighting its potential as a robust solution for enhancing model reliability. Furthermore, the research intriguingly revealed that steering vectors computed from one programming language (e.g., Python) could effectively correct type mispredictions in another (e.g., TypeScript), suggesting a shared representation of types across languages within CodeLLMs.

Implications and Future Directions

The implications of this research extend both theoretically and practically within the field of artificial intelligence and programming languages. Theoretically, it contributes to the ongoing discourse on model interpretability and the mechanisms underlying model predictions in the context of code. Practically, it offers a viable pathway towards the development of more reliable CodeLLMs, potentially transforming how these models are deployed in development environments and programming tools. Looking ahead, further exploration into the mechanisms of activation steering and its applicability across other types of programming tasks could pave the way for broader applications and a deeper understanding of LLMs in code prediction and generation tasks.

Conclusion

The advent of activation steering presents a promising avenue for mitigating the challenges associated with model robustness in the face of syntactic variations in code. By offering a method to directly influence model predictions towards accuracy, this research not only enhances the reliability of CodeLLMs but also beckons further investigation into the underlying representational and operational dynamics of these complex models.

PDF Markdown

Tweets

https://twitter.com/fran_lucc/status/1777776047599042980