Do Machines and Humans Focus on Similar Code? Exploring Explainability of Large Language Models in Code Summarization (2402.14182v1)

Published 22 Feb 2024 in cs.SE and cs.AI

Abstract: Recent LLMs have demonstrated proficiency in summarizing source code. However, as in many other domains of machine learning, LLMs of code lack sufficient explainability. Informally, we lack a formulaic or intuitive understanding of what and how models learn from code. Explainability of LLMs can be partially provided if, as the models learn to produce higher-quality code summaries, they also align in deeming the same code parts important as those identified by human programmers. In this paper, we report negative results from our investigation of explainability of LLMs in code summarization through the lens of human comprehension. We measure human focus on code using eye-tracking metrics such as fixation counts and duration in code summarization tasks. To approximate LLM focus, we employ a state-of-the-art model-agnostic, black-box, perturbation-based approach, SHAP (SHapley Additive exPlanations), to identify which code tokens influence that generation of summaries. Using these settings, we find no statistically significant relationship between LLMs' focus and human programmers' attention. Furthermore, alignment between model and human foci in this setting does not seem to dictate the quality of the LLM-generated summaries. Our study highlights an inability to align human focus with SHAP-based model focus measures. This result calls for future investigation of multiple open questions for explainable LLMs for code summarization and software engineering tasks in general, including the training mechanisms of LLMs for code, whether there is an alignment between human and model attention on code, whether human attention can improve the development of LLMs, and what other model focus measures are appropriate for improving explainability.

References (35)

Authors (6)

Jiliang Li (10 papers)
Yifan Zhang (245 papers)
Zachary Karas (6 papers)
Collin McMillan (38 papers)
Kevin Leach (29 papers)
Yu Huang (176 papers)

Citations (5)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Tweets

https://twitter.com/ComputerPapers/status/1760927396419543546

Do Machines and Humans Focus on Similar Code? Exploring Explainability of Large Language Models in Code Summarization (2402.14182v1)

Summary

Related Papers

Tweets