Understanding the Human-LLM Dynamic: A Literature Survey of LLM Use in Programming Tasks (2410.01026v1)

Published 1 Oct 2024 in cs.SE and cs.HC

Abstract: LLMs are transforming programming practices, offering significant capabilities for code generation activities. While researchers have explored the potential of LLMs in various domains, this paper focuses on their use in programming tasks, drawing insights from user studies that assess the impact of LLMs on programming tasks. We first examined the user interaction behaviors with LLMs observed in these studies, from the types of requests made to task completion strategies. Additionally, our analysis reveals both benefits and weaknesses of LLMs showing mixed effects on the human and task. Lastly, we looked into what factors from the human, LLM or the interaction of both, affect the human's enhancement as well as the task performance. Our findings highlight the variability in human-LLM interactions due to the non-deterministic nature of both parties (humans and LLMs), underscoring the need for a deeper understanding of these interaction patterns. We conclude by providing some practical suggestions for researchers as well as programmers.

Authors (2)

Deborah Etsenake (1 paper)
Meiyappan Nagappan (25 papers)

Summary

Understanding the Human-LLM Dynamic: A Literature Survey of LLM Use in Programming Tasks

The paper, "Understanding the Human-LLM Dynamic: A Literature Survey of LLM Use in Programming Tasks" authored by Deborah Etsenake and Meiyappan Nagappan, provides an extensive synthesis of user studies evaluating the interaction between humans and LLMs in the context of programming activities. By drawing insights from diverse user studies, the paper aims to address four critical research questions related to human-LLM interaction patterns, human enhancement through LLM use, task performance improvements, and the specific interaction behaviors that influence these enhancements.

Research Questions and Methodology

To systematically address the research questions, the authors conducted a literature review focusing on studies from major publishers like ACM, IEEE, Scopus, and ArXiv. They employed broad search terms to encompass various LLMs and programming contexts, eventually narrowing down to a final set of 88 user studies. These studies spanned different fields of computing and involved diverse tasks, ranging from code generation to code analysis.

Interaction Observations (RQ1)

The paper identifies three main themes in how programmers interact with LLMs: the types of requests made, the prompting strategies employed, and the overall interaction behaviors observed. Users generally prompted LLMs for learning/exploration, solution-oriented tasks, and error-correcting. Prompting strategies varied from single prompts to multi-prompts, with re-prompting strategies like re-asking questions, providing additional context, or changing the task scope.

It was noted that users often spent more time in the planning and understanding phases of interaction rather than prompting or accepting LLM responses. Interestingly, the paper highlighted that experts were less likely to accept LLM-generated code without modifications compared to novices, indicating varying levels of reliance on LLM outputs based on user proficiency.

Human Enhancement Evaluation (RQ2)

The impact of LLMs on human performance was evaluated mainly through metrics like time productivity and learning outcomes. The findings indicated significant gains in terms of task completion rates and reduced completion times, although there were instances where complex tasks led to increased time due to the cognitive burden of understanding LLM outputs. Learning outcomes were positively influenced when assessed using pre-test and post-test designs, with studies showing that LLMs aided in understanding programming concepts.

Task Performance Evaluation (RQ3)

Task performance metrics focused on the correctness, readability, and security of the code generated with LLM assistance. The results varied significantly: while LLMs often contributed positively to task-specific accuracy and security, there were occasions where LLM-generated code exhibited security vulnerabilities and readability issues. The paper emphasizes the need for standardized evaluation metrics to comprehensively assess LLM impact on task performance.

Interaction Effects on Human Enhancement and Task Performance (RQ4)

The paper reveals several insights into how interaction behaviors influence human performance and task outcomes. For instance, detailed prompts tend to yield better task performance, and experts using multi-prompt strategies generally achieve higher code correctness. Nonetheless, there is a clear need for further investigation into how specific interaction patterns quantitatively affect performance metrics.

Discussion and Future Work

The paper identifies the need for standardizing evaluation metrics for human-LLM interactions, emphasizing the non-deterministic nature of LLMs and the importance of effective prompting strategies. Future research should explore the impact of various interaction patterns on performance metrics and validate findings across different LLM models. The paper also underscores the educational potential of LLMs, suggesting that educators could incorporate in-class assessments alongside homework assignments involving LLM use to balance learning and academic integrity.

Conclusion

This comprehensive survey underscores the transformative potential of LLMs in programming tasks, while also identifying areas where users face challenges. By establishing standardized evaluation methods and refining interaction techniques, the paper provides a valuable roadmap for optimizing human-LLM collaboration. The insights gleaned from this paper can guide further research and development, ensuring that LLMs are effectively integrated into programming workflows to enhance both human capabilities and task performance.

PDF Markdown

Tweets

https://twitter.com/_reachsumit/status/1841687858652053979