Papers
Topics
Authors
Recent
2000 character limit reached

Detecting and Fixing API Misuses of Data Science Libraries Using Large Language Models (2509.25378v1)

Published 29 Sep 2025 in cs.SE

Abstract: Data science libraries, such as scikit-learn and pandas, specialize in processing and manipulating data. The data-centric nature of these libraries makes the detection of API misuse in them more challenging. This paper introduces DSCHECKER, an LLM-based approach designed for detecting and fixing API misuses of data science libraries. We identify two key pieces of information, API directives and data information, that may be beneficial for API misuse detection and fixing. Using three LLMs and misuses from five data science libraries, we experiment with various prompts. We find that incorporating API directives and data-specific details enhances Dschecker's ability to detect and fix API misuses, with the best-performing model achieving a detection F1-score of 61.18 percent and fixing 51.28 percent of the misuses. Building on these results, we implement Dschecker agent which includes an adaptive function calling mechanism to access information on demand, simulating a real-world setting where information about the misuse is unknown in advance. We find that Dschecker agent achieves 48.65 percent detection F1-score and fixes 39.47 percent of the misuses, demonstrating the promise of LLM-based API misuse detection and fixing in real-world scenarios.

Summary

We haven't generated a summary for this paper yet.

Whiteboard

Paper to Video (Beta)

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 1 like about this paper.