Do agentic coding workflows mitigate known LLM library-related failures or reproduce them at scale
Determine whether large-language-model-based agentic coding workflows, such as those used by tools like Claude Code, Cursor, Devin, Copilot, and OpenAI Codex when authoring pull requests, mitigate known LLM library-related failure modes—including hallucinated package names, deprecated API usage, and omission of version constraints—or whether these behaviors persist and are reproduced at scale.
Sponsor
References
What remains unclear is whether agentic workflows will actually use external libraries—and if they do, whether this autonomy helps mitigate existing problems or simply reproduces them at scale.
— A Study of Library Usage in Agent-Authored Pull Requests
(2512.11589 - Twist, 12 Dec 2025) in Section 1 (Introduction)