Papers
Topics
Authors
Recent
Search
2000 character limit reached

Who is using AI to code? Global diffusion and impact of generative AI

Published 10 Jun 2025 in physics.soc-ph, cs.CY, and cs.SE | (2506.08945v1)

Abstract: Generative coding tools promise big productivity gains, but uneven uptake could widen skill and income gaps. We train a neural classifier to spot AI-generated Python functions in 80 million GitHub commits (2018-2024) by 200,000 developers and track how fast--and where--these tools take hold. By December 2024, AI wrote an estimated 30.1% of Python functions from U.S. contributors, versus 24.3% in Germany, 23.2% in France, 21.6% in India, 15.4% in Russia and 11.7% in China. Newer GitHub users use AI more than veterans, while male and female developers adopt at similar rates. Within-developer fixed-effects models show that moving to 30% AI use raises quarterly commits by 2.4%. Coupling this effect with occupational task and wage data puts the annual value of AI-assisted coding in the United States at $9.6-$14.4 billion, rising to $64-$96 billion if we assume higher estimates of productivity effects reported by randomized control trials. Moreover, generative AI prompts learning and innovation, leading to increases in the number of new libraries and library combinations that programmers use. In short, AI usage is already widespread but highly uneven, and the intensity of use, not only access, drives measurable gains in output and exploration.

Summary

  • The paper introduces a neural classifier using GraphCodeBert embeddings to differentiate AI-generated code from human-written code, achieving a ROC AUC of 0.964.
  • It reveals significant geographic disparities in AI adoption, with the US, Germany, and France leading while other regions lag.
  • Empirical analysis shows a 2.4% increase in commit activity and projects a $9.6-$14.4 billion annual economic boost from AI integration in coding.

Global Diffusion and Impact of AI in Coding

Introduction

The document investigates the influence and widespread usage of generative AI tools, specifically in coding. It highlights the uneven diffusion of such technologies and quantifies their impact on productivity and innovation in software development globally. Through a comprehensive empirical analysis, it presents insights into how AI has reshaped programming tasks across various nations and demographics, addressing adoption barriers and estimating economic implications.

Generative AI Classifier Development

The crux of the paper lies in developing a neural classifier to discern AI-generated code from human-written code using GitHub data encompassing 80 million commits from 200,000 developers. The method involves leveraging GraphCodeBert embeddings and a classifier head to isolate AI-associated content effectively. The model is trained with Python functions from both human and AI-generated sources, subsequently achieving notable performance metrics with an out-of-sample ROC AUC Score of 0.964 and Average Precision of 0.969. Figure 1

Figure 1: Classifying code from functions written in the Python programming language as human or AI generated.

The paper provides a detailed analysis of AI adoption rates across the globe using the developed classifier. By tracking Python functions over time, it highlights significant disparities between countries, with the US showing the highest adoption rates, closely followed by Germany and France, while countries like China and Russia lag in adoption. Figure 2

Figure 2

Figure 2: Share of AI-generated Python functions over time, illustrating geographic differences in adoption.

Demographics and AI Use

Among US developers, AI tool adoption is notably higher among newer GitHub users compared to veterans, yet shows no significant gender differences. This finding runs counter to earlier studies that identified gender disparities in technology adoption. It underscores substantial differences in adoption rates based on user tenure. Figure 3

Figure 3: Intensity of AI use by gender and tenure among GitHub users in the US.

Impact on Productivity and Innovation

Generative AI presence correlates with increased activity and a higher propensity for library exploration. Regression models indicate a 2.4% increase in commit activity and an associated rise in experimentation with new libraries among developers using AI. Additionally, conservative economic estimates place the annual value of generative AI at $9.6-\$14.4 billion, escalating to potential higher figures based on broader productivity gains validated by RCTs and natural experiments. Figure 4

Figure 4: Effect of AI Share on library imports, commit activity, and AI adoption likelihood.

Discussion

The paper concludes by discussing the broader implications of generative AI adoption in coding, suggesting potential barriers and avenues for further research. The findings provide pivotal insights that inform policymakers about the heterogeneous adoption of AI technologies and their tangible impact on economic productivity and innovation.

Conclusion

This extensive analysis of AI usage in coding unveils significant global disparities in adoption rates, driven by both regulatory and cultural influences. The study establishes the substantial productivity gains possible through AI integration while raising important considerations regarding equitable access and future technological adoption pathways. The insights provided form a foundational step toward understanding the nuanced implications of AI in coding, laying the groundwork for ongoing observation and policy adaptation.

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 7 tweets with 81 likes about this paper.