Talkin' 'Bout AI Generation: Copyright and the Generative-AI Supply Chain (2309.08133v2)

Published 15 Sep 2023 in cs.CY

Abstract: "Does generative AI infringe copyright?" is an urgent question. It is also a difficult question, for two reasons. First, "generative AI" is not just one product from one company. It is a catch-all name for a massive ecosystem of loosely related technologies, including conversational text chatbots like ChatGPT, image generators like Midjourney and DALL-E, coding assistants like GitHub Copilot, and systems that compose music and create videos. These systems behave differently and raise different legal issues. The second problem is that copyright law is notoriously complicated, and generative-AI systems manage to touch on a great many corners of it: authorship, similarity, direct and indirect liability, fair use, and licensing, among much else. These issues cannot be analyzed in isolation, because there are connections everywhere. In this Article, we aim to bring order to the chaos. To do so, we introduce the generative-AI supply chain: an interconnected set of stages that transform training data (millions of pictures of cats) into generations (a new, potentially never-seen-before picture of a cat that has never existed). Breaking down generative AI into these constituent stages reveals all of the places at which companies and users make choices that have copyright consequences. It enables us to trace the effects of upstream technical designs on downstream uses, and to assess who in these complicated sociotechnical systems bears responsibility for infringement when it happens. Because we engage so closely with the technology of generative AI, we are able to shed more light on the copyright questions. We do not give definitive answers as to who should and should not be held liable. Instead, we identify the key decisions that courts will need to make as they grapple with these issues, and point out the consequences that would likely flow from different liability regimes.

PDF Abstract

Copyright Analysis of Generative AI Supply Chains

The forthcoming paper by Katherine Lee, A. Feder Coopert, and James Grimmelmann in the Journal of the Copyright Society of the U.S.A. presents a comprehensive exploration of the copyright implications within generative AI supply chains. This paper delves deeply into how generative AI systems—such as conversational chatbots, image and music generators—interact with U.S. copyright law. Generative AI is presented not as a monolith but as a multifaceted ecosystem, emphasizing the nuanced legal challenges that arise from AI models' diverse technical architectures and processes. The authors address the key question: "Does generative AI infringe copyright?" They do not provide definitive answers but rather clarify the complexities and identify critical legal decision points.

Generative AI Supply Chain

The authors propose an innovative framework called the "generative-AI supply chain," breaking down the stages from creation of expressive works to data creation, dataset curation, model training, fine-tuning, system deployment, generation, and model alignment. This analytical lens highlights the decision-making points that carry legal significance, revealing where potential copyright infringement may occur. Each stage involves interactions with copyrighted material, from raw data to final user-generated outputs.

Copyright Law Intersection

The paper meticulously applies traditional copyright doctrines to this supply chain. Key areas addressed include:

Authorship: Generative AI outputs challenge conventional notions of authorship as defined by originality and fixation. Current legal frameworks do not recognize AI as an ‘author,’ raising questions about ownership of AI-generated works.
Exclusive Rights and Infringement: The discussion revolves around how generative AI implicates reproduction, adaptation, distribution, performance, and display rights. Each stage could potentially infringe on these rights, particularly concerning training datasets and outputs that may replicate copyrighted works.
Substantial Similarity and Copying: This section explores how substantial similarity between generative outputs and copyrighted works is evaluated, noting the challenges in quantifying influence and memorization in AI models.
Indirect Liability: Generative AI actors could face indirect infringement liability depending on their level of involvement and control over potentially infringing materials.

Legal Outcomes and Implications

The paper suggests possible outcomes for generative AI under copyright law. It outlines regimes ranging from complete liability for AI outputs to systems protected by robust fair use defenses. This variability in legal treatment indicates significant uncertainties for practitioners and developers. Moreover, the authors caution against oversimplified analogies, emphasizing the need for detailed case-by-case analysis due to the complexity of AI technologies and their applications.

Practical and Theoretical Considerations

The theoretical implications of this research are profound, challenging foundational concepts of authorship, creativity, and the scope of exclusive rights. Practically, it suggests that courts will need nuanced understandings of AI processes and must consider modern interpretations of fair use that factor in transformative AI capabilities. The paper advocates for recognizing AI's diverse capabilities in generating new forms of expression, which might evolve to require tailored legal frameworks.

Future Directions

In conclusion, this paper lays the groundwork for future explorations in AI and copyright law. Its rich, interdisciplinary approach suggests avenues for further research into how generative AI technologies can coexist with existing intellectual property regimes. The complexities outlined offer a basis for dialogue among legal scholars, policymakers, and technology developers on creating sustainable, legally compliant AI innovations.