Legality of Training Large Language Models on Copyrighted Materials (US/EU)

Clarify whether the use of copyrighted materials to train large language models without explicit permission constitutes copyright infringement under European Union and United States law.

Background

The authors discuss the widespread practice of training LLMs on internet-scale corpora that include copyrighted works. They highlight uncertainty in the legality of such training in the EU and US, contrasting it with a more permissive stance in Japanese law.

This unresolved legal question has significant implications for both AI developers and the scientific community, as it affects the permissibility of using scholarly literature and other copyrighted content in training datasets.

References

Copyright laws in Europe and the USA are still unclear on whether this is an infringement, while in contrast, the Japanese law explicitly includes provisions accommodating AI training.

— What is the Role of Large Language Models in the Evolution of Astronomy Research? (2409.20252 - Fouesneau et al., 2024) in Section: Ethical and Legal Concerns — General Concerns

Legality of Training Large Language Models on Copyrighted Materials (US/EU)

Background

References

Related Problems