Determine fair use legality of training LLMs on copyrighted material in the U.S.
Determine whether training large language models on copyrighted material constitutes fair use under U.S. copyright law, clarifying the legal status of using copyrighted works during pretraining and establishing conditions under which such training practices are considered lawful or infringing.
References
In the U.S., whether training LLMs on copyrighted material is fair use remains uncertain and its legality will be determined by ongoing litigation.
— Hubble: a Model Suite to Advance the Study of LLM Memorization
(2510.19811 - Wei et al., 22 Oct 2025) in Section 2.1 (Copyright)