Dice Question Streamline Icon: https://streamlinehq.com

Transparency of AI Providers’ Data Retention and Training on Uploaded Legal Documents

Ascertain the extent to which major AI providers such as Google, Anthropic, and OpenAI actively store user-uploaded legal documents and use them for model training when accessed via cloud computing and commercial APIs.

Information Square Streamline Icon: https://streamlinehq.com

Background

Because lawyers have duties of confidentiality and many AI services rely on cloud infrastructure, the lack of transparency about providers’ data handling creates significant risks for legal practice.

The authors note both the business incentives for providers to use uploaded data and the current inability of users to verify or control such practices, motivating the need to establish the actual extent of data storage and training on user data.

References

While the extent to which companies like Google, Anthropic, or OpenAI are actively storing user data and training models on it is uncertain, there is no way to know or control what they are doing with uploaded documents.

Tasks and Roles in Legal AI: Data Curation, Annotation, and Verification (2504.01349 - Koenecke et al., 2 Apr 2025) in Challenge 1: Data Curation, paragraph on confidentiality and cloud computing/APIs