Microsoft's AI research team accidentally exposed 38 terabytes of private data, including a disk backup of two employees' workstations and over 30,000 internal Microsoft Teams messages while publishing open-source training data on GitHub. This was due to misconfiguration of a SAS token, a feature of Azure Storage used to share data.
The case highlights the risks organizations face when dealing with large amounts of training data for AI, emphasizing the need for additional security checks and safeguards. The exposed data included sensitive personal data, passwords to Microsoft services, secret keys and potential for injecting malicious code into AI models.