Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash 92 TPS
Gemini 2.5 Pro 50 TPS Pro
GPT-5 Medium 32 TPS
GPT-5 High 30 TPS Pro
GPT-4o 67 TPS
GPT OSS 120B 452 TPS Pro
Kimi K2 190 TPS Pro
2000 character limit reached

On the Readiness of Scientific Data for a Fair and Transparent Use in Machine Learning (2401.10304v2)

Published 18 Jan 2024 in cs.LG, cs.AI, and cs.DL

Abstract: To ensure the fairness and trustworthiness of ML systems, recent legislative initiatives and relevant research in the ML community have pointed out the need to document the data used to train ML models. Besides, data-sharing practices in many scientific domains have evolved in recent years for reproducibility purposes. In this sense, academic institutions' adoption of these practices has encouraged researchers to publish their data and technical documentation in peer-reviewed publications such as data papers. In this study, we analyze how this broader scientific data documentation meets the needs of the ML community and regulatory bodies for its use in ML technologies. We examine a sample of 4041 data papers of different domains, assessing their completeness, coverage of the requested dimensions, and trends in recent years. We focus on the most and least documented dimensions and compare the results with those of an ML-focused venue (NeurIPS D&B track) publishing papers describing datasets. As a result, we propose a set of recommendation guidelines for data creators and scientific data publishers to increase their data's preparedness for its transparent and fairer use in ML technologies.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube