Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
120 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

An investigation of licensing of datasets for machine learning based on the GQM model (2303.13735v1)

Published 24 Mar 2023 in cs.SE, cs.CY, and cs.LG

Abstract: Dataset licensing is currently an issue in the development of machine learning systems. And in the development of machine learning systems, the most widely used are publicly available datasets. However, since the images in the publicly available dataset are mainly obtained from the Internet, some images are not commercially available. Furthermore, developers of machine learning systems do not often care about the license of the dataset when training machine learning models with it. In summary, the licensing of datasets for machine learning systems is in a state of incompleteness in all aspects at this stage. Our investigation of two collection datasets revealed that most of the current datasets lacked licenses, and the lack of licenses made it impossible to determine the commercial availability of the datasets. Therefore, we decided to take a more scientific and systematic approach to investigate the licensing of datasets and the licensing of machine learning systems that use the dataset to make it easier and more compliant for future developers of machine learning systems.

Citations (2)

Summary

We haven't generated a summary for this paper yet.