Should ChatGPT and Bard Share Revenue with Their Data Providers? A New Business Model for the AI Era (2305.02555v2)
Abstract: With various AI tools such as ChatGPT becoming increasingly popular, we are entering a true AI era. We can foresee that exceptional AI tools will soon reap considerable profits. A crucial question arise: should AI tools share revenue with their training data providers in additional to traditional stakeholders and shareholders? The answer is Yes. Large AI tools, such as LLMs, always require more and better quality data to continuously improve, but current copyright laws limit their access to various types of data. Sharing revenue between AI tools and their data providers could transform the current hostile zero-sum game relationship between AI tools and a majority of copyrighted data owners into a collaborative and mutually beneficial one, which is necessary to facilitate the development of a virtuous cycle among AI tools, their users and data providers that drives forward AI technology and builds a healthy AI ecosystem. However, current revenue-sharing business models do not work for AI tools in the forthcoming AI era, since the most widely used metrics for website-based traffic and action, such as clicks, will be replaced by new metrics such as prompts and cost per prompt for generative AI tools. A completely new revenue-sharing business model, which must be almost independent of AI tools and be easily explained to data providers, needs to establish a prompt-based scoring system to measure data engagement of each data provider. This paper systematically discusses how to build such a scoring system for all data providers for AI tools based on classification and content similarity models, and outlines the requirements for AI tools or third parties to build it. Sharing revenue with data providers using such a scoring system would encourage more data owners to participate in the revenue-sharing program. This will be a utilitarian AI era where all parties benefit.
- F. Pereira, P. Norvig, A. Halev. “The unreasonable effectiveness of data,” IEEE Intelligent Systems, 2009.
- C. Sun, A. Shrivastava, S. Singh, A. Gupta, “Revisiting Unreasonable Effectiveness of Data in Deep Learning Era,” arXiv preprint arXiv:1707.02968 (2017).
- A. Radford, K. Narasimhan, T. Salimans, I. Sutskever “Improving Language Understanding by Generative Pre-Training,” 2018.
- Y. Zhu, R. Kiros, R. Zemel, R. Salakhutdinov, R. Urtasun, A. Torralba, S. Fidler, “Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books”, Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 19-27, 2015.
- A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, I. Sutskever, “Language Models are Unsupervised Multitask Learners,” Technical report, OpenAI, 2018.
- R. Taylor, M. Kardas, G. Cucurull, T. Scialom, A. Hartshorn, E. Saravia, A. Poulton, V. Kerkez, R. Stojnic, “Galactica: A Large Language Model for Science,” arXiv preprint arXiv:2211.09085 (2022).
- S. Wu, O. Irsoy, S. Lu, V. Dabravolski, M. Dredze, S. Gehrmann, P. Kambadur, D. Rosenberg, G. Mann, “BloombergGPT: A Large Language Model for Finance,” arXiv preprint arXiv:2303.17564 (2023).
- E. Mosca, F. Szigeti, S. Tragianni, D. Gallagher, G. Groh. “SHAPbased explanation methods: A review for NLP interpretability,” In Proceedings of the 29th International Conference on Computational Linguistics, pages 4593–4603, Gyeongju, Republic of Korea (2022).
- W. Zhao, T. Joshi, V. N. Nair, A. Sudjianto, “SHAP values for Explaining CNN-based Text Classification Models,” arXiv preprint arXiv:2008.11825 (2020).
- J. Devlin, M. -W. Chang, K. Lee, K. Toutanova, “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding ,” arXiv preprint arXiv:1810.04805 (2018).
- N. Reimers, I. Gurevych, “Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks,” EMNLP, 2019, arXiv preprint arXiv:1908.10084 (2019).
- W. Han, B. Pang, Y. Wu, “Robust Transfer Learning with Pretrained Language Models through Adapters, ” ACL 2021, arXiv preprint arXiv:2108.02340 (2021).
- R. Xu, F. Luo, Z. Zhang, C. Tan, B. Chang, S. Huang, F. Huang, “Raise a Child in Large Language Model: Towards Effective and Generalizable Fine-tuning,” EMNLP 2021, arXiv preprint arXiv:2109.05687 (2021).
- A. Ramesh, M. Pavlov, G. Goh, S. Gray, C. Voss, A. Radford, M. Chen, I. Sutskever, “Zero-Shot Text-to-Image Generation,” arXiv preprint arXiv:2102.12092 (2021).
- A. Goldfarb, F. Teodoridis, “Why is AI adoption in health care lagging?”, brookings Report, Article Link, (2023)
- G. Cinà, T. Röber, R. Goedhart, I. Birbil, “Why we do need Explainable AI for Healthcare,” arXiv preprint arXiv:2206.15363 (2022)
- D. McDuff, T. Curran, A. Kadambi, “Synthetic Data in Healthcare,” arXiv preprint arXiv:2304.03243 (2023)
Collections
Sign up for free to add this paper to one or more collections.