Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
129 tokens/sec
GPT-4o
28 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Data-Centric Artificial Intelligence (2212.11854v4)

Published 22 Dec 2022 in cs.AI

Abstract: Data-centric artificial intelligence (data-centric AI) represents an emerging paradigm emphasizing that the systematic design and engineering of data is essential for building effective and efficient AI-based systems. The objective of this article is to introduce practitioners and researchers from the field of Information Systems (IS) to data-centric AI. We define relevant terms, provide key characteristics to contrast the data-centric paradigm to the model-centric one, and introduce a framework for data-centric AI. We distinguish data-centric AI from related concepts and discuss its longer-term implications for the IS community.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (11)
  1. Alpaydin E (2020) Introduction to Machine Learning. MIT Press
  2. Amrani H (2021) Model-Centric and Data-Centric AI for Personalization in Human Activity Recognition. PhD thesis, University of Milano-Bicocca
  3. Biewald L (2020) Experiment Tracking With Weights and Biases. https://www.wandb.com/ (retrieved 2022-12-02)
  4. Gröger C (2021) There Is No AI Without Data. Communications of the ACM 64(11):98–108
  5. Holzinger A (2016) Interactive Machine Learning for Health Informatics: When Do We Need the Human-in-the-Loop? Brain Informatics 3(2):119–131
  6. Kaggle (2023) Kaggle Competitions. https://www.kaggle.com/competitions (retrieved 2023-07-05)
  7. Otto B (2011) Organizing Data Governance: Findings From the Telecommunications Industry and Consequences for Large Service Providers. Communications of the Association for Information Systems 29(1):45–66
  8. Sambasivan N, Kapania S, Highfill H, Akrong D, Paritosh P, Aroyo LM (2021) “Everyone Wants to Do the Model Work, Not the Data Work”: Data Cascades in High-Stakes AI. In: Proceedings of the CHI Conference on Human Factors in Computing Systems, pp 1–15
  9. Shearer C (2000) The CRISP-DM Model: The New Blueprint for Data Mining. Journal of Data Warehousing 5(4):13–22
  10. Strickland E (2022) Andrew Ng: Unbiggen AI. https://spectrum.ieee.org/andrew-ng-data-centric-ai (retrieved 2022-02-12)
  11. Turban E (2011) Decision Support and Business Intelligence Systems. Pearson Education India
Citations (32)

Summary

We haven't generated a summary for this paper yet.