Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
121 tokens/sec
GPT-4o
9 tokens/sec
Gemini 2.5 Pro Pro
47 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Reconciling Conflicting Data Curation Actions: Transparency Through Argumentation (2403.08257v1)

Published 13 Mar 2024 in cs.DB

Abstract: We propose a new approach for modeling and reconciling conflicting data cleaning actions. Such conflicts arise naturally in collaborative data curation settings where multiple experts work independently and then aim to put their efforts together to improve and accelerate data cleaning. The key idea of our approach is to model conflicting updates as a formal \emph{argumentation framework}(AF). Such argumentation frameworks can be automatically analyzed and solved by translating them to a logic program $P_{AF}$ whose declarative semantics yield a transparent solution with many desirable properties, e.g., uncontroversial updates are accepted, unjustified ones are rejected, and the remaining ambiguities are exposed and presented to users for further analysis. After motivating the problem, we introduce our approach and illustrate it with a detailed running example introducing both well-founded and stable semantics to help understand the AF solutions. We have begun to develop open source tools and Jupyter notebooks that demonstrate the practicality of our approach. In future work we plan to develop a toolkit for conflict resolution that can be used in conjunction with OpenRefine, a popular interactive data cleaning tool.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (16)
  1. 2022. Conceptual Model and Framework for Collaborative Data Cleaning. https://zenodo.org/records/6781134.
  2. Handbook of Formal Argumentation. London, England: College Publications.
  3. Tamraparni Dasu and Theodore Johnson. [n. d.]. Exploratory Data Mining and Data Cleaning. John Wiley & Sons.
  4. Phan Minh Dung. 1995. On the Acceptability of Arguments and Its Fundamental Role in Nonmonotonic Reasoning, Logic Programming and n-Person Games. AI 77, 2 (Sept. 1995), 321–357.
  5. Michael Gelfond and Vladimir Lifschitz. 1988. The stable model semantics for logic programming.. In ICLP/SLP, Vol. 88. Cambridge, MA, 1070–1080.
  6. Wrangler: Interactive visual specification of data transformation scripts. In Proceedings of the sigchi conference on human factors in computing systems. 3363–3372.
  7. Towards more transparent, reproducible, and reusable data cleaning with OpenRefine. iConference 2019 Proceedings (2019).
  8. Automatic Module Detection in Data Cleaning Workflows: Enabling Transparency and Recipe Reuse. In 16th International Digital Curation Conference (IDCC). https://doi.org/10.5281/zenodo.5606219 https://doi.org/10.2218/ijdc.v16i1.771.
  9. Games, Queries, and Argumentation Frameworks: Towards a Family Reunion. In 7th Workshop on Advances in Argumentation in Artificial Intelligence (AI33{}^{3}start_FLOATSUPERSCRIPT 3 end_FLOATSUPERSCRIPT). Accepted for publication.
  10. Nikolaus Parulian and Bertram Ludäscher. 2022. DCM Explorer: a tool to support transparent data cleaning through provenance exploration. In 14th Intl. Workshop on the Theory and Practice of Provenance (TaPP). 1–6.
  11. Nikolaus Parulian and Bertram Ludäscher. 2023. Trust the process: Analyzing prospective provenance for data cleaning. In Companion Proceedings of the ACM Web Conference 2023. 1513–1523.
  12. Emanuel Santos and Helena Galhardas. [n. d.]. Using Argumentation to Support the User Involvement In Data Cleaning. In 9th International Workshop on Quality in Databases (QDB) (Seattle, 2011). http://qdb2011.dia.uniroma3.it/participants/program/index.html
  13. The Well-founded Semantics for General Logic Programs. J. ACM 38, 3 (1991), 619–649.
  14. Ruben Verborgh and Max De Wilde. 2013. Using OpenRefine. Packt Publishing Ltd.
  15. Hadley Wickham. [n. d.]. Tidy Data. 059 ([n. d.]). Issue i10. https://doi.org/10.18637/jss.v059.i10
  16. Yilin Xia and Bertram Ludäscher. 2023. Games and Argumentation Demo Repository. github.com/idaks/Games-and-Argumentation/tree/idcc.
Citations (1)

Summary

We haven't generated a summary for this paper yet.