Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Toward a view-based data cleaning architecture (1910.11040v1)

Published 24 Oct 2019 in cs.DB

Abstract: Big data analysis has become an active area of study with the growth of machine learning techniques. To properly analyze data, it is important to maintain high-quality data. Thus, research on data cleaning is also important. It is difficult to automatically detect and correct inconsistent values for data requiring expert knowledge or data created by many contributors, such as integrated data from heterogeneous data sources. An example of such data is metadata for scientific datasets, which should be confirmed by data managers while handling the data. To support the efficient cleaning of data by data managers, we propose a data cleaning architecture in which data managers interactively browse and correct portions of data through views. In this paper, we explain our view-based data cleaning architecture and discuss some remaining issues.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Toshiyuki Shimizu (4 papers)
  2. Hiroki Omori (1 paper)
  3. Masatoshi Yoshikawa (45 papers)