Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
162 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Augmented Understanding and Automated Adaptation of Curation Rules (2007.08710v1)

Published 17 Jul 2020 in cs.IR

Abstract: Over the past years, there has been many efforts to curate and increase the added value of the raw data. Data curation has been defined as activities and processes an analyst undertakes to transform the raw data into contextualized data and knowledge. Data curation enables decision-makers and data analyst to extract value and derive insight from the raw data. However, to curate the raw data, an analyst needs to carry out various curation tasks including, extraction linking, classification, and indexing, which are error-prone, tedious and challenging. Besides, deriving insight require analysts to spend a long period of time to scan and analyze the curation environments. This problem is exacerbated when the curation environment is large, and the analyst needs to curate a varied and comprehensive list of data. To address these challenges, in this dissertation, we present techniques, algorithms and systems for augmenting analysts in curation tasks. We propose: ~(1) a feature-based and automated technique for curating the raw data. ~(2) We propose an autonomic approach for adapting data curation rules. ~(3) We provide a solution to augment users in formulating their preferences while curating data in large scale information spaces. ~(4) We implement a set of APIs for automating the basic curation tasks, including Named Entity extraction, POS tags, classification, and etc.

Summary

We haven't generated a summary for this paper yet.