Papers
Topics
Authors
Recent
Search
2000 character limit reached

CLX: Towards verifiable PBE data transformation

Published 2 Mar 2018 in cs.DB | (1803.00701v4)

Abstract: Effective data analytics on data collected from the real world usually begins with a notoriously expensive pre-processing step of data transformation and wrangling. Programming By Example (PBE) systems have been proposed to automatically infer transformations using simple examples that users provide as hints. However, an important usability issue - verification - limits the effective use of such PBE data transformation systems, since the verification process is often effort-consuming and unreliable. We propose a data transformation paradigm design CLX (pronounced "clicks") with a focus on facilitating verification for end users in a PBE-like data transformation. CLX performs pattern clustering in both input and output data, which allows the user to verify at the pattern level, rather than the data instance level, without having to write any regular expressions, thereby significantly reducing user verification effort. Thereafter, CLX automatically generates transformation programs as regular-expression replace operations that are easy for average users to verify. We experimentally compared the CLX prototype with both FlashFill, a state-of-the-art PBE data transformation tool, and Trifacta, an influential system supporting interactive data transformation. The results show improvements over the state of the art tools in saving user verification effort, without loss of efficiency or expressive power. In a user effort study on data sets of various sizes, when the data size grew by a factor of 30, the user verification time required by the CLX prototype grew by 1.3x whereas that required by FlashFill grew by 11.4x. In another test assessing the users' understanding of the transformation logic - a key ingredient in effective verification - CLX users achieved a success rate about twice that of FlashFill users.

Citations (1)

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.