Reliable Querying of Very Large, Fast Moving and Noisy Predicted Interaction Data using Hierarchical Crowd Curation

Published 6 Jun 2016 in cs.DB | (1606.01957v1)

Abstract: The abundance of predicted and mined but uncertain biological data show huge needs for massive, efficient and scalable curation efforts. The human expertise warranted by any successful curation enterprize is often economically prohibitive especially for speculative end user queries that may not ultimately bear fruit. So the challenge remains in devising a low cost engine capable of delivering fast but tentative annotation and curation of a set of data items that can be authoritatively validated by experts later demanding significantly small investment. The aim thus is to make a large volume of predicted data available for use as early as possible with an acceptable degree of confidence in their accuracy while the curation continues. In this paper, we present a novel approach to annotation and curation of biological database contents using crowd computing. The technical contribution is in the identification and management of trust of mechanical turks, and support for ad hoc declarative queries, both of which are leveraged to support reliable analytics using noisy predicted interactions.