Papers

Topics

Authors

Recent

View all

Assistant

AI Research Assistant

Well-researched responses based on relevant abstracts and paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses.

Gemini 2.5 Flash

Gemini 2.5 Flash 56 tok/s

Gemini 2.5 Pro 52 tok/s Pro

GPT-5 Medium 25 tok/s Pro

GPT-5 High 25 tok/s Pro

GPT-4o 75 tok/s Pro

Kimi K2 183 tok/s Pro

GPT OSS 120B 434 tok/s Pro

Claude Sonnet 4.5 35 tok/s Pro

2000 character limit reached

Determining Question-Answer Plausibility in Crowdsourced Datasets Using Multi-Task Learning (2011.04883v1)

Published 10 Nov 2020 in cs.CL and cs.AI

Abstract: Datasets extracted from social networks and online forums are often prone to the pitfalls of natural language, namely the presence of unstructured and noisy data. In this work, we seek to enable the collection of high-quality question-answer datasets from social media by proposing a novel task for automated quality analysis and data cleaning: question-answer (QA) plausibility. Given a machine or user-generated question and a crowd-sourced response from a social media user, we determine if the question and response are valid; if so, we identify the answer within the free-form response. We design BERT-based models to perform the QA plausibility task, and we evaluate the ability of our models to generate a clean, usable question-answer dataset. Our highest-performing approach consists of a single-task model which determines the plausibility of the question, followed by a multi-task model which evaluates the plausibility of the response as well as extracts answers (Question Plausibility AUROC=0.75, Response Plausibility AUROC=0.78, Answer Extraction F1=0.665).

Citations (5)

View on Semantic Scholar