Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Crowdsourcing Multiple Choice Science Questions (1707.06209v1)

Published 19 Jul 2017 in cs.HC, cs.AI, cs.CL, and stat.ML

Abstract: We present a novel method for obtaining high-quality, domain-targeted multiple choice questions from crowd workers. Generating these questions can be difficult without trading away originality, relevance or diversity in the answer options. Our method addresses these problems by leveraging a large corpus of domain-specific text and a small set of existing questions. It produces model suggestions for document selection and answer distractor choice which aid the human question generation process. With this method we have assembled SciQ, a dataset of 13.7K multiple choice science exam questions (Dataset available at http://allenai.org/data.html). We demonstrate that the method produces in-domain questions by providing an analysis of this new dataset and by showing that humans cannot distinguish the crowdsourced questions from original questions. When using SciQ as additional training data to existing questions, we observe accuracy improvements on real science exams.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Johannes Welbl (20 papers)
  2. Nelson F. Liu (19 papers)
  3. Matt Gardner (57 papers)
Citations (366)

Summary

  • The paper introduces a crowdsourcing approach for generating domain-specific science multiple-choice questions.
  • It employs a two-step method combining text filtering and trained distractor models to create plausible and diverse questions.
  • The resulting SciQ dataset and evaluation reveal both the potential and current limitations of NLP models in science exam applications.

Analysis of "Crowdsourcing Multiple Choice Science Questions"

The paper "Crowdsourcing Multiple Choice Science Questions" by Johannes Welbl, Nelson F. Liu, and Matt Gardner introduces a methodology for generating domain-specific, high-quality multiple-choice questions through crowdsourcing. This research targets the creation of science exam questions, a domain that presents unique challenges due to its reliance on both specialized knowledge and the integration of information extraction, reading comprehension, and reasoning capabilities.

Summary of Methodology

The authors present a two-step process for question generation. Initially, the method involves selecting relevant in-domain text from a curated corpus of science textbooks tailored for educational purposes. The selection process employs a document filter to narrow down passages likely to yield meaningful questions, thus aiding crowd workers in formulating questions. This approach balances between relying on text-based information and ensuring question relevance and diversity.

The second step refocuses crowd worker efforts on creating plausible distractors for the multiple-choice format. The authors introduce a trained distractor model, leveraging a dataset of real-world science questions to guide this process. The model predicts potential distractors based on several linguistic and contextual features, facilitating the creation of questions that are both challenging and coherent.

Dataset and Evaluation

Resulting from this process is SciQ, a dataset comprising 13,679 multiple-choice science questions. A notable aspect of this dataset is its partitioning into multiple-choice and direct-answer formats, with questions accompanied by relevant passages to aid answer retrieval in the latter case.

The paper evaluates the dataset's quality by benchmarking existing models and comparing them against human performance. The results indicate that while current neural models, such as the Attention Sum Reader (AS Reader) and the Gated Attention Reader (GA Reader), demonstrate reasonable performance, they do not surpass traditional information retrieval baselines like Lucene. This outcome underscores the dataset's utility in medium-sized data settings, where more sophisticated techniques may yet need optimization or augmentation.

Implications and Future Research

The implications of this work are significant for the development and training of NLP models focused on domain-specific applications. By extending model training with SciQ, the authors showcase improved accuracy on real science exam questions, indicating the efficacy of the dataset in enhancing model interactions with nuanced scientific material.

Future research can explore various pathways, such as improving distractor prediction for greater consistency and alignment with question semantics. Additionally, the exploration of automatic question generation through refined negative sampling approaches could broaden the applicability of this method.

The dataset adaptation also opens doors towards multitasking learning frameworks that leverage cross-domain datasets, offering fertile ground for experimentation and optimizing algorithms to balance precision and generalization across diverse application areas.

In conclusion, the paper contributes a thoughtfully developed methodology supported by a robust dataset, offering substantial potential for advancing NLP capabilities in scientific domains. Through iterative enhancements and collaborative model integrations, this research may significantly impact both artificial intelligence applications in education and automated problem-solving strategies across specialized fields.