Exploring Answer Information Methods for Question Generation with Transformers (2312.03483v1)

Published 6 Dec 2023 in cs.CL and cs.LG

Abstract: There has been a lot of work in question generation where different methods to provide target answers as input, have been employed. This experimentation has been mostly carried out for RNN based models. We use three different methods and their combinations for incorporating answer information and explore their effect on several automatic evaluation metrics. The methods that are used are answer prompting, using a custom product method using answer embeddings and encoder outputs, choosing sentences from the input paragraph that have answer related information, and using a separate cross-attention attention block in the decoder which attends to the answer. We observe that answer prompting without any additional modes obtains the best scores across rouge, meteor scores. Additionally, we use a custom metric to calculate how many of the generated questions have the same answer, as the answer which is used to generate them.

References (28)

Summary

The paper demonstrates that direct answer prompting outperforms other integration methods for question generation.
It evaluates transformer-based models using BART on the SQuAD dataset to compare answer prompting, CP, and answer-aware attention techniques.
Findings indicate that simpler answer prompting offers effective question quality, guiding future improvements in QG models.

Introduction

Question generation (QG) constitutes a critical task in numerous domains, such as educational assessment, information retrieval, and conversational AI. It involves the creation of questions based on various input types, including passages of text, images, or structured data. Though the recent focus has been on text-based QG, the challenge lies in understanding the input's context to formulate coherent and relevant questions. In this effort, the use of transformer-based models, particularly for generating questions from text, has shown promising results.

Background and Problem Definition

Previous works have primarily utilized recurrent neural network (RNN) models, including LSTM and GRU, as well as transformer models like BART and T5 for text-based QG. Innovations include leveraging linguistic features and external knowledge bases to refine the generated questions. However, the way answer information is integrated into the question generation process varies. The paper examines the impact of different answer integration methods on the quality of questions produced by a specific transformer model, BART. Furthermore, it explores question generation in two contexts: one that is based on provided answers (answer-aware) and one without explicit answer cues (answer-agnostic).

Methodology and Experimental Design

The paper capitalizes on the SQUAD dataset, a collection of question-answer pairs used to train and benchmark QG models. The authors experiment with three main techniques to embed answer information into the question generation process:

Answer Prompting (AP), where the answer is directly provided to the model as part of the input sequence.
Answer Embeddings and Encoder Output Products (CP), where an encoder's output is modulated by the answer through a product operation and subsequently used to inform the decoder.
Answer-Aware Attention Mechanisms (AA), where a separate decoder attention block is dedicated to the answer embeddings.

Additionally, these strategies are tested in combination with each other, and a 'related sentences' approach (RS), where only sentences containing the answer are fed to the model. The performance of each approach is measured with automatic evaluation metrics, namely ROUGE-L and METEOR, complemented with an accuracy check using a question answering model to validate if the generated questions indeed correspond to the original answers.

Results and Analysis

The findings reveal that the Answer Prompting method outperformed other strategies regarding the chosen metrics. While combining methods yielded mixed results, in some cases, it slightly improved over single methods. Notably, the combination of AP and CP, with or without RS, showed a minor decrease in performance as compared to AP alone. These results highlight the importance of answer representation and positioning in generating high-quality questions, especially in transformer-based architectures, which are sensitive to the input structure.

In conclusion, this paper offers valuable insights into the optimal utilization of answer information within question generation models. The indication that straightforward answer prompting provides the best results simplifies the process and allows future research to build upon a more refined baseline. Future work could extend these findings to diverse transformer models and investigate the relationship between model architecture and the efficacy of answer information techniques.