TANDA: Transfer and Adapt Pre-Trained Transformer Models for Answer Sentence Selection (1911.04118v2)

Published 11 Nov 2019 in cs.CL

Abstract: We propose TANDA, an effective technique for fine-tuning pre-trained Transformer models for natural language tasks. Specifically, we first transfer a pre-trained model into a model for a general task by fine-tuning it with a large and high-quality dataset. We then perform a second fine-tuning step to adapt the transferred model to the target domain. We demonstrate the benefits of our approach for answer sentence selection, which is a well-known inference task in Question Answering. We built a large scale dataset to enable the transfer step, exploiting the Natural Questions dataset. Our approach establishes the state of the art on two well-known benchmarks, WikiQA and TREC-QA, achieving MAP scores of 92% and 94.3%, respectively, which largely outperform the previous highest scores of 83.4% and 87.5%, obtained in very recent work. We empirically show that TANDA generates more stable and robust models reducing the effort required for selecting optimal hyper-parameters. Additionally, we show that the transfer step of TANDA makes the adaptation step more robust to noise. This enables a more effective use of noisy datasets for fine-tuning. Finally, we also confirm the positive impact of TANDA in an industrial setting, using domain specific datasets subject to different types of noise.

PDF Abstract

An Examination of TANDA: Transfer and Adaptation Methodology in Transformer Models for Answer Selection

The paper "TANDA: Transfer and Adapt Pre-Trained Transformers for Answer Selection" by Garg, Vu, and Moschitti presents a novel methodology for finetuning Transformer models in the context of the Answer Sentence Selection (AS2) task in Question Answering (QA). The proposed method, named TANDA, addresses key challenges in the fine-tuning process, specifically the stability of model performance and the adaptation to domain-specific data.

Key Components and Methodology

The primary contribution of the paper is the introduction of a dual-phase fine-tuning approach leveraging pre-trained Transformers like BERT and RoBERTa. The TANDA methodology incorporates:

Transfer Phase: This involves the initial fine-tuning of a pre-trained Transformer on a large-scale, general-purpose AS2 dataset named ASNQ (Answer Sentence Natural Questions). This step aims to adapt the Transformer’s semantic capabilities specifically to the AS2 task.
Adaptation Phase: Subsequent fine-tuning is conducted on domain-specific datasets to tailor the model to specific types of questions and answers encountered in different applications. This approach optimizes the model for work with more specialized content, enhancing its general applicability.

Experimental Setup and Results

The authors evaluate TANDA on widely recognized benchmarks such as WikiQA and TREC-QA. The empirical results are significant, with the proposed methodology establishing new state-of-the-art performance metrics. Specifically, the TANDA achieves MAP scores of 92% on WikiQA and 94.3% on TREC-QA, significantly surpassing previous benchmarks.

Further testing with real-world datasets derived from Alexa interactions illustrates TANDA’s applicability within industrial contexts. Despite the presence of noise in practical datasets, TANDA demonstrates robust performance, underscoring its resilience to noisy training data.

Implications and Future Directions

The implementation of TANDA provides substantial improvements in fine-tuning stability and efficiency. The decoupling of the fine-tuning process into two distinct phases allows for greater flexibility and reduced computational costs. By optimizing the parameters during the transfer phase, the adaptation phase becomes more resilient to variation and noise, facilitating the practical deployment of Transformer models across diverse domains.

The introduction of ASNQ as a comprehensive, large-scale dataset for the AS2 task is another essential contribution, potentially serving as a benchmark for future research. It aids in addressing the scarcity of quality datasets that have traditionally limited advancements in QA tasks.

Looking forward, the principles demonstrated through TANDA may extend to broader contexts within NLP, including other tasks such as textual entailment or sentiment classification. The methodology sets a precedent for leveraging large-scale, task-specific pre-training in combination with domain adaptation, which could further enhance the adaptability and robustness of Transformer models into new and complex NLP applications.

The paper presents substantial quantitative evidence supporting TANDA's efficacy, alongside insights into its robustness across varying domains, paving the way for future exploration in efficiently adapting LLMs to specialized NLP tasks.

PDF Markdown Bookmark Chat (Pro)

Authors (3)

Siddhant Garg (23 papers)
Thuy Vu (13 papers)
Alessandro Moschitti (48 papers)

Citations (209)

View on Semantic Scholar

TANDA: Transfer and Adapt Pre-Trained Transformer Models for Answer Sentence Selection (1911.04118v2)

An Examination of TANDA: Transfer and Adaptation Methodology in Transformer Models for Answer Selection

Key Components and Methodology

Experimental Setup and Results

Implications and Future Directions

Related Papers