Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

ConvFiT: Conversational Fine-Tuning of Pretrained Language Models (2109.10126v1)

Published 21 Sep 2021 in cs.CL

Abstract: Transformer-based LLMs (LMs) pretrained on large text collections are proven to store a wealth of semantic knowledge. However, 1) they are not effective as sentence encoders when used off-the-shelf, and 2) thus typically lag behind conversationally pretrained (e.g., via response selection) encoders on conversational tasks such as intent detection (ID). In this work, we propose ConvFiT, a simple and efficient two-stage procedure which turns any pretrained LM into a universal conversational encoder (after Stage 1 ConvFiT-ing) and task-specialised sentence encoder (after Stage 2). We demonstrate that 1) full-blown conversational pretraining is not required, and that LMs can be quickly transformed into effective conversational encoders with much smaller amounts of unannotated data; 2) pretrained LMs can be fine-tuned into task-specialised sentence encoders, optimised for the fine-grained semantics of a particular task. Consequently, such specialised sentence encoders allow for treating ID as a simple semantic similarity task based on interpretable nearest neighbours retrieval. We validate the robustness and versatility of the ConvFiT framework with such similarity-based inference on the standard ID evaluation sets: ConvFiT-ed LMs achieve state-of-the-art ID performance across the board, with particular gains in the most challenging, few-shot setups.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Ivan Vulić (130 papers)
  2. Pei-Hao Su (25 papers)
  3. Sam Coope (6 papers)
  4. Daniela Gerz (11 papers)
  5. Paweł Budzianowski (27 papers)
  6. Iñigo Casanueva (18 papers)
  7. Nikola Mrkšić (30 papers)
  8. Tsung-Hsien Wen (27 papers)
Citations (35)

Summary

We haven't generated a summary for this paper yet.