Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Semantic-based Method for Unsupervised Commonsense Question Answering (2105.14781v1)

Published 31 May 2021 in cs.CL

Abstract: Unsupervised commonsense question answering is appealing since it does not rely on any labeled task data. Among existing work, a popular solution is to use pre-trained LLMs to score candidate choices directly conditioned on the question or context. However, such scores from LLMs can be easily affected by irrelevant factors, such as word frequencies, sentence structures, etc. These distracting factors may not only mislead the model to choose a wrong answer but also make it oversensitive to lexical perturbations in candidate answers. In this paper, we present a novel SEmantic-based Question Answering method (SEQA) for unsupervised commonsense question answering. Instead of directly scoring each answer choice, our method first generates a set of plausible answers with generative models (e.g., GPT-2), and then uses these plausible answers to select the correct choice by considering the semantic similarity between each plausible answer and each choice. We devise a simple, yet sound formalism for this idea and verify its effectiveness and robustness with extensive experiments. We evaluate the proposed method on four benchmark datasets, and our method achieves the best results in unsupervised settings. Moreover, when attacked by TextFooler with synonym replacement, SEQA demonstrates much less performance drops than baselines, thereby indicating stronger robustness.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Yilin Niu (10 papers)
  2. Fei Huang (409 papers)
  3. Jiaming Liang (35 papers)
  4. Wenkai Chen (12 papers)
  5. Xiaoyan Zhu (54 papers)
  6. Minlie Huang (226 papers)
Citations (13)

Summary

We haven't generated a summary for this paper yet.