Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Probabilistic Linguistic Knowledge and Token-level Text Augmentation (2306.16644v2)

Published 29 Jun 2023 in cs.CL

Abstract: This paper investigates the effectiveness of token-level text augmentation and the role of probabilistic linguistic knowledge within a linguistically-motivated evaluation context. Two text augmentation programs, REDA and REDA${NG}$, were developed, both implementing five token-level text editing operations: Synonym Replacement (SR), Random Swap (RS), Random Insertion (RI), Random Deletion (RD), and Random Mix (RM). REDA${NG}$ leverages pretrained $n$-gram LLMs to select the most likely augmented texts from REDA's output. Comprehensive and fine-grained experiments were conducted on a binary question matching classification task in both Chinese and English. The results strongly refute the general effectiveness of the five token-level text augmentation techniques under investigation, whether applied together or separately, and irrespective of various common classification model types used, including transformers. Furthermore, the role of probabilistic linguistic knowledge is found to be minimal.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (1)
  1. Zhengxiang Wang (8 papers)

Summary

We haven't generated a summary for this paper yet.