Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Back Translation Survey for Improving Text Augmentation (2102.09708v2)

Published 19 Feb 2021 in cs.CL

Abstract: NLP relies heavily on training data. Transformers, as they have gotten bigger, have required massive amounts of training data. To satisfy this requirement, text augmentation should be looked at as a way to expand your current dataset and to generalize your models. One text augmentation we will look at is translation augmentation. We take an English sentence and translate it to another language before translating it back to English. In this paper, we look at the effect of 108 different language back translations on various metrics and text embeddings.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Matthew Ciolino (12 papers)
  2. David Noever (66 papers)
  3. Josh Kalin (12 papers)

Summary

We haven't generated a summary for this paper yet.