Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Leveraging Large Amounts of Weakly Supervised Data for Multi-Language Sentiment Classification (1703.02504v1)

Published 7 Mar 2017 in cs.CL, cs.IR, and cs.LG

Abstract: This paper presents a novel approach for multi-lingual sentiment classification in short texts. This is a challenging task as the amount of training data in languages other than English is very limited. Previously proposed multi-lingual approaches typically require to establish a correspondence to English for which powerful classifiers are already available. In contrast, our method does not require such supervision. We leverage large amounts of weakly-supervised data in various languages to train a multi-layer convolutional network and demonstrate the importance of using pre-training of such networks. We thoroughly evaluate our approach on various multi-lingual datasets, including the recent SemEval-2016 sentiment prediction benchmark (Task 4), where we achieved state-of-the-art performance. We also compare the performance of our model trained individually for each language to a variant trained for all languages at once. We show that the latter model reaches slightly worse - but still acceptable - performance when compared to the single LLM, while benefiting from better generalization properties across languages.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Jan Deriu (21 papers)
  2. Valeria De Luca (2 papers)
  3. Aliaksei Severyn (29 papers)
  4. Simon Müller (41 papers)
  5. Mark Cieliebak (20 papers)
  6. Thomas Hofmann (121 papers)
  7. Martin Jaggi (155 papers)
  8. Aurelien Lucchi (75 papers)
Citations (129)