Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Learning Document Embeddings by Predicting N-grams for Sentiment Classification of Long Movie Reviews (1512.08183v5)

Published 27 Dec 2015 in cs.CL

Abstract: Despite the loss of semantic information, bag-of-ngram based methods still achieve state-of-the-art results for tasks such as sentiment classification of long movie reviews. Many document embeddings methods have been proposed to capture semantics, but they still can't outperform bag-of-ngram based methods on this task. In this paper, we modify the architecture of the recently proposed Paragraph Vector, allowing it to learn document vectors by predicting not only words, but n-gram features as well. Our model is able to capture both semantics and word order in documents while keeping the expressive power of learned vectors. Experimental results on IMDB movie review dataset shows that our model outperforms previous deep learning models and bag-of-ngram based models due to the above advantages. More robust results are also obtained when our model is combined with other models. The source code of our model will be also published together with this paper.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Bofang Li (4 papers)
  2. Tao Liu (350 papers)
  3. Xiaoyong Du (40 papers)
  4. Deyuan Zhang (2 papers)
  5. Zhe Zhao (97 papers)
Citations (19)

Summary

We haven't generated a summary for this paper yet.