Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
167 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Modelling Context with User Embeddings for Sarcasm Detection in Social Media (1607.00976v2)

Published 4 Jul 2016 in cs.CL and cs.AI

Abstract: We introduce a deep neural network for automated sarcasm detection. Recent work has emphasized the need for models to capitalize on contextual features, beyond lexical and syntactic cues present in utterances. For example, different speakers will tend to employ sarcasm regarding different subjects and, thus, sarcasm detection models ought to encode such speaker information. Current methods have achieved this by way of laborious feature engineering. By contrast, we propose to automatically learn and then exploit user embeddings, to be used in concert with lexical signals to recognize sarcasm. Our approach does not require elaborate feature engineering (and concomitant data scraping); fitting user embeddings requires only the text from their previous posts. The experimental results show that our model outperforms a state-of-the-art approach leveraging an extensive set of carefully crafted features.

Citations (240)

Summary

  • The paper proposes a novel deep learning model using learned user embeddings to capture context and improve sarcasm detection in social media.
  • The proposed CUE-CNN model with user embeddings achieved over 2% higher accuracy than a baseline model on a large Twitter dataset.
  • The method simplifies sarcasm detection by reducing manual feature engineering, with applications in sentiment analysis and understanding social media.

Modelling Context with User Embeddings for Sarcasm Detection in Social Media

This paper presents a novel approach to sarcasm detection on social media platforms by integrating user-specific context into a convolutional neural network (CNN) model. The research addresses the challenge that previous sarcasm detection methods have faced regarding the need for extensive feature engineering to incorporate contextual information, such as the sarcastic tendencies of individuals and their interactions with different subjects.

The central contribution of the paper is the proposal of a deep learning model that automatically learns user embeddings based on prior user interactions without requiring elaborate feature engineering. This approach is significant because sarcasm is often context-dependent, and by leveraging user embeddings, the model can capture individual differences in sarcasm usage. The proposed model, CUE-CNN (Content and User Embedding Convolutional Neural Network), integrates both lexical features derived from the text via a convolutional layer and user embeddings that represent pre-learned user-specific contextual features.

The authors have benchmarked their model against a state-of-the-art sarcasm detection approach that utilizes a rich set of hand-crafted features, including attributes about the author and audience of tweets. Notably, their proposed method outperformed this baseline by more than 2% in absolute accuracy, highlighting the efficacy of integrating user embeddings into the sarcasm detection process.

The methodology for obtaining user embeddings is similar to the Paragraph Vector model, which captures word occurrences in the context of user-specific previous posts. These user embeddings are combined with content embeddings generated from the tweets to assess sarcasm probability. Pre-training the user embeddings improved model performance, as demonstrated by a 0.8% increase in accuracy.

The paper offers a comprehensive experimental setup using a large corpus of Twitter data and reports on evaluations conducted through 10-fold cross-validation. It further explores the embeddings' ability to ascertain individual user attributes such as political inclination or interest in sports, thereby validating that these embeddings successfully capture meaningful user differences. The paper's findings suggest the embeddings capture a concept akin to homophily.

This approach holds practical implications for improving sarcasm detection tools used in various domains like sentiment analysis, opinion mining, and understanding public discourse in social media. By reducing the need for manual feature crafting, it simplifies the deployment of sarcasm detection systems across different platforms and domains.

In future explorations, it would be beneficial to examine ways to incorporate interactions between users more deeply, potentially improving the model's ability to understand conversational contexts. Furthermore, extending this approach to handle more diverse languages and cultural nuances can enhance its generalizability and applicability. Overall, this work advances the capability of NLP systems to comprehend complex language use in a social media context efficiently.