Papers
Topics
Authors
Recent
Search
2000 character limit reached

KoCoSa: Korean Context-aware Sarcasm Detection Dataset

Published 22 Feb 2024 in cs.CL and cs.AI | (2402.14428v2)

Abstract: Sarcasm is a way of verbal irony where someone says the opposite of what they mean, often to ridicule a person, situation, or idea. It is often difficult to detect sarcasm in the dialogue since detecting sarcasm should reflect the context (i.e., dialogue history). In this paper, we introduce a new dataset for the Korean dialogue sarcasm detection task, KoCoSa (Korean Context-aware Sarcasm Detection Dataset), which consists of 12.8K daily Korean dialogues and the labels for this task on the last response. To build the dataset, we propose an efficient sarcasm detection dataset generation pipeline: 1) generating new sarcastic dialogues from source dialogues with LLMs, 2) automatic and manual filtering of abnormal and toxic dialogues, and 3) human annotation for the sarcasm detection task. We also provide a simple but effective baseline for the Korean sarcasm detection task trained on our dataset. Experimental results on the dataset show that our baseline system outperforms strong baselines like LLMs, such as GPT-3.5, in the Korean sarcasm detection task. We show that the sarcasm detection task relies deeply on the existence of sufficient context. We will release the dataset at https://github.com/Yu-billie/KoCoSa_sarcasm_detection.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (49)
  1. Detecting sarcasm in conversation context using transformer-based models. In Proceedings of the second workshop on figurative language processing, pages 98–103.
  2. David Bamman and Noah Smith. 2015. Contextualized sarcasm detection on twitter. In proceedings of the international AAAI conference on web and social media, volume 9, pages 574–577.
  3. Context-aware sarcasm detection using bert. In Proceedings of the Second Workshop on Figurative Language Processing, pages 83–87.
  4. Inpars: Data augmentation for information retrieval using large language models. arXiv preprint arXiv:2202.05144.
  5. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901.
  6. Palm: Scaling language modeling with pathways. arXiv preprint arXiv:2204.02311.
  7. Promptagator: few-shot dense retrieval from 8 examples (2022). arXiv preprint arXiv:2209.11755.
  8. Transformer-based context-aware sarcasm detection in conversation threads from social media. arXiv preprint arXiv:2005.11424.
  9. A contextual word embedding for arabic sarcasm detection with random forests. In Proceedings of the Sixth Arabic Natural Language Processing Workshop, pages 340–344.
  10. Dalya Faraj and Malak Abdullah. 2021. Sarcasmdet at sarcasm detection task 2021 in arabic using arabert pretrained model. In Proceedings of the sixth Arabic natural language processing workshop, pages 345–350.
  11. Ibrahim Abu Farha and Walid Magdy. 2020. From arabic sentiment analysis to sarcasm detection: The arsarcasm dataset. In Proceedings of the 4th Workshop on Open-Source Arabic Corpora and Processing Tools, with a Shared Task on Offensive Language Detection, pages 32–39.
  12. Ibrahim Abu Farha and Walid Magdy. 2021. Benchmarking transformer-based language models for arabic sentiment and sarcasm detection. In Proceedings of the sixth Arabic natural language processing workshop, pages 21–31.
  13. Sarcasm and emoticons: Comprehension and emotional impact. Quarterly Journal of Experimental Psychology, 69(11):2130–2146.
  14. Aniruddha Ghosh and Tony Veale. 2017. Magnets for sarcasm: Making sarcasm detection timely, contextual and very personal. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 482–491.
  15. Sarcasm analysis using conversation context. Computational Linguistics, 44(4):755–792.
  16. The role of conversation context for sarcasm detection in online interactions. arXiv preprint arXiv:1707.06226.
  17. The design and construction of a chinese sarcasm dataset. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 5034–5039.
  18. Cascade: Contextual sarcasm detection in online discussion forums. arXiv preprint arXiv:1805.06413.
  19. Annollm: Making large language models to be better crowdsourced annotators. arXiv preprint arXiv:2303.16854.
  20. How do cultural differences impact the quality of sarcasm annotation?: A case study of indian annotators and american text. In Proceedings of the 10th SIGHUM Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities, pages 95–99.
  21. Data augmentation using pre-trained transformer models. arXiv preprint arXiv:2003.02245.
  22. Shih-Kai Lin and Shu-Kai Hsieh. 2016. Sarcasm detection in chinese using a crowdsourced corpus. In Proceedings of the 28th Conference on Computational Linguistics and Speech Processing (ROCLING 2016), pages 299–310.
  23. Sarcasm detection in social media based on imbalanced classification. In Web-Age Information Management: 15th International Conference, WAIM 2014, Macau, China, June 16-18, 2014. Proceedings 15, pages 459–471. Springer.
  24. Bleau Moores and Vijay Mago. 2022. A survey on automated sarcasm detection on twitter. arXiv preprint arXiv:2202.02516.
  25. OpenAI. 2023. Gpt-4 technical report.
  26. Klue: Korean language understanding evaluation. In Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks, volume 1. Curran.
  27. A deeper look into sarcastic tweets using deep convolutional neural networks. arXiv preprint arXiv:1610.08815.
  28. Sarcasm detection on czech and english twitter. In Proceedings of COLING 2014, the 25th international conference on computational linguistics: Technical papers, pages 213–223.
  29. A comprehensive review on arabic sarcasm detection: Approaches, challenges and future trends. IEEE Access.
  30. Sarcasm detection on twitter: A behavioral modeling approach. In Proceedings of the eighth ACM international conference on web search and data mining, pages 97–106.
  31. Sarcasm as contrast between a positive sentiment and negative situation. In Proceedings of the 2013 conference on empirical methods in natural language processing, pages 704–714.
  32. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971.
  33. Zeroshotdataaug: Generating and augmenting training data with chatgpt. arXiv preprint arXiv:2304.14334.
  34. Sparse, contextually informed models for irony detection: Exploiting user communities, entities and sentiment. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 1035–1044.
  35. Want to reduce labeling cost? gpt-3 can help. arXiv preprint arXiv:2108.13487.
  36. Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35:24824–24837.
  37. Ciron: a new benchmark dataset for chinese irony detection. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 5714–5720.
  38. Overview of the WANLP 2021 shared task on sarcasm and sentiment detection in Arabic. In Proceedings of the Sixth Arabic Natural Language Processing Workshop, pages 296–305, Kyiv, Ukraine (Virtual). Association for Computational Linguistics.
  39. A report on the 2020 sarcasm detection shared task. In Proceedings of the Second Workshop on Figurative Language Processing, pages 1–11, Online. Association for Computational Linguistics.
  40. The design and construction of a Chinese sarcasm dataset.
  41. A large self-annotated corpus for sarcasm. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), Miyazaki, Japan. European Language Resources Association (ELRA).
  42. Kocasm: Korean Automatic Sarcasm Detection. GitHub.
  43. Rishabh Misra and Prahal Arora. 2023. Sarcasm detection using news headlines dataset. AI Open, 4:13–18.
  44. Identification of nonliteral language in social media: A case study on sarcasm. Journal of the Association for Information Science and Technology, 67:n/a–n/a.
  45. National Institute of Korean Language. 2022a. NIKL Messenger Corpus (v.2.0).
  46. National Institute of Korean Language. 2022b. NIKL Online text message Corpus (v.1.0).
  47. Silviu Oprea and Walid Magdy. 2020. iSarcasm: A dataset of intended sarcasm. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 1279–1289, Online. Association for Computational Linguistics.
  48. Creating and characterizing a diverse corpus of sarcasm in dialogue. In Proceedings of the 17th Annual Meeting of the Special Interest Group on Discourse and Dialogue, pages 31–41, Los Angeles. Association for Computational Linguistics.
  49. Sarcasm SIGN: Interpreting Sarcasm with Sentiment Based Monolingual Machine Translation. Association for Computational Linguistics. [link].
Citations (1)

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.