Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
139 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

IEKG: A Commonsense Knowledge Graph for Idiomatic Expressions (2312.06053v1)

Published 11 Dec 2023 in cs.CL and cs.LG

Abstract: Idiomatic expression (IE) processing and comprehension have challenged pre-trained LLMs (PTLMs) because their meanings are non-compositional. Unlike prior works that enable IE comprehension through fine-tuning PTLMs with sentences containing IEs, in this work, we construct IEKG, a commonsense knowledge graph for figurative interpretations of IEs. This extends the established ATOMIC2020 graph, converting PTLMs into knowledge models (KMs) that encode and infer commonsense knowledge related to IE use. Experiments show that various PTLMs can be converted into KMs with IEKG. We verify the quality of IEKG and the ability of the trained KMs with automatic and human evaluation. Through applications in natural language understanding, we show that a PTLM injected with knowledge from IEKG exhibits improved IE comprehension ability and can generalize to IEs unseen during training.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (64)
  1. An atlas of cultural commonsense for machine reasoning. In AAAI Conference on Artificial Intelligence.
  2. Knowledge graph based synthetic corpus generation for knowledge-enhanced language model pre-training. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 3554–3565, Online. Association for Computational Linguistics.
  3. Sentiment analysis in the news. In Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC’10), Valletta, Malta. European Language Resources Association (ELRA).
  4. Timothy Baldwin and Su Nam Kim. 2010. Multiword expressions. In Nitin Indurkhya and Fred J. Damerau, editors, Handbook of Natural Language Processing, Second Edition, pages 267–292. Chapman and Hall/CRC.
  5. Prajjwal Bhargava and Vincent Ng. 2022. Commonsense knowledge reasoning and generation with pre-trained language models: A survey. CoRR, abs/2201.12438.
  6. Leveraging sentiment distributions to distinguish figurative from literal health reports on Twitter. In Proceedings of The Web Conference 2020, pages 1217–1227.
  7. BNC Consortium. 2007. British national corpus, XML edition. Oxford Text Archive.
  8. COMET: Commonsense transformers for automatic knowledge graph construction. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 4762–4779, Florence, Italy. Association for Computational Linguistics.
  9. It’s not rocket science: Interpreting figurative language in narratives. arXiv preprint arXiv:2109.00087.
  10. It’s not Rocket Science: Interpreting Figurative Language in Narratives. Transactions of the Association for Computational Linguistics, 10:589–606.
  11. FLUTE: Figurative language understanding through textual explanations. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 7139–7159. Association for Computational Linguistics.
  12. MERMAID: Metaphor generation with symbolism and discriminative decoding. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 4250–4261, Online. Association for Computational Linguistics.
  13. Incorporating commonsense knowledge graph in pretrained models for social commonsense tasks. In Proceedings of Deep Learning Inside Out (DeeLIO): The First Workshop on Knowledge Extraction and Integration for Deep Learning Architectures, pages 74–79, Online. Association for Computational Linguistics.
  14. Ting-Yun Chang and Chi-Jen Lu. 2021. Rethinking why intermediate-task fine-tuning works. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 706–713, Punta Cana, Dominican Republic. Association for Computational Linguistics.
  15. The VNC-tokens dataset. In Proceedings of the LREC Workshop Towards a Shared Task for Multiword Expressions (MWE 2008), pages 19–22.
  16. A survey on knowledge graph embedding: Approaches, applications and benchmarks. Electronics, 9(5).
  17. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota. Association for Computational Linguistics.
  18. Matt Errey. 2018. Common English Idioms: ebook by Matt Errey. EnglishClub.
  19. Examining the tip of the iceberg: A data set for idiom translation. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), Miyazaki, Japan. European Language Resources Association (ELRA).
  20. Anna Feldman and Jing Peng. 2013. Automatic detection of idiomatic clauses. In International Conference on Intelligent Text Processing and Computational Linguistics, pages 435–446. Springer.
  21. MAGPIE: A large corpus of potentially idiomatic expressions. In Proceedings of The 12th Language Resources and Evaluation Conference, pages 279–287.
  22. A survey of knowledge-enhanced pre-trained language models. ArXiv, abs/2212.13428.
  23. Comet-atomic 2020: On symbolic and neural commonsense knowledge graphs. In AAAI.
  24. Cskg: The commonsense knowledge graph. Extended Semantic Web Conference (ESWC).
  25. Investigating robustness of dialog models to popular figurative language constructs. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 7476–7485, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
  26. A survey on knowledge graphs: Representation, acquisition, and applications. IEEE transactions on neural networks and learning systems, 33(2):494—514.
  27. Semeval-2013 task 5: Evaluating phrasal semantics. In Second Joint Conference on Lexical and Computational Semantics (* SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013), pages 39–47.
  28. Alon Lavie and Michael J. Denkowski. 2009. The meteor metric for automatic evaluation of machine translation. Machine Translation, 23(2–3):105–115.
  29. BART: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 7871–7880, Online. Association for Computational Linguistics.
  30. Chin-Yew Lin. 2004. ROUGE: A package for automatic evaluation of summaries. In Text Summarization Branches Out, pages 74–81, Barcelona, Spain. Association for Computational Linguistics.
  31. Changsheng Liu. 2019. Toward Robust and Efficient Interpretations of Idiomatic Expressions in Context. Ph.D. thesis, University of Pittsburgh.
  32. Changsheng Liu and Rebecca Hwa. 2017. Representations of context in recognizing the figurative and literal usages of idioms. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 31.
  33. Changsheng Liu and Rebecca Hwa. 2019. A generalized idiom usage recognition model based on semantic compatibility. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pages 6738–6745.
  34. Are sample-efficient nlp models more robust?
  35. Rosamund Moon et al. 1998. Fixed Expressions and Idioms in English: A Corpus-Based Approach. Oxford University Press.
  36. Jing Peng and Anna Feldman. 2016. Automatic idiom recognition with word embeddings. In Information Management and Big Data - Second Annual International Symposium, SIMBig 2015, Cusco, Peru, September 2-4, 2015, and Third Annual International Symposium, SIMBig 2016, Cusco, Peru, September 1-3, 2016, Revised Selected Papers, volume 656 of Communications in Computer and Information Science, pages 17–29. Springer.
  37. Classifying idiomatic and literal expressions using topic models and intensity of emotions. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 2019–2027. Association for Computational Linguistics.
  38. Sentence encoders on stilts: Supplementary training on intermediate labeled-data tasks. CoRR, abs/1811.01088.
  39. Language models are unsupervised multitask learners.
  40. Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of Machine Learning Research, 21(140):1–67.
  41. Using abstract context to detect figurative language.
  42. The effect of context on the efl learners’ idiom processing strategies. English Language Teaching (Toronto, Canada), 5(9):104 – 114.
  43. Ronit Saban-Bezalel and Nira Mashal. 2019. Different factors predict idiom comprehension in children and adolescents with ASD and typical development. J Autism Dev Disord, 49(12):4740–4750.
  44. Cem: Commonsense-aware empathetic response generation. In AAAI Conference on Artificial Intelligence.
  45. Multiword expressions: A pain in the neck for NLP. In International conference on intelligent text processing and computational linguistics, pages 1–15. Springer.
  46. Idiom token classification using sentential distributed semantics. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 194–204.
  47. ATOMIC: an atlas of machine commonsense for if-then reasoning. In The Thirty-Third AAAI Conference on Artificial Intelligence, AAAI 2019, The Thirty-First Innovative Applications of Artificial Intelligence Conference, IAAI 2019, The Ninth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2019, Honolulu, Hawaii, USA, January 27 - February 1, 2019, pages 3027–3035. AAAI Press.
  48. Conceptnet 5.5: An open multilingual graph of general knowledge. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, AAAI’17, page 4444–4451. AAAI Press.
  49. IMPLI: Investigating NLI models’ performance on figurative language. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 5375–5388, Dublin, Ireland. Association for Computational Linguistics.
  50. Identification of multiword expressions: A fresh look at modelling and evaluation. In Multiword expressions at length and in depth: Extended papers from the MWE 2017 workshop, volume 2, page 299. Language Science Press.
  51. AStitchInLanguageModels: Dataset and methods for the exploration of idiomaticity in pre-trained language models. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 3464–3477, Punta Cana, Dominican Republic. Association for Computational Linguistics.
  52. Learning from the worst: Dynamically generated datasets to improve online hate detection. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 1667–1682, Online. Association for Computational Linguistics.
  53. Knowledge graph embedding: A survey of approaches and applications. IEEE Transactions on Knowledge and Data Engineering, 29(12):2724–2743.
  54. Integrating task specific information into pretrained language models for low resource fine tuning. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 3181–3186, Online. Association for Computational Linguistics.
  55. A broad-coverage challenge corpus for sentence understanding through inference. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 1112–1122. Association for Computational Linguistics.
  56. Transformers: State-of-the-art natural language processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 38–45, Online. Association for Computational Linguistics.
  57. Ziheng Zeng and Suma Bhat. 2021. Idiomatic expression identification using semantic compatibility. Transactions of the Association for Computational Linguistics, 9:1546–1562.
  58. Ziheng Zeng and Suma Bhat. 2022. Getting BART to Ride the Idiomatic Train: Learning to Represent Idiomatic Expressions. Transactions of the Association for Computational Linguistics, 10:1120–1137.
  59. Transomcs: From linguistic graphs to commonsense knowledge. In Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI-20, pages 4004–4010. International Joint Conferences on Artificial Intelligence Organization. Main track.
  60. Aser: Towards large-scale commonsense knowledge acquisition via higher-order selectional preference over eventualities. Artif. Intell., 309(C).
  61. Bertscore: Evaluating text generation with bert. In International Conference on Learning Representations.
  62. Idiomatic expression paraphrasing without strong supervision.
  63. Xiaohan Zou. 2020. A survey on application of knowledge graph. Journal of Physics: Conference Series, 1487(1):012016.
  64. Mice: Mining idioms with contextual embeddings. Knowledge-Based Systems, 235:107606.
Citations (3)

Summary

We haven't generated a summary for this paper yet.