Evaluating Large Language Models' Ability Using a Psychiatric Screening Tool Based on Metaphor and Sarcasm Scenarios
Abstract: Metaphors and sarcasm are precious fruits of our highly evolved social communication skills. However, children with the condition then known as Asperger syndrome are known to have difficulties in comprehending sarcasm, even if they possess adequate verbal IQs for understanding metaphors. Accordingly, researchers had employed a screening test that assesses metaphor and sarcasm comprehension to distinguish Asperger syndrome from other conditions with similar external behaviors (e.g., attention-deficit/hyperactivity disorder). This study employs a standardized test to evaluate recent LLMs' (LLMs) understanding of nuanced human communication. The results indicate improved metaphor comprehension with increased model parameters; however, no similar improvement was observed for sarcasm comprehension. Considering that a human's ability to grasp sarcasm has been associated with the amygdala, a pivotal cerebral region for emotional learning, a distinctive strategy for training LLMs would be imperative to imbue them with the ability in a cognitively grounded manner.
- Aristotle Kenny, A (trans ). Poetics. Oxford, UK: Oxford University Press; 2013.
- The Routledge Handbook of Metaphor and Language. London, UK: Routledge; 2016.
- Gesturing on the telephone: Independent effects of dialogue and visibility. Journal of Memory and Language. 2008;58(2):495–520. doi:10.1016/j.jml.2007.02.004.
- Developmental Steps in Metaphorical Language Abilities: The Influence of Age, Gender, Cognitive Flexibility, Information Processing Speed, and Analogical Reasoning. Language and Speech. 2017;62(2):207–228. doi:10.1177/0023830917746552.
- Happé FGE. Understanding Minds and Metaphors: Insights from the Study of Figurative Language in Autism. Metaphor and Symbolic Activity. 1995;10(4):275–295. doi:10.1207/s15327868ms1004_3.
- Figurative language comprehension in individuals with autism spectrum disorder: A meta-analytic review. Autism. 2016;22(2):99–117. doi:10.1177/1362361316668652.
- The metaphor and sarcasm scenario test: a new instrument to help differentiate high functioning pervasive developmental disorder from attention deficit/hyperactivity disorder. Brain and Development. 2004;26(5):301–306. doi:10.1016/s0387-7604(03)00170-0.
- Study of situational recognition of attention deficit/hyperactivity disorders, Asperger’s disorder and high functioning autism with the Metaphor and Sarcasm Scenario Test (MSST). Brain and Development (in Japanese). 2006;38(3):177–181.
- The amygdala theory of autism. Neuroscience & Biobehavioral Reviews. 2000;24(3):355–364. doi:10.1016/s0149-7634(00)00011-7.
- Wang S, Li X. A revisit of the amygdala theory of autism: Twenty years after. Neuropsychologia. 2023;183:108519. doi:10.1016/j.neuropsychologia.2023.108519.
- Association AP. Diagnostic and Statistical Manual of Mental Disorders. 5th ed. Washington, DC: American Psychiatric Association; 2013.
- Heyes CM, Frith CD. The cultural evolution of mind reading. Science. 2014;344(6190). doi:10.1126/science.1243091.
- Social intelligence in the normal and autistic brain: an fMRI study. European Journal of Neuroscience. 1999;11(6):1891–1898. doi:10.1046/j.1460-9568.1999.00621.x.
- OpenAI. Introducing ChatGPT; 2022. https://openai.com/blog/chatgpt.
- Emergent Abilities of Large Language Models. Transactions on Machine Learning Research. 2022;2022:1–30.
- Social and emotional learning: Past, present, and future. In: Handbook for social and emotional learning: Research and practice. New York, US: The Guilford Press; 2015. p. 3–19.
- How Close is ChatGPT to Human Experts? Comparison Corpus, Evaluation, and Detection. arXiv. 2023;2301.07597:1–20. doi:10.48550/arXiv.2301.07597.
- Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models. PLOS Digital Health. 2023;2(2):e0000198. doi:10.1371/journal.pdig.0000198.
- GPTs are GPTs: An Early Look at the Labor Market Impact Potential of Large Language Models. arXiv. 2023;2303.10130:1–36. doi:10.48550/arXiv.2303.10130.
- Sparks of Artificial General Intelligence: Early experiments with GPT-4. arXiv. 2023;2303.12712. doi:10.48550/arXiv.2303.12712.
- Thinking Fast and Slow in Large Language Models. arXiv. 2022;2212.05206:1–30. doi:10.48550/arXiv.2212.05206.
- Hagendorff T. Machine Psychology: Investigating Emergent Capabilities and Behavior in Large Language Models Using Psychological Methods. arXiv. 2023;2303.13988:1–15. doi:10.48550/arXiv.2303.13988.
- Evaluating Theory of Mind in Question Answering. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, US: ACL; 2018. p. 2392–2400.
- Do Large Language Models know what humans know? Cognitive Science. 2023;47(7):e13309. doi:10.1111/cogs.13309.
- Ullman TD. Large Language Models Fail on Trivial Alterations to Theory-of-Mind Tasks. arXiv. 2023;2302.08399:1–11. doi:10.48550/arXiv.2302.08399.
- Kosinski M. Theory of Mind Might Have Spontaneously Emerged in Large Language Models. arXiv. 2023;2302.02083:1–31. doi:10.48550/arXiv.2302.02083.
- Developing ChatGPT’s Theory of Mind. Frontiers in Robotics and AI. 2023;10:1–4. doi:10.3389/frobt.2023.1189525.
- Metaphors in Pre-Trained Language Models: Probing and Generalization Across Datasets and Languages. In: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, US: ACL; 2022. p. 2037–2050.
- Challenging ChatGPT ‘Intelligence’ with Human Tools: A Neuropsychological Investigation on Prefrontal Functioning of a Large Language Model. SSRN. 2023; p. 1–33. doi:10.2139/ssrn.4377371.
- Personality Traits in Large Language Models. arXiv. 2023;2307.00184:1–91. doi:10.48550/arXiv.2307.00184.
- OpenAI. GPT-4 Technical Report. arXiv. 2023;2302.08774:1–100. doi:10.48550/arXiv.2303.08774.
- Papagno C. Comprehension of metaphors and idioms in patients with Alzheimer's disease: A longitudinal study. Brain. 2001;124(7):1450–1460. doi:10.1093/brain/124.7.1450.
- DeepMet: A Reading Comprehension Paradigm for Token-level Metaphor Detection. In: Proceedings of the 2nd Workshop on Figurative Language Processing. Stroudsburg, US: ACL; 2020. p. 30–39.
- A transformer-based approach to irony and sarcasm detection. Neural Computing and Applications. 2020;32(23):17309–17320. doi:10.1007/s00521-020-05102-3.
- Sarcasm Detection: A Comparative Study. arXiv. 2021;2107.02276:1–9.
- Psychologically-informed chain-of-thought prompts for metaphor understanding in large language models. arXiv. 2022;2209.08141:1–7. doi:10.48550/arXiv.2209.08141.
- https://www.databricks.com/blog/2023/04/12/dolly-first-open-commercially-viable-instruction-tuned-llm.
- Llama 2: Open Foundation and Fine-Tuned Chat Models. arXiv. 2023;2307.09288:1–77. doi:10.48550/arXiv.2307.09288.
- https://the-decoder.com/gpt-4-architecture-datasets-costs-and-more-leaked/.
- Loukusa S, Moilanen I. Pragmatic inference abilities in individuals with Asperger syndrome or high-functioning autism. A review. Research in Autism Spectrum Disorders. 2009;3(4):890–904. doi:10.1016/j.rasd.2009.05.002.
- Frith U. Autism: Explaining the Enigma. Oxford, UK: Blackwell Publishing; 1989.
- Frith U. Autism: A Very Short Introduction. Oxford, UK: Oxford University Press; 2008.
- Jolliffe T, Baron-Cohen S. Linguistic processing in high-functioning adults with autism or Asperger’s syndrome. Is global coherence impaired? Psychological Medicine. 2000;30(5):1169–1187. doi:10.1017/s003329179900241x.
- Ambiguity detection in adolescents with Asperger syndrome: Is central coherence or theory of mind impaired? Research in Autism Spectrum Disorders. 2011;5(1):648–656. doi:10.1016/j.rasd.2010.07.012.
- Social Skills Interventions for Children with Asperger’s Syndrome or High-Functioning Autism: A Review and Recommendations. Journal of Autism and Developmental Disorders. 2007;38(2):353–361. doi:10.1007/s10803-007-0402-4.
- LoRA: Low-Rank Adaptation of Large Language Models. In: Proceedings of the 10th International Conference on Learning Representations. Portland, US: OpenReview.net; 2022. p. 1–13.
- Training language models to follow instructions with human feedback. In: Proceedings of the 36th Annual Conference on Neural Information Processing Systems. Red Hook, US: Curran Associates; 2022. p. 27730–27744.
- Heuristics and Biases. Cambridge, UK: Cambridge University Press; 2002.
- Large Language Models Can Be Easily Distracted by Irrelevant Context. In: Proceedings of the 40th International Conference on Machine Learning. Cambridge, US: PMLR; 2023. p. 31210–31227.
- Language Acquisition: Do Children and Language Models Follow Similar Learning Stages? In: Findings of the Association for Computational Linguistics: ACL 2023. Stroudsburg, US: ACL; 2023. p. 12205–12218.
- Evidence of a Predictive Coding Hierarchy in the Human Brain Listening to Speech. Nature Human Behaviour. 2023;7(3):430–441. doi:10.1038/s41562-022-01516-2.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.