Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

ChatGPT is fun, but it is not funny! Humor is still challenging Large Language Models (2306.04563v1)

Published 7 Jun 2023 in cs.AI, cs.CL, cs.HC, and cs.LG

Abstract: Humor is a central aspect of human communication that has not been solved for artificial agents so far. LLMs are increasingly able to capture implicit and contextual information. Especially, OpenAI's ChatGPT recently gained immense public attention. The GPT3-based model almost seems to communicate on a human level and can even tell jokes. Humor is an essential component of human communication. But is ChatGPT really funny? We put ChatGPT's sense of humor to the test. In a series of exploratory experiments around jokes, i.e., generation, explanation, and detection, we seek to understand ChatGPT's capability to grasp and reproduce human humor. Since the model itself is not accessible, we applied prompt-based experiments. Our empirical evidence indicates that jokes are not hard-coded but mostly also not newly generated by the model. Over 90% of 1008 generated jokes were the same 25 Jokes. The system accurately explains valid jokes but also comes up with fictional explanations for invalid jokes. Joke-typical characteristics can mislead ChatGPT in the classification of jokes. ChatGPT has not solved computational humor yet but it can be a big leap toward "funny" machines.

Citations (24)

Summary

  • The paper reveals that ChatGPT mainly regenerates a limited set of pre-existing jokes, with over 90% repetition observed in its output.
  • The paper demonstrates that ChatGPT provides confident yet sometimes convoluted joke explanations, indicating a superficial grasp of nuanced humor.
  • The paper indicates that ChatGPT relies on surface-level attributes for joke detection, highlighting limitations in its deeper semantic understanding of humor.

Analyzing ChatGPT's Humor Capabilities: A Structured Examination

The paper "ChatGPT is fun, but it is not funny!" by Sophie Jentzsch and Kristian Kersting addresses the competency of ChatGPT, a LLM developed by OpenAI, in generating and understanding human humor. This investigation is positioned at the intersection of NLP advances and the domain of computational humor. The paper leverages distinctive experimental setups involving joke generation, explanation, and detection to empirically evaluate ChatGPT's humorous acumen.

Empirical Findings

1. Joke Generation:

One of the initial hypotheses posited by the authors was that ChatGPT regenerates jokes by selecting from a pre-defined repository of jokes, rather than generating them anew. This hypothesis was scrutinized through an experiment where the model generated jokes a thousand times. Remarkably, over 90% of these jokes were one of the 25 most frequently occurring jokes. This observation challenges the assumption that ChatGPT generates jokes afresh; instead, it seems to replicate existing jokes. The recurrence of certain jokes also suggests that these could be derived directly from the training data or are potentially memetic in the language domain this model has been exposed to. Interestingly, although the model appears to lack original humor capabilities, it occasionally blends elements from different jokes, thereby representing limited creative functionality.

2. Joke Explanation:

When tasked with explaining jokes from its output, ChatGPT exhibited a competent ability to elucidate wordplay and double meanings in jokes. However, when encountered with non-standard jokes or jokes lacking inherent humor, the model generated convoluted explanations, maintaining a sense of confidence, thereby unveiling another of ChatGPT's characteristics: its inclination towards constructing plausible but fictitious explanations when faced with ambiguous queries.

3. Joke Detection:

Jentzsch and Kersting investigated ChatGPT's ability to discern jokes from non-jokes by modifying existing jokes to systematically eliminate distinctive attributes such as wordplay, structure, or topic. They noticed that when presented without enough of these joke-like characteristics, many samples were not categorized as jokes by the model. This suggests that ChatGPT relies shallowly on these surface-level traits to detect humor, rather than a robust semantic understanding of humor.

Implications and Future Directions

The paper highlights the superficial understanding of humor that is currently embedded in ChatGPT, providing evidence that while it can replicate formatted content recognizable as humor, its ability to innovate or appreciate more abstract or sophisticated humor is still limited. The paper leaves open questions about the further development of LLMs with an enriched understanding of human-like humor, suggesting a need for models that can grasp nuanced humor over a wide cultural and contextual spectrum.

The authors propose future research could involve examining newer iterations of LLMs such as GPT-4 or exploring open-source alternatives like LLaMa or GPT-NeoX to compare capabilities in humor appreciation and generation. Such endeavors would be helpful in realizing more competent conversational agents that can truly enrich human-computer interaction by transcending superficial mimicry towards deeper behavioral emulation.

Conclusion

In conclusion, while ChatGPT demonstrates remarkable text-generation capabilities, its proficiency in humor generation and interpretation remains underdeveloped. The paper by Jentzsch and Kersting rigorously elucidates these limitations, offering insight into the challenges within the NLP domain regarding the computational handling of abstract human elements like humor. Although this reflects the current stage of AI's development in humor, these findings could serve as a valuable groundwork to guide future research and improvements in this niche yet influential field.

Youtube Logo Streamline Icon: https://streamlinehq.com