Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 164 tok/s
Gemini 2.5 Pro 46 tok/s Pro
GPT-5 Medium 21 tok/s Pro
GPT-5 High 27 tok/s Pro
GPT-4o 72 tok/s Pro
Kimi K2 204 tok/s Pro
GPT OSS 120B 450 tok/s Pro
Claude Sonnet 4.5 34 tok/s Pro
2000 character limit reached

Small But Funny: A Feedback-Driven Approach to Humor Distillation (2402.18113v1)

Published 28 Feb 2024 in cs.CL and cs.AI

Abstract: The emergence of LLMs has brought to light promising language generation capabilities, particularly in performing tasks like complex reasoning and creative writing. Consequently, distillation through imitation of teacher responses has emerged as a popular technique to transfer knowledge from LLMs to more accessible, Small LLMs (SLMs). While this works well for simpler tasks, there is a substantial performance gap on tasks requiring intricate language comprehension and creativity, such as humor generation. We hypothesize that this gap may stem from the fact that creative tasks might be hard to learn by imitation alone and explore whether an approach, involving supplementary guidance from the teacher, could yield higher performance. To address this, we study the effect of assigning a dual role to the LLM - as a "teacher" generating data, as well as a "critic" evaluating the student's performance. Our experiments on humor generation reveal that the incorporation of feedback significantly narrows the performance gap between SLMs and their larger counterparts compared to merely relying on imitation. As a result, our research highlights the potential of using feedback as an additional dimension to data when transferring complex language abilities via distillation.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (63)
  1. I2D2: Inductive knowledge distillation with NeuroLogic and self-imitation. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 9614–9630, Toronto, Canada. Association for Computational Linguistics.
  2. Art or artifice? large language models and the false promise of creativity. ArXiv, abs/2309.14556.
  3. FLUTE: Figurative language understanding through textual explanations. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 7139–7159, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
  4. Minigpt-v2: Large language model as a unified interface for vision-language multi-task learning. arXiv:2310.09478.
  5. Peng-Yu Chen and Von-Wun Soo. 2018. Humor recognition using deep learning. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), pages 113–117, New Orleans, Louisiana. Association for Computational Linguistics.
  6. Gptscore: Evaluate as you desire. ArXiv, abs/2302.04166.
  7. “judge me by my size (noun), do you?” YodaLib: A demographic-aware humor generation framework. In Proceedings of the 28th International Conference on Computational Linguistics, pages 2814–2825, Barcelona, Spain (Online). International Committee on Computational Linguistics.
  8. SAMSum corpus: A human-annotated dialogue dataset for abstractive summarization. In Proceedings of the 2nd Workshop on New Frontiers in Summarization, pages 70–79, Hong Kong, China. Association for Computational Linguistics.
  9. The false promise of imitating proprietary llms.
  10. Do androids laugh at electric sheep? humor “understanding” benchmarks from the new yorker caption contest. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 688–714, Toronto, Canada. Association for Computational Linguistics.
  11. Distilling the knowledge in a neural network.
  12. Distilling the knowledge in a neural network. ArXiv, abs/1503.02531.
  13. SemEval-2020 task 7: Assessing humor in edited news headlines. In Proceedings of the Fourteenth Workshop on Semantic Evaluation, pages 746–758, Barcelona (online). International Committee for Computational Linguistics.
  14. Distilling step-by-step! outperforming larger language models with less training data and smaller model sizes. In Findings of the Association for Computational Linguistics: ACL 2023, pages 8003–8017, Toronto, Canada. Association for Computational Linguistics.
  15. Impossible distillation: from low-quality model to high-quality dataset & model for summarization and paraphrasing.
  16. Towards a human-like open-domain chatbot. arXiv.
  17. BART: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 7871–7880, Online. Association for Computational Linguistics.
  18. Symbolic chain-of-thought distillation: Small models can also “think” step-by-step. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2665–2679, Toronto, Canada. Association for Computational Linguistics.
  19. Self-alignment with instruction backtranslation. ArXiv, abs/2308.06259.
  20. Chin-Yew Lin. 2004. ROUGE: A package for automatic evaluation of summaries. In Text Summarization Branches Out, pages 74–81, Barcelona, Spain. Association for Computational Linguistics.
  21. Yen-Ting Lin and Yun-Nung Chen. 2023. LLM-eval: Unified multi-dimensional automatic evaluation for open-domain conversations with large language models. In Proceedings of the 5th Workshop on NLP for Conversational AI (NLP4ConvAI 2023), pages 47–58, Toronto, Canada. Association for Computational Linguistics.
  22. G-eval: Nlg evaluation using gpt-4 with better human alignment. ArXiv, abs/2303.16634.
  23. On learning to summarize with large language models as references.
  24. BRIO: Bringing order to abstractive summarization. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2890–2903, Dublin, Ireland. Association for Computational Linguistics.
  25. Zhengyuan Liu and Nancy Chen. 2022. Learning from bootstrapping and stepwise reinforcement reward: A semi-supervised framework for text style transfer. In Findings of the Association for Computational Linguistics: NAACL 2022, pages 2633–2648, Seattle, United States. Association for Computational Linguistics.
  26. Pun-GAN: Generative adversarial network for pun generation. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3388–3393, Hong Kong, China. Association for Computational Linguistics.
  27. Self-refine: Iterative refinement with self-feedback. ArXiv, abs/2303.17651.
  28. Teaching small language models to reason. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 1773–1781, Toronto, Canada. Association for Computational Linguistics.
  29. Shuhaib Mehri and Vered Shwartz. 2023. Automatic evaluation of generative models with instruction tuning.
  30. Rada Mihalcea and Carlo Strapparava. 2005. Making computers laugh: Investigations in automatic humor recognition. In Human Language Technology - The Baltic Perspectiv.
  31. AmbiPun: Generating humorous puns with ambiguous context. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 1053–1062, Seattle, United States. Association for Computational Linguistics.
  32. Orca: Progressive learning from complex explanation traces of gpt-4.
  33. OpenAI. 2023. Gpt-4 technical report.
  34. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pages 311–318, Philadelphia, Pennsylvania, USA. Association for Computational Linguistics.
  35. Refiner: Reasoning feedback on intermediate representations.
  36. Saša Petrović and David Matthews. 2013. Unsupervised joke generation from big data. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 228–232, Sofia, Bulgaria. Association for Computational Linguistics.
  37. Language models are unsupervised multitask learners. OpenAI blog, 1(8):9.
  38. Direct preference optimization: Your language model is secretly a reward model.
  39. Towards empathetic open-domain conversation models: A new benchmark and dataset. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 5370–5381, Florence, Italy. Association for Computational Linguistics.
  40. Victor Raskin. 1979. Semantic mechanisms of humor. In Annual Meeting of the Berkeley Linguistics Society, volume 5, pages 325–335.
  41. From humor recognition to irony detection: The figurative language of social media. Data Knowl. Eng., 74:1–12.
  42. Graeme Ritchie. 2005. Computational mechanisms for pun generation. In Proceedings of the Tenth European Workshop on Natural Language Generation (ENLG-05), Aberdeen, Scotland. Association for Computational Linguistics.
  43. A practical application of computational humour. In Proceedings of the 4th international joint conference on computational creativity, pages 91–98. UK London.
  44. Joshua Robinson and David Wingate. 2023. Leveraging large language models for multiple choice question answering. In The Eleventh International Conference on Learning Representations.
  45. Branch-solve-merge improves large language model evaluation and generation. ArXiv, abs/2310.15123.
  46. Branch-solve-merge improves large language model evaluation and generation.
  47. Loose lips sink ships: Mitigating length bias in reinforcement learning from human feedback. ArXiv, abs/2310.05199.
  48. Distilling reasoning capabilities into smaller language models. In Findings of the Association for Computational Linguistics: ACL 2023, pages 7059–7073, Toronto, Canada. Association for Computational Linguistics.
  49. A long way to go: Investigating length correlations in rlhf.
  50. Energy and policy considerations for deep learning in NLP. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 3645–3650, Florence, Italy. Association for Computational Linguistics.
  51. Llama 2: Open foundation and fine-tuned chat models. ArXiv, abs/2307.09288.
  52. Zephyr: Direct distillation of lm alignment.
  53. Alessandro Valitutti. 2011. How many jokes are really funny? towards a new approach to the evaluation of computational humour generators. In Proceedings of International Workshop on Natural Language Processing and Cognitive Science (NLPCS 2011), pages 189–200, Denmark. Samfundslitteratur. International Workshop on Natural Language Processing and Cognitive Science (NLPCS 2011) ; Conference date: 20-08-2011 Through 21-08-2011.
  54. SCOTT: Self-consistent chain-of-thought distillation. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 5546–5558, Toronto, Canada. Association for Computational Linguistics.
  55. Large language models are not fair evaluators. ArXiv, abs/2305.17926.
  56. Chain-of-thought prompting elicits reasoning in large language models.
  57. Can humor prediction datasets be used for humor generation? humorous headline generation via style transfer. In Proceedings of the Second Workshop on Figurative Language Processing, pages 186–191, Online. Association for Computational Linguistics.
  58. Orion Weller and Kevin Seppi. 2019. Humor detection: A transformer gets the last laugh. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3621–3625, Hong Kong, China. Association for Computational Linguistics.
  59. Symbolic knowledge distillation: from general language models to commonsense models. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 4602–4625, Seattle, United States. Association for Computational Linguistics.
  60. The generative ai paradox: "what it can create, it may not understand".
  61. A neural approach to pun generation. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1650–1660, Melbourne, Australia. Association for Computational Linguistics.
  62. AlignScore: Evaluating factual consistency with a unified alignment function. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 11328–11348, Toronto, Canada. Association for Computational Linguistics.
  63. Bertscore: Evaluating text generation with bert. In International Conference on Learning Representations.
Citations (5)

Summary

We haven't generated a summary for this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 3 tweets and received 43 likes.

Upgrade to Pro to view all of the tweets about this paper:

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube