Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Learning to Diversify Neural Text Generation via Degenerative Model (2309.12619v1)

Published 22 Sep 2023 in cs.CL

Abstract: Neural LLMs often fail to generate diverse and informative texts, limiting their applicability in real-world problems. While previous approaches have proposed to address these issues by identifying and penalizing undesirable behaviors (e.g., repetition, overuse of frequent words) from LLMs, we propose an alternative approach based on an observation: models primarily learn attributes within examples that are likely to cause degeneration problems. Based on this observation, we propose a new approach to prevent degeneration problems by training two models. Specifically, we first train a model that is designed to amplify undesirable patterns. We then enhance the diversity of the second model by focusing on patterns that the first model fails to learn. Extensive experiments on two tasks, namely LLMing and dialogue generation, demonstrate the effectiveness of our approach.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (39)
  1. Satanjeev Banerjee and Alon Lavie. 2005. METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. In Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, pages 65–72, Ann Arbor, Michigan. Association for Computational Linguistics.
  2. Semeval-2017 task 1: Semantic textual similarity multilingual and crosslingual focused evaluation. In Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017), pages 1–14.
  3. A survey on dialogue systems: Recent advances and new frontiers. Acm Sigkdd Explorations Newsletter, 19(2):25–35.
  4. F^ 2-softmax: Diversifying neural text generation via frequency factorized softmax. In Proc. of the Conference on Empirical Methods in Natural Language Processing (EMNLP).
  5. Dorin Comaniciu and Peter Meer. 2002. Mean shift: A robust approach toward feature space analysis. IEEE Transactions on pattern analysis and machine intelligence, 24(5):603–619.
  6. Improving neural conversational models with entropy-based data filtering. In Proc. the Annual Meeting of the Association for Computational Linguistics (ACL).
  7. The second conversational intelligence challenge (convai2). In The NeurIPS’18 Competition, pages 187–208. Springer.
  8. The second conversational intelligence challenge (convai2). The NeurIPS’18 Competition.
  9. Stephen Fagan and Ramazan Gençay. 2011. An introduction to textual econometrics. Handbook of empirical economics and finance, pages 133–154.
  10. Hierarchical neural story generation. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 889–898.
  11. A theoretical analysis of the repetition problem in text generation. In Proceedings of the AAAI Conference on Artificial Intelligence.
  12. Simcse: Simple contrastive learning of sentence embeddings. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 6894–6910.
  13. Training dynamics for text summarization models. In Findings of the Association for Computational Linguistics: ACL 2022, pages 2061–2073.
  14. Co-teaching: Robust training of deep neural networks with extremely noisy labels. In Proc. the Advances in Neural Information Processing Systems (NeurIPS), volume 31.
  15. Tianxing He and James Glass. 2020. Negative training for neural dialogue response generation. In Proc. the Annual Meeting of the Association for Computational Linguistics (ACL).
  16. Learning to write with cooperative discriminators. In Proc. of Annual Meeting of the Association for Computational Linguistics (ACL), pages 1638–1649.
  17. The curious case of neural text degeneration. In International Conference on Learning Representations (ICLR).
  18. Improving neural response diversity with frequency-aware cross-entropy loss. In Proc. of World Wide Web Conference (WWW), pages 2879–2885.
  19. Jacob Devlin Ming-Wei Chang Kenton and Lee Kristina Toutanova. 2019. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of NAACL-HLT, pages 4171–4186.
  20. Diederik P Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. In ICLR (Poster).
  21. A diversity-promoting objective function for neural conversation models. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 110–119.
  22. Don’t say that! making inconsistent dialogue unlikely with unlikelihood training. In Proc. the Annual Meeting of the Association for Computational Linguistics (ACL).
  23. Dailydialog: A manually labelled multi-turn dialogue dataset. In Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 986–995.
  24. Diversifying neural dialogue generation via negative distillation. In Proc. of The Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL).
  25. Focal loss for dense object detection. In Proceedings of the IEEE international conference on computer vision, pages 2980–2988.
  26. Pointer sentinel mixture models. arXiv preprint arXiv:1609.07843.
  27. Abstractive text summarization using sequence-to-sequence rnns and beyond. In Proceedings of The 20th SIGNLL Conference on Computational Natural Language Learning, pages 280–290.
  28. Learning from failure: De-biasing classifier from biased classifier. In Proc. the Advances in Neural Information Processing Systems (NeurIPS), volume 33, pages 20673–20684.
  29. Don’t give me the details, just the summary! topic-aware convolutional neural networks for extreme summarization. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 1797–1807.
  30. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pages 311–318, Philadelphia, Pennsylvania, USA. Association for Computational Linguistics.
  31. Regularizing neural networks by penalizing confident output distributions. arXiv preprint arXiv:1701.06548.
  32. Language models are unsupervised multitask learners. OpenAI blog, 1(8):9.
  33. Learning from others’ mistakes: Avoiding dataset biases without modeling them. In Proc. the International Conference on Learning Representations (ICLR).
  34. Get to the point: Summarization with pointer-generator networks. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1073–1083.
  35. Towards debiasing nlu models from unknown biases. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 7597–7610.
  36. Attention is all you need. In Proc. of Advances in neural information processing systems (NIPS), pages 5998–6008.
  37. Diversifying dialog generation via adaptive label smoothing. In Proc. the Annual Meeting of the Association for Computational Linguistics (ACL).
  38. Neural text generation with unlikelihood training. In Proc. the International Conference on Learning Representations (ICLR).
  39. Texygen: A benchmarking platform for text generation models. In The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, pages 1097–1100.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Jimin Hong (9 papers)
  2. ChaeHun Park (15 papers)
  3. Jaegul Choo (161 papers)

Summary

We haven't generated a summary for this paper yet.