Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Mitigating the Learning Bias towards Repetition by Self-Contrastive Training for Open-Ended Generation (2307.01542v1)

Published 4 Jul 2023 in cs.CL

Abstract: Despite the huge progress in myriad generation tasks, pretrained LLMs (LMs) such as GPT2 still tend to generate repetitive texts with maximization-based decoding algorithms for open-ended generation. We attribute their overestimation of token-level repetition probabilities to the learning bias: LMs capture simple repetitive patterns faster with the MLE loss. We propose self-contrastive training to penalize the output of a premature checkpoint of the same model when it incorrectly predicts repetition, which is shown to mitigate repetition effectively while maintaining fluency on two datasets. Furthermore, we find that LMs use longer-range dependencies to predict repetitive tokens than non-repetitive ones, which may be the cause of sentence-level repetition loops.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (23)
  1. Why exposure bias matters: An imitation learning perspective of error accumulation in language generation. In Findings of the Association for Computational Linguistics: ACL 2022, pages 700–710.
  2. Language models are few-shot learners.
  3. Is gpt-3 text indistinguishable from human text? scarecrow: A framework for scrutinizing machine text. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 7250–7274.
  4. Jeffrey L Elman. 1990. Finding structure in time. Cognitive science, 14(2):179–211.
  5. Hierarchical neural story generation. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 889–898.
  6. Jessica Ficler and Yoav Goldberg. 2017. Controlling linguistic style aspects in neural language generation. In Proceedings of the Workshop on Stylistic Variation, pages 94–104.
  7. A theoretical analysis of the repetition problem in text generation. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 12848–12856.
  8. The curious case of neural text degeneration. In International Conference on Learning Representations.
  9. Ctrl: A conditional transformer language model for controllable generation. arXiv preprint arXiv:1909.05858.
  10. A diversity-promoting objective function for neural conversation models. In NAACL HLT 2016, The 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego California, USA, June 12-17, 2016, pages 110–119. The Association for Computational Linguistics.
  11. Contrastive decoding: Open-ended text generation as optimization. arXiv preprint arXiv:2210.15097.
  12. Straight to the gradient: Learning to use novel tokens for neural text generation. In Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pages 6642–6653. PMLR.
  13. Typical decoding for natural language generation. arXiv preprint arXiv:2202.00666.
  14. Pointer sentinel mixture models. arXiv preprint arXiv:1609.07843.
  15. Nelson Morgan and Hervé Bourlard. 1989. Generalization and parameter estimation in feedforward nets: Some experiments. Advances in neural information processing systems, 2.
  16. A deep reinforced model for abstractive summarization. In International Conference on Learning Representations.
  17. Mauve: Measuring the gap between neural text and human text using divergence frontiers. In Advances in Neural Information Processing Systems, volume 34, pages 4816–4828. Curran Associates, Inc.
  18. On the spectral bias of neural networks. In International Conference on Machine Learning, pages 5301–5310. PMLR.
  19. Attention is all you need. In Advances in neural information processing systems, pages 5998–6008.
  20. Consistency of a recurrent language model with respect to incomplete decoding. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 5553–5568, Online. Association for Computational Linguistics.
  21. Neural text generation with unlikelihood training. In International Conference on Learning Representations.
  22. Taming repetition in dialogue generation. CoRR, abs/2112.08657.
  23. Learning to break the loop: Analyzing and mitigating repetitions for neural text generation. In Advances in Neural Information Processing Systems.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Jian Guan (65 papers)
  2. Minlie Huang (226 papers)

Summary

We haven't generated a summary for this paper yet.