ChatGPT as a commenter to the news: can LLMs generate human-like opinions? (2312.13961v1)
Abstract: ChatGPT, GPT-3.5, and other LLMs have drawn significant attention since their release, and the abilities of these models have been investigated for a wide variety of tasks. In this research we investigate to what extent GPT-3.5 can generate human-like comments on Dutch news articles. We define human likeness as `not distinguishable from human comments', approximated by the difficulty of automatic classification between human and GPT comments. We analyze human likeness across multiple prompting techniques. In particular, we utilize zero-shot, few-shot and context prompts, for two generated personas. We found that our fine-tuned BERT models can easily distinguish human-written comments from GPT-3.5 generated comments, with none of the used prompting methods performing noticeably better. We further analyzed that human comments consistently showed higher lexical diversity than GPT-generated comments. This indicates that although generative LLMs can generate fluent text, their capability to create human-like opinionated comments is still limited.
- Generating sentiment-preserving fake online reviews using neural language models and their human- and machine-based detection. In Advanced Information Networking and Applications, pages 1341–1354, 2020.
- Language models are few-shot learners. In Advances in Neural Information Processing Systems, volume 33, pages 1877–1901, 2020.
- Detecting hate speech with GPT-3. arXiv preprint arXiv:2103.12407, 2021.
- RobBERT: a Dutch RoBERTa-based Language Model. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 3255–3265, Online, November 2020. Association for Computational Linguistics.
- BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, 2019.
- How good are GPT models at machine translation? A comprehensive evaluation. arXiv preprint arXiv:2302.09210, 2023.
- The curious case of neural text degeneration. In International Conference on Learning Representations, 2020.
- RoBERTa: A robustly optimized BERT pretraining approach, 2019.
- A unified approach to interpreting model predictions. In Advances in Neural Information Processing Systems, volume 30, 2017.
- Creating and detecting fake reviews of online products. Journal of Retailing and Consumer Services, 64:102771, 2022.
- Multitask prompted training enables zero-shot task generalization. In The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022, 2022.
- Lucas Shen. LexicalRichness: A small module to compute textual lexical richness, 2022.
- Fast WordPiece tokenization. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 2089–2103, Online and Punta Cana, Dominican Republic, November 2021. Association for Computational Linguistics.
- AI model GPT-3 (dis)informs us better than humans. Science Advances, 9(26):eadh1850, 2023.
- Lexical statistics and tipological structures: a measure of lexical richness. Procedia-Social and Behavioral Sciences, 95:447–454, 2013.
- Is ChatGPT a good sentiment analyzer? A preliminary study. arXiv preprint arXiv:2304.04339, 2023.
- Extractive summarization via ChatGPT for faithful summary generation. arXiv preprint arXiv:2304.04193, 2023.
- Rayden Tseng (1 paper)
- Suzan Verberne (57 papers)
- Peter van der Putten (9 papers)