Detecting AI-Generated Sentences in Human-AI Collaborative Hybrid Texts: Challenges, Strategies, and Insights (2403.03506v4)
Abstract: This study explores the challenge of sentence-level AI-generated text detection within human-AI collaborative hybrid texts. Existing studies of AI-generated text detection for hybrid texts often rely on synthetic datasets. These typically involve hybrid texts with a limited number of boundaries. We contend that studies of detecting AI-generated content within hybrid texts should cover different types of hybrid texts generated in realistic settings to better inform real-world applications. Therefore, our study utilizes the CoAuthor dataset, which includes diverse, realistic hybrid texts generated through the collaboration between human writers and an intelligent writing system in multi-turn interactions. We adopt a two-step, segmentation-based pipeline: (i) detect segments within a given hybrid text where each segment contains sentences of consistent authorship, and (ii) classify the authorship of each identified segment. Our empirical findings highlight (1) detecting AI-generated sentences in hybrid texts is overall a challenging task because (1.1) human writers' selecting and even editing AI-generated sentences based on personal preferences adds difficulty in identifying the authorship of segments; (1.2) the frequent change of authorship between neighboring sentences within the hybrid text creates difficulties for segment detectors in identifying authorship-consistent segments; (1.3) the short length of text segments within hybrid texts provides limited stylistic cues for reliable authorship determination; (2) before embarking on the detection process, it is beneficial to assess the average length of segments within the hybrid text. This assessment aids in deciding whether (2.1) to employ a text segmentation-based strategy for hybrid texts with longer segments, or (2.2) to adopt a direct sentence-by-sentence classification strategy for those with shorter segments.
- Attention-based neural text segmentation. In European Conference on Information Retrieval, pages 180–193. Springer, 2018.
- Segformer: a topic segmentation model with controllable range of attention. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, pages 12545–12552, 2023.
- A joint model for document segmentation and segment labeling. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 313–322, 2020.
- Statistical models for text segmentation. Machine learning, 34:177–210, 1999.
- The impact of multiple parallel phrase suggestions on email input and composition behaviour of native and non-native english writers. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems, pages 1–13, 2021.
- All that’s ‘human’is not gold: Evaluating human evaluation of generated text. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, pages 7282–7296, 2021.
- Geoffrey M Currie. Academic integrity and artificial intelligence: is chatgpt hype, hero or heresy? In Seminars in Nuclear Medicine. Elsevier, 2023.
- Real or fake text?: Investigating human ability to detect boundaries between human-written and machine-generated text. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, pages 12763–12771, 2023.
- Tweepfake: About detecting deepfake tweets. Plos one, 16(5):e0251415, 2021.
- Lessons learnt from linear text segmentation: a fair comparison of architectural and sentence encoding strategies for successful segmentation. In Ruslan Mitkov and Galia Angelova, editors, Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing, pages 408–418, Varna, Bulgaria, September 2023. INCOMA Ltd., Shoumen, Bulgaria.
- Tipster: A topic-guided language model for topic-aware text segmentation. In International Conference on Database Systems for Advanced Applications, pages 213–221. Springer, 2022.
- How close is chatgpt to human experts? comparison corpus, evaluation, and detection, 2023.
- Deberta: Decoding-enhanced bert with disentangled attention. arXiv preprint arXiv:2006.03654, 2020.
- DeBERTav3: Improving deBERTa using ELECTRA-style pre-training with gradient-disentangled embedding sharing. In The Eleventh International Conference on Learning Representations, 2023.
- Bad actor, good advisor: Exploring the role of large language models in fake news detection. In Proceedings of the 38th AAAI Conference on Artificial Intelligence, Vancouver, Canada, February 2024.
- Automatic detection of machine generated text: A critical survey. In Proceedings of the 28th International Conference on Computational Linguistics, pages 2296–2309, 2020.
- Automatic detection of entity-manipulated text using factual knowledge. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 86–93, 2022.
- Jacob Devlin Ming-Wei Chang Kenton and Lee Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of NAACL-HLT, pages 4171–4186, 2019.
- Outfox: Llm-generated essay detection through in-context learning with adversarially generated examples. In Proceedings of the 38th AAAI Conference on Artificial Intelligence, Vancouver, Canada, February 2024.
- Text segmentation as a supervised learning task. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), pages 469–473, 2018.
- Coauthor: Designing a human-ai collaborative writing dataset for exploring language model capabilities. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems, pages 1–19, 2022.
- Human guided exploitation of interpretable attention patterns in summarization and topic segmentation. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 10189–10204, 2022.
- Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692, 2019.
- Transformer over pre-trained transformer for neural text segmentation with enhanced topic coherence. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 3334–3340, 2021.
- Text segmentation by cross segment attention. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 4707–4716, 2020.
- Is this abstract generated by ai? a research for the gap between ai-generated scientific text and human-written scientific text, 2023.
- Towards understanding and detecting fake reviews in app stores. Empirical Software Engineering, 24(6):3316–3355, 2019.
- Detectgpt: Zero-shot machine-generated text detection using probability curvature. In Proceedings of the 40th International Conference on Machine Learning, ICML, 2023.
- Threat scenarios and best practices to detect neural fake news. In Proceedings of the 29th International Conference on Computational Linguistics, pages 1233–1249, 2022.
- Language models are unsupervised multitask learners. 2019.
- Domenic Rosati. SynSciPass: detecting appropriate uses of scientific text generation. In Arman Cohan, Guy Feigenblat, Dayne Freitag, Tirthankar Ghosal, Drahomira Herrmannova, Petr Knoth, Kyle Lo, Philipp Mayr, Michal Shmueli-Scheuer, Anita de Waard, and Lucy Lu Wang, editors, Proceedings of the Third Workshop on Scholarly Document Processing, pages 214–222, Gyeongju, Republic of Korea, October 2022. Association for Computational Linguistics.
- Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108, 2019.
- Swapna Somasundaran et al. Two-level transformer and auxiliary coherence modeling for improved text segmentation. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pages 7797–7804, 2020.
- The chatgpt storm and what faculty can do. Nurse Educator, 48(3):119–124, 2023.
- Statistical section segmentation in free-text clinical records. In Lrec, pages 2001–2008, 2012.
- An lstm approach to short text sentiment classification with word embeddings. In Proceedings of the 30th conference on computational linguistics and speech processing (ROCLING 2018), pages 214–223, 2018.
- Hierarchical heterogeneous graph representation learning for short text classification. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 3091–3101, 2021.
- SeqXGPT: Sentence-level AI-generated text detection. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 1144–1156, Singapore, Dec 2023. Association for Computational Linguistics.
- A sequence-to-sequence approach with mixed pointers to topic segmentation and segment labeling. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 2683–2693, 2023.
- Dialogue topic segmentation via parallel extraction network with neighbor smoothing. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 2126–2131, 2022.
- Improving context modeling in neural topic segmentation. In Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing, pages 626–636, 2020.
- Improving long document topic segmentation models with enhanced coherence modeling. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 5592–5605, 2023.
- Defending against neural fake news. Advances in neural information processing systems, 32, 2019.
- Towards automatic boundary detection for human-ai collaborative hybrid essay in education. Proceedings of the 38th AAAI Conference on Artificial Intelligence, 2024.
- Zijie Zeng (5 papers)
- Shiqi Liu (31 papers)
- Lele Sha (4 papers)
- Zhuang Li (69 papers)
- Kaixun Yang (7 papers)
- Sannyuya Liu (9 papers)
- Dragan Gašević (32 papers)
- Guanliang Chen (11 papers)