Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
162 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Adaptive Ensembles of Fine-Tuned Transformers for LLM-Generated Text Detection (2403.13335v1)

Published 20 Mar 2024 in cs.LG and cs.AI

Abstract: LLMs have reached human-like proficiency in generating diverse textual content, underscoring the necessity for effective fake text detection to avoid potential risks such as fake news in social media. Previous research has mostly tested single models on in-distribution datasets, limiting our understanding of how these models perform on different types of data for LLM-generated text detection task. We researched this by testing five specialized transformer-based models on both in-distribution and out-of-distribution datasets to better assess their performance and generalizability. Our results revealed that single transformer-based classifiers achieved decent performance on in-distribution dataset but limited generalization ability on out-of-distribution dataset. To improve it, we combined the individual classifiers models using adaptive ensemble algorithms, which improved the average accuracy significantly from 91.8% to 99.2% on an in-distribution test set and from 62.9% to 72.5% on an out-of-distribution test set. The results indicate the effectiveness, good generalization ability, and great potential of adaptive ensemble algorithms in LLM-generated text detection.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (68)
  1. J. Achiam, S. Adler, S. Agarwal, L. Ahmad, I. Akkaya, F. L. Aleman, D. Almeida, J. Altenschmidt, S. Altman, S. Anadkat et al., “Gpt-4 technical report,” arXiv preprint arXiv:2303.08774, 2023.
  2. Anthropic, “Model card and evaluations for claude models,” Anthropic, Tech. Rep., 2023. [Online]. Available: https://www-files.anthropic.com/production/ images/Model-Card-Claude-2.pdf
  3. J. Su, C. Jiang, X. Jin, Y. Qiao, T. Xiao, H. Ma, R. Wei, Z. Jing, J. Xu, and J. Lin, “Large language models for forecasting and anomaly detection: A systematic literature review,” arXiv preprint arXiv:2402.10350, 2024.
  4. V. Veselovsky, M. H. Ribeiro, and R. West, “Artificial artificial artificial intelligence: Crowd workers widely use large language models for text production tasks,” arXiv preprint arXiv:2306.07899, 2023.
  5. T. Liu, I. Škrjanec, and V. Demberg, “Temperature-scaling surprisal estimates improve fit to human reading times–but does it do so for the “right reasons”?” in ICLR 2024 Workshop on Representational Alignment, 2024.
  6. D. Li, J. You, K. Funakoshi, and M. Okumura, “A-tip: attribute-aware text infilling via pre-trained language model,” in Proceedings of the 29th International Conference on Computational Linguistics, 2022, pp. 5857–5869.
  7. Y. Zhou, X. Li, Q. Wang, and J. Shen, “Visual in-context learning for large vision-language models,” arXiv preprint arXiv:2402.11574, 2024.
  8. Y. Zhou, X. Geng, T. Shen, C. Tao, G. Long, J.-G. Lou, and J. Shen, “Thread of thought unraveling chaotic contexts,” arXiv preprint arXiv:2311.08734, 2023.
  9. T. Liu, C. Xu, Y. Qiao, C. Jiang, and W. Chen, “News recommendation with attention mechanism,” Journal of Industrial Engineering and Applied Science, vol. 2, no. 1, pp. 21–26, 2024.
  10. T. Susnjak, “Chatgpt: The end of online exam integrity?” arXiv preprint arXiv:2212.09292, 2022.
  11. J. Cui, Z. Li, Y. Yan, B. Chen, and L. Yuan, “Chatlaw: Open-source legal large language model with integrated external knowledge bases,” arXiv preprint arXiv:2306.16092, 2023.
  12. S. R. Piccolo, P. Denny, A. Luxton-Reilly, S. Payne, and P. G. Ridge, “Many bioinformatics programming tasks can be automated with chatgpt,” arXiv preprint arXiv:2303.13528, 2023.
  13. A. J. Thirunavukarasu, D. S. J. Ting, K. Elangovan, L. Gutierrez, T. F. Tan, and D. S. W. Ting, “Large language models in medicine,” Nature medicine, vol. 29, no. 8, pp. 1930–1940, 2023.
  14. Z. Ji, N. Lee, R. Frieske, T. Yu, D. Su, Y. Xu, E. Ishii, Y. J. Bang, A. Madotto, and P. Fung, “Survey of hallucination in natural language generation,” ACM Computing Surveys, vol. 55, no. 12, pp. 1–38, 2023.
  15. J. Lee, T. Le, J. Chen, and D. Lee, “Do language models plagiarize?” in Proceedings of the ACM Web Conference 2023, 2023, pp. 3637–3647.
  16. L. Weidinger, J. Mellor, M. Rauh, C. Griffin, J. Uesato, P.-S. Huang, M. Cheng, M. Glaese, B. Balle, A. Kasirzadeh et al., “Ethical and social risks of harm from language models,” arXiv preprint arXiv:2112.04359, 2021.
  17. N. Ayoobi, S. Shahriar, and A. Mukherjee, “The looming threat of fake and llm-generated linkedin profiles: Challenges and opportunities for detection and prevention,” in Proceedings of the 34th ACM Conference on Hypertext and Social Media, 2023, pp. 1–10.
  18. C. Stokel-Walker, “Ai bot chatgpt writes smart essays-should academics worry?” Nature, 2022.
  19. E. Kasneci, K. Seßler, S. Küchemann, M. Bannert, D. Dementieva, F. Fischer, U. Gasser, G. Groh, S. Günnemann, E. Hüllermeier et al., “Chatgpt for good? on opportunities and challenges of large language models for education,” Learning and individual differences, vol. 103, p. 102274, 2023.
  20. Y. Liang, X. Wang, Y. C. Wu, H. Fu, and M. Zhou, “A study on blockchain sandwich attack strategies based on mechanism design game theory,” Electronics, vol. 12, no. 21, p. 4417, 2023.
  21. J. Tian, C. Shen, B. Wang, X. Xia, M. Zhang, C. Lin, and Q. Li, “Lesson: Multi-label adversarial false data injection attack for deep learning locational detection,” IEEE Transactions on Dependable and Secure Computing, 2024.
  22. J. Liu, T. Hu, Y. Zhang, X. Gai, Y. Feng, and Z. Liu, “A chatgpt aided explainable framework for zero-shot medical image diagnosis,” arXiv preprint arXiv:2307.01981, 2023.
  23. S. Chen, N. Kong, X. Sun, H. Meng, and M. Li, “Claims data-driven modeling of hospital time-to-readmission risk with latent heterogeneity,” Health care management science, vol. 22, pp. 156–179, 2019.
  24. Y. Li, W. Wang, X. Yan, M. Gao, and M. Xiao, “Research on the application of semantic network in disease diagnosis prompts based on medical corpus,” International Journal of Innovative Research in Computer Science & Technology, vol. 12, no. 2, pp. 1–9, 2024.
  25. S. Chen, W. D. Kearns, J. L. Fozard, and M. Li, “Personalized fall risk assessment for long-term care services improvement,” in 2017 Annual Reliability and Maintainability Symposium (RAMS).   IEEE, 2017, pp. 1–7.
  26. W. Dai, J. Tao, X. Yan, Z. Feng, and J. Chen, “Addressing unintended bias in toxicity detection: An lstm and attention-based approach,” in 2023 5th International Conference on Artificial Intelligence and Computer Applications (ICAICA).   IEEE, 2023, pp. 375–379.
  27. J. Liu, T. Hu, Y. Zhang, Y. Feng, J. Hao, J. Lv, and Z. Liu, “Parameter-efficient transfer learning for medical visual question answering,” IEEE Transactions on Emerging Topics in Computational Intelligence, 2023.
  28. T. Liu, C. Xu, Y. Qiao, C. Jiang, and J. Yu, “Particle filter slam for vehicle localization,” Journal of Industrial Engineering and Applied Science, vol. 2, no. 1, pp. 27–31, 2024.
  29. S. Chen, L. Lu, Y. Xiang, Q. Lu, and M. Li, “A data heterogeneity modeling and quantification approach for field pre-assessment of chloride-induced corrosion in aging infrastructures,” Reliability Engineering & System Safety, vol. 171, pp. 123–135, 2018.
  30. J. Tian, B. Wang, R. Guo, Z. Wang, K. Cao, and X. Wang, “Adversarial attacks and defenses for deep-learning-based unmanned aerial vehicles,” IEEE Internet of Things Journal, vol. 9, no. 22, pp. 22 399–22 409, 2021.
  31. S. Chen, L. Lu, and M. Li, “Multi-state reliability demonstration tests,” Quality Engineering, vol. 29, no. 3, pp. 431–445, 2017.
  32. J. Wu, N. Hovakimyan, and J. Hobbs, “Genco: An auxiliary generator from contrastive learning for enhanced few-shot learning in remote sensing,” arXiv preprint arXiv:2307.14612, 2023.
  33. S. Chen, L. Lu, Q. Zhang, and M. Li, “Optimal binomial reliability demonstration tests design under acceptance decision uncertainty,” Quality Engineering, vol. 32, no. 3, pp. 492–508, 2020.
  34. B. Wang, L. Lu, S. Chen, and M. Li, “Optimal test design for reliability demonstration under multi-stage acceptance uncertainties,” Quality Engineering, vol. 0, no. 0, pp. 1–14, 2023. [Online]. Available: https://doi.org/10.1080/08982112.2023.2249188
  35. S. Chen, K. Li, H. Fu, Y. C. Wu, and Y. Huang, “Sea ice extent prediction with machine learning methods and subregional analysis in the arctic,” Atmosphere, vol. 14, no. 6, p. 1023, 2023.
  36. R. Tao, P. Zhao, J. Wu, N. F. Martin, M. T. Harrison, C. Ferreira, Z. Kalantari, and N. Hovakimyan, “Optimizing crop management with reinforcement learning and imitation learning,” arXiv preprint arXiv:2209.09991, 2022.
  37. J. Wu, R. Tao, P. Zhao, N. F. Martin, and N. Hovakimyan, “Optimizing nitrogen management with deep reinforcement learning and crop simulations,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 1712–1720.
  38. J. Wu, D. Pichler, D. Marley, D. Wilson, N. Hovakimyan, and J. Hobbs, “Extended agriculture-vision: An extension of a large aerial image dataset for agricultural pattern analysis,” arXiv preprint arXiv:2303.02460, 2023.
  39. J.-Y. Shi, L.-T. Ling, F. Xue, Z.-J. Qin, Y.-J. Li, Z.-X. Lai, and T. Yang, “Combining incremental conductance and firefly algorithm for tracking the global mpp of pv arrays,” Journal of Renewable and Sustainable Energy, vol. 9, no. 2, 2017.
  40. E. Mitchell, Y. Lee, A. Khazatsky, C. D. Manning, and C. Finn, “Detectgpt: Zero-shot machine-generated text detection using probability curvature,” in International Conference on Machine Learning.   PMLR, 2023, pp. 24 950–24 962.
  41. J. Pu, Z. Sarwar, S. M. Abdullah, A. Rehman, Y. Kim, P. Bhattacharya, M. Javed, and B. Viswanath, “Deepfake text detection: Limitations and opportunities,” in 2023 IEEE Symposium on Security and Privacy (SP).   IEEE, 2023, pp. 1613–1630.
  42. G. Jawahar, M. Abdul-Mageed, and L. V. Lakshmanan, “Automatic detection of machine generated text: A critical survey,” arXiv preprint arXiv:2011.01314, 2020.
  43. B. Guo, X. Zhang, Z. Wang, M. Jiang, J. Nie, Y. Ding, J. Yue, and Y. Wu, “How close is chatgpt to human experts? comparison corpus, evaluation, and detection,” arXiv preprint arXiv:2301.07597, 2023.
  44. Y. Li, Q. Li, L. Cui, W. Bi, L. Wang, L. Yang, S. Shi, and Y. Zhang, “Deepfake text detection in the wild,” arXiv preprint arXiv:2305.13242, 2023.
  45. L. Breiman, “Random forests,” Machine learning, vol. 45, pp. 5–32, 2001.
  46. A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly et al., “An image is worth 16x16 words: Transformers for image recognition at scale,” arXiv preprint arXiv:2010.11929, 2020.
  47. J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” arXiv preprint arXiv:1810.04805, 2018.
  48. J. Wu, S. Chen, Q. Zhao, R. Sergazinov, C. Li, S. Liu, C. Zhao, T. Xie, H. Guo, C. Ji et al., “Switchtab: Switched autoencoders are effective tabular learners,” arXiv preprint arXiv:2401.02013, 2024.
  49. S. Chen, J. Wu, N. Hovakimyan, and H. Yao, “Recontab: Regularized contrastive representation learning for tabular data,” arXiv preprint arXiv:2310.18541, 2023.
  50. X. Li, X. Wang, X. Chen, Y. Lu, H. Fu, and Y. C. Wu, “Unlabeled data selection for active learning in image classification,” Scientific Reports, vol. 14, no. 1, p. 424, 2024.
  51. R. Shijaku and E. Canhasi, “Chatgpt generated text detection,” Publisher: Unpublished, 2023.
  52. Y. Wang, J. Mansurov, P. Ivanov, J. Su, A. Shelmanov, A. Tsvigun, C. Whitehouse, O. M. Afzal, T. Mahmoud, A. F. Aji et al., “M4: Multi-generator, multi-domain, and multi-lingual black-box machine-generated text detection,” arXiv preprint arXiv:2305.14902, 2023.
  53. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in neural information processing systems, vol. 30, 2017.
  54. M. Zhu, Y. Zhang, Y. Gong, K. Xing, X. Yan, and J. Song, “Ensemble methodology: Innovations in credit default prediction using lightgbm, xgboost, and localensemble,” arXiv preprint arXiv:2402.17979, 2024.
  55. Z. Hu, J. Zhang, H. Wang, S. Liu, and S. Liang, “Leveraging relational graph neural network for transductive model ensemble,” in Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2023, pp. 775–787.
  56. R. Delgado, “A semi-hard voting combiner scheme to ensemble multi-class probabilistic classifiers,” Applied Intelligence, vol. 52, no. 4, pp. 3653–3677, 2022.
  57. W. Weimin, L. Yufeng, Y. Xu, X. Mingxuan, and G. Min, “Enhancing liver segmentation: A deep learning approach with eas feature extraction and multi-scale fusion,” International Journal of Innovative Research in Computer Science & Technology, vol. 12, no. 1, pp. 26–34, 2024.
  58. D. Li, Y. Wang, K. Funakoshi, and M. Okumura, “Joyful: Joint modality fusion and graph contrastive learning for multimodal emotion recognition,” arXiv preprint arXiv:2311.11009, 2023.
  59. Y. Wang, D. Li, K. Funakoshi, and M. Okumura, “Emp: emotion-guided multi-modal fusion and contrastive learning for personality traits recognition,” in Proceedings of the 2023 ACM International Conference on Multimedia Retrieval, 2023, pp. 243–252.
  60. Z. Hu, J. Zhang, Y. Yu, Y. Zhuang, and H. Xiong, “How many validation labels do you need? exploring the design space of label-efficient model ranking,” arXiv preprint arXiv:2312.01619, 2023.
  61. J. H. Friedman, “Greedy function approximation: a gradient boosting machine,” Annals of statistics, pp. 1189–1232, 2001.
  62. H. Abburi, M. Suesserman, N. Pudota, B. Veeramani, E. Bowen, and S. Bhattacharya, “Generative ai text classification using ensemble llm approaches,” arXiv preprint arXiv:2309.07755, 2023.
  63. V. Sanh, L. Debut, J. Chaumond, and T. Wolf, “Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter,” arXiv preprint arXiv:1910.01108, 2019.
  64. P. He, J. Gao, and W. Chen, “Debertav3: Improving deberta using electra-style pre-training with gradient-disentangled embedding sharing,” arXiv preprint arXiv:2111.09543, 2021.
  65. J. Lee-Thorp, J. Ainslie, I. Eckstein, and S. Ontanon, “Fnet: Mixing tokens with fourier transforms,” arXiv preprint arXiv:2105.03824, 2021.
  66. Z. Lan, M. Chen, S. Goodman, K. Gimpel, P. Sharma, and R. Soricut, “Albert: A lite bert for self-supervised learning of language representations,” arXiv preprint arXiv:1909.11942, 2019.
  67. A. Conneau, K. Khandelwal, N. Goyal, V. Chaudhary, G. Wenzek, F. Guzmán, E. Grave, M. Ott, L. Zettlemoyer, and V. Stoyanov, “Unsupervised cross-lingual representation learning at scale,” arXiv preprint arXiv:1911.02116, 2019.
  68. F. Sebastiani, “Machine learning in automated text categorization,” ACM computing surveys (CSUR), vol. 34, no. 1, pp. 1–47, 2002.
Citations (27)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets