Stylometry Analysis of Multi-authored Documents for Authorship and Author Style Change Detection (2401.06752v1)
Abstract: In recent years, the increasing use of Artificial Intelligence based text generation tools has posed new challenges in document provenance, authentication, and authorship detection. However, advancements in stylometry have provided opportunities for automatic authorship and author change detection in multi-authored documents using style analysis techniques. Style analysis can serve as a primary step toward document provenance and authentication through authorship detection. This paper investigates three key tasks of style analysis: (i) classification of single and multi-authored documents, (ii) single change detection, which involves identifying the point where the author switches, and (iii) multiple author-switching detection in multi-authored documents. We formulate all three tasks as classification problems and propose a merit-based fusion framework that integrates several state-of-the-art NLP algorithms and weight optimization techniques. We also explore the potential of special characters, which are typically removed during pre-processing in NLP applications, on the performance of the proposed methods for these tasks by conducting extensive experiments on both cleaned and raw datasets. Experimental results demonstrate significant improvements over existing solutions for all three tasks on a benchmark dataset.
- JMIR Formative Research 6(5), e36,238 (2022)
- IEEE Transactions on Technology and Society (2022)
- Computer Science Review 43, 100,452 (2022)
- Journal of the American Society for Information Science and Technology 64(11), 2256–2264 (2013)
- CLEF (2022)
- In: European Conference on Information Retrieval, pp. 331–338. Springer (2022)
- In: CLEF (Working Notes), pp. 1899–1909 (2021)
- arXiv preprint arXiv:1810.04805 (2018)
- Eke, D.O.: Chatgpt and the rise of generative ai: threat to academic integrity? Journal of Responsible Technology 13, 100,060 (2023)
- In: The new writing environment, pp. 147–168. Springer (1996)
- arXiv preprint arXiv:1503.02531 (2015)
- In: Proceedings of ICNN’95-international conference on neural networks, vol. 4, pp. 1942–1948. IEEE (1995)
- In: Working Notes Papers of the CLEF 2018 Evaluation Labs. Avignon, France, September 10-14, 2018/Cappellato, Linda [edit.]; et al., pp. 1–25 (2018)
- In: 2019 25th Conference of Open Innovations Association (FRUCT), pp. 184–195. IEEE (2019)
- arXiv preprint arXiv:1909.11942 (2019)
- arXiv preprint arXiv:1907.11692 (2019)
- Nath, S.: Style change detection using siamese neural networks. In: CLEF (Working Notes), pp. 2073–2082 (2021)
- ACM Computing Surveys (CSuR) 50(6), 1–36 (2017)
- arXiv preprint arXiv:2305.13661 (2023)
- In: GLOBECOM 2020-2020 IEEE Global Communications Conference, pp. 1–6. IEEE (2020)
- Multimedia Tools and Applications 78(22), 31,267–31,302 (2019)
- arXiv preprint arXiv:1910.01108 (2019)
- In: CLEF (Working Notes), pp. 2137–2145 (2021)
- Stamatatos, E.: A survey of modern authorship attribution methods. Journal of the American Society for information Science and Technology 60(3), 538–556 (2009)
- Strøm, E.: Multi-label style change detection by solving a binary classification problem. In: CLEF (Working Notes), pp. 2146–2157 (2021)
- Swarm and Evolutionary Computation 49, 114–123 (2019)
- In: 2023 1st International Conference on Advanced Innovations in Smart Cities (ICAISC), pp. 1–6. IEEE (2023)
- DOI 10.5281/zenodo.4589145. URL https://doi.org/10.5281/zenodo.4589145
- Zhang, Z.: Style change detection based on writing style similarity