Detection of tortured phrases in scientific literature (2402.03370v1)

Published 2 Feb 2024 in cs.IR, cs.AI, cs.CL, and cs.DL

Abstract: This paper presents various automatic detection methods to extract so called tortured phrases from scientific papers. These tortured phrases, e.g. flag to clamor instead of signal to noise, are the results of paraphrasing tools used to escape plagiarism detection. We built a dataset and evaluated several strategies to flag previously undocumented tortured phrases. The proposed and tested methods are based on LLMs and either on embeddings similarities or on predictions of masked token. We found that an approach using token prediction and that propagates the scores to the chunk level gives the best results. With a recall value of .87 and a precision value of .61, it could retrieve new tortured phrases to be submitted to domain experts for validation.

References (11)

Citations (2)

View on Semantic Scholar

Summary

The paper introduces a token prediction approach that achieves 87% recall and 61% precision in identifying tortured phrases.
The authors utilize advanced language models and minimal labeling strategies to build a robust dataset distinguishing tortured from expected phrases.
This work lays the groundwork for automated screening tools that uphold academic integrity by detecting manipulated academic content.

Detection of Tortured Phrases in Scientific Literature

The paper "Detection of tortured phrases in scientific literature" addresses the growing issue of the use of content rewriting tools, such as "spinners," in academic publishing. These tools are employed by some researchers under the pressure of "publish or perish" to circumvent plagiarism detection systems. As these tools alter text by replacing keywords with synonyms, they often produce nonsensical and irrelevant substitutions, especially in technical terms, resulting in what are identified as "tortured phrases." This paper profoundly explores automated systems to detect these artificial constructs in scientific literature, highlighting several approaches and their effectiveness.

The authors focus on creating a systematic method to identify tortured phrases, which are expressions transformed from established scientific terms. An example case includes "flag to clamor" emerging from "signal to noise." The authors distinguish these from their correct equivalents, termed as "expected phrases." Through building a dataset that incorporates these tortured phrases, various automatic detection methods are evaluated, relying predominantly on LLMs and advanced machine learning techniques.

The paper examines different methodologies, particularly LLM-based methods. A token prediction approach, which calculates probabilities and ranks of terms within phrases, is particularly noteworthy. This technique has shown promise, yielding a high recall rate of 87% and a precision rate of 61%, underscoring its capability to flag novel tortured phrases for subsequent validation by subject matter experts.

The pertinence of uncovering these tortured phrases is not negligible; they serve as indicators of potentially unreliable papers. The span of the issue is significant, with noted instances of these phrases already surfacing in over 12,000 papers, including those as recent as 2023 and even some scheduled for publication in 2024.

The paper explores constructing a robust dataset to support this endeavor. Rather than relying heavily on labeled data, the authors develop methods requiring minimal labeling, utilizing logical approaches like word embeddings and similarity measures to autonomously uncover tortured phrases. They extend traditional methods that depended on cosine similarity and distance metrics, incorporating modern techniques involving token predictions. Masking methods used in conjunction with LLMs, such as SciBERT, assess the likelihood of word occurrence within phrases to differentiate tortured from expected phrases.

The results confirm that the use of LLMs significantly enhances detection accuracy. Using token prediction coupled with propagating scores at the chunk level proved particularly effective compared to previous techniques, which showed limitations when applying cosine similarity or distance metrics alone. However, the need for human involvement persists due to the system's precision rate, reiterating the necessity for subsequent expert validation.

This work poses several practical implications for academic integrity and journal editorial processes. Improving the detection of paraphrased content assists in preserving the quality and reliability of scientific publications. Furthermore, the efforts form a foundation for developing comprehensive databases of tortured phrases, facilitating more automated screening solutions in the future.

In conclusion, the paper provides substantive backing for employing advanced LLMs in detecting improperly altered academic content. While the presented solutions mark a significant stride toward automated detection of academic misconduct, ongoing refinements and expert collaborations remain pivotal. Future endeavors may involve the application of more context-aware LLMs and the establishment of richer datasets encompassing diverse scientific fields. As the landscape of AI-assisted text generation and manipulation continues to evolve, so too must the systems developed to ensure the integrity of scientific communication.

PDF Markdown

Related Papers

Tweets

https://twitter.com/MicrobiomDigest/status/1797630464066560030

https://twitter.com/gastronomy/status/1755098796017492098