Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

PetKaz at SemEval-2024 Task 8: Can Linguistics Capture the Specifics of LLM-generated Text? (2404.05483v1)

Published 8 Apr 2024 in cs.CL and cs.AI

Abstract: In this paper, we present our submission to the SemEval-2024 Task 8 "Multigenerator, Multidomain, and Multilingual Black-Box Machine-Generated Text Detection", focusing on the detection of machine-generated texts (MGTs) in English. Specifically, our approach relies on combining embeddings from the RoBERTa-base with diversity features and uses a resampled training set. We score 12th from 124 in the ranking for Subtask A (monolingual track), and our results show that our approach is generalizable across unseen models and domains, achieving an accuracy of 0.91.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Kseniia Petukhova (5 papers)
  2. Roman Kazakov (3 papers)
  3. Ekaterina Kochmar (33 papers)
Citations (1)