2000 character limit reached
From Genes to Tokens: a GWAS-inspired Approach for Interpretable Stylometric Analysis
Published 8 Jun 2026 in cs.CL | (2606.09543v1)
Abstract: This short paper introduces a stylometric interpretation method inspired by genome-wide association studies (GWAS). Each "gene" token's association with "phenotype" authorship is tested using logistic regression with multiple-comparison correction. Applied to English, German, and Russian corpora, the method detects statistically significant lexical markers distinctive of individual authors.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.