Authorship Attribution Using a Neural Network Language Model (1602.05292v1)

Published 17 Feb 2016 in cs.CL and cs.AI

Abstract: In practice, training LLMs for individual authors is often expensive because of limited data resources. In such cases, Neural Network LLMs (NNLMs), generally outperform the traditional non-parametric N-gram models. Here we investigate the performance of a feed-forward NNLM on an authorship attribution problem, with moderate author set size and relatively limited data. We also consider how the text topics impact performance. Compared with a well-constructed N-gram baseline method with Kneser-Ney smoothing, the proposed method achieves nearly 2:5% reduction in perplexity and increases author classification accuracy by 3:43% on average, given as few as 5 test sentences. The performance is very competitive with the state of the art in terms of accuracy and demand on test data. The source code, preprocessed datasets, a detailed description of the methodology and results are available at https://github.com/zge/authorship-attribution.

Citations (39)

View on Semantic Scholar