Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Is Automated Topic Model Evaluation Broken?: The Incoherence of Coherence (2107.02173v3)

Published 5 Jul 2021 in cs.CL and cs.LG

Abstract: Topic model evaluation, like evaluation of other unsupervised methods, can be contentious. However, the field has coalesced around automated estimates of topic coherence, which rely on the frequency of word co-occurrences in a reference corpus. Contemporary neural topic models surpass classical ones according to these metrics. At the same time, topic model evaluation suffers from a validation gap: automated coherence, developed for classical models, has not been validated using human experimentation for neural models. In addition, a meta-analysis of topic modeling literature reveals a substantial standardization gap in automated topic modeling benchmarks. To address the validation gap, we compare automated coherence with the two most widely accepted human judgment tasks: topic rating and word intrusion. To address the standardization gap, we systematically evaluate a dominant classical model and two state-of-the-art neural models on two commonly used datasets. Automated evaluations declare a winning model when corresponding human evaluations do not, calling into question the validity of fully automatic evaluations independent of human judgments.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Alexander Hoyle (13 papers)
  2. Pranav Goel (10 papers)
  3. Denis Peskov (2 papers)
  4. Andrew Hian-Cheong (1 paper)
  5. Jordan Boyd-Graber (68 papers)
  6. Philip Resnik (20 papers)
Citations (108)

Summary

We haven't generated a summary for this paper yet.