Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Content Modeling Using Latent Permutations (1401.3488v1)

Published 15 Jan 2014 in cs.IR, cs.CL, and cs.LG

Abstract: We present a novel Bayesian topic model for learning discourse-level document structure. Our model leverages insights from discourse theory to constrain latent topic assignments in a way that reflects the underlying organization of document topics. We propose a global model in which both topic selection and ordering are biased to be similar across a collection of related documents. We show that this space of orderings can be effectively represented using a distribution over permutations called the Generalized Mallows Model. We apply our method to three complementary discourse-level tasks: cross-document alignment, document segmentation, and information ordering. Our experiments show that incorporating our permutation-based model in these applications yields substantial improvements in performance over previously proposed methods.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Harr Chen (3 papers)
  2. S. R. K. Branavan (4 papers)
  3. Regina Barzilay (106 papers)
  4. David R. Karger (11 papers)
Citations (51)

Summary

We haven't generated a summary for this paper yet.