- The paper introduces Retell, which uses LLMs to convert detailed narrative text into high-level thematic summaries.
- It applies LDA on abstract retellings to achieve topics that better align with gold-standard literary themes.
- Retell proves resource-efficient and effective in detecting culturally significant themes, such as racial identity, in literary works.
The paper "Tell, Don't Show: Leveraging LLMs' Abstractive Retellings to Model Literary Themes" (2505.23166) introduces Retell, a novel approach for topic modeling literary texts, which typically emphasize "showing" (sensory details, actions) rather than "telling" (abstract concepts). Standard topic models like Latent Dirichlet Allocation (LDA), which rely on lexical patterns, often struggle to capture the high-level themes implicit in such narrative language.
The core idea of Retell is to use resource-efficient LLMs to generate abstractive "retellings" (summaries or descriptions) of literary passages. These retellings effectively translate the low-level details ("showing") into higher-level concepts ("telling"), making the content more amenable to traditional topic modeling methods. The authors then apply LDA on these generated retellings instead of the original text. The approach is designed to be simple and accessible, particularly for researchers in the humanities who may have limited computational resources or face high API costs associated with larger LLMs.
The paper evaluates Retell against two baselines:
- Default LDA: Applying LDA directly to the original literary passages.
- TopicGPT-lite: An adaptation of the TopicGPT framework (2405.23166) that directly prompts LLMs to generate topic labels for passages. The authors modified the original approach to make it feasible with smaller, resource-efficient LLMs (e.g., Llama 3.1 8B, Phi-3.5-mini, Gemma 2 2B, and GPT-4o mini), addressing issues like the generation of overly specific topics observed with weaker models.
Evaluation is conducted using a dataset of literary passages paired with pre-existing theme labels scraped from online sources like Goodreads, SparkNotes, and LitCharts.
- Topic Relatedness: Crowdworkers rated how well the most prominent topics predicted by each method for a set of passages aligned with their gold labels. Retell (especially with 'summarize' or 'describe' verbs) consistently produced topics judged by humans as more related to the gold labels compared to default LDA and TopicGPT-lite. The topics derived from Retell's outputs tend to be more conceptual.
- Passage-level Relevance: In a human evaluation inspired by topic intrusion tests, in-house annotators rated the relevance of predicted topics to individual passages. Both Retell and TopicGPT-lite showed that their top predicted topics were rated as more relevant than intruder topics, but Retell generally produced topics that were more consistently rated as "Very Relevant".
- Automatic Metrics: Precision and recall metrics comparing passage pairs linked by predicted topics with those sharing gold labels showed that Retell often achieves higher precision than baselines, while TopicGPT-lite can have higher recall due to assigning many passages to common, sometimes less informative, labels.
A detailed case study applies Retell to a collection of books used in US high school English Language Arts (ELA) courses to identify passages discussing racial/cultural identity. By comparing topic modeling outputs to expert-guided human annotations, the study demonstrates that Retell's topics related to race better align with human judgments of passages that explicitly mention or deeply engage with racial/cultural identity compared to default LDA and TopicGPT-lite. This suggests Retell's potential for supporting content analysis tasks like identifying passages of interest for close reading at scale. Error analysis in the case study highlights challenges like LMs sometimes incorporating knowledge beyond the immediate passage (e.g., book-level context) into their retellings, and potential failures in selecting content relevant to specific downstream tasks.
The paper concludes that Retell offers a competitive and accessible alternative for topic modeling literature with resource-efficient LLMs, effectively bridging the gap between the "showing" of narrative text and the "telling" needed for robust topic discovery. While acknowledging limitations such as the subjective nature of literary interpretation and potential biases in LLM outputs and evaluation data, the authors emphasize that Retell provides a valuable tool for cultural analytics, particularly when combined with human expert insight, especially for sensitive topics like race.