Biterm Topic Model (BTM) Overview
- BTM is a probabilistic graphical model that extracts unordered word pairs from short texts to identify underlying topics, effectively addressing sparsity issues.
- It employs corpus-level biterm statistics and collapsed Gibbs sampling, with adaptive extensions like AOBTM and AOBST enhancing dynamic and sentiment-aware topic discovery.
- BTM is widely applied in short-text analytics, real-time monitoring, and review mining, though it may lack fine-grained document profiling in longer texts.
The Biterm Topic Model (BTM) is a probabilistic graphical model tailored for topic discovery in collections of short texts such as tweets, app reviews, or news snippets. Unlike conventional document-centric models (e.g., Latent Dirichlet Allocation, LDA), which rely on modeling within-document word co-occurrences, BTM addresses the severe sparsity in short documents by pooling all unordered word-pairs ("biterms") at the corpus level and modeling global word co-occurrence patterns directly. Subsequent extensions, including adaptive online (AOBTM) and sentiment-aware (AOBST) variants, enable dynamic and version-sensitive topic tracing, crucial in real-time analytics of evolving short-text streams.
1. Generative Process and Formal Definition
The core innovation of BTM lies in its corpus-level biterm modeling. Given a corpus of short documents with vocabulary size , all unordered word-pairs from each document are extracted to form a multiset . Each biterm is posited to be generated from a single latent topic , where is the number of topics.
The generative process is as follows:
- Draw global topic proportions .
- For each topic , draw topic-word distribution .
- For each biterm : a. Sample topic , b. Generate two words independently .
The joint probability for a biterm and its latent topic is (Jipeng et al., 2019, Hadi et al., 2020, Cui et al., 2017, Gao et al., 2020).
The global exchangeability assumption eliminates per-document parameters, leveraging cross-document biterm statistics and mitigating short-text sparsity.
2. Posterior Inference Algorithms
Collapsed Gibbs sampling is the principal inference mechanism for BTM. Let denote the number of biterms assigned to topic , the count of word under topic (aggregated over all biterms), and the vocabulary size.
For each biterm , the posterior topic assignment update is:
where denotes counts excluding the current biterm (Jipeng et al., 2019, Hadi et al., 2020). After iterations, parameters are estimated via:
Alternative inference techniques include stochastic variational inference (SCVB0), incremental and online Gibbs samplers, and stochastic divergence minimization (SDM-BTM). SDM-BTM achieves deterministic one-pass convergence by iteratively updating running statistics via -divergence minimization. Empirically, SDM yields superior predictive log-likelihood and faster convergence relative to CGS-based and SCVB0 approaches, as per experiments on millions of tweets (Cui et al., 2017).
| Inference Method | Per-biterm Update Cost | Memory Requirement |
|---|---|---|
| iBTM | ||
| oBTM | ||
| SCVB0–BTM | ||
| SDM–BTM |
As shown, SDM and SCVB0 offer linear update cost and manageable memory, favoring scalability for large vocabularies.
3. Adaptive and Online Extensions (AOBTM, AOBST)
Static BTM is ill-suited for streaming or versioned short texts, where topic distributions drift over time. Adaptive Online BTM (AOBTM) integrates prior statistics from a tunable window of past slices, using weighted aggregation of previous topic-word distributions for dynamic Dirichlet priors (Hadi et al., 2020).
At version :
- Let be the topic-word distribution in previous slice .
- Compute adaptive weights via softmax similarity.
- Set prior for topic in the current slice:
Gibbs sampling proceeds as in static BTM but uses as the hyperparameter. AOBTM supports automatic selection of (number of topics) and (window size) via parallel grid search maximizing topic coherence (PMI-score).
Adaptive Online Biterm Sentiment-Topic (AOBST) further augments BTM by modeling sentiment labels jointly with topic assignments, extending the generative process and inference accordingly (Gao et al., 2020). AOBST leverages labeled biterms and version-aware priors for tracing sentiment-topic evolution, facilitating the identification of negative issues post-app updates.
4. Empirical Performance, Metrics, and Applications
Benchmarking BTM and its adaptations across standard short-text datasets, app reviews, and tweets consistently demonstrates BTM’s superiority over document-centric models (LDA, PLSA) in sparsity-prone scenarios. STTM library experiments on six datasets report (Jipeng et al., 2019):
- Clustering Purity/NMI: BTM achieves Purity ≈ 0.849, NMI ≈ 0.875 (GoogleNews); it reliably surpasses LDA.
- Topic coherence: Top-10 word PMI scores indicate BTM approaches or matches DMM variants using embeddings.
- Classification: BTM outperforms LDA in accuracy-based tasks.
- Efficiency: Converges in ≈60 Gibbs iterations; per-iteration cost remains .
In app issue detection across 164k reviews spanning 89 versions, AOBST provides F_hybrid ≈ 80.9% and achieves 22.3% improvement over IDEA (adaptive OLDA+labeling). Adding sentiment and embedding-based topic labeling yields additional performance gains (Gao et al., 2020).
Key metrics include:
- PMI-score (coherence via external corpora)
- Jensen–Shannon divergency (topic discreteness)
- Precision_E, Recall_L, F_hybrid (emerging topic detection vs. changelog ground truth)
BTM, AOBTM, and AOBST have wide applicability: automated triaging of user complaints, real-time theme detection on social platforms, and agile requirement mining for software projects (Hadi et al., 2020, Gao et al., 2020).
5. Strengths, Limitations, and Implementation Considerations
BTM's main strength is the alleviation of short-text sparsity through global biterm co-occurrence modeling, with a robust collapsed Gibbs sampling protocol and available open-source implementation (STTM) (Jipeng et al., 2019). Adaptive variants offer online, version-sensitive topic continuity, crucial for tracking longitudinal change.
Limitations include:
- Absence of per-document topic mixtures: only corpus-level is available, which may hinder fine-grained document profiling.
- Ignorance of higher-order or sequenced context beyond word pairs; may degrade performance if documents contain richer structure.
- Biterm extraction can be quadratic in document length, though mitigated by sliding window heuristics.
- For long documents, richer within-document models may be preferable.
Empirical findings suggest BTM is most effective for corpora where texts are very short and word-pair statistics are sparse. Inference methods such as SDM-BTM further enhance scalability and convergence.
6. Contemporary Developments and Future Directions
The evolution of BTM includes variational and streaming inference algorithms, sentiment augmentation (AOBST), and adaptive prior frameworks (AOBTM), expanding its utility beyond static analysis towards dynamic data streams and multifaceted topic-sentiment modeling.
Current survey results indicate that DMM-based short-text models enhanced by embeddings marginally outperform BTM in certain clustering and coherence tasks, yet BTM remains a strong, interpretable baseline due to its conceptual simplicity and empirical reliability (Jipeng et al., 2019, Hadi et al., 2020).
A plausible implication is that further integration of contextual and semantic embedding information into biterm frameworks could yield incrementally better performance, especially under severe sparsity or nonstationarity. Additionally, automated hyperparameter selection and parallel search are streamlining deployment for real-world applications such as app monitoring and social media analytics.
Empirical observation, notably in app review mining and live event tracking, underscores BTM and its adaptive variants as foundational for the extraction of coherent, distinctive, and timely topics in continuously evolving short-text environments (Gao et al., 2020, Hadi et al., 2020).