Papers
Topics
Authors
Recent
2000 character limit reached

Spectral Clustering with Likelihood Refinement is Optimal for Latent Class Recovery (2506.07167v1)

Published 8 Jun 2025 in stat.ME

Abstract: Latent class models are widely used for identifying unobserved subgroups from multivariate categorical data in social sciences, with binary data as a particularly popular example. However, accurately recovering individual latent class memberships and determining the number of classes remains challenging, especially when handling large-scale datasets with many items. This paper proposes a novel two-stage algorithm for latent class models with high-dimensional binary responses. Our method first initializes latent class assignments by an easy-to-implement spectral clustering algorithm, and then refines these assignments with a one-step likelihood-based update. This approach combines the computational efficiency of spectral clustering with the improved statistical accuracy of likelihood-based estimation. We establish theoretical guarantees showing that this method leads to optimal latent class recovery and exact clustering of subjects under mild conditions. Additionally, we propose a simple consistent estimator for the number of latent classes. Extensive experiments on both simulated data and real data validate our theoretical results and demonstrate our method's superior performance over alternative methods.

Summary

We haven't generated a summary for this paper yet.

Whiteboard

Paper to Video (Beta)

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Authors (2)

Collections

Sign up for free to add this paper to one or more collections.