Cause of unusually slow mixing on the teenage behavior dataset

Ascertain the factors responsible for the substantially slower mixing and long integrated autocorrelation time observed when applying the paper’s Monte Carlo algorithms (collapsed Gibbs sampling with integrated likelihood and variable k) to the latent class analysis of the teenage problem-behavior survey dataset (6504 respondents, 6 binary variables), by identifying specific data characteristics or posterior structure features that lead to the prolonged correlation times relative to the other datasets studied.

Background

In the performance evaluation of the proposed algorithms, most datasets exhibit integrated correlation times on the order of tens of sweeps, enabling efficient sampling and consistent inference. However, the teenage problem-behavior dataset stands out with markedly longer correlation times—roughly two orders of magnitude larger than others—necessitating significantly longer runs to achieve stable results.

The authors explicitly note that they do not understand why mixing is so slow for this dataset, despite using the same model class (latent class analysis), priors, and sampling procedures. Determining the cause would help diagnose when and why the algorithms face difficulty and potentially guide algorithmic or modeling refinements.

References

It is not clear what makes mixing so much slower for this data set, but the difference has practical repercussions---as reported in Section~\ref{sec:realdata}, we find it necessary to run for considerably longer to generate consistent results.

— Fast sampling and model selection for Bayesian mixture models (2501.07668 - Newman, 13 Jan 2025) in Section 4.2 (Performance measures)

Cause of unusually slow mixing on the teenage behavior dataset

Background

References

Related Problems