Papers
Topics
Authors
Recent
2000 character limit reached

Sparse Bayesian Partially Identified Models for Sequence Count Data (2512.12040v1)

Published 12 Dec 2025 in stat.ME

Abstract: In genomics, differential abundance and expression analyses are complicated by the compositional nature of sequence count data, which reflect only relative-not absolute-abundances or expression levels. Many existing methods attempt to address this limitation through data normalizations, but we have shown that such approaches imply strong, often biologically implausible assumptions about total microbial load or total gene expression. Even modest violations of these assumptions can inflate Type I and Type II error rates to over 70%. Sparse estimators have been proposed as an alternative, leveraging the assumption that only a small subset of taxa (or genes) change between conditions. However, we show that current sparse methods suffer from similar pathologies because they treat sparsity assumptions as fixed and ignore the uncertainty inherent in these assumptions. We introduce a sparse Bayesian Partially Identified Model (PIM) that addresses this limitation by explicitly modeling uncertainty in sparsity assumptions. Our method extends the Scale-Reliant Inference (SRI) framework to the sparse setting, providing a principled approach to differential analysis under scale uncertainty. We establish theoretical consistency of the proposed estimator and, through extensive simulations and real data analyses, demonstrate substantial reductions in both Type I and Type II errors compared to existing methods.

Summary

We haven't generated a summary for this paper yet.

Whiteboard

Video Overview

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 1 like about this paper.