Biases in differential expression analysis of RNA-seq data: A matter of replicate type (1508.03719v1)
Abstract: In differential expression (DE) analysis of RNA-seq count data, it is known that genes with a larger read number are more likely to be differentially expressed. This bias has a profound effect on the subsequent Gene Ontology (GO) analysis by perturbing the ranks of gene-sets. Another known bias is that the commonly used parametric DE analysis methods (e.g., edgeR, DESeq and baySeq) tend to yield more DE genes as the sequencing depth is increased. We nevertheless show that these biases are in fact confined to data of the technical replicate type. We also show the GO or gene-set enrichment analysis methods applied to technical replicate data result in considerable number of false positives. In conclusion, the current DE and enrichment analysis methods can be confidently used for biological replicate count data, while caution should be exercised when analysing technical replicate data.