Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
95 tokens/sec
Gemini 2.5 Pro Premium
32 tokens/sec
GPT-5 Medium
18 tokens/sec
GPT-5 High Premium
20 tokens/sec
GPT-4o
97 tokens/sec
DeepSeek R1 via Azure Premium
87 tokens/sec
GPT OSS 120B via Groq Premium
468 tokens/sec
Kimi K2 via Groq Premium
202 tokens/sec
2000 character limit reached

Hierarchical clustering of DNA k-mer counts in RNA-seq fastq files reveals batch effects (1405.0114v6)

Published 1 May 2014 in q-bio.GN

Abstract: Batch effects, artificial sources of variation due to experimental design, are a widespread phenomenon in high throughput data. Therefore, mechanisms for detection of batch effects are needed requiring comparison of multiple samples. We apply hierarchical clustering (HC) on DNA k-mer counts of multiple RNA-seq derived Fastq files. Ideally, HC generated trees reflect experimental treatment groups and thus may indicate experimental effects, but clustering of preparation groups indicates the presence of batch effects. In order to provide a simple applicable tool we implemented sequential analysis of Fastq reads with low memory usage in an R package (seqTools) available on Bioconductor. DNA k-mer counts were analysed on 61 Fastq files containing RNA-seq data from two cell types (dermal fibroblasts and Jurkat cells) sequenced on 8 different Illumina Flowcells. Results: Pairwise comparison of all Flowcells with hierarchical clustering revealed strong Flowcell based tree separation in 6 (21 %) and detectable Flowcell based clustering in 17 (60.7 %) of 28 Flowcell comparisons. In our samples, batch effects were also present in reads mapped to the human genome. Filtering reads for high quality (Phred >30) did not remove the batch effects. Conclusions: Hierarchical clustering of DNA k-mer counts provides a quality criterion and an unspecific diagnostic tool for RNA-seq experiments.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.