Standard RNAseq methods using bulk RNA and recent single-cell RNAseq methods use DNA barcodes to identify samples and cells, and the barcoded cDNAs are pooled into a library pool before high throughput sequencing. In cases of single-cell and low-input RNAseq methods, the library is further amplified by PCR after the pooling. Preparation of hundreds or more samples for a large study often requires multiple library pools. However, sometimes correlation between expression profiles among the libraries is low and batch effect biases make integration of data between library pools difficult.
Karolinska Institutet researchers investigated 166 technical replicates in 14 RNAseq libraries made using the STRT method. The patterns of the library biases differed by genes, and uneven library yields were associated with library biases. The former bias was corrected using the NBGLM-LBC algorithm, which the researchers present in the current study. The latter bias could not be corrected directly, but could be solved by omitting libraries with particularly low yields. A simulation experiment suggested that the library bias correction using NBGLM-LBC requires a consistent sample layout. The NBGLM-LBC correction method was applied to an expression profile for a cohort study of childhood acute respiratory illness, and the library biases were resolved.
Library biases and the correction in GEWAC study
a Hierarchical clustering of GEWAC subjects using the leukocyte expression profile without the library bias correction. Upper bar below tree represents the libraries, and lower bar represents the sample types. b PCA of the expression profile without the library bias correction. c Relation between the library quantity (x-axis of the left panel), proportions of mapped reads on protein coding genes (x-axis of the right panel), and the library redundancy (y-axis). d Hierarchical clustering of GEWAC subjects using the leukocyte expression profile with the library bias correction. The upper bar below tree represents the libraries, the lower bar represents the sample types