The need to reduce per sample cost of RNA-seq profiling for scalable data generation has led to the emergence of highly multiplexed RNA-seq. These technologies utilize barcoding of cDNA sequences in order to combine multiple samples into a single sequencing lane to be separated during data processing. In this study, Boston University researchers report the performance of one such technique denoted as sparse full length sequencing (SFL), a ribosomal RNA depletion-based RNA sequencing approach that allows for the simultaneous sequencing of 96 samples and higher. They offer comparisons to well established single-sample techniques, including: full coverage Poly-A capture RNA-seq, microarrays, as well as another low-cost highly multiplexed technique known as 3′ digital gene expression (3’DGE). Data was generated for a set of exposure experiments on immortalized human lung epithelial (AALE) cells in a two-by-two study design, in which samples received both genetic and chemical perturbations of known oncogenes/tumor suppressors and lung carcinogens. SFL demonstrated improved performance over 3’DGE in terms of coverage, power to detect differential gene expression, and biological recapitulation of patterns of differential gene expression from in vivo lung cancer mutation signatures.
Design of cross-platform experiments and high-throughput data processing
Schematic of the number of each pair of genotypic and chemical perturbations, as well as a summary of preprocessing methods used to quantify gene-level expression for each platform. Note that “Unt.” is an abbreviation of “untreated,” denoting that the RNA-seq samples used in this experiment did not receive chemical perturbations. Numbers in each box represent biological replicates of each condition. The color scheme for each platform is consistent throughout this report.