RNA-Sequencing (RNA-Seq) provides valuable information for characterizing the molecular nature of the cells, in particular, identification of differentially expressed transcripts on a genome-wide scale. Unfortunately, cost and limited specimen availability often lead to studies with small sample sizes, and hypothesis testing on differential expression between classes with a small number of samples is generally limited. The problem is especially challenging when only one sample per each class exists. In this case, only a few methods among many that have been developed are applicable for identifying differentially expressed transcripts. Thus, the aim of this study was to develop a method able to accurately test differential expression with a limited number of samples, in particular non-replicated samples.
Researchers from Seoul National University have developed a local-pooled-error method for RNA-Seq data (LPEseq) to account for non-replicated samples in the analysis of differential expression. Their LPEseq method extends the existing LPE method, which was proposed for microarray data, to allow examination of non-replicated RNA-Seq experiments. The researchers demonstrated the validity of the LPEseq method using both real and simulated datasets. By comparing the results obtained using the LPEseq method with those obtained from other methods, they found that the LPEseq method outperformed the others for non-replicated datasets, and showed a similar performance with replicated samples; LPEseq consistently showed high true discovery rate while not increasing the rate of false positives regardless of the number of samples. This proposed LPEseq method can be effectively used to conduct differential expression analysis as a preliminary design step or for investigation of a rare specimen, for which a limited number of samples is available.
Schematic representation of the local-pooled-error method for RNA-Seq data (LPEseq) method
(A) The flow chart of the proposed algorithm. The proposed method first determines intensity bins (percentile by default) and evaluates the LPE distribution differently depending on the existence of replicates in each class: LPE per each class with replicates and LPE between classes with non-replicated experiments. For non-replicated cases, the addition step smoothens the LPE distribution by removing outliers. Detailed examples are depicted in case of replicated (B) and non-replicated (C) experiments. Blue and green colors represent different classes (i.e., X and Y). The red dotted line and orange line represent the LPE curve with and without outliers, respectively. DE transcripts are colored in red.