Researchers from the the Georgia Institute of Technology compare methods for filtering RNA-seq low expression genes and investigate the effect of filtering on detection of differentially expressed genes (DEGs). Although RNA-seq technology has improved the dynamic range of gene expression quantification, low-expression genes may be indistinguishable from sampling noise. The presence of noisy, low-expression genes can decrease the sensitivity of detecting DEGs. Thus, identification and filtering of these low-expression genes may improve DEG detection sensitivity. Using the SEQC benchmark dataset, the researchers investigated the effect of different filtering methods on DEG detection sensitivity. Moreover, they investigated the effect of RNA-seq pipelines on optimal filtering thresholds.
Results indicate that the filtering threshold that maximizes the total number of DEGs closely corresponds to the threshold that maximizes DEG detection sensitivity. Transcriptome reference annotation, expression quantification method, and DEG detection method are statistically significant RNA-seq pipeline factors that affect the optimal filtering threshold.
(A) Total number of degs discovered using different pipelines. (B) Optimal filtering thresholds based on percentile of average count for different pipelines determined by maximum total number of degs and maximum tpr.