Deconvolution is a mathematical process of resolving an observed function into its constituent elements. In the field of biomedical research, deconvolution analysis is applied to obtain single cell-type or tissue specific signatures from a mixed signal and most of them follow the linearity assumption. Although recent development of next generation sequencing technology suggests RNA-seq as a fast and accurate method for obtaining transcriptomic profiles, few studies have been conducted to investigate best RNA-seq quantification methods that yield the optimum linear space for deconvolution analysis.
Using a benchmark RNA-seq dataset, Baylor College of Medicine researchers investigated the linearity of abundance estimated from seven most popular RNA-seq quantification methods both at the gene and isoform levels. Linearity is evaluated through parameter estimation, concordance analysis and residual analysis based on a multiple linear regression model. Results show that count data gives poor parameter estimations, large intercepts and high inter-sample variability; while TPM value from Kallisto and Salmon shows high linearity in all analyses.
Concordant analysis between rank of quantifications of 0.75×A¯+0.25×B¯(Constructed Value) and C¯ (Measured Value) at gene level (a) and isoform level (b). Rankes were normalized by the number of quantifications in each plot
Salmon and Kallisto TPM data gives the best fit to the linear model studied. This suggests that TPM values estimated from Salmon and Kallisto are the ideal RNA-seq measurements for deconvolution studies.