Several methods have been developed to identify differential correlation (DC) between pairs of molecular features from -omics studies. Most DC methods have only been tested with microarrays and other platforms producing continuous and Gaussian-like data. Sequencing data is in the form of counts, often modeled with a negative binomial distribution making it difficult to apply standard correlation metrics. University of Colorado researchers have developed an R package for identifying DC called Discordant which uses mixture models for correlations between features and the Expectation Maximization (EM) algorithm for fitting parameters of the mixture model. Several correlation metrics for sequencing data are provided and tested using simulations. Other extensions in the Discordant package include additional modeling for different types of differential correlation, and faster implementation, using a subsampling routine to reduce run-time and address the assumption of independence between molecular feature pairs.
Flow of Discordant R Package
Gray boxes are functions, blue boxes outputs and green boxes optional parameters. a Input argument for createVectors(). b Optional argument for createVectors(). c Output argument for createVectors(), input for discordantRun(). d Optional arguments for discordantRun(). e Outputs for discordantRun()
With simulations and breast cancer miRNA-Seq and RNA-Seq data, the researchers find that Spearman’s correlation has the best performance among the tested correlation methods for identifying differential correlation. Application of Spearman’s correlation in the Discordant method demonstrated the most power in ROC curves and sensitivity/specificity plots, and improved ability to identify experimentally validated breast cancer miRNA. They also considered including additional types of differential correlation, which showed a slight reduction in power due to the additional parameters that need to be estimated, but more versatility in applications. Finally, subsampling within the EM algorithm considerably decreased run-time with negligible effect on performance.
A new method and R package called Discordant is presented for identifying differential correlation with sequencing data. Based on comparisons with different correlation metrics, this study suggests Spearman’s correlation is appropriate for sequencing data, but other correlation metrics are available to the user depending on the application and data type. The Discordant method can also be extended to investigate additional DC types and subsampling with the EM algorithm is now available for reduced run-time. These extensions to the R package make Discordant more robust and versatile for multiple -omics studies.
Availability – http://bioconductor.org/packages/discordant