A Class-Information-Based Sparse Component Analysis Method to Identify Differentially Expressed Genes on RNA-Seq Data

With the development of deep sequencing technologies, many RNA-Seq data have been generated. Researchers have proposed many methods based on the sparse theory to identify the differentially expressed genes from these data.

In order to improve the performance of sparse principal component analysis, in this paper, researchers from Qufu Normal University propose a novel class-information-based sparse component analysis (CISCA) method which introduces the class information via a total scatter matrix.

  • First, CISCA normalizes the RNA-Seq data by using a Poisson model to obtain their differential sections.
  • Second, the total scatter matrix is gotten by combining the between-class and within-class scatter matrices.
  • Third, they decompose the total scatter matrix by using singular value decomposition and construct a new data matrix by using singular values and left singular vectors.
  • Then, aiming at obtaining sparse components, CISCA decomposes the constructed data matrix by solving an optimization problem with sparse constraints on loading vectors.
  • Finally, the differentially expressed genes are identified by using the sparse loading vectors.

rna-seq

The graphical depiction of CISCA of the matrix F, with factor scores Q and PCs Z. Formula is the row vector of PCs Z, which transforms the data vector Formula into factor scores Formula. Correspondingly, Formula is the column vector of PCs Z, which transforms the data vector Formula into factor scores Formula.

Liu JX, Xu Y, Gao YL, Zheng CH, Wang D, Zhu Q. (2016) A Class-Information-Based Sparse Component Analysis Method to Identify Differentially Expressed Genes on RNA-Seq Data. IEEE/ACM Trans Comput Biol Bioinform 13(2):392-8. [abstract]

Leave a Reply

Your email address will not be published. Required fields are marked *

*

Time limit is exhausted. Please reload CAPTCHA.