With the development of deep sequencing technologies, many RNA-Seq data have been generated. Researchers have proposed many methods based on the sparse theory to identify the differentially expressed genes from these data.
In order to improve the performance of sparse principal component analysis, in this paper, researchers from Qufu Normal University propose a novel class-information-based sparse component analysis (CISCA) method which introduces the class information via a total scatter matrix.
- First, CISCA normalizes the RNA-Seq data by using a Poisson model to obtain their differential sections.
- Second, the total scatter matrix is gotten by combining the between-class and within-class scatter matrices.
- Third, they decompose the total scatter matrix by using singular value decomposition and construct a new data matrix by using singular values and left singular vectors.
- Then, aiming at obtaining sparse components, CISCA decomposes the constructed data matrix by solving an optimization problem with sparse constraints on loading vectors.
- Finally, the differentially expressed genes are identified by using the sparse loading vectors.
The graphical depiction of CISCA of the matrix F, with factor scores Q and PCs Z. is the row vector of PCs Z, which transforms the data vector into factor scores . Correspondingly, is the column vector of PCs Z, which transforms the data vector into factor scores .