Single-cell RNA sequencing is a powerful technology to discover new cell types and study biological processes in complex biological samples. A current challenge is to predict transcription factor (TF) regulation from single-cell RNA data.
Goethe University researchers propose a novel approach for predicting gene expression at the single-cell level using cis-regulatory motifs, as well as epigenetic features. The researchers designed a tree-guided multi-task learning framework that considers each cell as a task. Through this framework they were able to explain the single-cell gene expression values using either TF binding affinities or TF ChIP-seq data measured at specific genomic regions. TFs identified using these models could be validated by the literature.
Schematic illustration of the learning set-ups, single- (a) and multi- (b) task learning
Common input files consisting of TF data (static, dynamic, or ChIP-seq) and single-cell gene expression are provided for both learning schemes. The rows of the feature matrix, X, are the genes for which one of the feature set-ups described previously would be used. The response matrix, Y, consists of the gene expression values measured in single cells. And finally, the coefficients matrix, B, establishes a linear association between the X and Y, where the rows indicate the features and columns the cells.
The proposed method allows one to identify distinct TFs that show cell type-specific regulation. This approach is not limited to TFs but can use any type of data that can potentially be used in explaining gene expression at the single-cell level to study factors that drive differentiation or show abnormal regulation in disease.
Availability – https://github.com/SchulzLab/Triangulate.