Wellcome Trust Sanger Institute scientists and their collaborators have developed a new analysis tool that was able to show, for the first time, which genes were expressed by individual cells in different genetic versions of a benign blood cancer.
Single cell RNA sequencing can define cell types by revealing differences in the proteins produced by individual cells, however analysing the data remains challenging. Reported in Nature Methods today, the new open source computer tool called Single Cell Consensus Clustering (SC3) was shown to be more accurate and robust than existing methods of analysing single-cell RNA sequence data, and is freely available for researchers to use*.
Recent advances in single-cell genomics technology has made it possible to separate individual cells from different tissues and organs, and measure the sets of RNA messages – called the transcriptome – which help give each cell its own identity. These individual transcriptomes can be used to define cell types and to understand the functions of healthy and diseased cells in the human body. This technology has enormous potential for biological research.
In order to analyse the transcriptomic data, similar cells need to be grouped together. However, it is hard to know what criteria to use to group them, and the data is often very complex. The researchers developed the SC3 computer tool to overcome these problems and validated it using several publicly available gold standard datasets.
Dr Vladimir Yu Kiselev, first author from the Sanger Institute, said “We created the new SC3 tool to analyse complex single-cell RNA-sequence data, and showed that it is more robust and accurate than existing methods at grouping cells. The SC3 tool contains added features that help interpret the biological function of the cells in that group, such as lists of marker genes for each group. We expect this will be used by many researchers around the world.”
The SC3 framework for consensus clustering of scRNA-seq data
(a) Overview of clustering with SC3. Results of the consensus step are shown for the Treutlein12 data. (b) Published datasets used to set SC3 parameters. N, number of cells; k, number of clusters originally identified by the authors; RPKM, reads per kilobase of transcript per million mapped reads; RPM, reads per million mapped reads; FPKM, fragments per kilobase of transcript per million mapped reads; TPM, transcripts per million mapped reads; UMI, unique molecular identifiers; CPM, counts per million mapped reads. (c) Eigenvector (d) values that achieve adjusted Rand index (ARI) > 0.95 on gold-standard datasets. Black vertical lines indicate the interval d = 4–7% of N, showing high accuracy in the classification. (d) 100 realizations of the SC3 clustering of the datasets in b. Dots represent individual clustering runs and bars represent the median. Red and gray correspond to clustering with and without consensus step, respectively. The solid black line corresponds to ARI = 0.8. The dashed black line separates gold- and silver-standard datasets.
The SC3 tool was then used to analyse single-cell RNA-sequence data from two patients diagnosed with myeloproliferative neoplasm (MPN) blood cancers. Pre-malignant MPN occurs when the bone marrow makes too many blood cells, and in 10 per cent of patients can lead to overt leukaemia.
Patients often have multiple versions of the cancer, called subclones, which have different mutations, and the researchers wanted to find if the expression levels of RNA correlated with the different mutations. Previous attempts to analyse the RNA datasets with other methods had failed, however SC3 was able to resolve the datasets and showed that each cancer-causing mutation led to different proteins being expressed.
Prof Tony Green, an author from the Wellcome Trust-MRC Stem Cell Institute and Cambridge University, said: “The SC3 tool was able to use patterns of gene expression to distinguish, within an individual cancer, subclones that carried different mutations. This approach will help us define the cellular heterogeneity within each cancer, an important step towards improving cancer treatment”.
Dr Martin Hemberg, lead author on the paper from the Wellcome Trust Sanger Institute, said: “It has been difficult to fully exploit single-cell RNA-sequence data due to the current lack of computational methods for analysing them. Our study shows that SC3 is an accurate and user-friendly tool, which can analyse complex datasets. We hope that this tool will help researchers gain new biological insights from transcriptome datasets in the future and provide information for diseases that affect specific cell types.”
Source – Eurekalert