BlockClust: efficient clustering and classification of non-coding RNAs from short read RNA-seq profiles

Non-coding RNAs (ncRNAs) play a vital role in many cellular processes such as RNA splicing, translation, gene regulation. However the vast majority of ncRNAs still have no functional annotation. One prominent approach for putative function assignment is clustering of transcripts according to sequence and secondary structure. However sequence information is changed by post-transcriptional modifications, and secondary structure is only a proxy for the true 3D conformation of the RNA polymer. A different type of information that does not suffer from these issues and that can be used for the detection of RNA classes, is the pattern of processing and its traces in small RNA-seq reads data.

Here researchers from the University of Freiburg introduce BlockClust, an efficient approach to detect transcripts with similar processing patterns. They propose a novel way to encode expression profiles in compact discrete structures, which can then be processed using fast graph-kernel techniques. They perform both unsupervised clustering and develop family specific discriminative models; finally the researchers show how the proposed approach is scalable, accurate and robust across different organisms, tissues and cell lines.

rna-seq

Availability: The whole BlockClust galaxy workflow including all tool dependencies is available at http://toolshed.g2.bx.psu.edu/view/rnateam/blockclust_workflow

Contact: [email protected]; [email protected]

Videm P, Rose D, Costa F, Backofen R. (2013) BlockClust: efficient clustering and classification of non-coding RNAs from short read RNA-seq profiles. Bioinformatics 30(12), i274-i282. [article]