Bioinformatics for Novel Long Intergenic Noncoding RNA (lincRNA) Identification

rna-seq

Long intergenic noncoding RNAs (lincRNAs) , which are larger than 200 nucleotides and transcribed from the intergenic regions of protein coding genes, have been shown by accumulating findings to be widely expressed and extensively functional in many cellular processes. Nevertheless, given their cell/tissue-specificity, there is a need of identifying novel lincRNAs in a given system. To fulfill this purpose, researchers from the Chinese University of Hong Kong recently described the bioinformatics workflow for detecting novel lincRNAs using a RNA-seq dataset. As shown at right, aligning reads, reconstructing transcriptome, and filtering are three main steps. After preparing RNAs and performing RNA-seq, sequence information of single-end or paired-end reads are obtained. Reads are then mapped or aligned back to a reference genome to identify locations where the sequences originate from. Fulfilling this purpose, multiple programs have been designed to map reads across splice junctions in RNA-seq data, such as TopHat, GSNAP, and STAR. For transcriptome reconstruction, Cufflinks and Scripture are commonly used for ab initio transcript assembly based on the aligned reads. In the final step, several filters need to be applied to discriminate real lincRNA transcripts from assembly artifacts, including filters of transcript length, expression level and coding potential, etc. To this end, an integrative bioinformatics pipeline, sebnif (self-estimation-based novel lincRNA filtering pipeline) is implemented to identify bona fide novel lincRNAs with high quality. Furthermore, sebnif utilities enable the annotation of high-confidence lincRNAs using additional datasets such as ChIP-seq of histone modifications and CAGE (Cap analysis gene expression ) tags when available.

Peng X, Sun K, Zhou J, Sun H, Wang H. (2017) Bioinformatics for Novel Long Intergenic Noncoding RNA (lincRNA) Identification in Skeletal Muscle Cells. Methods Mol Biol 1556:355-362. [abstract]

Leave a Reply

Your email address will not be published. Required fields are marked *

*

Time limit is exhausted. Please reload CAPTCHA.