RNA-Seq is an approach to transcriptome profiling that uses deep-sequencing technologies to detect and accurately quantify RNA molecules originating from a genome at a given moment in time. In recent years, the advent of RNA-Seq has facilitated genome-wide expression profiling, including the identification of novel and rare transcripts like noncoding RNAs and novel alternative splicing isoforms. Advances in transcriptome reconstruction technologies have made possible the identification and the characterization of thousands of novel long noncoding RNAs (lncRNAs) from short read RNA-seq data. The rapid increase of sequencing depth and read length has considerably improved the accuracy of transcript reconstruction and offers the unprecedented possibility to characterize lncRNAs on a global scale
LncRNAs are defined as transcripts of length >200 nucleotides that are characterized by a low coding potential. The choice of this length threshold is somewhat arbitrary, but it is instrumental in order to separate lncRNAs from other noncoding RNA classes, such as microRNAs (miRNAs) , short interfering RNAs (siRNAs) , Piwi-interacting RNAs (piRNAs) , small nucleolar RNAs (snoRNAs) , and other short RNAs.
- antisense lncRNAs are transcripts that span at least one exon of a nearby protein coding, and are transcribed in the opposite direction
- intronic lncRNAs originate from intronic regions, and they do not overlap any annotated exon
- bidirectional lncRNAs are transcripts that initiate in a divergent fashion from the promoter of a protein-coding gene
- intergenic lncRNAs are lncRNAs with separate transcriptional units from protein-coding genes
Here, researchers from the National Institute of Molecular Genetics, Italy describe the analytical steps required for the identification and characterization of noncoding RNAs starting from RNA-Seq raw samples, with a particular emphasis on long noncoding RNAs (lncRNAs).
Exploratory data analysis of raw RNA-seq samples performed using DESeq2
(a) Principal component analysis (PCA) performed on rlog-normalized expression data reveals a separation between samples belonging to different biological classes (“labels”). (b) Similar results are obtained by performing hierarchical clustering on rlog-normalized expression data