Long noncoding RNAs (lncRNAs) are emerging as a class of important regulators participating in various biological functions and disease processes. With the widespread application of next-generation sequencing technologies, large numbers of lncRNAs have been identified, producing plenty of lncRNA annotation resources in different contexts. However, at present, we lack a comprehensive overview of these lncRNA annotation resources.
Here, researchers from Harbin Medical University reviewed 24 currently available lncRNA annotation resources referring to > 205 000 lncRNAs in over 50 tissues and cell lines. They characterized these annotation resources from different aspects, including exon structure, expression, histone modification and function. They found many distinct properties among these annotation resources. Especially, these resources showed diverse chromatin signatures, remarkable tissue and cell type dependence and functional specificity. Their results suggested the incompleteness and complementarity of current lncRNA annotations and the necessity of integration of multiple resources to comprehensively characterize lncRNAs. Finally, the researchers developed ‘LNCat’, a user-friendly database that provides a genome browser of lncRNA structures, visualization of different resources from multiple angles and download of different combinations of lncRNA annotations, and supports rapid exploration, comparison and integration of lncRNA annotation resources. Overall, this study provides a comprehensive comparison of numerous lncRNA annotations, and can facilitate understanding of lncRNAs in human disease.
General information of lncRNAs among resources
Resources were organized in a hierarchical manner, including lincRNA resource (red) and lncRNA resource (purple), ab initio (green) and de novo (blue) based on their assembly methods, utilization of chromatin signature (pink) as well as the database (brown). (A) Bar chart of the number of lncRNA genes in each annotation resource. (B). Box plots showing the length distribution of lncRNA genes in different resources. (C). Bar chart showing the average exon number of lncRNA genes among resources and PCGs from the UCSC Gene track. Here, we focused on 13 lncRNA annotation resources that provided exon structure information. (D) Fractions of lncRNAs in each classification across resources with available exon structure. LncRNAs were subclassified based on intersection with PCGs. (E) Cumulative distribution of average conservation scores of lncRNA exons in different resources and exons of PCGs. The lincRNA resources were presented in dash lines. The conservation was evaluated by phastCons scores (see Materials and Methods). (F) Fraction of lncRNAs containing at least one repetitive element (orange) and no repetitive element (grey) for each resource. Nineteen resources with available strand information were performed for repetitive element analysis.
Availability – lncRNA atlas (LNCat) is freely available at: http://biocc.hrbmu.edu.cn/LNCat/