Over the past few years, with the rapid growth of deep-sequencing technology and the development of computational prediction algorithms, a large number of long non-coding RNAs (lncRNAs) have been identified in various types of human cancers. Therefore, it has become critical to determine how to properly annotate the potential function of lncRNAs from RNA-sequencing (RNA-seq) data and arrange the robust information and analysis into a useful system readily accessible by biological and clinical researchers. In order to produce a collective interpretation of lncRNA functions, it is necessary to integrate different types of data regarding the important functional diversity and regulatory role of these lncRNAs.
Researchers at the National Yang Ming Chiao Tung University utilized transcriptomic sequencing data to systematically observe and identify lncRNAs and their potential functions from 5034 The Cancer Genome Atlas RNA-seq datasets covering 24 cancers. Then, they constructed the ‘lncExplore’ database that was developed to comprehensively integrate various types of genomic annotation data for collective interpretation.
The distinctive features in our lncExplore database include:
- novel lncRNAs verified by both coding potential and translation efficiency score
- pan-cancer analysis for studying the significantly aberrant expression across 24 human cancers
- genomic annotation of lncRNAs, such as cis-regulatory information and gene ontology
- observation of the regulatory roles as enhancer RNAs and competing endogenous RNAs
- the findings of the potential lncRNA biomarkers for the user-interested cancers by integrating clinical information and disease specificity score.
Schematic representation of lncExplore database
The available known lncRNA resources were collected from five public databases (Ensembl, Human body map, NONCODE, H-InvDB and lncRNAdb). We predicted the novel transcripts from RNA-seq datasets. To eliminated false-positive novel lncRNAs, we only collected the transcripts with following characteristics: sequences longer than 200 nucleotides, sequences with low coding potential probability, sequences without potential pseudogenes and sequences with similar translation efficiency as known lncRNAs. The elimination step reduced transcripts into >20 000 ‘unique lncRNA transcripts’ in our database. In lncExplore, the information about lncRNAs includes basic genomic information, gene expression profiles across cancers, predicted molecular annotations (GO, eRNA and ceRNA) and clinical-related information (disease specificity score and survival curve).
The lncExplore database is to our knowledge the first public lncRNA annotation database providing cancer-specific lncRNA expression profiles for not only known but also novel lncRNAs, enhancer RNAs annotation and clinical analysis based on pan-cancer analysis. lncExplore provides a more complete pathway to highly efficient, novel and more comprehensive translation of laboratory discoveries into the clinical context and will assist in reinterpreting the biological regulatory function of lncRNAs in cancer research.
Availability – Database URL: http://lncexplore.bmi.nycu.edu.tw