Targeted RNA sequencing (CaptureSeq) uses oligonucleotide probes to capture RNAs for sequencing, providing enriched read coverage, accurate measurement of gene expression, and quantitative expression data.
Researchers from the European Bioinformatics Institute applied CaptureSeq to refine transcript annotations in the current murine GRCm38 assembly. More than 23,000 regions corresponding to putative or annotated long noncoding RNAs (lncRNAs) and 154,281 known splicing junction sites were selected for targeted sequencing across five mouse tissues and three brain subregions. The results illustrate that the mouse transcriptome is considerably more complex than previously thought. The researchers assembled more complete transcript isoforms than GENCODE, expand transcript boundaries, and connect interspersed islands of mapped reads. They describe a novel filtering pipeline that identifies previously unannotated but high-quality transcript isoforms. In this set, 911 GENCODE neighboring genes are condensed into 400 expanded gene models. Additionally, 594 GENCODE lncRNAs acquire an open reading frame (ORF) when their structure is extended with CaptureSeq. Finally, they validate their observations using current FANTOM and Mouse ENCODE resources.
Filtering pipeline flowchart
The input is the comprehensive annotation returned by Cuffmerge. Then, we apply the series of 11 filters described in the text. The output is the high quality set (HQ).