Quark – semi-reference-based compression of RNA-seq data

The past decade has seen an exponential increase in biological sequencing capacity, and there has been a simultaneous effort to help organize and archive some of the vast quantities of sequencing data that are being generated. While these developments are tremendous from the perspective of maximizing the scientific utility of available data, they come with heavy costs. The storage and transmission of such vast amounts of sequencing data is expensive.

Researchers from Stony Brook University present Quark, a semi reference-based compression tool designed for RNA-seq data. Quark makes use of a reference sequence when encoding reads, but produces a representation that can be decoded independently, without the need for a reference. This allows Quark to achieve markedly better compression rates than existing reference-free schemes, while still relieving the burden of assuming a specific, shared reference sequence between the encoder and decoder. The researchers demonstrate that Quark achieves state-of-the-art compression rates, and that, typically, only a small fraction of the reference sequence must be encoded along with the reads to allow reference-free decompression.

rna-seq

Quark uses the core component of RapMap which is quasi-mapping. It is used to produce a set of tuples for a paired-end read ri, where each tuple can be represented as (pli; Fli ; pri; Fri ). The table contaning the tuples for each read can be summarized to a set of equivalence classes as discussed in Section 3. The encoding function Q is explained above with two paired end reads r1 and r2. For left end of r1, there are 12 matches followed by unmatched characters. For the right end of r1, first 4 characters differ from the reference, followed by 11 exact matches, the left and right end together can be encoded as Q(r1) = {12GC,ATTG11}. The relevant intervals are subsequently stored as islands.

Availability: Quark is implemented in C++11, and is available under a GPLv3 license at www.github.com/COMBINE-lab/quark.

Sarkar H, Patro R. (2016) Quark enables semi-reference-based compression of RNA-seq data. bioRXiv [Epub ahead of print]. [abstract]

2 comments

  1. Hi, Thanks for the post !!! The software url is correct but the embedded link is wrong can you please edit it to https://github.com/COMBINE-lab/quark

Leave a Reply

Your email address will not be published. Required fields are marked *

*

Time limit is exhausted. Please reload CAPTCHA.