The major algorithms for quantifying transcriptomics data for differential gene expression analysis were designed for analyzing data from human or human-like genomes, specifically those with single gene transcripts and distinct transcriptional boundaries that extend beyond the coding sequence (CDS) as identified through expressed sequence tags (ESTs) or EST-like sequence data. Some eukaryotic genomes and all, or nearly all, bacterial genomes require alternate methods of quantification since they lack annotation of transcriptional boundaries with EST or EST-like data, have overlapping transcriptional boundaries, and/or have polycistronic transcripts.
Researchers at the University of Maryland School of Medicine have developed and tested an algorithm was that better quantifies transcriptomics data for differential gene expression analysis in organisms with overlapping transcriptional units and polycistronic transcripts. Using data from standard libraries originating from Escherichia coli and Ehrlichia chaffeensis, and strand-specific libraries from the Wolbachia endosymbiont wBm, FADU can derive counts for genes that are missed by HTSeq and featureCounts. Using the default parameters with the E. coli data, FADU can detect transcription of 51 more genes than HTSeq in union mode and 21 genes more than featureCounts, with 42 and 18 of these features being ≤ 300 bp, respectively. Due to its ability to derive counts for otherwise unrepresented genes without overstating their abundance, the developers believe FADU to be an improved tool for quantifying transcripts in prokaryotic systems for RNA-Seq analyses.
Clustering patterns of the different count values in wBm derived with HTSeq modules, feature Counts modes, and FADU
(A) An unrooted dendrogram with 1000 bootstraps was generated using the log2 count values from wBm calculated using HTSeq, feature Counts, and FADU. The dendrogram reveals three distinct clusters of (1) feature Counts default , HTSeq union, and HTSeq intersection-non empty; (2) HTSeq intersection strict; and (3) FADU, feature Counts overlap, and feature Counts fraction al-overlap. (B) The log2 count values for all wBm genes with count values derived from at l east on e of the tools was used to generate a heatmap . The wBm genes are displayed on the horizontal axis while each of the tools are displayed on the vertical axis. All cells in grey describe genes with no count value in its corresponding tool. Boots trap values for both the unrooted and squared dendrograms are located next to their corresponding nodes. (C) A principal component analysis for all wBm count values derived from each of the tools was done. Each color corresponds to either FADU, HTSeq, or feature Counts, while each shape represents the specific mode of the tool used.
Availability: FADU is available at https://github.com/adkinsrs/FADU. FADU was implemented using Python3 and requires the PySAM module (version 0.12.0.1 or later).