miRNAs are a class of non-coding RNAs mainly involved in post-transcription control of gene expression although they are also involved in other non-canonical functions. Furthermore, they represents an interesting class of biomarkers for a large variety of diseases. Small RNA-sequencing is consolidating as the election method for the quantification of miRNAs. The steps needed to analyze Small RNA-sequencing data require the use of dedicated hardware and a certain level of bioinformatics skills that are out of the standard experience of classical biologists. The recent advent of cloud computing has moderated the need for biologists to acquire dedicated hardware. An other step for the easy access to bioinformatics to biologists is provided by the recent development of open-source and commercial workbenches. BaseSpace is a commercial workbench developed by Illumina which provides a user friendly interface to bioinformatics tools. In BaseSpace framework, complex bioinformatics pipelines are wrapped in graphical interfaces, called App, that result intuitively easy to use for biologists. Within BaseSpace framework is present miRNAs analysis, a free App designed to make simple the identification of miRNAs differential expression between two experimental conditions.
- trim adapter using cutadapt,
- map trimmed reads on miRNA precursors using SHiMPS aligner ,
- counts reads associated to mature miRNAs,
- detects differential expression between two experimental conditions with DESeq2,
- detects changes in 3P/5P ratio for the same miRNA within two different conditions using Rank Product statistics.
- Save results in a folder, i.e. data, to be downloaded for further analysis. The description of the content of the data folder is shown in README file within the data folder.
The input are fastq files generated with Small RNA-sequencing protocols. Each sample is provided as single fastq gziped file. The fastq.gz name structure is the following: SampleName_SampleNumber_Lane_Read_FlowCellIndex.fastq, e.g. M518_s1_L001_R1_001.fastq.gz S518_s0_L001_R1_001.fastq.gz .
The statistical analysis requires the presence of two experimental groups: condition 1 and condition 2.
Differential expression analysis uses condition 1 as reference. Thus, differential expression will be represented as log2(condition2/condition1)
User needs to provide a list of fastq files that belong to the experimental condition 1 and a list of fastq files that belong to experimental condition 2.
It would be ideal to have a balanced set of samples in the two condition 2, e.g. 4 samples in condition 1 and 4 samples in condition 2.
The app packs the output files in data folder containing the following items:
- README: A file describing the content of the data folder
- raw.counts.txt: miRNAs unnormalized counts
- trimmimg.log: adapters trimming statistics
- length_distribution.pdf: Length distribution of the trimmed reads
- shrimp.log: mapping statistics
- rlognorm.counts.txt: miRNAs log2 normalized counts for data visualization. E.g. clustering
- libnorm.counts.txt: miRNAs library normalized counts for data visualization, e.g. clustering
- dispersion.pdf: DESeq2 dispersion estimation plot
- differential_expression_plot.pdf: DESeq2 differential expression plot
- results.txt: DESeq2 Differential expression results
- ratio_between_3p_5p.txt: RanKProd differential expression results referring to 3P/5P miRNAs
- miRNAs unnormalized counts as all.counts R object,
- miRNA differential expression by DESeq2 as res R object,
- miRNA log normalized counts as rld R object,
- miRNA Rank Product analysis of the ratio between 3P and 5P miRNAs as ratio.between.3p.5p R object
- analysis.log: log of the full analysis pipeline
- SampleName_trimmed.log: log files produced by cutadapt
- SampleName_trimmed.fastq.log: log files produced by SHRiMP
- SampleName_trimmed.fastqS.bam: sorted and indexed bam files
- SampleName_trimmed.fastqS.bam.bai: bam file indexes
- harpin.fa: miRNA precursor fasta file ftp://mirbase.org/pub/mirbase/CURRENT/hairpin.fa.gz
- mature.fa: miRNA mature fasta file ftp://mirbase.org/pub/mirbase/CURRENT/mature.fa.gz
- mature_on_precursorS.bam: sorted bam containing the mapping position of mature miRNAs on their precursors
- mature_on_precursorS.bam.bai: index to the bam file containing the mapping position of mature miRNAs on their precursors
- Each input sample needs to be provided by a unique fastq.gz file.
- The maximum number of samples that can be analyzed is 32.
- The App only supports two groups analysis.
The App derived by the workflow described by Cordero et al. Optimizing a massive parallel sequencing workflow for quantitative miRNA expression analysis. PLoS One. 2012;7(2):e31630
The developer is Raffaele Calogero, Associate Professor of Molecular Biology and Bioinformatics at University of Turin. Users are kindly requested to provide bugs report and feed backs to raffaele[dot]calogero[at]unito[dot]it
A demo data set is also available. The demo is made of 5 samples: three samples of a control cell line and two samples of the same control cell line in which a genes was knocked down. miRNAs libraries were prepared using NEB Small RNA library kit.