This is a recording of our Recipe Webinar, “Find differentially expressed genes in RNA-Seq data”. The video recording has been transcribed, and can be accessed by enabling subtitles/closed captioning on the video. We received several questions during our Question & Answer session. We’ve provided the transcribed questions and answers here for ease of use.
GENERAL GENOMESPACE QUESTIONS:
Q: How much space does a user get when they join GenePattern? Is it possible obtain more space on GenePattern? Also is possible to FTP larger than 2 GB files into your user account?
A: GenomeSpace offers 30GB of space per user, for free. You can also connect your private storage (Dropbox, Google Drive or Amazon) to GenomeSpace. All of this space in your account is also available to GenePattern, when GenePattern is connected to GenomeSpace. Files larger than 2GB can be transferred as well, although the larger the file is, the longer the transfer takes.
Q: Is it possible share for collaborative projects?
A: Yes! GenomeSpace allows users to share data publicly with all users, or privately with a single user or group of users. To learn more about how to share data, please read the documentation here: http://genomespace.org/support/guides…
Q: Can we attach files from CyVerse to GenomeSpace?
A: You can upload files from CyVerse to your GenomeSpace account if they support unrestricted data fetching (i.e. no login links required). You can also upload data using the URL uploader. However, CyVerse is not currently connected to GenomeSpace.
Q: Can you create a customized reference genome GTF file starting with FASTA or FASTQ file chromosome position?
A: It is not common to create a custom GTF file from a FASTA/FASTQ file. The GTF file has more annotation than mere sequences, therefore the set of data that are found in a FASTA/FASTQ file are not sufficient to build a complete custom GTF file. One thing to check is to make sure that the UCSC Table Browser (or another service) to see if there is an existing GTF file that fits your needs.
Q: If you have biological replicates, can you just upload multiple files?
A: There is an option to “batch” GenePattern jobs together. This can speed up sending multiple replicates through the same aligner. More information about batch processing can be found here: http://www.broadinstitute.org/cancer/…
Q: How can you transform FPKM to fold change? If I am comparing two treatments (e.g. wild type and mutant), what can I do to compare the two phenotypes if one phenotype has a value but the other has no value (e.g. 0)?
A: To calculate fold change between two phenotype classes, if one of your classes has 0 expression, it is common practice to threshold as part of preprocessing to remove lowly-expressed reads. You can add a very small value to all of your FPKM values across your dataset to avoid dividing by zero when calculating fold change. Cuffdiff will also provide a log2fold change values. Depending on which version of CuffDiff/Tuxedo you are using, if one value is 0, the tool will provide an infinite or double computer value of negative or positive 1.7e-308 (its a “double” computer value).
Q: Are alternative aligners (other than TopHat) available on GenomeSpace? Are aligners like Star and GSNAP available on GenePattern?
A: Not yet, but the GenePattern team is looking into adding new aligners such as STAR, RSEM, GEM, etc. Galaxy also has the STAR aligner available in the Galaxy toolshed.
Q: Can you generate lists of differentially expressed genes? Can you generate p-values/FDR figures?
A: There are various other tools that you do these analyses in GenePattern, such as ComparativeMarkerSelection. There are also several recipes available for gene expression analysis available on GenomeSpace.
Q: Does this recipe apply to single-cell RNA-seq data or only bulk RNA-seq data?
A: We do not recommend applying this recpie to single-cell RNA-Seq data at the moment, since the current technologies are still in development. However we are planning to create some single-cell analysis recipes in the near future.
Q: What about in the case of species which do not have a reference genome?
A: In this case there are two possibilities: (1) use the genome of a reference species which is very similar to your species. This would be a good way to determine in which way your species is different from other species which have already been extensively sequenced. (2) Complete de novo genome or transcriptome assembly and construct a reference genome. For this common tools that are used are tools such as the Trinity suite.