With the development of transcriptomic technologies, we are able to quantify precise changes in gene expression profiles from astronauts and other organisms exposed to spaceflight. Members of NASA GeneLab and GeneLab-associated analysis working groups have developed a consensus pipeline for analyzing short-read RNA-sequencing data from spaceflight-associated experiments. The pipeline includes quality control, read trimming, mapping and gene quantification steps, culminating in the detection of differentially expressed genes. This data analysis pipeline is all publicly available through the GeneLab database.
“The GeneLab was set up to analyze large data sets of molecules that change in response to space flight, and continues to collect existing data from the past, as well as generates new data from ongoing flights,” Nathaniel Szewczyk, Ph.D. explained. “There is a push to look at whether we can use such data sets to better understand astronaut health and to see what lessons can be learned in similar experiments.”
“The GeneLab’s pipeline is a fantastic opportunity for scientists and anyone interested in the data to have easy access to the results and to use this information to compare experiments,” Sarah Wyatt, Ph.D. said. “When we looked at the data sets from previous experiments, we knew we needed to integrate and expand them so others could utilize this data.”
Source – Ohio University
The researchers have published the full details and rationale for the construction of this pipeline in order to promote transparency, reproducibility and reusability of pipeline data, to provide a template for data processing of future spaceflight-relevant datasets, and to encourage cross-analysis of data from other databases with the data available in GeneLab.
GeneLab RNA-seq Consensus Pipeline (RCP)
A: The three broad steps of the RCP. The RCP handles: 1) Data preprocessing to trim sequencing adapters and to provide quality control metrics; 2) Data processing to map reads to the reference genome and quantify the number of read counts per gene; and 3) Differential gene expression calculation, which will provide a list of differentially expressed genes that can be sorted by adjusted p-value and log fold-change. B: The full RCP annotated with tools, input files, and output files.