Changes in the blood-based RNA transcriptome have the potential to inform biomarkers of Parkinson’s disease (PD) progression. Researchers from the Parkinson Progression Marker Initiative sequenced a discovery set of whole-blood RNA species in 4,871 longitudinally collected samples from 1,570 clinically phenotyped individuals from the Parkinson’s Progression Marker Initiative (PPMI) cohort. Samples were sequenced to an average of 100 million read pairs to create a high-quality transcriptome. Participants with PD in the PPMI had significantly altered RNA expression (>2,000 differentially expressed genes), including an early and persistent increase in neutrophil gene expression, with a concomitant decrease in lymphocyte cell counts. This was validated in a cohort from the Parkinson’s Disease Biomarkers Program (PDBP) consisting of 1,599 participants and by alterations in immune cell subtypes. This publicly available transcriptomic dataset, coupled with available detailed clinical data, provides new insights into PD biological processes impacting whole blood and new paths for developing diagnostic and prognostic PD biomarkers.
Samples and Analysis Overview
a, Workflow of same sequencing and initial data processing. Samples were sent to the biorepository at Indiana University (IU) for RNA isolation and then sent to the HAIB for library preparation and sequencing. Genome alignment and gene and transcript quantification were performed (rectangles with end bars represent data analysis). Samples were assessed for outliers using a Rosner outlier test, a sex incompatibility check and a DNA–RNA incompatibility check. Pools were used to measure intra-plate variability of gene and transcript expression. FASTQ, BAM, TPM and gene count files (rhomboids represent the data type, while a gray background indicates data sourced from another portion of the PPMI study) are all available for download from the LONI IDA. A browser (a cylinder represents an interactive data explorer) at https://ppmi-info.org allows for querying of genes and exploration of the relationship to clinical parameters. VCF, variant call format. b, Sankey representation of sequenced individuals; the number of baseline samples is indicated on the left, and horizontal bands demonstrate the number of samples connected across visits. c, PCA plot of all distinct samples, including female (black), male (blue) and pooled technical replicates: HC (red) and PD (yellow). Failed samples (gray) were removed from further analysis. d, Histogram of the number of sequencing reads used as input, with the mean shown as a vertical line. Many samples had well above the target of 100 million paired reads.
Availability – raw sequencing data (FASTQ files), alignment files (BAM files), TPM data and counts for each sample are available at the LONI IDA. (https://fairsharing.org/, IDA; LONI IDA, https://doi.org/10.25504/FAIRsharing.r4ph5f). Data are also available through the AMP PD (https://amp-pd.org/).