Formalin-fixed, paraffin-embedded (FFPE) tissues have many advantages for identification of risk biomarkers, including wide availability and potential for extended follow-up endpoints. However, RNA derived from archival FFPE samples has limited quality. Here we identified parameters that determine which FFPE samples have the potential for successful RNA extraction, library preparation, and generation of usable RNAseq data.
Mayo Clinic researchers have optimized library preparation protocols designed for use with FFPE samples using seven FFPE and Fresh Frozen replicate pairs, and tested optimized protocols using a study set of 130 FFPE biopsies from women with benign breast disease. Metrics from RNA extraction and preparation procedures were collected and compared with bioinformatics sequencing summary statistics. Finally, a decision tree model was built to learn the relationship between pre-sequencing lab metrics and qc pass/fail status as determined by bioinformatics metrics.
Samples that failed bioinformatics qc tended to have low median sample-wise correlation within the cohort (Spearman correlation < 0.75), low number of reads mapped to gene regions (< 25 million), or low number of detectable genes (11,400 # of detected genes with TPM > 4). The median RNA concentration and pre-capture library Qubit values for qc failed samples were 18.9 ng/ul and 2.08 ng/ul respectively, which were significantly lower than those of qc pass samples (40.8 ng/ul and 5.82 ng/ul). The researchers built a decision tree model based on input RNA concentration, input library qubit values, and achieved an F score of 0.848 in predicting QC status (pass/fail) of FFPE samples.
Flow-chart of library optimization and bioinformatics evaluation
a A pilot study consisting of FFPE and fresh frozen pairs for 7 BBD patients were submitted for sequencing to evaluate two protocols of library preparation for RNA-seq, Ribo-depletion and RNA exome capture. Several bioinformatics metrics were evaluated for the two protocols. Whole exome sequencing (WES) data was used to estimate SNP confirmation rate, and the RNA exome capture showed superior performance in all categories and was selected as the library preparation protocol to process all samples. b 130 study samples (ER+ estrogen receptor positive, ER− estrogen receptor negative, Cont control) along with 17 technical replicates and 11 study replicates were submitted for library preparation using the RNA exome capture protocol. 40 samples failed library preparation step with insufficient RNA. All remaining samples were submitted for sequencing in 10 batches. Rigorous bioinformatics evaluation was performed to identify qc failed samples based on defined bioinformatics metrics. The final dataset comprised 62 study samples
The researchers provide a bioinformatics quality control recommendation for FFPE samples from breast tissue by evaluating bioinformatic and sample metrics. Thier results suggest a minimum concentration of 25 ng/ul FFPE-extracted RNA for library preparation and 1.7 ng/ul pre-capture library output to achieve adequate RNA-seq data for downstream bioinformatics analysis.