RNA sequencing experiments generate large amounts of information about expression levels of genes. Although they are mainly used for quantifying expression levels, they contain much more biologically important information such as copy number variants (CNVs). Researchers from the University of Texas Health Science Center at Houston have developed CaSpER, a signal processing approach for identification, visualization, and integrative analysis of focal and large-scale CNV events in multiscale resolution using either bulk or single-cell RNA sequencing data. CaSpER integrates the multiscale smoothing of expression signal and allelic shift signals for CNV calling. The allelic shift signal measures the loss-of-heterozygosity (LOH) which is valuable for CNV identification. CaSpER employs an efficient methodology for the generation of a genome-wide B-allele frequency (BAF) signal profile from the reads and utilizes it for correction of CNVs calls. CaSpER increases the utility of RNA-sequencing datasets and complements other tools for complete characterization and visualization of the genomic and transcriptomic landscape of single cell and bulk RNA sequencing data.
The CaSpER algorithm uses expression values and B-allele frequencies (BAF)
from RNA-seq reads to estimate CNV events
A normalized gene expression matrix is generated (Step 1). Expression signal is smoothed by applying recursive iterative median filtering. Three-scale resolution of the expression signal is computed. (Step 2). For the smoothed signal at each scale, HMM is used to assign CNV states to regions and segment the signal into regions of similar copy number states (Step 3). Five CNV states are used in HMM model; 1: homozygous deletion, 2: heterozygous deletion, 3: neutral, 4: one-copy amplification, 5: multi-copy amplification. BAF information incorporated into the segmented CNV events. BAF information is extracted from mapped RNA-seq reads using an optimized BAF generation algorithm (Step 4). BAF signal is smoothed by applying recursive iterative median filtering. Three-scale resolution of the allele-based frequency signal is computed (Step 5). BAF shift threshold is estimated using a Gaussian mixture (Step 6). CNV events are corrected using BAF shifts and final CNV correction is applied to all the CNV and BAF scale pair combinations (Step 7).
Availability – https://github.com/akdess/CaSpER