BALDR – a computational pipeline for paired heavy and light chain immunoglobulin reconstruction in single-cell RNA-seq data

B cells play a critical role in the immune response by producing antibodies, which display remarkable diversity. Here researchers from the Yerkes National Primate Research Centerdescribe a bioinformatic pipeline, BALDR (BCR Assignment of Lineage using De novo Reconstruction) that accurately reconstructs the paired heavy and light chain immunoglobulin gene sequences from Illumina single-cell RNA-seq data. BALDR was accurate for clonotype identification in human and rhesus macaque influenza vaccine and simian immunodeficiency virus vaccine induced vaccine-induced plasmablasts and naïve and antigen-specific memory B cells. BALDR enables matching of clonotype identity with single-cell transcriptional information in B cell lineages and will have broad application in the fields of vaccines, human immunodeficiency virus broadly neutralizing antibody development, and cancer.

Pipeline for immunoglobulin gene reconstruction in human samples


The pipeline used for IgH and IgL gene reconstruction using either all sequencing reads (Unfiltered) or bioinformatically filtered reads (IG_mapped, IG_mapped+Unmapped, Recombinome_mapped, and IMGT_mapped) from sc-RNA-seq data. Details for each filter are described in Methods and in the text. In the initial step, adapter sequences are trimmed from the fastq files using Trimmomatic. Reads are then filtered to enrich those containing partial sequences from the IgH or IgL variable region and constant regions, and to exclude reads mapping to conventional protein coding genes. Filtered (or total) reads are then assembled using the Trinity algorithm without normalization. The assembled transcript models are annotated using IgBLAST. The reads used for assembly are mapped to the assembled transcript models using bowtie2. The models are ranked according to the number of reads mapped. Transcript models that are not productive or have a V(D)J and CDR nucleotide sequence that is the same as a higher ranked model are filtered out. The top model from the remaining set is selected as the putative heavy or light chain

Availability – BALDR is available at

Upadhyay AA, Kauffman RC, Wolabaugh AN, Cho A, Patel NB, Reiss SM, Havenar-Daughton C, Dawoud RA, Tharp GK, Sanz I, Pulendran B, Crotty S, Lee FE, Wrammert J, Bosinger SE. (2018) BALDR: a computational pipeline for paired heavy and light chain immunoglobulin reconstruction in single-cell RNA-seq data. Genome Med10(1):20. [article]

Leave a Reply

Your email address will not be published. Required fields are marked *


Time limit is exhausted. Please reload CAPTCHA.