Voom: variance modelling at the observation-level

In the past few years, RNA-seq has emerged as a revolutionary new technology for expression profiling. RNA-seq expression data consists of read counts, and many recent publications have argued therefore that RNA-seq data should be analysed by statistical methods designed specifically for counts. Yet all the statistical methods developed for RNA-seq counts rely on approximations of various kinds.

VoomThis article revisits the idea of applying normal-based microarray-like statistical methods to RNA-seq read counts, with the idea that it is more important to model the mean-variance relationship correctly than it is to specify the exact probabilistic distribution of the counts. Log-counts per million are used as expression values. The voom method estimates the mean-variance relationship robustly and generates a precision weight for each individual normalized observation. The normalized log-counts per million and associated precision weights are then entered into the limma analysis pipeline, or indeed into any statistical pipeline for microarray data that is precision weight aware. This opens access for RNA-seq analysts to a large body of methodology developed for microarrays, allowing RNA-seq and microarray data to be analysed in closely comparable ways. The performance of voom and related limma-based pipelines is compared to that of edgeR, DESeq, baySeq, TSPM, PoissonSeq, and DSS. Simulation studies show that voom out-performs previous RNA-seq methods even when the data is generated according to the assumptions of the earlier methods. This is especially true when the sequence depths vary between RNA samples. Several data sets are also analysed to demonstrate how voom can handle heterogeneous data and complex experiments as well as facilitating pathway analysis and gene set testing methods.

(read more…)

Incoming search terms:

  • The RNA-seq Tuxedo pipeline
  • www rna-seqblog com voom-precision-weights-unlock-linear-model-analysis-tools-for-rna-seq-read-counts

With the availability of gene expression data by RNA-seq, powerful statistical approaches for grouping similar gene expression profiles across different environments have become increasingly important. A team led by researchers at Penn State University describe and assess a computational model for clustering genes into distinct groups based on the pattern of gene expression in response to changing environment. The model capitalizes on the Poisson distribution to capture the count property of RNA-seq data. A two-stage hierarchical expectation-maximization (EM) algorithm is implemented to estimate an optimal number of groups and mean expression amounts of each group across two environments. A procedure is formulated to test whether and how a given group shows a plastic response to environmental changes. The impact of gene-environment interactions on the phenotypic plasticity of the organism can also be visualized and characterized. The model was used to analyse an RNA-seq dataset measured from two cell lines of breast cancer that respond differently to an anti-cancer drug, from which genes associated with the resistance and sensitivity of the cell lines are identified. They performed simulation studies to validate the statistical behaviour of the model. The model provides a useful tool for clustering gene expression data by RNA-seq, facilitating understanding of gene functions and networks.

rna-seq

  • Wang N, Wang Y, Hao H, Wang L, Wang Z, Wang J, Wu R. (2013) A bi-Poisson model for clustering gene expression profiles by RNA-seq. Brief Bioinform [Epub ahead of print]. [abstract]

Incoming search terms:

  • rna-seq identified a super-long intergenic transcript functioning in adipoge
  • gene expression heart vertebrate
  • illumina sequencing scriptseq
  • star alignment r rnaseq
  • www rna-seqblog com a-bi-poisson-model-for-clustering-gene-expression-profiles-by-rna-seq

The estimation of isoform abundances from RNA-Seq data requires a time-intensive step of mapping reads to either an assembled, or previously annotated transcriptome, followed by an optimization procedure for deconvolution of multi-mapping reads. These procedures are essential for downstream analysis such as differential expression. In cases where it is desirable to adjust the underlying annotation, for example upon the discovery of novel isoforms or errors in existing annotations, current pipelines must be rerun from scratch. This makes it difficult to update abundance estimates after re-annotation, or to explore the effect of changes in the transcriptome on analyses.

Researchers at UC Berkeley have developed a novel efficient algorithm for updating abundance estimates from RNA-Seq experiments upon re-annotation that does not require re-analysis of the entire dataset. Their approach is based on a fast partitioning algorithm for identifying transcripts whose abundances may depend on the added or deleted isoforms, and on a fast follow-up approach to re-estimating abundances for all transcripts. They demonstrate the effectiveness of our methods by showing how to synchronize RNA-Seq abundance estimates with the daily RefSeq incremental updates. Thus, they provide a practical approach to maintaining relevant databases of RNA-Seq derived abundance estimates even as annotations are being constantly revised.

ReXpress

Availability – ReXpress is freely available, together with source code, at http://bio.math.berkeley.edu/ReXpress/

Contact: lpachter@math.berkeley.edu

  • Roberts A, Schaeffer L, Pachter L. (2013) Updating RNA-Seq analyses after re-annotation. Bioinformatics [Epub ahead of print]. [abstract]

Incoming search terms:

  • www rna-seqblog com rexpress-for-updating-abundance-estimates-from-rna-seq-experiments-upon-re-annotation

The power of deep sequencing technology to reliably detect single RNA reads leads to a paradoxical problem of high sensitivity. In hybridization or PCR based methods for RNA quantification, the concern is low sensitivity, i.e., the problem that the signal from truly expressed genes might not be distinguishable from noise. In contrast, the problem with RNA-seq is that it is not clear whether genes with very low read counts are from low expressed genes or merely transcriptional noise. The frequency distribution for read counts does not show a clear separation in two classes of genes, which makes the decision whether a gene is to be considered expressed or not seemingly arbitrary.

Here, researchers from Yale University address this problem by suggesting a statistical model that considers the number of transcripts detected in a RNA-Seq study as a mixture of two distributions: one is a exponential distribution for transcripts from inactive genes, and a negative binomial distribution for actively transcribed genes. They apply this model to a number of RNA-Seq data sets and find that the model fits the data very well. The calculated criteria for distinguishing between expressed and non-expressed gene is remarkably consistent among data sets, suggesting genes with more than two transcripts per million transcripts (TPM) are highly likely from actively transcribed genes. The regression model correctly identifies the not actively expressed class of genes and thus, provides an operational criterion for classifying genes in expressed and non-expressed sets, facilitating the interpretation of RNA-Seq data.

  •  Wagner GP, Kin K, Lynch VJ. (2013) A model based criterion for gene expression calls using RNA-seq data. Theory Biosci [Epub ahead of print]. [abstract]

Incoming search terms:

  • www rna-seqblog com exponential-negative-binomial-model-for-gene-expression-calls-using-rna-seq-data
  • clustering rna-seq
  • rna-seq for gene expression
  • RNA-seq error have influence on gene expression
  • regulation of gene expresssion in prokaryotes
  • junctions negative binomial
  • edge-pro into deseq
  • edge-pro bacteria rna
  • dispersion matlab
  • deep sequencing rnaseq

Small RNA sequencing allows genome-wide discovery, categorization, and quantification of genes producing regulatory small RNAs. Many tools have been described for annotation and quantification of microRNA loci (MIRNAs) from small RNA-seq data. However, in many organisms and tissue types, MIRNA genes comprise only a small fraction of all small RNA-producing genes.

ShortStack is a stand-alone application that analyzes reference-aligned small RNA-seq data and performs comprehensive de novo annotation and quantification of the inferred small RNA genes. ShortStack’s output reports multiple parameters of direct relevance to small RNA gene annotation, including RNA size distributions, repetitiveness, strandedness, hairpin-association, MIRNA annotation, and phasing. In this study, ShortStack is demonstrated to perform accurate annotations and useful descriptions of diverse small RNA genes from four plants (Arabidopsis, tomato, rice, and maize) and three animals (Drosophila, mice, and humans). ShortStack efficiently processes very large small RNA-Seq data sets using modest computational resources, and its performance compares favorably to previously described tools. Annotation of MIRNA loci by ShortStack is highly specific in both plants and animals.

Availability: ShortStack is freely available under a GNU General Public License – ShortStack – Axtell Lab @ Penn State

Axtell MJ. (2013) ShortStack: Comprehensive annotation and quantification of small RNA genes. RNA [Epub ahead of print]. [abstract]

Incoming search terms:

  • accepted_hit bam
  • next gen sequencing blog ion
  • poster result is sequencing pdf
  • RNA-seq looking for small rna
  • rsem: accurate transcript quantification from rna-seq data with or without a reference genome
  • soap
  • SOAP de novo-Trans
  • www rna-seqblog com shortstack-comprehensive-de-novo-annotation-and-quantification-of-small-rna-genes

RNA-Seq experiments produce digital counts of reads that are affected by both biological and technical variation. To distinguish the systematic changes in expression between conditions from noise, the counts are frequently modeled by the Negative Binomial distribution. However, in experiments with small sample size, the per-gene estimates of the dispersion parameter are unreliable.

Researchers at the European Molecular Biology Laboratory, Germany and Purdue University have devloped a simple and effective approach for estimating the dispersions. First, they obtain the initial estimates for each gene using the method of moments. Second, the estimates are regularized, i.e. shrunk towards a common value that minimizes the average squared difference between the initial estimates and the shrinkage estimates. The approach does not require extra modeling assumptions, is easy to compute and is compatible with the exact test of differential expression.

They evaluated the proposed approach using 10 simulated and experimental datasets and compared its performance with that of currently popular packages edgeR, DESeq, baySeq, BBSeq and SAMseq. For these datasets, sSeq performed favorably for experiments with small sample size in sensitivity, specificity and computational time.

sSeq

Availability: http://www.stat.purdue.edu/∼ovitek/Software.html and Bioconductor.

Contact: ovitek@purdue.edu

  • Yu D, Huber W, Vitek O. (2013) Shrinkage estimation of dispersion in Negative Binomial models for RNA-seq experiments with small sample size. Bioinformatics [Epub ahead of print] [article]

Incoming search terms:

  • sseq
  • www rna-seqblog com sseq-shrinkage-estimation-of-dispersion-in-negative-binomial-models-for-rna-seq-experiments-with-small-sample-size

Rapidly growing public gene expression databases contain a wealth of data for building an unprecedentedly detailed picture of human biology and disease. This data comes from many diverse measurement platforms that make integrating it all difficult. In this paper, researchers from the University of Helsinki, Finland  and Stockholm University, Sweden propose a new method for processing RNA-sequencing data that yields gene expression estimates that are much more similar to corresponding estimates from microarray data, hence greatly improving cross-platform comparability. The method, called PREBS is based on estimating the expression only from microarray probe regions, and processing these estimates with microarray summarisation algorithm RMA. This allows new ways of using RNA-sequencing data, such as expression estimation for microarray probe sets. Gene signatures defined based on PREBS expression measures of RNA-sequencing data are much more accurate for retrieval of similar microarray samples from a database.

PREBS

Availability: http://www.bioconductor.org/packages/2.12/bioc/html/prebs.html

Uziela K, Honkela A.(2013) Probe region expression estimation for RNA-seq data for improved microarray comparability. arXiv:1304.1698 [q-bio.GN]. [article]

Incoming search terms:

  • compute graph with EBseq results

CCB Johns HopkinsThe expression levels of bacterial genes can be measured directly using next-generation sequencing (NGS) methods, offering much greater sensitivity and accuracy than earlier, microarray-based methods. Most bioinformatics software for estimating levels of gene expression from NGS data has been designed for eukaryotic genomes, with algorithms focusing particularly on detection of splicing patterns. These methods do not perform well on bacterial genomes.

Here, researchers at Johns Hopkins University School of Medicine describe the first software system designed explicitly for quantifying the degree of gene expression in bacteria and other prokaryotes. EDGE-pro (Estimated Degree of Gene Expression in PROkaryotes) processes the raw data from an RNA-seq experiment on a bacterial or archaeal species and produces estimates of the expression levels for each gene in these gene-dense genomes.

Availability – The EDGE-pro tool is implemented as a pipeline of C++ and Perl programs and is freely available as open-source code at http://www.genomics.jhu.edu/software/EDGE/index.shtml.

  • Magoc T, Wood D, Salzberg SL. (2013) EDGE-pro: Estimated Degree of Gene Expression in Prokaryotic Genomes. Evol Bioinform Online 9, 127-36. [article]

Incoming search terms:

  • 유전자ppt
  • endangered species list
  • rnaseq ipa go terms
  • market share for Next Generation Sequencing Data Analysis
  • theory gene expression development :htm
  • post-gwas
  • next-generation rna-sequencing
  • phd position molecular simulation rna seq 2013
  • gs junior chip sequencing data
  • next gen sequencing blog post

RNA-Seq technology measures the transcript abundance by generating sequence reads and counting their frequencies across different biological conditions. To identify differentially expressed genes between two conditions, it is important to consider the experimental design as well as the distributional property of the data. In many RNA-Seq studies, the expression data are obtained as multiple pairs, e.g., pre- vs. post-treatment samples from the same individual. We seek to incorporate paired structure into analysis.

Now, a team led by researchers at Yale University have developed a Bayesian hierarchical mixture model for RNA-Seq data to separately account for the variability within and between individuals from a paired data structure. The method assumes a Poisson distribution for the data mixed with a gamma distribution to account variability between pairs. The effect of differential expression is modeled by two-component mixture model. The performance of this approach is examined by simulated and real data.

Paired RNA-Seq DataIn this setting, the proposed model provides higher sensitivity than existing methods to detect differential expression. Application to real RNA-Seq data demonstrates the usefulness of this method for detecting expression alteration for genes with low average expression levels or shorter transcript length.

Availability: The method was implemented in R and is available at http://bioinformatics.med.yale.edu

  • Chung LM, Ferguson JP, Zheng W, Qian F,Bruno V, Montgomery RR, Zhao H(2013) Differential expression analysis for paired RNA-seq data. BMC Bioinformatics 14, 110. [abstract]

Incoming search terms:

  • RNA bioinformatics
  • cigar
  • cryptic RNA-seq bioinformatics
  • bioinformatics rna-seq analysis
  • list of RNA-sequencing service institute and university
  • rna seq differential expression comparison master thesis
  • rna seq structural variation
  • rna sequencing data analysis service
  • Baylor Research Institute Dallas
  • rna sequencing yale

Finding genes that are differentially expressed between conditions is an integral part of understanding the molecular basis of phenotypic variation. In the past decades, DNA microarrays have been used extensively to quantify the abundance of mRNA corresponding to different genes, and more recently high-throughput sequencing of cDNA (RNA-Seq) has emerged as a powerful competitor. As the cost of sequencing decreases, it is conceivable that the use of RNA-Seq for differential expression analysis will increase rapidly. To exploit the possibilities and address the challenges posed by this relatively new type of data, a number of software packages have been developed especially for differential expression analysis of RNA-Seq data.

Scientists at the Swiss Institute of Bioinformatics have conducted an extensive comparison of eleven methods for differential expression analysis of RNA-Seq data. All methods are freely available within the R framework and take as input a matrix of counts, i.e. the number of reads mapping to each genomic feature of interest in each of a number of samples. They evaluated the methods based on both simulated data and real RNA-Seq data.

The found that very small sample sizes, which are still common in RNA-Seq experiments, impose problems for all evaluated methods and any results obtained under such conditions should be interpreted with caution. For larger sample sizes, the methods combining a variance-stabilizing transformation with the ‘limma’ method for differential expression analysis perform well under many different conditions, as does the nonparametric SAMseq method.

Soneson C, Delorenzi M. (2013) A comparison of methods for differential expression analysis of RNA-Seq data. BMC Bioinformatics 14(1), 91. [article]

Incoming search terms:

  • swiss institute of bioinformatics rna seq
  • differential gene variance expression
  • rna-seq differential expression bayesian vs cuffdiff
  • differential expression in rnaseq
  • rsem deseq workflow
  • rna expression methods ppt
  • time course seq differentially expressed
  • rna seq expression
  • Overdispersion RNA-seq cummeRbund
  • course differential expression 2013

The RegulatoryGenomics website posts and updates a comprehensive list of tools for RNA-Seq analysis.

This is their current version.

Spliced-mappers

Method

Reference

Web-site

TopHap

(Trapnell et al. 2009)

http://tophat.cbcb.umd.edu/

MapSplice

(Wang et al. 2010)

http://www.netlab.uky.edu/p/bioinfo/MapSplice

SpliceMap

(Auger et al. 2010)

http://www.stanford.edu/group/wonglab/SpliceMap/

HMMSplicer

(Dimon et al. 2010)

http://derisilab.ucsf.edu/index.php?software=105

TrueSight

(Li et al. 2012b)

http://bioen-compbio.bioen.illinois.edu/TrueSight/

SOAPsplice

(Huang et al. 2011)

http://soap.genomics.org.cn/soapsplice.html

PASSion

(Zhang et al. 2012)

https://trac.nbic.nl/passion

PALMapper

(Jean et al. 2010)

http://galaxy.raetschlab.org/

SplitSeek

(Ameur et al. 2010)

http://solidsoftwaretools.com/gf/project/splitseek

Supersplat

(Bryant et al. 2010)

http://mocklerlab-tools.cgrb.oregonstate.edu/

SeqSaw

(Wang et al. 2011)

http://bioinfo.au.tsinghua.edu.cn/software/seqsaw

MapNext

(Bao et al. 2009)

http://evolution.sysu.edu.cn/english/software/mapnext.htm

STAR

(Dobin et al. 2012)

http://gingeraslab.cshl.edu/STAR/

GSNAP

(Wu et al. 2010)

http://research-pub.gene.com/gmap/

QPALMA

(De Bona et al. 2008)

http://www.raetschlab.org/suppl/qpalma

OSA

(Hu et al. 2012)

http://omicsoft.com/osa/

  Read more

Incoming search terms:

  • pathyway analysis for rna seq data
  • statistical methods for differential pathway activities
  • star splice junctions
  • solas rna analysis
  • scarf file rna
  • rnaseq alternative splicing trinity
  • rna seq alternative splicing method
  • alternative splicing expression
  • MethodstostudyEvent/IsoformExpressionandAlternativeSplicingfromRNA-Seq|RNA-SeqBlog
  • junction map mrna deep sequencing

Digital transcriptome analysis by next-generation sequencing discovers substantial mRNA variants. Variation in gene expression underlies many biological processes and holds a key to unravelling mechanism of common diseases. However, the current methods for construction of co-expression networks using overall gene expression are originally designed for microarray expression data, and they overlook a large number of variations in gene expressions.

CCATo use information on exon, genomic positional level and allele-specific expressions, researchers at Fudan University, China have developed novel component-based methods, single and bivariate canonical correlation analysis, for construction of co-expression networks with RNA-Seq data. To evaluate the performance of our methods for co-expression network inference with RNA-Seq data, they are applied to lung squamous cell cancer expression data from TCGA database and their own bipolar disorder and schizophrenia RNA-Seq study. The preliminary results demonstrate that the co-expression networks constructed by canonical correlation analysis and RNA-Seq data provide rich genetic and molecular information to gain insight into biological processes and disease mechanism. These new methods substantially outperform the current statistical methods for co-expression network construction with microarray expression data or RNA-Seq data based on overall gene expression levels.

Availability: A program for implementing the developed CCA for co-expression network construction can be downloaded from bioconductor (http://www.bioconductor.org/) and at  http://www.sph.uth.tmc.edu/hgc/faculty/xiong/index.htm

  • Hong S, Chen X, Jin L, Xiong M. (2013) Canonical correlation analysis for RNA-seq co-expression networks. Nucleic Acids Res [Epub ahead of print]. [article]

Incoming search terms:

  • coexpression network RNA-seq
  • Canonical correlation analysis for RNA-seq co-expression networks
  • build rna-seq network out of cuffdiff data
  • software analysis coexpression transcriptome sequencing
  • canonical correlation chipseq rnaseq
  • rna-seq cufflinks coexpression network
  • network rnaseq
  • Hong S Chen
  • Gene network from rna-seq data
  • gene expression correlation analysis

 Messenger RNA expression is important in normal development and differentiation, as well as in manifestation of disease. RNA-Seq experiments allow for the identification of differentially expressed (DE) genes and their corresponding isoforms on a genome-wide scale. However, statistical methods are required to ensure that accurate identifications are made. A number of methods exist for identifying DE genes, but far fewer are available for identifying DE isoforms. When isoform DE is of interest, investigators often apply gene-level (count-based) methods directly to estimates of isoform counts. Doing so is not recommended. In short, estimating isoform expression is relatively straightforward for some groups of isoforms, but more challenging for others. This results in estimation uncertainty that varies across isoform groups. Count-based methods were not designed to accommodate this varying uncertainty and consequently application of them for isoform inference results in reduced power for some classes of isoforms and increased false discoveries for others.

EBSeq

Taking advantage of the merits of empirical Bayesian methods, researchers at the University of Wisconsin have developed EBSeq for identifying DE isoforms in an RNA-Seq experiment comparing two or more biological conditions. Results demonstrate substantially improved power and performance of EBSeq for identifying DE isoforms. EBSeq also proves to be a robust approach for identifying DE genes.

Availability: An R package containing examples and sample data sets is available at http://www.biostat.wisc.edu/˜kendzior/EBSEQ/

  • Leng N, Dawson JA, Thomson JA, Ruotti V, Rissman AI, Smits BMG, Haag JD, Gould MN, Stewart RM, Kendziorski C. (2013) EBSeq: An empirical Bayes hierarchical model for inference in RNA-seq experiments. Bioinformatics [Epub ahead of print]. [abstract]

Incoming search terms:

  • what does means throughput rna-seq seqanswers
  • differential expression in rna-seq
  • rna sequencing illumina next generation
  • isoform based differential expression RNA Seq
  • differential expression of rna seq
  • identifying differentially expressed genes from rna-seq data matlab 2009a
  • seqanswers difference transcript expression count variance
  • RNAseq images
  • rna-seq differential gene expression two samples
  • testing for differential gene expression bayseq

Next Page →

  • Social Networking Pages

    Linkedin Group

  • Follow Me on Pinterest
  • RSS SEQanswers – RNA Sequencing

    • HT Seq Count stranded options May 24, 2013
      I am very new to bioinformatics, so I would be really grateful for some help! I have been using *HTSeq Count v0.5.3* and I am bit confused about... […]
      qwrissie
    • Tophat 2.0.8b installation error May 24, 2013
      I install tophat-2.0.8b to rerun the mapping. but when i make it, the error appears like this. make[1]: Entering directory... […]
      canhu
    • reason for low mapping rate?? May 23, 2013
      we did RNASeq using HiSeq 2000 100PE. When the data were back, I mapping them to the reference sequence, but got very low mapping rate (30-40%). I... […]
      miaom
    • cross-species data - questions about normalization May 23, 2013
      Hi, I have some data form various samples (cell types) in different species. I want to compare and analyze gene expression variability across the... […]
      trelek2
    • CuffDiff strange output May 23, 2013
      Hi, I hope that someone can be so gentle to help me. I'm analizing some data from RNA-Seq with TopHat and Cufflinks and I focus my attention on... […]
      Pruexel
    • cannot away with cuffdiff,incredible May 23, 2013
      Hi,all I have 4(A,B,C,D) sample in 4 times(increasing time),I got diff result in 3 different cuffdiff 1.cuffdiff 3(A,B,C) individual... […]
      upper
  • RSS Biostar – RNA-Seq

    • Why am I getting so many unmapped reads in STAR, classified as "too short"?
      I am currently using STAR to map several Hi-SEQ mRNA runs. I'm having trouble getting a decent amount of reads to map, but I don't really understand why. I'm hoping you can shed some light :) In the final log, only about 50% (or less) of the reads map to the reference. I'm using a GTF in addition to the genome. The unmapped bin that most […]
    • What are the best practices for SNP identification in RNA seq transcriptome data
      I have 20 RICE RNA seq tranascriptome data hiseq 2000 platform paired end reads. I aligned fasta reads with BWA and remove PCR duplicates with PICARD. Later I call SNP with samtools using various parameters. I would like to clarify what parameters should I used while alinging to reference rice genome for looking SNP location 100 bp upstream and 250 bp downst […]
    • How do TopHat options -g , --supress-hits, and Bowtie options interplay?
      Hi, I am currently using TopHat2 to map RNA-seq runs. I think there have been some changes pertaining the -g option. Does anyone know how it works now? I used to think that setting -g would look for n alignments for a given read, report them [if top-scoring] and discard those reads that had more than g [top scoring] alignments. Now, the description sounds mo […]
    • What happened to -k in TopHat for multiple-mapping reads?
      Selecting -g n in tophat does not discard reads mapping more than n, but instead only reports n alignments for those out all all their TOP scoring alignments. I think there used to be an option -k that would allow one to discard reads that topped x alignments -- whatever happened to that? I only see -g in the tophat 2 manual, no reporting options like before […]
    • Does tophat use the library-type information for mapping, or just for the XS flag?
      When I specify library-type to TopHat, i.e., first-strand, second-strand, unstranded, TopHat appends a value + or - to the XS:A flag, which is useful for subsequent analyses, such as annotation. However, does this information actually influence the "mappability" of reads, or is this unaffected? My thinking is that the information would be considere […]
    • Purpose of Y-shaped adapters in Illumina Sequencing?
      Hi all, Y adapters different sequences to be annealed to the 5' and 3' ends of each molecule in a library. The arms of the Y are unique, and the middle part, connected to the DNA fragment, is complementary. What are the advantages of this? My take of this over having fully-complementary adapters (ADAPTER1 - - - - - ADAPTER1) is that: -Upon primer a […]