High-throughput sequencing (HTS) has revolutionized the way in which epigenetic research is conducted. When coupled with fully-sequenced genomes, millions of small RNA (sRNA) reads are mapped to regions of interest and the results scrutinized for clues about epigenetic mechanisms. However, this approach requires careful consideration in regards to experimental design, especially when one investigates repetitive parts of genomes such as transposable elements (TEs), or when such genomes are large, as is often the case in plants.
Here, in an attempt to shed light on complications of mapping sRNAs to TEs, a team led by researchers at the University of Sussex focused on the 2,300 Mb maize genome, 85% of which is derived from TEs, and scrutinized methodological strategies that are commonly employed in TE studies. These include choices for the reference dataset, the normalization of multiply mapping sRNAs, and the selection among sRNA metrics. The researcher team further examined how these choices influence the relationship between sRNAs and the critical feature of TE age, and contrast their effect on low copy genomic regions and other popular HTS data.
Based on their analyses, the researchers share a series of take-home messages that may help with the design, implementation, and interpretation of high-throughput TE epigenetic studies specifically, but these conclusions may also apply to any work that involves analysis of HTS data.
A matrix of the terms, data and analyses used in this study
The coloured boxes contain information specific for the maize genome (blue) or the TE exemplar database (green). The numbers in brackets for the Copia families represent their complete full-length populations retrieved from MASiVEdb