Based on 98 public and internal small RNA high throughput sequencing libraries, University of Cambridge researchers mapped small RNAs to the genome of the model organism Arabidopsis thaliana and defined loci based on their expression using an empirical Bayesian approach. The resulting loci were subsequently classified based on their genetic and epigenetic context as well as their expression properties. The researchers present the results of this classification, which broadly conforms to previously reported divisions between transcriptional and post-transcriptional gene silencing small RNAs, and to PolIV and PolV dependencies. However, they are able to demonstrate the existence of further subdivisions in the small RNA population of functional significance. Moreover, the researchers present a framework for similar analyses of small RNA populations in all species.
Summary of work-flow
Schematic diagram of the work-flow used to identify and classify sRNA loci. The yellow box depicts raw data (individual sRNAs are shown as green and blue arrows) used as a basis for segmentation (red box). Annotation features are determined from the loci and by comparison with external data sets and used to classify the loci (orange box).
Availability – Sequencing data are made available through the European Nucleotide Archive (ENA)(http://www.ebi.ac.uk/ena) under accession number PRJEB18944.