Next-generation sequencing (NGS) informs many biological questions with unprecedented depth and nucleotide resolution. These assays have created a need for analytical tools that enable users to manipulate data nucleotide-by-nucleotide robustly and easily. Furthermore, because many NGS assays encode information jointly within multiple properties of read alignments – for example, in ribosome profiling, the locations of ribosomes are jointly encoded in alignment coordinates and length – analytical tools are often required to extract the biological meaning from the alignments before analysis. Many assay-specific pipelines exist for this purpose, but there remains a need for user-friendly, generalized, nucleotide-resolution tools that are not limited to specific experimental regimes or analytical workflows.
Researchers from the California Institute of Quantitative Biosciences have developed Plastid, a Python library designed specifically for nucleotide-resolution analysis of genomics and NGS data. As such, Plastid is designed to extract assay-specific information from read alignments while retaining generality and extensibility to novel NGS assays. Plastid represents NGS and other biological data as arrays of values associated with genomic or transcriptomic positions, and contains configurable tools to convert data from a variety of sources to such arrays. Plastid also includes numerous tools to manipulate even discontinuous genomic features, such as spliced transcripts, with nucleotide precision. Plastid automatically handles conversion between genomic and feature-centric coordinates, accounting for splicing and strand, freeing users of burdensome accounting. Finally, Plastid’s data models use consistent and familiar biological idioms, enabling even beginners to develop sophisticated analytical workflows with minimal effort.
Uses of Plastid in analysis workflows
Plastid (yellow box) contains tools for both exploratory data analysis (blue, center) and command-line scripts for specific tasks (green, right). Plastid standardizes representation of data across the variety of file formats used to represent genomics data (left). Quantitative data are represented as arrays of data over the genome. Read alignments may be transformed into arrays using a mapping function appropriate to a given assay. Transcripts are represented as chains of segments that automatically account for their discontinuities during analysis. Plastid integrates directly with the SciPy stack (blue, center). For exploratory analysis in other environments (blue, above) or further processing in external programs (right, green), Plastid imports and exports data in standardized formats
Availability – Plastid is released under the BSD 3-Clause license. Official releases are available in the Python Package Index at http://pypi.python.org/pypi/plastid. Development versions are available at the project’s home page, https://github.com/joshuagryphon/plastid. Examples, user documentation, and technical information are available at http://plastid.readthedocs.io.