The latest supercomputing technology is powering RNA-Seq data analysis pipelines at the Center for Genomic Regulation

from HPC Wire by John Russell

Univa’s recent launch of the Grid Engine Container Edition is the latest evidence of growing support for Docker containers in high-performance computing (HPC). Last month NERSC introduced Shifter, a tool for bringing Docker containers onto Edison, NERSC’s Cray supercomputer. Container technology is hardly new, but the Docker implementation ease-of-use promises to substantially expand container technology use.

Virtualization in all forms has significantly impacted computing, virtual machines being the most prominent. Container technology is the next iteration of that advancement, allowing for application-level virtualization in a compact, easy to configure, and relatively isolated portion of the host operating system. Rather than having to have a bulky system image with an entire operating system attached to it, containers interface directly with the host OS and only contain the differences from the host OS needed to run the virtual environment successfully.

In its announcement, Univa cited The Centre for Genomic Regulation (CRG), which deployed Docker containers with Univa Grid Engine for scientific data analysis on distributed clusters. Paolo Di Tommaso, CRG software engineer, said, “This solution allowed us to reduce configuration and deployment problems and rapidly create a self-contained run-time environment to accelerate development.” CRG, part of the National Center for Genomic Analysis (Barcelona) is one of Europe’s largest genomics research centers.

Ari Berman, Principal Investigator for BioTeam, a research computing consulting company, noted, “Containers can be viewed as portable execution environments that are as compact as you can make them. Also, since they use the host kernel to operate, there is no operational overhead to slow down computational activities, unlike virtual machines which often operate much slower than the host OS. There are many ways to use containers, including the framework that popularized their use, Docker.”

“From a scientific HPC viewpoint, the ability to schedule and run container-based compute resources is a pretty big deal, and one that we’ve been watching closely for the last year or so. There is a lot to be said for being able to define a contained compute environment that is different than the configuration of the host system, without a major reconfiguration of the HPC systems to be able to use it.”

Univa’s efforts to incorporate containers into the Grid Engine scheduler and dispatch system have significant implications for science as a service and collaborative computing, said Berman. “If a collaborator has a complex workflow that they want to run on data that exists in another place on another HPC resource, encoding the workflow into a container, then running it on the remote system would make it much simpler to operate in that environment, which is likely configured differently than the local resources.”

Using containers in this manner, noted Berman, would allow for more seamless interactions with public and private clouds, which makes the concept of hybrid computing environments much more realistic from an operational standpoint. “The use of containers in this way may also have implications for grid computing and for the way that methods listed in scientific publications are shared with the scientific community, meaning that the journals can provide a container to readers so that the analyses can be reproduced in the same environment and configuration that the published results were produced in,” said Berman.

Running RNA-Seq data analysis pipelines is a major portion of CRG’s work. Di Tommaso said Univa support for containers technology has allowed CRG to simplify the deployment of complex genomic pipelines in the Univa cluster in a significant manner. Instead of install/configure/update dozens of different pieces of software that may be required by a data analysis application, we can simply deploy them using Docker containers that are downloaded on-demand in the cluster nodes. Above all this guarantees the reproducibility of the computational environment and the consistency of results of our analysis over time.”

Univa Grid Engine Container Edition greatly simplifies Docker use said Gary Tyreman, CEO, Univa, “The problem we solve is orchestration, which granted means different things to different people. What we mean is having to figure out the best place to run a multi-tier app or a single instance app and drive efficiency up. Part of that is resource management, part is understanding where is the best place to run a particular container based on the needs around it or the fact that a container might already be downloaded and sitting on a particular node.”

“This is a our first step, full implementation with Docker within a Grid Engine cluster. You’ll see us do a number of things in the fall as well as we look to broaden what we bring to market beyond Grid Engine,” he said.

Security concerns with Docker containers persist, agrees Tyreman, but are steadily being solved. In the enterprise space, at both high end and low end, he sees Docker uptake spreading beyond dev/op to production environments over time. Univa is a founding member of the Cloud Native Computing Foundation with companies such Google VMware, Red Hat.

Source – HPC Wire

Leave a Reply

Your email address will not be published. Required fields are marked *


Time limit is exhausted. Please reload CAPTCHA.