A former UBC post-doctoral research fellow led an international research team in re-analyzing all public RNA sequencing data to uncover almost ten times more RNA viruses than were previously known, including several new species of coronaviruses in some unexpected places.
This planetary-scale database of RNA viruses can help pave the way to rapidly identify virus spillover into humans, as well as those viruses that affect livestock, crops, and endangered species.
Dr. Artem Babaian is behind the Serratus Project collaboration. It published the stunning results of the research in the prestigious scientific journal Nature this week.
Working with the Cloud Innovation Centre, a public/private collaboration between UBC and Amazon Web Services, the Serratus Project was able to build a “ridiculously powerful” supercomputer on AWS equivalent in power to 22,500 CPUs, said Babaian.
The supercomputer read through 20 million gigabytes of publicly available gene sequence data from 5.7 million biological samples around the world, searching for a specific gene that indicated the presence of an RNA virus. The samples have been collected and freely shared within the world research community over 13 years and include everything from ice-core samples to animal dung.
Map of World Sequencing Data. Credit: Serratus Project
Researchers with the Serratus Project found 132,000 RNA viruses (where just 15,000 were known previously) and nine new species of coronaviruses. Babaian estimates that without the CIC and the AWS Cloud, it would take a traditional supercomputer well over a year and hundreds of thousands of dollars to perform the 2,000 years of CPU time necessary for this analysis. Serratus accomplished it in 11 days for $24,000.
“We’re entering a new era of understanding the genetic and spatial diversity of viruses in nature, and how a wide variety of animals interface with these viruses. The hope is we’re not caught off guard if something like SARS-CoV-2—the novel coronavirus that causes COVID-19— emerges again. These viruses can be recognized more easily and their natural reservoirs can be found faster. The real goal is these infections are recognized so early that they never become pandemics,” said Babaian, who holds a PhD in medical genetics from UBC and is now a Banting Fellow at the University of Cambridge.
“If a patient presents with a fever of unknown origin, once that blood is sequenced, you can now connect that unknown virus in the human to a way bigger database of existing viruses. If a patient, for example, presents with a viral infection of unknown origin in St. Louis, you can now search through the database in about two minutes, and connect that virus to, say, a camel in sub-Saharan Africa sampled in 2012.”
Phylogram for group-E sequences. Six viruses were similar to PsNV in Ambystoma mexicanum (axolotl; AmexNV), Puntigrus tetrazona (tiger barb; PtetNV), Hippocampus kuda (seahorse; HkudNV), Syngnathus typhle (broad-nosed pipefish; StypNV), Takifugu pardalis (fugu fish; TparNV) and the Acanthemblemaria sp. (blenny; AcaNV). More-distant members identified were in Hypomesus transpacificus (the endangered delta smelt; HtraNV), Silurus sp. (catfish; SilNV) and Monopterus albus (asian swamp eel; MalbNV). b, Unrooted phylogram for Coronaviridae annotated with genera (Greek letters) and group-E CoV-like nidoviruses (see also Extended Data Fig. 4). Maximum likelihood tree generated by clustering the RdRP amino acid sequences at 97% identity to show sub-species variability. c, Genome structure of AmexNV and the contigs recovered from group-E CoV-like viruses annotated with HMM matches. AmexNV contigs contain an identical 129-nt trailing sequence (Tr). All the putatively segmented CoV-like are monophyletic with PsNV. A gap in the PsNV reference sequence24 is shown with circles, overlapping the common contig ends seen in these viruses.
Babaian, 32, had been conducting genetic research into cancer with BC Cancer when the COVID-19 pandemic hit and he switched gears.
The work, which the understated Babaian says started as a “fun side project,” began March 3, 2020, when he and his climbing partner friend, UBC engineering student Jeff Taylor, sketched out the idea “on the back of a napkin,” said Babaian.
“I should have kept that napkin,” he noted.
Babaian approached UBC’s Cloud Innovation Centre for help shortly after. Serratus, named after Serratus Mountain in the Tantalus Range in British Columbia, which he and Taylor viewed during a climb in 2020, was born.
Babaian recalled he was sitting on his wife’s nursing chair when the first results started to flash up on his laptop, indicating that Serratus was not only working, but producing data almost incomprehensibly fast.
“It was probably the most exciting scientific period of my life,” he said. “There are two types of fun. Type 1 is smiling and fun. Type 2 is when you’re miserable while doing it but the memory shines, like rock climbing. In many ways Serratus is Type 2 fun. You just kind of have to believe it’s going to work out.”
Babaian said he would not have been able to do this work without the support of the UBC Cloud Innovation Centre.
“The Cloud Innovation Centre was really there unlocking the doors for us,” he said. “We had an idea and they brought in experts from their networks to make it come to life. Now the global community can benefit from all this previously untapped research.”
“Artem approached us with an innovative vision. The power of the Cloud Innovation Centre is that we pair our in-house innovation and technology teams from UBC with those from Amazon Web Services,” said Marianne Schroeder, director of the UBC Cloud Innovation Centre. “It was our great privilege to support the realization of this vision; helping to find a technology solution for complex problems is what we do.”
The Centre, which launched right before the pandemic in January 2020, supports challenges that focus on community health and wellbeing. To date, the team has published more than 20 projects including reference architecture and deployment guides all available open source.
“While the public cloud as we know it has been around for 15 years, the last few years of innovation at Amazon Web Services have really made genomics research possible in a new way,” said Coral Kennett, who heads up the Centre for Amazon Web Services. “We were able to give Artem access to compute power for pennies a query. We highly encourage the research community to submit their projects and ideas to the Cloud Innovation Centre so that more innovation comes to light benefitting the community.”
Find out more about other Cloud Innovation Centre projects here.
Source – The University of British Columbia
Availability – Serratus (v0.3.0) is available at https://github.com/ababaian/serratus.