TileDB launches cross-language access to single-cell data

TileDB database first to implement an open data model and API specification, enabling rapid cloud-optimized analysis of large single-cell datasets

TileDB, the database for any complex data and compute, today announced the launch of TileDBSOMA, the first collection of software libraries that implement the open-source SOMA API specification. SOMA and TileDB-SOMA are the result of a collaboration between the Chan Zuckerberg Initiative and TileDB to accelerate single-cell research by eliminating data silos and enable large-scale computations that are otherwise too challenging to execute on commodity hardware.

New technologies and analysis tools have led to the exponential growth of single-cell RNA sequencing (scRNA-seq) data, requiring new solutions that can accommodate datasets at scale. Advancements in genomics technologies have also enabled researchers to combine multiple modalities of data collected from the same cell samples, increasing the complexity and impact of single-cell analysis.

“The unsaid assumption in single-cell research is that dataset size is bound by RAM, but instead of asking researchers to change their computational tools, we’re rethinking how the data model itself could do more heavy lifting for scientists,” said Stavros Papadopoulos, Founder & CEO, TileDB, Inc. “With TileDB-SOMA for R and Python, computational biologists can work across programming languages and combine data that was previously formatted specifically for Seurat, Anndata/Scanpy or Bioconductor. This breaks down data silos, and allows scientists to collaborate without the hassle of converting or duplicating data. Everyone can access the dataset, stored locally or in the cloud, at any scale.”

SOMA makes cloud-based, single-cell data readily available for analysis and rapid experimentation. SOMA is a flexible, open-source API spec designed to enable access to any dataset that can be modeled as groups of annotated sparse 2D matrices. Storage engines that implement the SOMA spec allow scientists to expand their research across a growing body of scRNA-seq data using existing computational tools. Initially developed to help researchers query large single-cell biology datasets directly in cloud storage without loading unneeded data into RAM, SOMA’s design requirements can be applied to a wide range of scientific data.

The first two SOMA API implementations, TileDB-SOMA for Python (version 1.0) and TileDB-SOMA for R (pre-release), are based on the TileDB open-source and cloud-optimized storage engine, and allow single-cell researchers using different tools — Anndata/Scanpy and Seurat, with Bioconductor coming soon — to access large cloud-based datasets quickly and conveniently from different programming languages.

“By streamlining access to enormous datasets, powerful new tools like TileDB-SOMA will accelerate the research efforts of single-cell biologists,” said Ambrose Carr, a computational biologist and Director of Product Management for Single-Cell Biology at the Chan Zuckerberg Initiative. “Our engineering team collaborated closely with TileDB to build TileDB-SOMA as a readily accessible, cloud-based storage engine for single-cell datasets, so scientists have the ability to execute complex queries faster and more efficiently. Our team is excited for the launch of this new tool, which will solve some of the fundamental data accessibility challenges facing the single-cell community.”


Leave a Reply

Your email address will not be published. Required fields are marked *


Time limit is exhausted. Please reload CAPTCHA.