by Rafal Gumienny
The python API for computational biology is constantly growing and slowly catches up with R. One of the common tasks that bioinformaticians often encounter is to compare their results to some publicly available data.
If one wanted to use Gene Expression Omnibus resources, in most cases it would require switching from Python to R.
Sometimes, such a switch can be simply inconvenient. Furthermore, it requires not only some knowledge of the second programming language, but also a deeper understanding of the data structures that are returned by the library, which can be challenging for the beginner.
To facilitate this process researchers at the Swiss Institute of Bioinformatics developed a small python library called GEOparse that is a rough equivalent of the GEOquery in R. GEOparse allows downloading and loading the SOFT files from the Gene Expression Omnibus database. The data is loaded in easily digestible data structures composed of Pandas DataFrames and dictionaries. GEOparse provides an easy solution for manipulation of the data and basic calculations, including filtering and annotation.