Skip to content

Latest commit

 

History

History
28 lines (23 loc) · 1.48 KB

paper.md

File metadata and controls

28 lines (23 loc) · 1.48 KB
title tags authors affiliations date bibliography
BioPandas: Working with molecular structures in pandas DataFrames
bioinformatics
computational biology
protein structure analysis
protein-ligand docking
virtual screening
name orcid affiliation
Sebastian Raschka
0000-0001-6989-4493
1
name index
Michigan State University, East-Lansing, USA
1
31 May 2017
paper.bib

Summary

BioPandas is a Python library that reads molecular structures from 3D-coordinate files, such as PDB [@Berman2000] [@berman2003announcing] and MOL2, into pandas DataFrames [@mckinney2010data] for convenient data analysis and data mining related tasks.

In addition to parsing protein and small molecule data into a data frame format, BioPandas provides additional utility functions for structure analysis. These functions include common computations such as computing the root-mean-squared-deviation between structures and converting protein structures into primary amino acid sequence formats.

Furthermore, useful small-molecule related functions are provided for reading and parsing millions of small molecule structures (from multi-MOL2 files [@tripos2007tripos]) fast and efficiently in virtual screening applications. Inbuilt functions for filtering molecules by the presence of functional groups and their pair-wise distances to each other make BioPandas a particularly attractive utility library for virtual screening and protein-ligand docking applications.

References