Mol2 files bonds #54

btyukodi · 2018-07-06T16:15:33Z

PandasMol2().read_mol2() reads and parses a mol2 file, however, in the dataframe, only the @ATOM section is present. Is there any way to access the bonds?

Thanks
Botond

rasbt · 2018-07-06T19:25:56Z

Hi there,

in the current version of biopandas, the Mol2 BOND section is not parsed, yet. You would need to parse it from the raw text data that is attached to a PandasMol2 object pmol via pmol.mol2_text.

I think adding some parsing functionality for the BOND section would be a good thing to do!

Since the analysis of bonds is not necessarily done in each virtual screening analysis/application, I would make parsing the bonds section not the default (but optional) for computational efficiency reasons.

A general "design" question now is whether a BOND DataFrame should be stored/accessed via

a) pmol.df_bonds (where pmol.df is currently storing the atom section)

or

b) pmol.df['bonds'] (where pmol.df would then have to be renamed to pmol.df['atom'], which would make it more consistent with PandasPdb())

pmol.parse_bonds() # method to parse the bonds section
pmol.df_bonds # dataframe containing the bonds section

Feedback and comments (and PRs!) are welcome!

btyukodi · 2018-07-06T21:06:01Z

Hi, thanks for your answer. Just wrote a quick, dirty and probably non-optimal parser. It could serve as a temporary solution:

def bond_parser(filename):
    f = open(filename,'r')
    f_text = f.read()
    f.close()
    bond_start = f_text.find('@<TRIPOS>BOND')
    bond_end = f_text[bond_start:].replace('@<TRIPOS>BOND','').find('@')
    df_bonds = pd.DataFrame(np.array(f_text[bond_start:bond_end].replace('@<TRIPOS>BOND\n','').replace('\n',' ').split(' ')).reshape((-1,4)),
            columns=['bond_id', 'atom1', 'atom2', 'bond_type'])
    df_bonds.set_index(['bond_id'], inplace=True)
    return df_bonds

wlgfour · 2020-09-10T18:56:21Z

Here is a shorter version of of what @btyukodi wrote that uses regex:

dwef bond_parset(filename):
    with open(filename, 'r') as f:
        f_text = f.read()
    bonds =  np.array(re.sub(r'\s+', ' ', re.search(r'@<TRIPOS>BOND([a-z0-9\s]*)@', f_text).group(1)).split()).reshape((-1, 4))
    df_bonds = pd.DataFrame(bonds, columns=['bond_id', 'atom1', 'atom2', 'bond_type'])
    df_bonds.set_index(['bond_id'], inplace=True)
    return df_bonds

rasbt · 2020-09-10T22:49:41Z

Thanks a lot. I totally forgot about this / got to busy to work on this. Thanks for sharing though, I hopefully will get to it one day (or maybe someone else :))

mercicle · 2020-12-03T00:39:10Z

Hey there all, was this ever implemented?

rasbt · 2020-12-03T01:10:27Z

No, sorry, never really found the time to do that :(

Kroppeb · 2022-05-14T15:42:31Z

Seems like this is still an open issue?

rasbt · 2022-05-14T15:51:10Z

yeah, this is still an open issue

rasbt added the enhancement label Jul 6, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mol2 files bonds #54

Mol2 files bonds #54

btyukodi commented Jul 6, 2018

rasbt commented Jul 6, 2018 •

edited

btyukodi commented Jul 6, 2018

wlgfour commented Sep 10, 2020 •

edited

rasbt commented Sep 10, 2020

mercicle commented Dec 3, 2020

rasbt commented Dec 3, 2020

Kroppeb commented May 14, 2022

rasbt commented May 14, 2022

Mol2 files bonds #54

Mol2 files bonds #54

Comments

btyukodi commented Jul 6, 2018

rasbt commented Jul 6, 2018 • edited

btyukodi commented Jul 6, 2018

wlgfour commented Sep 10, 2020 • edited

rasbt commented Sep 10, 2020

mercicle commented Dec 3, 2020

rasbt commented Dec 3, 2020

Kroppeb commented May 14, 2022

rasbt commented May 14, 2022

rasbt commented Jul 6, 2018 •

edited

wlgfour commented Sep 10, 2020 •

edited