Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected behaviour in SSSR routine - wrong sub rings #2641

Open
Mblakey opened this issue Nov 11, 2023 · 3 comments
Open

Unexpected behaviour in SSSR routine - wrong sub rings #2641

Mblakey opened this issue Nov 11, 2023 · 3 comments

Comments

@Mblakey
Copy link
Contributor

Mblakey commented Nov 11, 2023

Open Babel version: current
Operating system and version: linux or macos (tested both)

Looking at the following smiles: O1C23C4=C(C=C12)C1=COC(=C4OC3)C1

#1 (2)

I would expect to see a SSSR containing 5 subcycles, which is what is seen, however the subcycles generated for each OBRing* shows some weird behaviour.

running the following code on the Mol object :

unsigned int rc = 0;
FOR_RINGS_OF_MOL(r,mol){
  obring = &(*r);
  fprintf(stderr,"%d [ ",rc++);
  for(unsigned int i=0;i<obring->Size();i++){
    ratom = mol->GetAtom(obring->_path[i]);
    fprintf(stderr,"%d ",ratom->GetIdx());
  }
  fprintf(stderr,"]\n");
}

gives the following output:

0 [ 6 1 2 ]
1 [ 3 4 5 6 2 ]
2 [ 12 11 3 2 13 ]
3 [ 7 8 9 10 14 ]
4 [ 3 4 5 6 1 2 ]

Looking at ring (3), we see atom indexes that are not shared with any other subcycle in the list? which would be impossible if the subcycles are generated as i would expect? To confirm, the method in RDKit does not show this behaviour:

mol = rdkit.Chem.MolFromSmiles('O1C23C4=C(C=C12)C1=COC(=C4OC3)C1')
for sequence in rdmolops.GetSymmSSSR(mol):
    for i in sequence:
        print(f"{i} ", end='')
    print()

yields

[ 0 1 5 ]
[ 4 3 2 1 5 ]
[ 7 6 13 9 8 ]
[ 11 10 2 1 12 ]
[ 13 6 3 2 10 9 ]

Where overlap is seen for all the atoms across the ring set.

Happy to attempt a fix on this but some direction would be appreciated (if in fact this is a true bug, and not my misunderstanding). I believe its due to the 3 cycle, as for the majority of cases i see the correct overlaps.

@Mblakey Mblakey changed the title Unexpected behaviour in SSSR routine Unexpected behaviour in SSSR routine - wrong sub rings Nov 12, 2023
@fredrikw
Copy link
Contributor

I have been looking at this and I agree that it is a bug. The problem is that when pruning redundant rings (in

//remove larger rings that cover the same atoms as smaller rings
), overlaps are checked through atoms instead of bonds.
In this case, that results in the central ring (marked in red:
image
) getting pruned.
I could make a try to fix this, but I'm not too familiar with all the inner workings of the OBRing to be certain to fix it in the best way possible. I suppose the easiest way would be to add the attribute _bondset to the class, parallell to the _pathset and use that for pruning instead. Do you think there might be a performance issue here? I think maybe @timvdm was the last one to touch the code?
Another option would be to add code in OBRingSearch::RemoveRedundant such as in OBRing::visitRing:

    // Translate ring atom indexes to ring bond indexes.
    std::vector<unsigned int> bonds = atomRingToBondRing(mol, ring->_path);
    OBBitVec bondset;
    for (unsigned int i = 0; i < bonds.size(); ++i)
      bondset.SetBitOn(bonds[i]);

    //
    // Remove larger rings that cover the same bonds as smaller rings.
    //
    mask.Clear();
    for (unsigned int j = 0; j < rlist.size(); ++j) {
      std::vector<unsigned int> otherBonds = atomRingToBondRing(mol, rlist[j]->_path);
      OBBitVec bs;
      for (unsigned int i = 0; i < otherBonds.size(); ++i)
        bs.SetBitOn(otherBonds[i]);

      // Here we select only smaller rings.
      if (otherBonds.size() < bonds.size())
        mask |= bs;
    }

@Mblakey
Copy link
Contributor Author

Mblakey commented Nov 17, 2023

Hi @fredrikw ! Yes agreed definitely the wrong approach here, atom candidacy is not enough to decide which rings to be removed. I have seen the exclusive OR option for MCB on the bond set as you've suggested, not sure how fast this will be. but yes if this is the best way forward definitely agree a bond bit vector would be a clean solution.

I am currently looking at CDKs approach to attempt a fix on this, whats missing is the gaussian elimination step needed for MCB, will have a go this weekend and come back to this post.

Any help from @timvdm would be very much appreciated.

@fredrikw
Copy link
Contributor

I put together a quick fix for this in #2648 but there are a bunch of questions that needs answers before it is done...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants