Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added test case in CDKAtomTypeMatcherFilesTest that gives rise to an NPE #919

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

uli-f
Copy link
Member

@uli-f uli-f commented Oct 17, 2022

No description provided.

@uli-f uli-f mentioned this pull request Oct 17, 2022
@johnmay
Copy link
Member

johnmay commented Oct 17, 2022

Bond Type 8 is a query bond (any) and not for representing molecules. We could support this with the ChemAxon extensions but this is not a bug but rather out of scope for now. It's the same with aromatic bonds (4).

Note V3000 has bond type 9 and 10 for dative and hbonds and this is much more portable.

@uli-f
Copy link
Member Author

uli-f commented Oct 17, 2022

Thank you for the explanation.

The file was read from V3000 and written to V2000 by ChemAxon.

The only thing I was looking for here is a more helpful exception. The NPE doesn't tell me anything about what is going on.

Would you consider it feasible to throw an UnsupportedOperationException instead just to make it clear what the problem is?

@egonw
Copy link
Member

egonw commented Oct 17, 2022

The only thing I was looking for here is a more helpful exception. The NPE doesn't tell me anything about what is going on.

This is what the Strict reading mode is about. I guess it should throw a CDKException with a readable message.

I can contribute to this one.

@egonw egonw assigned egonw and uli-f Oct 17, 2022
@johnmay
Copy link
Member

johnmay commented Oct 17, 2022

The file was read from V3000 and written to V2000 by ChemAxon.

Why not read the V3000 directly?

@uli-f
Copy link
Member Author

uli-f commented Oct 17, 2022

Why not read the V3000 directly?

Totally valid question 😃
That is probably me being very conservative and still assuming that V3000 isn't as widespread as V2000.
Seems like a good op to re-consider 🧠

Irrespective of that throwing a more informative exception might be worthwhile doing.

@uli-f
Copy link
Member Author

uli-f commented Oct 17, 2022

This is what the Strict reading mode is about. I guess it should throw a CDKException with a readable message.

I can contribute to this one.

@egonw Sounds good to me. Do you want me to move the test to MDLV2000ReaderTest?

If the idea is to throw a CDKException in case a query bond is encountered in Mode.STRICT what is the expected outcome when reading a query bond in Mode.RELAXED? Just log an error and move on?

@johnmay
Copy link
Member

johnmay commented Oct 17, 2022

Note it does give you back a query atom container...

    public static void main(String[] args) throws CDKException {
        String ctfile = "\n" +
                "  Mrv2204 10172221282D          \n" +
                "\n" +
                "  2  1  0  0  0  0            999 V2000\n" +
                "   -7.0097    0.3928    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0\n" +
                "   -6.1810    1.8211    0.0000 Ge  0  0  0  0  0  3  0  0  0  0  0  0\n" +
                "  1  2  8  0  0  0  0\n" +
                "M  STY  1   1 DAT\n" +
                "M  SAL   1  2   1   2\n" +
                "M  SDT   1 MRV_COORDINATE_BOND_TYPE                              \n" +
                "M  SDD   1     0.0000    0.0000    DR    ALL  0       0  \n" +
                "M  SED   1 1\n" +
                "M  END";
        MDLV2000Reader reader = new MDLV2000Reader(new StringReader(ctfile));
        IAtomContainer mol = reader.read(SilentChemObjectBuilder.getInstance().newAtomContainer());
        if (mol instanceof IQueryAtomContainer) {
            System.err.println("You have a query features in your molfile! I can't do much with it beyond using it as a query.");
        } else {
            // all ok!
        }
    }

@johnmay
Copy link
Member

johnmay commented Oct 17, 2022

It is sometimes possible to convert queries into molecules but we hesitate to do this automatically since it can introduce errors.

@uli-f
Copy link
Member Author

uli-f commented Oct 17, 2022

Note it does give you back a query atom container...

Uh, okay.
That doesn't make things easier, does it...?

Which brings me back to how this issue started: Does it make more sense to handle this in CDKAtomTypeMatcher? Maybe simply by ignoring atoms that are IQueryAtoms when atom types are perceived?

@johnmay
Copy link
Member

johnmay commented Oct 17, 2022

That doesn't make things easier, does it...?

If you try to treat a query as molecule you're going to have a bad time. It's simply the limit of what you can do in MOLfiles - short of customer ChemAxon extensions which are by no means defacto.

@uli-f
Copy link
Member Author

uli-f commented Oct 18, 2022

If you try to treat a query as molecule you're going to have a bad time. It's simply the limit of what you can do in MOLfiles - short of customer ChemAxon extensions which are by no means defacto.

Sorry for not making this clear: It is not my intention to treat query molecules as regular non-query molecules.

I definitely want to drop those query molecules within my processing pipeline. However, at the moment, they crash the flow of the pipeline by raising an exception and I was looking for a more specific (and thus also being a bit more informative) exception to catch instead of the NPE.

The obvious alternative seems to be that I more carefully validate what I am processing (i.e., by discarding any molecule that has query features). I am very happy to go down this route.

@johnmay
Copy link
Member

johnmay commented Oct 18, 2022

I definitely want to drop those query molecules within my processing pipeline. However, at the moment, they crash the flow of the pipeline by raising an exception and I was looking for a more specific (and thus also being a bit more informative) exception to catch instead of the NPE.

The obvious alternative seems to be that I more carefully validate what I am processing (i.e., by discarding any molecule that has query features). I am very happy to go down this route.

Yeah that's the intention, it gives you back a query molecule, if you have manual ways to clean that up then it may be possible to do something with it.

@johnmay
Copy link
Member

johnmay commented Oct 18, 2022

But yes running CdkAtomTypeMatcher on a query should throw a better error!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants