2D input to create molecular grid image #5917
Replies: 6 comments 8 replies
-
@bertiewooster I can definitely see the use of this and think it's a good idea. So the input would be something like
As for the name... how about something like |
Beta Was this translation helpful? Give feedback.
-
Some questions about tests for code as I prepare to write tests:
I get a similar error if I run I checked the HowToContribute wiki, including the presentation An RDKit-centric intro to git and github, and unfortunately they didn't address these questions. I'll add the answers to a Contribute to RDKit page so they're available in the future. |
Beta Was this translation helpful? Give feedback.
-
Good news: I've largely completed the coding, have some tests in place with Trouble setting up development environmentIs it possible to set up a development environment for the RDKit on a Mac, specifically with an M2 chip? I tried to follow the instructions How to build from source with Conda: macOS 10.12 (Sierra): Python 3 environment but this sequence of problems occurred:
I learned that meant that I needed to install apt, which I tried:
I installed Java but then got this error:
I learned that "by default Mac does not support APT commands". Is there some solution? Containerized development environment?Alternatively, is it possible to lower the bar for code contributions by simplifying this process, for example by setting up a containerized development environment using e.g. Docker? |
Beta Was this translation helpful? Give feedback.
-
Hi Jeremy,
Getting the build environment right is often the most challenging part of
getting started with RDKit development, in my experience. I attach the
YAML file for the conda env I use for building, and the associated cmake
command. I then use the following to build RDKit into the python
environment so defined:
export RDBASE=/Users/david/Projects/RDKit/Master
export PYTHONPATH=$RDBASE; export DYLD_LIBRARY_PATH=$RDBASE/lib; export
DYLD_FALLBACK_LIBRARY_PATH="$RDBASE/lib:$PYROOT/lib"
*RDKit* (make -j 6 install ; find $RDBASE/rdkit -name \*.so -exec
install_name_tool -add_rpath $RDBASE/lib {} \; -print )
I created the conda env with:
mamba create -n RDKit.Build
mamba activate RDKit.Build
mamba install -c anaconda numpy matplotlib
mamba install -c anaconda cmake cairo pillow eigen pkg-config
mamba install -c anaconda boost-cpp boost py-boost
I use mamba rather than conda, but if you're using conda you should just be
able to substitute 'conda' for 'mamba' in the above and remove the '-c
anaconda' bit. Greg told me this week that you can do it directly with
mamba, if you don't include the py-boost package, in which case you
wouldn't need the '-c anaconda' bit with the mamba command either. The
py-boost in mamba is included in the boost package, IIRC. I have not yet
got round to trying this.
I normally do my development on an M1 mac mini, but this set of commands
also works on a new M1 MacBook Pro and an old intel-based MBP on both
Monterey and Ventura. Your mileage may vary, obviously. People have
helpfully sent me cmake commands in the past which work for them that I
have struggled with.
Hope this helps,
Dave
…On Wed, Jan 18, 2023 at 4:26 AM Jeremy Monat ***@***.***> wrote:
Good news: I've largely completed the coding, have some tests in place
with unittest, and am adding more tests as I learn how to parametrize
unittest. Bad news: I've run into problems setting up a development
environment--
Trouble setting up development environment
Is it possible to set up a development environment for the RDKit on a Mac,
specifically with an M2 chip? I tried to follow the instructions How to
build from source with Conda: macOS 10.12 (Sierra): Python 3 environment
<https://www.rdkit.org/docs/Install.html#macos-10-12-sierra-python-3-environment>
but this sequence of problems occurred:
(rdkit-clean) ***@***.*** rdkit-clean % make
/Users/myUserName/miniconda3/envs/rdkit-clean/include/boost/python/detail/wrap_python.hpp:57:11: fatal error: 'pyconfig.h' file not found
I learned that meant that I needed to install apt
<https://stackoverflow.com/questions/57244655/cannot-build-boost-python-library-fatal-error-pyconfig-h-no-such-file-or-dire#57244837>,
which I tried:
(rdkit-clean) ***@***.*** rdkit-clean % sudo apt install python3.10-dev
The operation couldn’t be completed. Unable to locate a Java Runtime.
Please visit http://www.java.com for information on installing Java
I installed Java but then got this error:
(rdkit-clean) ***@***.*** rdkit-clean % sudo apt install python3.10-dev
The operation couldn’t be completed. Unable to locate a Java Runtime that supports apt.
Please visit http://www.java.com for information on installing Java.
I learned that "by default Mac does not support APT commands"
<https://stackoverflow.com/questions/58598795/why-cant-i-run-apt-install-on-mac>.
Is there some solution?
Containerized development environment?
Alternatively, is it possible to lower the bar for code contributions by
simplifying this process, for example by setting up a containerized
development environment using e.g. Docker?
—
Reply to this email directly, view it on GitHub
<#5917 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACGF2FUG3Q4R35AWBCK7KWDWS5WF3ANCNFSM6AAAAAATPKDVOQ>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
--
David Cosgrove
Freelance computational chemistry and chemoinformatics developer
http://cozchemix.co.uk
|
Beta Was this translation helpful? Give feedback.
-
PS
I re-read my email twice before posting, and still failed to notice that I
included my prompt in one of the lines. It should say
(make -j 6 install ; find $RDBASE/rdkit -name \*.so -exec install_name_tool
-add_rpath $RDBASE/lib {} \; -print )
on the 3rd line of the build commands. Sorry.
On Wed, Jan 18, 2023 at 6:58 AM David Cosgrove ***@***.***>
wrote:
… Hi Jeremy,
Getting the build environment right is often the most challenging part of
getting started with RDKit development, in my experience. I attach the
YAML file for the conda env I use for building, and the associated cmake
command. I then use the following to build RDKit into the python
environment so defined:
export RDBASE=/Users/david/Projects/RDKit/Master
export PYTHONPATH=$RDBASE; export DYLD_LIBRARY_PATH=$RDBASE/lib; export
DYLD_FALLBACK_LIBRARY_PATH="$RDBASE/lib:$PYROOT/lib"
*RDKit* (make -j 6 install ; find $RDBASE/rdkit -name \*.so -exec
install_name_tool -add_rpath $RDBASE/lib {} \; -print )
I created the conda env with:
mamba create -n RDKit.Build
mamba activate RDKit.Build
mamba install -c anaconda numpy matplotlib
mamba install -c anaconda cmake cairo pillow eigen pkg-config
mamba install -c anaconda boost-cpp boost py-boost
I use mamba rather than conda, but if you're using conda you should just
be able to substitute 'conda' for 'mamba' in the above and remove the '-c
anaconda' bit. Greg told me this week that you can do it directly with
mamba, if you don't include the py-boost package, in which case you
wouldn't need the '-c anaconda' bit with the mamba command either. The
py-boost in mamba is included in the boost package, IIRC. I have not yet
got round to trying this.
I normally do my development on an M1 mac mini, but this set of commands
also works on a new M1 MacBook Pro and an old intel-based MBP on both
Monterey and Ventura. Your mileage may vary, obviously. People have
helpfully sent me cmake commands in the past which work for them that I
have struggled with.
Hope this helps,
Dave
On Wed, Jan 18, 2023 at 4:26 AM Jeremy Monat ***@***.***>
wrote:
> Good news: I've largely completed the coding, have some tests in place
> with unittest, and am adding more tests as I learn how to parametrize
> unittest. Bad news: I've run into problems setting up a development
> environment--
> Trouble setting up development environment
>
> Is it possible to set up a development environment for the RDKit on a
> Mac, specifically with an M2 chip? I tried to follow the instructions How
> to build from source with Conda: macOS 10.12 (Sierra): Python 3
> environment
> <https://www.rdkit.org/docs/Install.html#macos-10-12-sierra-python-3-environment>
> but this sequence of problems occurred:
>
> (rdkit-clean) ***@***.*** rdkit-clean % make
>
> /Users/myUserName/miniconda3/envs/rdkit-clean/include/boost/python/detail/wrap_python.hpp:57:11: fatal error: 'pyconfig.h' file not found
>
>
> I learned that meant that I needed to install apt
> <https://stackoverflow.com/questions/57244655/cannot-build-boost-python-library-fatal-error-pyconfig-h-no-such-file-or-dire#57244837>,
> which I tried:
>
> (rdkit-clean) ***@***.*** rdkit-clean % sudo apt install python3.10-dev
>
> The operation couldn’t be completed. Unable to locate a Java Runtime.
>
> Please visit http://www.java.com for information on installing Java
>
>
> I installed Java but then got this error:
>
> (rdkit-clean) ***@***.*** rdkit-clean % sudo apt install python3.10-dev
>
> The operation couldn’t be completed. Unable to locate a Java Runtime that supports apt.
>
> Please visit http://www.java.com for information on installing Java.
>
>
> I learned that "by default Mac does not support APT commands"
> <https://stackoverflow.com/questions/58598795/why-cant-i-run-apt-install-on-mac>.
> Is there some solution?
> Containerized development environment?
>
> Alternatively, is it possible to lower the bar for code contributions by
> simplifying this process, for example by setting up a containerized
> development environment using e.g. Docker?
>
> —
> Reply to this email directly, view it on GitHub
> <#5917 (comment)>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/ACGF2FUG3Q4R35AWBCK7KWDWS5WF3ANCNFSM6AAAAAATPKDVOQ>
> .
> You are receiving this because you are subscribed to this thread.Message
> ID: ***@***.***>
>
--
David Cosgrove
Freelance computational chemistry and chemoinformatics developer
http://cozchemix.co.uk
--
David Cosgrove
Freelance computational chemistry and chemoinformatics developer
http://cozchemix.co.uk
|
Beta Was this translation helpful? Give feedback.
-
Noting current code flow as an aid during development: graph TD;
A[MolsMatrixToGridImage]-->B[_MolsNestedToLinear];
B-->C[MolsToGridImage pre-existing];
|
Beta Was this translation helpful? Give feedback.
-
This post is to propose a two-dimensional (nested) list input option to create molecular grid images, and solicit feedback on how to proceed. I'd be happy to tackle any coding can be done in Python.
Motivation
Using
Draw.MolsToGridImage
, I have been creating functions where each row has some meaning, for example a molecule and its groups off a core, or level of fragmentation in mass spectrometry:Annotated to explain columns
Mass spectrometry fragmentation tree
Annotated to explain rows
As a conceptual example, the two-dimensional list of molecules
[[molA, molB], [molC, null_mol, molD]]
, wherenull_mol
is a molecule with no atoms (its SMILES string is''
), produces a molecular grid image like:High-level algorithm
The general procedure I have used is:
Draw.MolsToGridImage
on the one-dimensional lists, settingmolsPerRow
to the length of the longest sub-list.I think it would be useful to incorporate into RDKit a way to handle steps 2-4 automatically. You would provide a two-dimensional list (a list of sub-lists), where each sub-list represents a row, and the code would handle creating the molecular grid image. That would make it easy for users to conceive and implement a two-dimensional molecular grid image, where the row might represent anything useful to their application.
Request feedback on how to proceed
If folks agree, here are some ideas of how to proceed:
Draw.Mols2dToGridImage
, which requires a two-dimensional list and does not have amolsPerRow
parameter, but otherwise has the same inputs and output asDraw.MolsToGridImage
. The new function would perform steps 2-4 above formols
,legends
,highlightAtomLists
, andhighlightBondLists
, then callDraw.MolsToGridImage
withmolsPerRow
equal to the length of the longest sub-list, and pass through any other arguments (subImgSize
,useSVG
,returnPNG
,**kwargs
).molsPerRow
parameter can be supplied. This new function only does one thing: transform 2D input into 1D input thatDraw.MolsToGridImage
can accept.Draw.MolsToGridImage
so that it detects whether the input is a nested list. If so, it performs steps 2-4.molsPerRow
value. A user might accidentally pass in a nested list, then be confused when the RDKit formats their molecular grid image in an unexpected way. Complexity would be added to the code because there would be two types of inputs, 1D or 2D lists.I believe option 1 is better.
Beta Was this translation helpful? Give feedback.
All reactions