New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add automatic data type inference to PandasTools.LoadSDF #7348
Comments
Hi there, import rdkit |
Thanks. But it only converted data types from object to string or category. Below was the test results:
Data Type After converting:
In this case, I would prefer converting columns like AMW and CLOGP, ... to float, and LIPINSKI_VIOLATIONS and NUM_HACCEPTORS, ... integer. |
The RDKit library's PandasTools.LoadSDF function currently lacks the ability to automatically detect the data types of columns when loading data from an SDF file into a Pandas DataFrame. Users have to manually specify the data types, which can be time-consuming and error-prone.
I propose adding an optional dtype parameter to PandasTools.LoadSDF, similar to the pd.read_csv() function in Pandas. This would allow the function to automatically infer the data types of the columns, reducing the manual effort required by the user.
It will has a few benefits:
Improved user experience and reduced errors
Increased efficiency when working with large or complex SDF files
Consistency with Pandas' pd.read_csv() function
This feature would be a valuable addition to the RDKit library, benefiting users who work with SDF data and Pandas DataFrames.
Thanks for your consideration.
The text was updated successfully, but these errors were encountered: