Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

parse_gctx takes too much memory when we inquire specific columns and rows #63

Open
Dom303 opened this issue Jan 15, 2020 · 2 comments
Open

Comments

@Dom303
Copy link

Dom303 commented Jan 15, 2020

When loading a very large gctx file ~24Gb on my laptop with 16 Gb using the function cmapPy.pandasGEXpress.parse.parse, I run out of memory with the following error:

Unable to allocate array with shape (473647,) and data type

If I use cidx to select a very low number of columns, then there is no more error.
However, when I request certain columns and certain rows, using both cidx and ridx, the same allocation error occurs. This indicates that the row filtering is applied, followed by the column filtering. This is a bad behaviour when dealing with very large cmap files, where it would be preferable that both filtering be applied simultaneously to avoid running out of RAM.

The problem comes from pandasGEXpress.parse_metadata_df, at the line curr_dset.read_direct(temp_array).
The function read_direct simply reads all the rows/columns without any means of filtering.

@dhamelse
Copy link

I am also experiencing this issue.

@mark-liddell
Copy link

This is also causing issues for me

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants