Replies: 1 comment 4 replies
-
Hi @bgalvao, very good question! Many of the checks in the package don't really need to run on all samples, so the first thing they do is sample the pandas DataFrame contained in the Dataset. I expect that will work with Dask, but of course it's better to test first hand. Other checks (mainly those that test data integrity, or compare datasets) use the whole data and can fail in case of a too-large dataset. It indeed may be possible to seamlessly switch from pandas to Dask (or Modin, for the option to use Ray or Dask with the same api).
Please let us know if that's something you're willing to tackle. If so, we can open an issue from this discussion. |
Beta Was this translation helpful? Give feedback.
-
Greetings! This project is amazing and I'm already experimenting writing tests for a CI/CD pipeline.
I am starting to question, what happens if the dataset is too large for Pandas? Would it be seamless to switch to Dask or Polars?
I would love to work on a Pull Request if it comes to be necessary 😁
Beta Was this translation helpful? Give feedback.
All reactions