Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataset filtering support for Iceberg table #2029

Open
PaoloLeonard opened this issue Mar 6, 2024 · 2 comments
Open

Dataset filtering support for Iceberg table #2029

PaoloLeonard opened this issue Mar 6, 2024 · 2 comments

Comments

@PaoloLeonard
Copy link
Contributor

Hello everyone,

The current implementation of dataset filtering in Soda Core relies heavily on the presence of timestamp columns to perform time-based data filtering. However, with the growing adoption of Iceberg tables, it's become evident that not all Iceberg tables utilise a timestamp column for partitioning or time-based queries.
This limitation poses challenges for people leveraging Iceberg tables in their data lakes, as they might encounter difficulties in using Soda Core's dataset filtering features effectively.

Would it be possible to support dataset filtering for Iceberg tables to Soda Core?

Thanks!

@tools-soda
Copy link

SAS-2997

@benjamin-pirotte
Copy link

Hi Paolo, I am not accustomed to Iceberg tables, could you let me know why dataset filters would not work on such tables? What would be the mechanism for partitioning in case there is no timestamp column?

Note that you can use dataset filters on any column type, it doesn't have to be a timestamp. It allows a SQL WHERE clause which makes it quite flexible.

Thanks a lot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants