Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Idea: include a bloom filter in sparse array MBRs #2375

Open
gatesn opened this issue Jun 21, 2021 · 1 comment
Open

Idea: include a bloom filter in sparse array MBRs #2375

gatesn opened this issue Jun 21, 2021 · 1 comment

Comments

@gatesn
Copy link

gatesn commented Jun 21, 2021

My own motivation for this comes from modelling labelled dimensions with dictionary encoding. e.g. I have labels A: 0, B: 1, C: 2. When slicing an array for label B any fragment/tile that includes labels A and C is considered relevant.

I understand there may be discussions/thoughts on supporting labelled dimensions in a first-class way and therefore not sure if this idea is generally applicable beyond this use-case, though I suspect it is given its support in other formats, e.g. Parquet: https://github.com/apache/parquet-format/blob/master/BloomFilter.md. One other use-case that does come to mind is var-sized string/byte dimensions.

It also might first make sense to generalise the on-disk fragment metadata format to allow for arbitrary extensions to "metadata" (bloom filter, value sets, other dim statistics). This would make it easier to add additional metadata in the future, as well as enabling forward-compatibility such that old readers can still read files from newer writers by just ignoring any metadata feature that they don't support.

@stavrospapadopoulos
Copy link
Member

@gatesn this is already in our roadmap, along with dictionary compression, RLE compression for strings and min/max values for attribute tiles. We hope to implement those soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants