Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TieredSFCIndexStrategy configurable overInclusiveOnEdge #1816

Open
ptbannister opened this issue Jun 3, 2021 · 2 comments
Open

TieredSFCIndexStrategy configurable overInclusiveOnEdge #1816

ptbannister opened this issue Jun 3, 2021 · 2 comments

Comments

@ptbannister
Copy link

TieredSFCIndexStrategy calculates query ranges that are always overinclusive on the edges. This behavior is good for some use cases, such as asking for data that intersects the queried data. But for other use cases, such as asking for data that contains the queried data, it searches an excessive amount of data.

For a good example of when we would want to parameterize whether we should be overinclusive on edges, consider the square between coordinates (0,0) and (1,1), with a TieredSFCIndexStrategy using the Hilbert curve. On the tier with two bits of precision, we would expect the query range from 0202 to 0202, but because the strategy is hardwired to be overinclusive on edges, we get the range from 0200 to 0203, searching the whole planet for data. At three bits of precision, we would expect the query range from 0308 to 0308, but instead we get a range that includes 0302, 0307, and 030d as well as the expected 0308.

There are at least two possible ways this could be improved. One would be to make the index strategy configurable, so its behavior is either always overinclusive, or always not overinclusive; this would involve adding a parameter to the constructor. Another would be to add overInclusiveOnEdges as a parameter of getQueryRanges, so it could be determined at query time what kind of search would be most efficient.

@rfecher
Copy link
Contributor

rfecher commented Jun 14, 2021

if I'm following correctly, the issue is the edge is always rounded and the nature of querying is that you don't a piori know whats stored (ie. you have to be fully inclusive of the query geometry). So an inserted point exactly at (0,0) would be 0202 and an inserted point exactly at (1,1) would be 0203 and if you search a bbox of 0,0 though 1,1 you would have to return both points so your ranges would have to include 0202 and 0203. We do a range decomposition to not strictly use a single range from min Hilbert Value to max Hilbert value but a more fine-grained set to represent the query geometry (ref: https://locationtech.github.io/geowave/latest/devguide.html#theorydecomposition).

Basically on ingest you always want to give exact hilbert values for points and not be over-inclusive for ranges (bboxes) but then on query you need to be over-inclusive on the edge to include points on each of the edges of a bbox or you can miss results (unless you're doing a contains relationship instead of an intersection, but most users expect an intersection relationship on bbox queries).

@ptbannister
Copy link
Author

Yes, exactly.

If the most simple approach is acceptable (making the index strategy configurable for overInclusiveOnEdges) then we already have a solution that we could contribute.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants