Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using GeoWave for 3D Point Cloud Data #1799

Open
jay-sridharan opened this issue Mar 14, 2021 · 3 comments
Open

Using GeoWave for 3D Point Cloud Data #1799

jay-sridharan opened this issue Mar 14, 2021 · 3 comments

Comments

@jay-sridharan
Copy link

Hello!

I've read a bit through your documentation and looked at the source code a bit, and it seems as though you support 2D spatial data indexing, and (2D + time) spatiotemporal indexing out-of-box. Is there any support for encoding and decoding a 3D point cloud? And for specifying a custom reference-frame / bounding limits for the SFC?

In my use case, I'd like to store a 3D point cloud that I know will exist within a 600x600x600 cube, with the maximum precision possible.

Could you please give me the 2 minute explanation on what I would need to set up to achieve such a thing?

@rfecher
Copy link
Contributor

rfecher commented Mar 15, 2021

actually this should be very straightforward with GeoWave and in fact seems similar to something I had built an experiment for to measure performance impacts in "indexing" a dimension (using it within an SFC as part of the key) vs. simply encoding the dimension and filtering with a post-process. It obviously is a major performance benefit to use the multi-dimensional SFC to have effective range scans, but the goal there was to measure it in practice on HBase. The source code for that experiment is here although it is based on an older version of GeoWave, it should be fairly straightforward to update it to the latest GeoWave and the results are shown in Advances in Spatial and Temporal Databases 2017.

If you don't use time in your data, you can remove the "T" dimension from that example. You'll notice it is added to each index in the "Index Model" to be serialized despite it not being in the SFC portion of each index.

Also, each "dimension" just uses the default constructor for BasicDimensionDefinition which uses a min of 0 and a max of 1. Based on my understanding you'd want to use super(0,600); on for example this line so that dimension's range is (0,600).

@jay-sridharan
Copy link
Author

jay-sridharan commented Mar 15, 2021

Gotcha - thank you for the quick response! I think I'd want to experiment with the various SFCs you currently support. As you mention in the paper, the Hilbert curves offer better locality, but the query algorithm is more complex and may take more time. Do all of the curve implementations and their respective query algorithms support N dimensions?

Also, I was just taking a look at your Z-order curve implementation. Have you considered implementing a region splitting algorithm? That might help performance if you split the region along quadrant boundaries to prevent accessing too much data outside the bounding box. Or perhaps that's already rolled into the implementation by way of the Quad-tree? I'm not too familiar with the code-base, so I apologize if I'm misunderstanding how this works.

Can be split to:

Anyway, this is an awesome project, and quite involved! Would love to see if you know of anyone else using GeoWave for point cloud storage.

@rfecher
Copy link
Contributor

rfecher commented Mar 16, 2021

Our SFC algorithm is plug-and-play so you can swap in your own (and contributions that improve the project are welcome), however our Z-Order SFC implementation is not battle-tested as we always preference the compact Hilbert curve. The curve decomposition that occurs on query, while less intuitive than simple Z-Order is not a performance concern from my benchmarking. Particularly in a system expecting many results and each row needs to be deserialized, this deserialization costs dramatically more than SFC decomposition, but generally speaking decomposition using primitives is fast - the only case where it takes time is if the precision requires use of BigInteger and BigDecimal math (which by default we avoid).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants