Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GEOMESA-3259 FSDS - Add support for GeoParquet #3064

Draft
wants to merge 28 commits into
base: main
Choose a base branch
from

Conversation

adeet1
Copy link
Contributor

@adeet1 adeet1 commented Mar 20, 2024

  • Create a BoundsObserver trait, and tweak various classes and methods to use that trait
  • Add an observer to the SimpleFeatureParquetWriter and write records to it, in order to create a bounding box of all the geometries
    • Add this bounding box to the GeoParquet metadata (which requires the metadata map to be changed to a mutable data structure)
  • Read and write all geometry attributes as binary (a primitive Parquet type) instead of as a pair of x/y doubles (a group Parquet type), using the same converter and attribute writer for all geometry types, while also maintaining backwards compatibility
  • Add support for parsing WKB bytes in the Parquet geometry transformer functions
  • Use a spatial index instead of a GeoTools filter for bounding box queries

* Create a BoundsObserver trait, and tweak various classes and methods to use that trait
* Add an observer to the SimpleFeatureParquetWriter and write records to it, in order to create a bounding box of all the geometries. Add this bounding box to the GeoParquet metadata (which requires the metadata map to be changed to a mutable data structure).
* Read/write all geometry attributes in binary (a primitive Parquet type) instead of as a pair of x/y doubles (a group Parquet type), using the same converter and attribute writer for all geometry types, while also maintaining backwards compatibility
* Add support for parsing WKB bytes in the Parquet geometry transformer functions
* Exclude bounding box from the GeoTools filter and use a spatial index instead

Co-authored-by: Emilio Lahr-Vivaz <elahrvivaz@ccri.com>
@adeet1
Copy link
Contributor Author

adeet1 commented Mar 20, 2024

To-do items:

  • Make FilterConverter.spatial backwards-compatible
  • Add support for 3D geometries and bounding boxes
  • Add a unit test assert that ensures the file metadata validates against the GeoParquet metadata schema

pom.xml Outdated Show resolved Hide resolved
@adeet1 adeet1 requested a review from elahrvivaz March 22, 2024 01:09
pom.xml Outdated Show resolved Hide resolved
val observer = if (observers.isEmpty) { updateObserver } else {
new CompositeObserver(observers.map(_.apply(path)).+:(updateObserver))
val observer = if (observers.isEmpty) { updateObserver.asInstanceOf[BoundsObserver] } else {
new CompositeObserver(observers.map(_.apply(path)).+:(updateObserver)).asInstanceOf[BoundsObserver]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think ideally we could use the bounds that the writer is already calculating, so that we don't have to calculate them twice.

Copy link
Contributor Author

@adeet1 adeet1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • When we compact GeoParquet files in a filesystem partition, we need to ensure that the bounding boxes in the metadata of the files get merged correctly (i.e. assert that the union of bounding boxes of the files before compaction is equal to the union of bounding boxes of the newly compacted files).

…ss files are correctly merged upon compaction

* Write features with different geometries and coordinates, so we can test the merging of unique bounding boxes.
* This fixes a failing unit test "suppress or allow empty output files" in ExportCommandTest.scala
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants