Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve performance for queries on sparse multipolygons #270

Open
tyrasd opened this issue Oct 26, 2020 · 3 comments
Open

improve performance for queries on sparse multipolygons #270

tyrasd opened this issue Oct 26, 2020 · 3 comments
Labels
performance Performance optimizations, bottlenecks of the current pipeline, etc. priority:low Should be quite a far way down on the agenda

Comments

@tyrasd
Copy link
Member

tyrasd commented Oct 26, 2020

Currently, when querying large, but very sparse multipolygons as areas-of-interest, it can happen that a lot of CPU time is spent inside the bbox <-> polygon testing, leading to poor performance:

image

In these situations, the current implementation of the fip package are not ideal (it assumes somewhat contiguous and not too unisotropic input polygons to work really well).

see also https://github.com/GIScience/oshdb/blob/0.5.9/oshdb-util/src/main/java/org/heigit/bigspatialdata/oshdb/util/geometry/fip/FastInPolygon.java#L72

//cc @SlowMo24

@tyrasd tyrasd added the performance Performance optimizations, bottlenecks of the current pipeline, etc. label Oct 26, 2020
@SlowMo24
Copy link
Contributor

SlowMo24 commented Oct 28, 2020

I don't know if this helps but when I had to check which polygon a geometry lies in and the polygons had 'bad bboxes' (e.g. france with DOM/TOM) created an STRTree of polygons, not Mulipolygons:

stub:

Geometry geometry = wkbReader.read(geomBytes);
for (int i = 0; i < geometry.getNumGeometries(); i++) {
    Geometry geometryN = geometry.getGeometryN(i);
    index.insert(geometryN.getEnvelopeInternal(), new Object[]{theID, geometryN});
}

depending on the cost of area or complexity calculation something like https://postgis.net/docs/ST_Subdivide.html (this surely exists in jts or geotools) on very large/complex/overlapping polygons could be used to increase performance of STRTree even further.
I then used that Index as follows within aggregateBy. The order of query might be difficult to determine:

  • search for bbox in polygon index or vice versa. There might also be a possibility to intersect two indices?

Attention, very ugly and possibly wrong code:

return index
    .query(centroid.getEnvelopeInternal())
    .parallelStream()
    .filter(entry -> ((Polygon) entry[1]).contains(geometry))
    .map(entry -> ((int) entry[0]))
    .collect(Collectors.toList());

//EDIT: correct code that was copy pasted out of context

@tyrasd tyrasd changed the title improve performance for queries sparse disjunct multipolygons improve performance for queries sparse multipolygons Oct 28, 2020
@tyrasd
Copy link
Member Author

tyrasd commented Oct 28, 2020

yeah, for such geometries an implementation like what you propose would make sense. JTS even has a built-in utility/mechanism (called PreparedGeometry) which basically does more or less exactly this. But in "normal" situations the current fip package does a better job than JTS:

image

The crucial question would be to how to decide which implementation to use in which situation? And/or if it would be not even more performant to come up with a solution which can combine both approaches for sparse multipolygons: for the example of France -> split "distinct" regions into R-tree superstructure, and use the current fip routines to perform exact geometry tests. 🤔

@SlowMo24
Copy link
Contributor

SlowMo24 commented Oct 29, 2020

great you did some benchmarking!!!
Disclaimer: I am not sure I fully understand the problem. Maybe you could add some more information to the issue-text.

Anyway I will share my thought at the risk of them being off-topic:
The problem we face is that sparse polygons in the current implementation force us to 'touch'/iterate a large number of GridOSHEntities because they lie within the bbox but not the Polygon. So even though the PreparedGeom is slower this would only be a one-time check and the actual computation would be faster as it would only 'touch' the necessary GridOSHEntities.
To my understanding the same then applies to OSHEntities within the GridOSHEntity where we can first filter based on index and bbox and then run the actual computation on objects that additionally satisfy fip .

@tyrasd tyrasd changed the title improve performance for queries sparse multipolygons improve performance for queries on sparse multipolygons Nov 9, 2020
@Hagellach37 Hagellach37 added the priority:low Should be quite a far way down on the agenda label Oct 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Performance optimizations, bottlenecks of the current pipeline, etc. priority:low Should be quite a far way down on the agenda
Projects
None yet
Development

No branches or pull requests

3 participants