improve performance for queries on sparse multipolygons #270

tyrasd · 2020-10-26T13:12:13Z

Currently, when querying large, but very sparse multipolygons as areas-of-interest, it can happen that a lot of CPU time is spent inside the bbox <-> polygon testing, leading to poor performance:

In these situations, the current implementation of the fip package are not ideal (it assumes somewhat contiguous and not too unisotropic input polygons to work really well).

see also https://github.com/GIScience/oshdb/blob/0.5.9/oshdb-util/src/main/java/org/heigit/bigspatialdata/oshdb/util/geometry/fip/FastInPolygon.java#L72

//cc @SlowMo24

The text was updated successfully, but these errors were encountered:

SlowMo24 · 2020-10-28T09:18:02Z

I don't know if this helps but when I had to check which polygon a geometry lies in and the polygons had 'bad bboxes' (e.g. france with DOM/TOM) created an STRTree of polygons, not Mulipolygons:

stub:

Geometry geometry = wkbReader.read(geomBytes);
for (int i = 0; i < geometry.getNumGeometries(); i++) {
    Geometry geometryN = geometry.getGeometryN(i);
    index.insert(geometryN.getEnvelopeInternal(), new Object[]{theID, geometryN});
}

depending on the cost of area or complexity calculation something like https://postgis.net/docs/ST_Subdivide.html (this surely exists in jts or geotools) on very large/complex/overlapping polygons could be used to increase performance of STRTree even further.
I then used that Index as follows within aggregateBy. The order of query might be difficult to determine:

search for bbox in polygon index or vice versa. There might also be a possibility to intersect two indices?

Attention, very ugly and possibly wrong code:

return index
    .query(centroid.getEnvelopeInternal())
    .parallelStream()
    .filter(entry -> ((Polygon) entry[1]).contains(geometry))
    .map(entry -> ((int) entry[0]))
    .collect(Collectors.toList());

//EDIT: correct code that was copy pasted out of context

tyrasd · 2020-10-28T17:36:17Z

yeah, for such geometries an implementation like what you propose would make sense. JTS even has a built-in utility/mechanism (called PreparedGeometry) which basically does more or less exactly this. But in "normal" situations the current fip package does a better job than JTS:

The crucial question would be to how to decide which implementation to use in which situation? And/or if it would be not even more performant to come up with a solution which can combine both approaches for sparse multipolygons: for the example of France -> split "distinct" regions into R-tree superstructure, and use the current fip routines to perform exact geometry tests. 🤔

SlowMo24 · 2020-10-29T11:14:30Z

great you did some benchmarking!!!
Disclaimer: I am not sure I fully understand the problem. Maybe you could add some more information to the issue-text.

Anyway I will share my thought at the risk of them being off-topic:
The problem we face is that sparse polygons in the current implementation force us to 'touch'/iterate a large number of GridOSHEntities because they lie within the bbox but not the Polygon. So even though the PreparedGeom is slower this would only be a one-time check and the actual computation would be faster as it would only 'touch' the necessary GridOSHEntities.
To my understanding the same then applies to OSHEntities within the GridOSHEntity where we can first filter based on index and bbox and then run the actual computation on objects that additionally satisfy fip .

tyrasd added the performance Performance optimizations, bottlenecks of the current pipeline, etc. label Oct 26, 2020

tyrasd changed the title ~~improve performance for queries sparse disjunct multipolygons~~ improve performance for queries sparse multipolygons Oct 28, 2020

tyrasd changed the title ~~improve performance for queries sparse multipolygons~~ improve performance for queries on sparse multipolygons Nov 9, 2020

Hagellach37 added the priority:low Should be quite a far way down on the agenda label Oct 20, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

improve performance for queries on sparse multipolygons #270

improve performance for queries on sparse multipolygons #270

tyrasd commented Oct 26, 2020

SlowMo24 commented Oct 28, 2020 •

edited

tyrasd commented Oct 28, 2020

SlowMo24 commented Oct 29, 2020 •

edited

improve performance for queries on sparse multipolygons #270

improve performance for queries on sparse multipolygons #270

Comments

tyrasd commented Oct 26, 2020

SlowMo24 commented Oct 28, 2020 • edited

tyrasd commented Oct 28, 2020

SlowMo24 commented Oct 29, 2020 • edited

SlowMo24 commented Oct 28, 2020 •

edited

SlowMo24 commented Oct 29, 2020 •

edited