Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RGeo 3.0 Roadmap #247

Closed
10 of 12 tasks
BuonOmo opened this issue Feb 4, 2021 · 7 comments
Closed
10 of 12 tasks

RGeo 3.0 Roadmap #247

BuonOmo opened this issue Feb 4, 2021 · 7 comments

Comments

@BuonOmo
Copy link
Member

BuonOmo commented Feb 4, 2021

Summary

We want to simplify the overall interface, being more ruby-stylish (a.k.a simple). Refactor code to improve maintainability. Here are the main axis on which we want to focus for the next major:

  • Loose Invalidity handling for geometries
  • CAPI Refactoring
  • Better GeoJSON integration
  • Consistency and clarification of interface

Loose Invalidity handling for geometries

This is currently the most critical item. We want to always have loose invalidity handling and never raise on creation, which will allow us to get rid of uses_lenient_assertions. This will have the benefit of always raise when a method cannot be use while not making usage cumbersome. For instance, one may want to parse a polygon even if it is not simple, however, we should not give him a bad area computation because it was not verified.

Work on that topic has already been started in the https://github.com/rgeo/rgeo/tree/fix/correct-area-when-invalid-polygon branch. However, there are still performance issues that should be investigated.

CAPI Refactoring

Better GeoJSON integration

GeoJSON is one of the most used way to parse data into RGeo::Geometry object, yet it is cumbersome to use.

Consistency and clarification of interface

The last item here is to be discussed: we do not want to add something that may make the interface more complex. If you read until here, Please mark your interest for this feature. It may be the case that this is only useful for us maintainers when creating examples and tests.

Proj Defs

Modus operandi

The main direction may be discussed here. If one want to tackle on of the roadmap items, please open an issue first to indicate that you are in charge and to discuss further how you want to do it if need be.

We will also reference issues with breaking changes with the 3.0 label and merge pull requests in the https://github.com/rgeo/rgeo/tree/v3-dev branch.

@keithdoggett
Copy link
Member

Another thing we should work on is an update to the spherical_factory, specifically the SphericalMath module. We had previously discussed this but did not make a list of options/tasks.

The main issue is that small features may be improperly considered invalid (#212) and is due to floating-point issues. I think the first step for this would be to research some techniques or examples of this working well elsewhere.

@BuonOmo
Copy link
Member Author

BuonOmo commented Apr 19, 2021

Working on loose geometry validation. I’m really trying to figure out what path we should take. Either:

  • raise on invalid at geometry creation
  • raise on invalid at method usage
  • allow some methods to give unexpected results.

In rgeo actually, we have inconsistent behavior (either validation early, either nothing).

To explain further the differences, here’s a polygon i’ve been working on and testing with both turf (js) and shapely (python) my polygon area is 0.

In our case, it would be 0 as well if computed with those coordinates.

If we had only the case of polygons and areas, we could check if the polygon is simple on creation and store it in the data pointer. Whenever the area is to be computed we would give the correct area if polygon is simple or raise otherwise.

However, there are plenty of reasons a geometry can be invalid, and for each geometry plenty of methods that need different validity.

Hence we need a more canonical approach if we want to maintain correct performances (timewise and spacewise).

If we check for validity, every geos method should work. Therefore we would solely need to add a validity flag to the geometry (int size) and check once (either on method call or geometry loading). This way we keep perfs and make sure a user never gets either nil on creation or a bizarre result for a geometry computation.

The complete way, much more complex, would be to have flags for each different kinds of validity (there are 11 variations in geos). Then we could have a finer, per method check. This allows user to do much more (for instance, the bowtie polygon could give a correct area). But there are a few issues:

  • this is done under the hood, and a user may expect the 0 value for the area (since most known repositories do so)
  • maintainability may go wild. We’ll have a lot of combinaison, all to be maintained correctly. And this may change per geos version…
  • we will need to either recompute the valid polygon every time (timewise complexity would be huge) or store it in the data, which would double space taken by each geometry (pointer to geos is most of the space taken by the geometry)

Considering both options, I think I would go for the first one, with error only on method call. Allowing users to use the lenient assertion flag (maybe renamed to check_validity, or something more scary to make sure user knows what he is doing). And giving this API for users to check if they still want the result with the valid geometry:

polygon = RGeo::Geos.factory.polygon(...)
polygon.area #(would raise an error with `polygon.invalid_reason` as its message if geometry is not valid)
polygon.area! # does not check for validity, gives 0 for a bowtie
polygon.valid?
polygon.invalid_reason # as it is today, maybe separating the location from the message to allow user switching on the result.
polygon.make_valid! # change the current geometry to a valid one (and return it for chaining)
polygon.make_valid # `dup.make_valid!` to avoid mutation.

Most of this api could be done in ruby to keep our c extension simple. We would just need to make sure we do still have correct perfs.

EDIT: There are two points I've not covered with this API, (1.) methods calling other geometries and (2.) methods ending with a ?

  1. If we do not append a !, this checks ALL geometries, self and every others given in parameters
  2. contains?! is not possible in ruby, I'd say maybe unsafe_contains? ?

@keithdoggett
Copy link
Member

@BuonOmo thanks for the great write-up here.

I think the conclusion you came to about this is correct in that we should:

  • Add a validity flag to the geometry classes
  • Compute that validity flag once needed
  • Raise an error on methods that require validity if any geometry used is invalid
  • Add a set of unsafe methods that will not check for validity (indicated by ! or unsafe_).

I think we have 2 more things to consider:

  1. If we allow unsafe operations we need to make sure that errors are properly handled in GEOS. Per the example in Strange polygon buffer behavior on RGeo 2.1.1 #252 we see that attempting intersections with an invalid geometry does not work and GEOS reports a TopologyError. We need to make sure that this will still be properly handled and add docs making sure the users know that just because they use unsafe_ operations doesn't mean they'll always get an answer.

  2. I think an important part of this update will be having consistency across all factories. Is there a way that we could move a lot of this logic to the abstract factory/geometry implementations so that this is handled a little more seamlessly across all implementations?

@BuonOmo
Copy link
Member Author

BuonOmo commented May 1, 2021

@keithdoggett I've made an example of what it could look like:

class Feature
  OGC_METHODS = %w(contains? intersection)

  def check_validity!
    @invalid_reason ||= invalid_reason
    return unless @invalid_reason

    raise RGeo::GeometryError, @invalid_reason
  end

  OGC_METHODS.each do |method_sym|
    copy = alias_method "unsafe_#{method_sym}".to_sym, method_sym
    undef_method method_sym
    define_method(method_sym) do |*args|
      check_validity!
      copy.call(*args)
    end
  end
end

So this would be handled across all factories. There is still one pitfall with this flag, if the geometry changes, it needs to be reset for each method that updates a geometry. I think this is OK, but I'll need to have a look at the codebase to be sure. Any opinion on that ?

Also, next message I'll open a PR dedicated to the subject!

@keithdoggett
Copy link
Member

@BuonOmo this looks good. As far as I know, none of the methods mutate any geometries, but we should still give it a check. I don't think it makes sense to mutate geometries in general and the benefits of memoizing invalid_reason outweigh the risks of it being incorrect after doing something that wouldn't be considered "best practice".

@BuonOmo
Copy link
Member Author

BuonOmo commented Aug 30, 2021

@keithdoggett FYI, I've removed the help wanted tag: if one want to help, they can just search through the created issues!

@keithdoggett
Copy link
Member

Closing since V3 has been released

@keithdoggett keithdoggett unpinned this issue Mar 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants