Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Interpolate to the union of the polygons from two dataframes #192

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

sjsrey
Copy link
Member

@sjsrey sjsrey commented Nov 19, 2023

  • interpolate to the union of the polygons from two dataframes
  • classify spatial relations between target geometries and a buffer

@sjsrey sjsrey added the enhancement New feature or request label Nov 19, 2023
Copy link

codecov bot commented Nov 19, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (45b9673) 54.82% compared to head (5f15663) 55.83%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #192      +/-   ##
==========================================
+ Coverage   54.82%   55.83%   +1.01%     
==========================================
  Files          15       17       +2     
  Lines         839      865      +26     
==========================================
+ Hits          460      483      +23     
- Misses        379      382       +3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@martinfleis martinfleis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not convinced by either of these changes (but surely won't block them if you want to go ahead).

area_buffer:

  • why it is called area buffer?
  • what you are trying to do there is a relate pattern. I think that this should live in geopandas and make use of sindex without a predicate for the first sift and then shapely.relate for the second, with a mapping of the resulting he DE-9IM intersection matrix to legible names.

area_faces:

  • if the whole code is a call to overlay and another to area_interpolate, is it something that is needed at all? What is the use case for it? Isn't it better to add an example notebook with that use case calling overlay and than area_interpolate?
  • if so, the name is not suggesting what it is doing at all :).

@jGaboardi
Copy link
Member

I agree with @martinfleis on all points above.

@sjsrey
Copy link
Member Author

sjsrey commented Nov 21, 2023

I am not convinced by either of these changes (but surely won't block them if you want to go ahead).

area_buffer:

  • why it is called area buffer?
  • what you are trying to do there is a relate pattern. I think that this should live in geopandas and make use of sindex without a predicate for the first sift and then shapely.relate for the second, with a mapping of the resulting he DE-9IM intersection matrix to legible names.

This is needed in my use case as I will be interpolating to those three different geometries.

area_faces:

  • if the whole code is a call to overlay and another to area_interpolate, is it something that is needed at all? What is the use case for it? Isn't it better to add an example notebook with that use case calling overlay and than area_interpolate?
  • if so, the name is not suggesting what it is doing at all :).

More of a helper function as I end up using this often. It comes up in tract harmonization research. Could be other use cases.

I'm all ears regarding better names.

@knaaptime
Copy link
Member

knaaptime commented Nov 27, 2023

how about area_buffer --> area_relate? I agree a lower level implementation in geopandas like @martinfleis suggests would probably be ideal, but until then it's quite useful (and fitting) for it to be available here.

on faces, there's probably a more descriptive name that's eluding me, but this ends up being a common pattern that we should document more thoroughly. We tend to think of areal interpolation problems as a complete change of support for the same study area. So our examples are tailored to things like moving from administrative zones to regular hexgrids, where the target is roughly exhaustive of the same study area (subject to some boundary issues, which is why we give allocate_total).

But another common case is when you have a second geometry that doesnt exhaust the study area and you want to understand the allocation of some person/thing/resource inside or outside the second geometry. So like, how much of the population with demographic characteristic X lives inside versus outside the target geometry {tax district, political district, proximity to waterfront, distance from superfund site, etc} where you know the target covers only a portion of the source. Depending on the relationship between their geoms, that can be a tougher to answer than 'select the centroids inside the buffer'.

If you haven't thought about it carefully, you might just interpolate source-->target and compare source vs target (forgetting about the double counting)--or, if you haven' thought of the overlay trick, it would be tempting to interpolate source-->target geom with allocate_total=False, then clip out the target from the source, and interpolate source-->clipped source (again remembering to allocate_total=False) and compare target vs clipped source (doing in 3 steps what you could've done in, like, 1)

serge's notebook covers this, just without quite the detailed context, so maybe we need another applied example, but i think the convenience function might be worth keeping around

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants