Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Formalisation of the Parquet spec of weights IO #699

Open
martinfleis opened this issue Apr 8, 2024 · 0 comments
Open

Formalisation of the Parquet spec of weights IO #699

martinfleis opened this issue Apr 8, 2024 · 0 comments

Comments

@martinfleis
Copy link
Member

While working on #698, I realised that there's probably a space for a formal specification of the weights exchange format based on Parquet as we have introduced in Graph. That way other projects (spdep, pygeoda, rgeoda) can write their own IO so we can avoid those horrendous space-separated text files with no formal specification.

At this moment, we expect:

  • exactly three columns focal, neighbor, weight, where weight shall be numeric
  • canonical sorting of the observations to ensure correct sparse rountripping
  • custom metadata with transformation and libpysal version- those would probably change if we want to open to other projects

So I don't expect a very long document. I think the optimal place for this discussion would be the SDSL Discord and SDSL 2024 and I am happy to lead that. Before I get into that rabbit hole, anyone has any ideas or objections we should take into account?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant