Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not quite geoparquet? #379

Open
cholmes opened this issue Oct 5, 2023 · 2 comments
Open

Not quite geoparquet? #379

cholmes opened this issue Oct 5, 2023 · 2 comments

Comments

@cholmes
Copy link

cholmes commented Oct 5, 2023

Hey all, I love your support of GeoParquet (and the project in general), but looking at the files on https://open.quiltdata.com/b/spatial-ucr/tree/ they're not actually valid GeoParquet. This is clearly because you made them all before we had the spec established, which has some required metadata to communicate that it's geoparquet.

You can check this easily with the GPQ tool:

% gpq describe 01.parquet 
╭───────────────┬────────┬────────────┬────────────┬─────────────╮
│ COLUMN        │ TYPE   │ ANNOTATION │ REPETITION │ COMPRESSION │
├───────────────┼────────┼────────────┼────────────┼─────────────┤
│ geoid         │ binary │ string     │ 0..1       │ snappy      │
│ population    │ int64  │            │ 0..1       │ snappy      │
│ housing_units │ int64  │            │ 0..1       │ snappy      │
│ geometry      │ binary │            │ 0..1       │ snappy      │
├───────────────┼────────┴────────────┴────────────┴─────────────┤
│ Rows          │ 252266                                         │
│ Row Groups    │ 1                                              │
╰───────────────┴────────────────────────────────────────────────╯
 ⚠️  Not a valid GeoParquet file (invalid "geo" metadata). Run describe with the --metadata-only flag to see the "geo" metadata value. Run validate for more detail on validation issues.

And you can use that to easily convert to be valid, since it looks like most of your data is Parquet Geospatial Compatible data:

% gpq convert 01.parquet 01-converted.parquet
% gpq describe 01-converted.parquet 
╭────────────────────┬────────┬────────────┬────────────┬─────────────┬──────────┬────────────────┬────────┬────────╮
│ COLUMN             │ TYPE   │ ANNOTATION │ REPETITION │ COMPRESSION │ ENCODING │ GEOMETRY TYPES │ BOUNDS │ DETAIL │
├────────────────────┼────────┼────────────┼────────────┼─────────────┼──────────┼────────────────┼────────┼────────┤
│ geoid              │ binary │ string     │ 0..1       │ zstd        │          │                │        │        │
│ population         │ int64  │            │ 0..1       │ zstd        │          │                │        │        │
│ housing_units      │ int64  │            │ 0..1       │ zstd        │          │                │        │        │
│ geometry           │ binary │            │ 0..1       │ zstd        │ WKB      │                │        │        │
├────────────────────┼────────┴────────────┴────────────┴─────────────┴──────────┴────────────────┴────────┴────────┤
│ Rows               │ 252266                                                                                       │
│ Row Groups         │ 1                                                                                            │
│ GeoParquet Version │ 1.0.0                                                                                        │
╰────────────────────┴──────────────────────────────────────────────────────────────────────────────────────────────╯ 

If you use the latest geopandas to_parquet() method it'll also write out valid GeoParquet. So if you have scripts that you used to make the original files you can probably just run them again. I'm not totally sure if the latest GeoPandas release yet writes out GeoParquet 1.0.0, but it'll do 1.0.0-beta.1 and then be easily read (could also wait for it to support 1.0.0). This will allow users to use GDAL/OGR or drag and drop into QGIS.

I'd be happy to help convert at least some of the data into GeoParquet. I've been putting up lots of data on https://beta.source.coop/ and would be happy to help get you an account there, though quilt looks pretty cool too.

@knaaptime
Copy link
Member

hey @cholmes, thanks for this!!

😆 yes, most of the files are not quite geoparquet because we were rolling our own before the format existed. Thanks for all your (collective) work finalizing the spec. The files in the spatial-ucr bucket are scattered actually... I think most of them should actually be proper geoparquet--at least some version of the spec. Most (i.e. everything in acs/ or blocks_2020/ were created using the to_parquet function, but the earlier tracts_ files were pure geom-->wkb so missing all the metadata)

Its been on my todo list for awhile to convert these, but since everything still works, it hasnt bubbled up the priority list. Obviously thats not ideal :P. The tools i'm using now (while ugly code) pull from the census ftp and convert to (proper) geoparquet, so moving forward everything should be ok, its just a matter of grabbing those legacy files.

I didnt know anything about beta.source.coop but it looks sweet. Would love some help converting and pushing up over there if you're interested. Quilt has been lovely but always happy to share open data on more platforms

@cholmes
Copy link
Author

cholmes commented Oct 5, 2023

Awesome! Yeah, I'd be more than happy to help convert and push up. And would be great to just get some of the data that you already put in geoparquet over there too - lots of great foundational layer you have there.

Shoot me an email to cholmes [at] 9eo.org and we can get you set up with an account on source.coop that we both should be able to push to.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants