Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add documentation/examples for new data loaders and help with use case #52

Open
djhoese opened this issue Feb 24, 2022 · 0 comments
Open

Comments

@djhoese
Copy link

djhoese commented Feb 24, 2022

@jhamman just presented on some updates to xbatcher including the new data loader interfaces from #25. I tried to find a documented way of using it and I don't see one. If some could be added that would be great because I've been helping some people at my work use Satpy to prepare data for their machine learning projects and I think the data loader could be a nice optimization. Their preparation work has always ended with saving to NetCDF or zarr. My understanding of these interfaces in xbatcher is that that saving to disk step shouldn't be needed (except for future caching functionality). Is that correct?

The psuedo-code of the most recent project I helped looks something like this:

dates_of_interest = [...]
geographic_bounds_of_interest = [...]

for dt in dates_of_interest:
    abi_filenames = get_goes16_abi_filenames(dt)
    scn = satpy.Scene(reader='abi_l1b', filenames=abi_filenames)
    scn.load(channels_of_interest)

    for bbox in geographic_bounds_of_interest:
        cropped_scn = scn.crop(xy_bbox=bbox)
        cropped_scn.save_datasets(filename="some_bbox_specific_file.nc")

And then they do their ML work based on those NetCDF files. Satpy is all xarray[dask]-based and the actual code for the above does a lot of client.map work (distributed's Client) to do the individual pieces. I can't speak for the researcher I'm helping, but I think if there is an optimization step here by using a data loader to give these "patches" (that's what they call them) of data to pytorch/tensorflow without needing to save to NetCDF that would be a really good example for a certain NASA project we're a part of.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants