Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using datatrees to represent datasets #809

Open
abkfenris opened this issue Jan 12, 2023 · 3 comments
Open

Using datatrees to represent datasets #809

abkfenris opened this issue Jan 12, 2023 · 3 comments

Comments

@abkfenris
Copy link
Member

Is your feature request related to a problem? Please describe.

Not many forecasts are stored as init x lead, so that can add additional dataset wrangling for users or data providers are asked to store an additional copy of data.

Describe the solution you'd like

Datatree is working to create a tree-like data structure for Xarray. Datatrees can correspond to NetCDF groups or other hierarchies of datasets.

One of the ways this datatrees can be used is to collect related but non align-able datasets. I think this property could make datatree useful as datasets can be stored within a tree as they are structured on disk. Then a datatreeaccessor can be used to aggregate and reshape the underlying datasets for access and analysis.

I've started exploring using datatrees for forecasts in xarray_fmrc. I've initially modeled it off of THREDDS forecast model run collections, but I think it could support other forecast presentations like climpred's init x lead dataset structure.

I'm mainly coming about this with my data provider hat on, so input from researchers would be really nice (most of my forecast users are fishermen, sailors, surfers, and other folks on the water and around the waterfront, not scientists). There is a discussion going on the Pangeo Discourse.

Describe alternatives you've considered

Lots of individual datasets in ERDDAP, making users assemble things themselves.

Additional context

Relevant datatree links:

@aaronspring
Copy link
Collaborator

Thanks for your interest in climpred @abkfenris

We created climpred before xdatatree was introduced. It would have saved a lot of time and code for us use xdatatree instead of invention the PredictionEnsemble. As both of the core maintainers are now out of academia, we will not do large refactorings. If you are interested, feel invited to make climpred suited for your needs and we can give some guidance.

@abkfenris
Copy link
Member Author

Hi @aaronspring , I definitely understand that you're not up to refactoring it if it's not something you're using day to day. I can't really justify taking on that big of a refactoring either, but I can try to make sure it's possible to easily kick out a climpred compatible dataset from xarray_fmrc.

@aaronspring
Copy link
Collaborator

to_climpred() sounds great.

You datasets seem to have dimensions forecast_reference_time and actual valid_time with forecast_period as coordinate. One dimensional dimensions can easily be swapped, see similar https://climpred.readthedocs.io/en/stable/api/climpred.utils.convert_init_lead_to_valid_time_lead.html#climpred.utils.convert_init_lead_to_valid_time_lead

climpred also works with these standard_names in the attributes and renames dimensions automatically. Also lead in units ns is converted by climpred internally.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants