-
-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DAS-2067 - Migrate datatree io.py and common.py #9011
base: main
Are you sure you want to change the base?
Conversation
@@ -347,6 +347,34 @@ def _ipython_key_completions_(self) -> list[str]: | |||
return list(items) | |||
|
|||
|
|||
class TreeAttrAccessMixin(AttrAccessMixin): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So here's the biggest question I have... is this okay for now? (Maybe this is directed at @TomNicholas)
The use of __slots__
is a little new to me (but makes sense from reading up on it). I think to adopt __slots__
and remove __dict__
I would need to slightly rework DataTree
, TreeNode
and NamedNode
. I wanted to check that the scope mainly seems to be self.children
and self.parent
attributes, and understand why those classes were implemented as they currently are before trying to alter things.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is going to embarrass me, but why did you stop calling init_subclass on the Parent object? everything else with this merge seems reasonable to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. I accidentally commented out that last line. Will fix!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, my last comment was true, but... uncommenting out that line means that the same method in the parent class (AttrAccessMixin
) gets called by super
in addition to this method, right? That would mean we end up calling the validation logic that this child method is trying to avoid for now (checking that __dict__
is not present on the class).
Now if only I'd said that the first time around 😉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just comments for now
@@ -347,6 +347,34 @@ def _ipython_key_completions_(self) -> list[str]: | |||
return list(items) | |||
|
|||
|
|||
class TreeAttrAccessMixin(AttrAccessMixin): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is going to embarrass me, but why did you stop calling init_subclass on the Parent object? everything else with this merge seems reasonable to me.
if kwargs.get("format", None) not in [None, "NETCDF4"]: | ||
raise ValueError("to_netcdf only supports the NETCDF4 format") | ||
|
||
engine = kwargs.get("engine", None) | ||
if engine not in [None, "netcdf4", "h5netcdf"]: | ||
if engine not in [None, *get_args(T_DataTreeNetcdfEngine)]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No stopper but I'm not sure I like the kwargs.get
-style here, engine
wont be typed and might hide typing issues downstream. Is engine
correctly narrowed from Any
after this if-case?
If we want good typing coverage it's better to prefer keyword-only arguments (with default values) rather than using kwargs.get
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting - yeah. I took a quick look at to_netcdf
in xarray/backends/api.py, and it looks like making these explicitly named with defaults would be consistent with that, too.
Looks like the potential list of arguments is: format
, engine
, group
and compute
(with group
and compute
for _datatree_to_zarr
). These changes would also presumably require similar updates to DataTree.to_netcdf and DataTree.to_zarr(https://github.com/pydata/xarray/blob/main/xarray/core/datatree.py#L1516)
.
Just to check @TomNicholas - am I missing a reason for keeping the kwargs.get
implementation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't see a valid reason not to replace them, but I'm not 100% familiar.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea of making them consistent with the other to_netcdf
functions in xarray/backends/api.py
. I'll make this change and push it up as I get chance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to check @TomNicholas - am I missing a reason for keeping the
kwargs.get
implementation?
Not that I can think of.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've made these changes in this commit
@@ -347,6 +347,34 @@ def _ipython_key_completions_(self) -> list[str]: | |||
return list(items) | |||
|
|||
|
|||
class TreeAttrAccessMixin(AttrAccessMixin): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this class necessary? Can't DataTree
inherit directly from AttrAccessMixin
? And override _attr_sources
and so on?
I think the reason this class existed separately was just to do with hacking around limitations of datatree being in a separate repository...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I initially tried to use AttrAccessMixin
. Just dropping it in as a replacement for TreeAttrAccessMixin
doesn't work because of the checking done inside the AttrAccessMixin.__init_subclass__
method. The check for if __dict__
exists fails. DataTree
, TreeNode
and NamedNode
do all define __slots__
, but I think there are attributes being defined in places like DataTree.__init__
, for example DataTree.parent
and DataTree.children
(versus the DataTree._parent
and DataTree._children
attributes defined in DataTree.__slots__
).
I think we talked about this a little bit in one of our calls - things like DataTree.parent
and DataTree.children
seemed like tricky things to change over.
if kwargs.get("format", None) not in [None, "NETCDF4"]: | ||
raise ValueError("to_netcdf only supports the NETCDF4 format") | ||
|
||
engine = kwargs.get("engine", None) | ||
if engine not in [None, "netcdf4", "h5netcdf"]: | ||
if engine not in [None, *get_args(T_DataTreeNetcdfEngine)]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to check @TomNicholas - am I missing a reason for keeping the
kwargs.get
implementation?
Not that I can think of.
This PR migrates what I believe is the last actual source code out of the
xarray/datatree_/datatree
directory intoxarray/core
. This is another piece of #8572.I made a couple of decisions that I'd love some feedback/extra eyes on:
xarray/datatree_/datatree/io.py
into their own module, rather thanxarray/backends/api.py
. It felt like these were a little separate, but I can definitely be convinced otherwise.TreeAttrAccessMixin
. I really just wanted to swap over to usingAttrAccessMixin
, but I ran into issues with__slots__
. Looking at it, I think I would need to work on theDataTree
,TreeNode
andNamedNode
classes. I can go down that path, but thought I'd put up this PR for discussion before beginning that effort.Tests addedwhats-new.rst
New functions/methods are listed inapi.rst