Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fill_cdpna() causes crash in output #62

Closed
aadm opened this issue Sep 9, 2020 · 2 comments
Closed

fill_cdpna() causes crash in output #62

aadm opened this issue Sep 9, 2020 · 2 comments
Labels
bug Something isn't working help wanted Extra attention is needed

Comments

@aadm
Copy link
Contributor

aadm commented Sep 9, 2020

With an input data read from netcdf via dask, applying fill_cdpna() causes an error when writing back the file to netcdf.

One solution is to 'realize' the data and then outputting to disk, e.g.:

cube.compute().seisio.to_netcdf('cube_crop.seisnc')

But this is not applicable if the data size is very large (it would crash simply because there is not enough ram to hold the data in memory).

Example:

cube = open_seisnc(local_data+'cube.seisnc', chunks={'xline': 100, 'iline': 100})
cube.seis.fill_cdpna()
cube = cube.sel(twt=slice(3800,5500))
cube.seisio.to_netcdf('cube_crop.seisnc')

Resulting error log copied from jupyter console:

In [7]: cube.seisio.to_netcdf('cube_crop.seisnc')
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-7-0f071262ca7b> in <module>
----> 1 near.seisio.to_netcdf(local_data+'md4_near_cond_crop.seisnc')
~/miniconda3/envs/wrk/lib/python3.7/site-packages/segysak/_accessor.py in to_netcdf(self, seisnc, **kwargs)
     46         kwargs["engine"] = "h5netcdf"
     47 
---> 48         self._obj.to_netcdf(seisnc, **kwargs)
     49 
     50 
~/miniconda3/envs/wrk/lib/python3.7/site-packages/xarray/core/dataset.py in to_netcdf(self, path, mode, format, group, engine, encoding, unlimited_dims, compute, invalid_netcdf)
   1566             unlimited_dims=unlimited_dims,
   1567             compute=compute,
-> 1568             invalid_netcdf=invalid_netcdf,
   1569         )
   1570 
~/miniconda3/envs/wrk/lib/python3.7/site-packages/xarray/backends/api.py in to_netcdf(dataset, path_or_file, mode, format, group, engine, encoding, unlimited_dims, compute, multifile, invalid_netcdf)
   1080         # to be parallelized with dask
   1081         dump_to_store(
-> 1082             dataset, store, writer, encoding=encoding, unlimited_dims=unlimited_dims
   1083         )
   1084         if autoclose:
~/miniconda3/envs/wrk/lib/python3.7/site-packages/xarray/backends/api.py in dump_to_store(dataset, store, writer, encoder, encoding, unlimited_dims)
   1126         variables, attrs = encoder(variables, attrs)
   1127 
-> 1128     store.store(variables, attrs, check_encoding, writer, unlimited_dims=unlimited_dims)
   1129 
   1130 
~/miniconda3/envs/wrk/lib/python3.7/site-packages/xarray/backends/common.py in store(self, variables, attributes, check_encoding_set, writer, unlimited_dims)
    296         self.set_dimensions(variables, unlimited_dims=unlimited_dims)
    297         self.set_variables(
--> 298             variables, check_encoding_set, writer, unlimited_dims=unlimited_dims
    299         )
    300 
~/miniconda3/envs/wrk/lib/python3.7/site-packages/xarray/backends/common.py in set_variables(self, variables, check_encoding_set, writer, unlimited_dims)
    334             check = vn in check_encoding_set
    335             target, source = self.prepare_variable(
--> 336                 name, v, check, unlimited_dims=unlimited_dims
    337             )
    338 
~/miniconda3/envs/wrk/lib/python3.7/site-packages/xarray/backends/h5netcdf_.py in prepare_variable(self, name, variable, check_encoding, unlimited_dims)
    287                 dimensions=variable.dims,
    288                 fillvalue=fillvalue,
--> 289                 **kwargs,
    290             )
    291         else:
~/miniconda3/envs/wrk/lib/python3.7/site-packages/h5netcdf/core.py in create_variable(self, name, dimensions, dtype, data, fillvalue, **kwargs)
    499             group = group._require_child_group(k)
    500         return group._create_child_variable(keys[-1], dimensions, dtype, data,
--> 501                                             fillvalue, **kwargs)
    502 
    503     def _get_child(self, key):
~/miniconda3/envs/wrk/lib/python3.7/site-packages/h5netcdf/core.py in _create_child_variable(self, name, dimensions, dtype, data, fillvalue, **kwargs)
    474             h5ds = self._h5group[name]
    475             if _netcdf_dimension_but_not_variable(h5ds):
--> 476                 self._detach_dim_scale(name)
    477                 del self._h5group[name]
    478 
~/miniconda3/envs/wrk/lib/python3.7/site-packages/h5netcdf/core.py in _detach_dim_scale(self, name)
    572             for n, dim in enumerate(var.dimensions):
    573                 if dim == name:
--> 574                     var._h5ds.dims[n].detach_scale(self._all_h5groups[dim])
    575 
    576         for subgroup in self.groups.values():
~/miniconda3/envs/wrk/lib/python3.7/site-packages/h5py/_hl/dims.py in detach_scale(self, dset)
    101         """
    102         with phil:
--> 103             h5ds.detach_scale(self._id, dset.id, self._dimension)
    104 
    105     def items(self):
h5py/_objects.pyx in h5py._objects.with_phil.wrapper()
h5py/_objects.pyx in h5py._objects.with_phil.wrapper()
h5py/h5ds.pyx in h5py.h5ds.detach_scale()
h5py/defs.pyx in h5py.defs.H5DSdetach_scale()
RuntimeError: Unspecified error in H5DSdetach_scale (return value <0)
@trhallam trhallam added bug Something isn't working help wanted Extra attention is needed labels Sep 9, 2020
@trhallam
Copy link
Owner

trhallam commented Sep 9, 2020

Thanks @aadm I'm not sure if this can be fixed but I'll have a look. An alternative option for us that is probably simpler is to allow fill_cdpna() to be run during the conversion process, and then it would be implicit in the NetCDF4 file. It is a little frustrating that it is not super easy to modify parts of NetCDF4 on disk.

One option I might get you to try. Is open the file without using dask ('this won't load it straight away'). Run fill_cdpna() and then use the to_netcdf() method with mode='a' to append modified variables. Then try your cropping operation using dask.

So your example would be:

cube = open_seisnc(local_data+'cube.seisnc')
cube.seis.fill_cdpna()
cube.seis.to_netcdf(local_data+'cube.seisnc', mode='a') # Note this will modify the file so you might want to back it up as a test.

cube = open_seisnc(local_data+'cube.seisnc', chunks={'xline': 100, 'iline': 100})
cube = cube.sel(twt=slice(3800,5500))
cube.seisio.to_netcdf('cube_crop.seisnc')

@trhallam
Copy link
Owner

This has been addressed in v0.5 with lazy loading support and reversion to xarray native netcdf methods.

Note: please use the new accessor namespace method .segysak.fill_cdpna().

Note: The current release candidate has a bug but the actual release fill fix a uint downcasting issue that isn't compatible with xr.to_netcdf.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants