Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

read_cdf/TimeSeries crashes if CDF variable doesn't have a 'FILLVAL' attribute #7565

Open
jgieseler opened this issue Apr 9, 2024 · 1 comment
Labels
Affects Dev An issue/bug that affects master Affects Release An issue/bug that affects a released version (use a version label too) Bug Probably a bug Effort Low Requires a small time investment Good First Issue The best issues for new people to tackle! io/CDF Issue with CDF files io Affects the io submodule Package Novice Requires little knowledge of the internal structure of SunPy Priority High Rapid action required timeseries Affects the timeseries submodule

Comments

@jgieseler
Copy link
Member

Describe the bug

Similar to #5907, I stumbled upon CDF files that aren't fully complying to the CDF ISTP Metadata Guidelines by having a variable that doesn't provide a corresponding FILLVAL attribute. read_cdf() is expecting this attribute to exist, and otherwise crashes (and with it also TimeSeries):

sunpy/sunpy/io/_cdf.py

Lines 83 to 87 in 9f0a37d

# Set fillval values to NaN
# It would be nice to properley mask these values to work with
# non-floating point (ie. int) dtypes, but this is not possible with pandas
if np.issubdtype(data.dtype, np.floating):
data[data == attrs['FILLVAL']] = np.nan

But as has been mentioned in #5907 (comment), it's not uncommon that a CDF file is not fully following the ISTP standard (these are by the way recent PSP data files, so I'll anyhow contact the instrument team). If the FILLVAL information isn't provided by the instrument team, one could just skip the replacement call. So I propose to change if np.issubdtype(data.dtype, np.floating): to something like if np.issubdtype(data.dtype, np.floating) and 'FILLVAL' in attrs:

To Reproduce

Because the error message occurs in read_cdf when using TimeSeries, this is what I'm calling here directly:

import datetime as dt
from sunpy.io._cdf import read_cdf
from sunpy.net import Fido
from sunpy.net import attrs as a
from sunpy.timeseries import TimeSeries

dataset = 'PSP_SWP_SPI_SF00_L3_MOM'
trange = a.Time(dt.date(2023,3,14), dt.date(2023,3,15))
result = Fido.search(trange, a.cdaweb.Dataset(dataset))
downloaded_files = Fido.fetch(result)
# ts = TimeSeries(downloaded_files, concatenate=True)
data = read_cdf(downloaded_files.data[0])

Screenshots

No response

System Details

  • sunpy version: 5.1.1 and 5.1.2

Installation method

pip

@jgieseler
Copy link
Member Author

jgieseler commented Apr 9, 2024

I'm just noticing that the corresponding CDF variables have as the VAR_TYPE attribute 'ignore_data'. According to the ISTP Guidelines this is a placeholder, but I'm wondering if the instrument team actually had the intention that these variables should be skipped. 🤔

@nabobalis nabobalis added Bug Probably a bug Package Novice Requires little knowledge of the internal structure of SunPy Affects Dev An issue/bug that affects master Affects Release An issue/bug that affects a released version (use a version label too) Priority High Rapid action required Effort Low Requires a small time investment io Affects the io submodule timeseries Affects the timeseries submodule Good First Issue The best issues for new people to tackle! io/CDF Issue with CDF files labels Apr 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Affects Dev An issue/bug that affects master Affects Release An issue/bug that affects a released version (use a version label too) Bug Probably a bug Effort Low Requires a small time investment Good First Issue The best issues for new people to tackle! io/CDF Issue with CDF files io Affects the io submodule Package Novice Requires little knowledge of the internal structure of SunPy Priority High Rapid action required timeseries Affects the timeseries submodule
Projects
None yet
Development

No branches or pull requests

2 participants