Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: iris pp treats standard calendar as a proleptic gregorian and ignores real proleptic gregorian calendars #3561

Open
cpelley opened this issue Nov 26, 2019 · 23 comments · May be fixed by #5138

Comments

@cpelley
Copy link

cpelley commented Nov 26, 2019

if time_coord is not None:
if time_coord.units.calendar == "360_day":
pp.lbtim.ic = 2
elif time_coord.units.calendar == "standard":
pp.lbtim.ic = 1
elif time_coord.units.calendar == "365_day":
pp.lbtim.ic = 4

Standard calendar cannot be treated as a Proleptic Gregorian for dates 1582-01-01
More context here: ancil/ticket #1159

UM F03 more recently clarified that the calendar it expects isn't Standard but a Proleptic Gregorian. Not only is iris pp save ignoring time coordinates with proleptic_gregorian calendars entirely, it also allows saving Standard dates prior to 1582-01-01 (as though they are proleptic gregorian dates when they are not).

@github-actions
Copy link
Contributor

In order to maintain a backlog of relevant issues, we automatically label them as stale after 500 days of inactivity.

If this issue is still important to you, then please comment on this issue and the stale label will be removed.

Otherwise this issue will be automatically closed in 28 days time.

@github-actions github-actions bot added the Stale A stale issue/pull-request label Aug 11, 2021
@nhsavage
Copy link
Contributor

this is still important but I don't have any time to fix it myself

@rcomer rcomer removed the Stale A stale issue/pull-request label Aug 12, 2021
@wjbenfold
Copy link
Contributor

wjbenfold commented Feb 3, 2022

If I understand this issue and the PP spec correctly, it isn't possible to save a PP file with a non-Proleptic Gregorian calendar. Possibly behaviours for Iris when asked to save a cube with a Standard calendar seems limited to

  • leave the calendar unconverted and don't warn (current behaviour)
  • leave the calendar unconverted and warn
  • throw error (bad, breaks valid existing code)
  • warn then convert to Proleptic Gregorian calendar then save (breaking change)
  • silently convert to Proleptic Gregorian calendar then save (breaking change)
  • two or more of the above behaviours, selected by a flag (which would need a default)

Are there other alternatives? Which of these would be preferred?

@nhsavage
Copy link
Contributor

nhsavage commented Feb 3, 2022

taking this back to basics. The problem is that the orginal UMDP F3 which was the basis of the pp load and save rules was somewhat lose in its teminology and referrred to the UM's calendar as a Standard calendar in one place and Proleptic Gregorian somewhere else. On closer inspection, I determined that it is in fact Proleptic Greogrian. However, the pp load and save rules are treating it as Standard

This means there is a symetrical error here:
https://github.com/SciTools/iris/blob/main/lib/iris/fileformats/pp.py#L1064

I thefore think there are actually two changes needed here which should cancel out in all round trip pp tests:

in pp.py change

calendar = cf_units.CALENDAR_STANDARD
to
calendar = cf_units.CALENDAR_PROLEPTIC_GREGORIAN

and in

pp_save_rules.py

change

elif time_coord.units.calendar == 'standard':

to

elif time_coord.units.calendar == 'proleptic_gregorian':

@nhsavage
Copy link
Contributor

nhsavage commented Feb 3, 2022

assuming a round trip with pp data is unchanged, then there are two other cases:

  1. loading from pp and saving to some other format e.g. netCDF - this will change the calendar of the output and can be considered a bug fix which changes results
  2. loading from other file formats which state that the data has Standard calendar - this is the hard one.
    a. we could throw an error here, which would be justified in that UM doesn't support proleptic gregorian calendars
    b. we could convert to proleptic gregorian. This would only affect data in Standard calendar a with a time base or a date before 1582. This is a real niche issue as almost all climate models use a Proleptic Gregorian not a Standard calendar as otherwise you need a lot of calendar code to cope with the transition from the Julian to the Gregorian calendar.

@nhsavage
Copy link
Contributor

nhsavage commented Feb 3, 2022

What I could probably do is make the above changes (except the conversion of standard calendar) and then see which, if any, of the tests break

@wjbenfold
Copy link
Contributor

SciTools/cf-units#203 currently needs parallel changes in Iris, which seem quite relevant here - notably that pp files don't handle cf's "standard" calendar

@wjbenfold
Copy link
Contributor

wjbenfold commented Feb 3, 2022

  1. loading from pp and saving to some other format e.g. netCDF - this will change the calendar of the output and can be considered a bug fix which changes results

I agree that this is a bug fix. Given the likelihood that downstream users have a workaround it seems like a breaking change. We could look at hiding it behind a FUTURE flag or something (though I'm not 100% on what that entails)

  1. loading from other file formats which state that the data is Standard calendar - this is the hard one.
    a. we could throw an error here, which would be justified in that UM doesn't support proleptic gregorian calendars
    b. we could convert to proleptic gregorian. This would only affect data in Standard calendar a with a time base or a date before 1582. This is a real niche issue as almost all climate models use a Proleptic Gregorian not a Standard calendar as otherwise you need a lot of calendar code to cope with the transition from the Julian to the Gregorian calendar.

I'd be happy converting to the right calendar and then doing the save without asking the user. I think this is breaking in the same way as the other effect - it invalidates workarounds / changes results that should be reproducible.

@rcomer
Copy link
Member

rcomer commented Feb 3, 2022

Sorry I’ve not read this thread in detail, but SciTools/cf-units#185 seems relevant, and I understand @larsbarring already has a branch for it.

@rcomer
Copy link
Member

rcomer commented Feb 17, 2022

  1. loading from pp and saving to some other format e.g. netCDF - this will change the calendar of the output and can be considered a bug fix which changes results

I agree that this is a bug fix. Given the likelihood that downstream users have a workaround it seems like a breaking change. We could look at hiding it behind a FUTURE flag or something (though I'm not 100% on what that entails)

I think this will break more than just saving to other formats. For example, if you plot a time series and your calendar is "standard", the values are converted to datetime.datetime and matplotlib handles them out of the box. For any other calendar, we rely on nc-time-axis, where features such as date-locators and -formatters are (probably?) not as mature. So we risk breaking a lot of plotting code.

iris/lib/iris/plot.py

Lines 590 to 608 in 75c7570

if coord.units.calendar == "standard":
r = [datetime.datetime(*date) for date in dates]
else:
try:
import nc_time_axis # noqa: F401
except ImportError:
msg = (
"Cannot plot against time in a non-standard "
'calendar, because "nc_time_axis" is not available : '
"Install the package from "
"https://github.com/SciTools/nc-time-axis to enable "
"this usage."
)
raise IrisError(msg)
r = [
cftime.datetime(*date, calendar=coord.units.calendar)
for date in dates
]

For many (most?) UM applications, the data won't go back far enough for the standard/proleptic_gregorian distinction to be meaningful, so I think it would be good to permanently retain the option of cubes loaded from PP having the "standard" calendar.

@nhsavage
Copy link
Contributor

it sounds like this is an issue we will have to live with. Trying to make this backward compatible seems very hard but there is no consensus to make it a breaking change, so we will have to live with it. I propose to close this issue as "won't be fixed"

@rcomer
Copy link
Member

rcomer commented Feb 17, 2022

As I understand it, we are only just now starting to have climate configurations with a proper calendar, whereas historically they were 360_day. So the issue could start to become a genuine problem where it wasn't before (though I'm not sure how far back even the climate experiments go).

@wjbenfold
Copy link
Contributor

We make breaking changes sometimes, and with good reasons. This will be a useful place to start if in future it becomes more important to match the actual meaning of pp dates.

In the meantime, would a warning be helpful, so that users this is affecting have a pointer towards where the bug is coming from in their code that assumes Iris does this correctly?

@nhsavage
Copy link
Contributor

No one ever looks at warnings.

@rcomer
Copy link
Member

rcomer commented Feb 25, 2022

I've just learned something...

from cf_units import Unit

u = Unit("days since 1970-01-01", calendar="standard")
gap_dates = range(-141430, -141420)
print(u.num2date(gap_dates))

shows Julian/Gregorian gap as expected

[cftime.DatetimeGregorian(1582, 10, 2, 0, 0, 0, 0, has_year_zero=False)
 cftime.DatetimeGregorian(1582, 10, 3, 0, 0, 0, 0, has_year_zero=False)
 cftime.DatetimeGregorian(1582, 10, 4, 0, 0, 0, 0, has_year_zero=False)
 cftime.DatetimeGregorian(1582, 10, 15, 0, 0, 0, 0, has_year_zero=False)
 cftime.DatetimeGregorian(1582, 10, 16, 0, 0, 0, 0, has_year_zero=False)
 cftime.DatetimeGregorian(1582, 10, 17, 0, 0, 0, 0, has_year_zero=False)
 cftime.DatetimeGregorian(1582, 10, 18, 0, 0, 0, 0, has_year_zero=False)
 cftime.DatetimeGregorian(1582, 10, 19, 0, 0, 0, 0, has_year_zero=False)
 cftime.DatetimeGregorian(1582, 10, 20, 0, 0, 0, 0, has_year_zero=False)
 cftime.DatetimeGregorian(1582, 10, 21, 0, 0, 0, 0, has_year_zero=False)]

but

print(u.num2pydate(gap_dates))

shows no gap, because the datetime library is actually Proleptic Gregorian.

datetime assumes the current Gregorian calendar extended in both directions

[real_datetime(1582, 10, 12, 0, 0) real_datetime(1582, 10, 13, 0, 0)
 real_datetime(1582, 10, 14, 0, 0) real_datetime(1582, 10, 15, 0, 0)
 real_datetime(1582, 10, 16, 0, 0) real_datetime(1582, 10, 17, 0, 0)
 real_datetime(1582, 10, 18, 0, 0) real_datetime(1582, 10, 19, 0, 0)
 real_datetime(1582, 10, 20, 0, 0) real_datetime(1582, 10, 21, 0, 0)]

which rather points to the plotting code being wrong. Possibly there are other implications but my brain hasn't got there yet...

@rcomer
Copy link
Member

rcomer commented Oct 25, 2022

I wonder if we could actually have our cake and eat it here. Noting that

  • 1st January 1970 is the same date in the Proleptic Gregorian and Standard calendars
  • Any time point expressed as hours since 1st January 1970 is also therefore the same in the Proleptic Gregorian and Standard calendars

So, once we have converted our pp time headers into a coordinate with unit "hours since 1970-01-01 00:00:00", it is somewhat arbitrary whether we label it "proleptic_gregorian" or "standard". Would it therefore be reasonable to use the Proleptic Gregorian calendar to do the conversion from pp-header to coordinate, but still label the coordinate with the "standard" calendar? It would be a bit of a fudge, but would mean any change in behaviour would be limited to dates before 15th October 1582, and therefore a bugfix.

Based on #3561 (comment), pp-saving could use num2pydate instead of num2date. Then the saved dates would be Proleptic Gregorian regardless of which real-world calendar was on the coordinate.

@nhsavage
Copy link
Contributor

I have to admit, I am now getting a bit lost with all of this. It came about because I needed to convert some netCDF data from the ACCESS model to ancils. That model stored the date with a time of "days since 1-1-1 0:00" or similar and when it was saved to ancil, the time was shifted. I worked around this by converting the time units to be post 1582. I am not sure if this fix would still be needed with your proposed cake fix?

@rcomer
Copy link
Member

rcomer commented Oct 27, 2022

Hmmm. It looks like num2pydate won't even work if you have "days since 1-1-1 0:00"

import cf_units
tunit = cf_units.Unit("days since 1-1-1 0:00")
tunit.num2pydate(42)
--> 573 dates = cftime.num2date(time_values, **num2date_kwargs)
    574 try:
    575     # We can assume all or none of the dates have a microsecond attribute
    576     microseconds = np.array([d.microsecond if d else 0 for d in dates])

File src/cftime/_cftime.pyx:504, in cftime._cftime.num2date()

ValueError: illegal calendar or reference date for python datetime

But since cf_units v3.1 we can do

new_unit = tunit.change_calendar("proleptic_gregorian")
print(new_unit.num2date(0))
0000-12-30 00:00:00

So maybe pp-saving should enforce a Proleptic Gregorian calendar before doing its conversions to headers.

I need more caffeine...

@rcomer rcomer changed the title BUG: iris pp treats gregorian as a proleptic gregorian and ignores real proleptic gregorian calendars BUG: iris pp treats standard calendar as a proleptic gregorian and ignores real proleptic gregorian calendars Nov 2, 2022
@rcomer
Copy link
Member

rcomer commented Nov 2, 2022

Noting that the calendar previously called "gregorian" in the cf-conventions is now called "standard", I've updated the comments through this issue to hopefully reduce confusion.

@trexfeathers
Copy link
Contributor

trexfeathers commented Jan 18, 2023

This no longer needs a team discussion of the principles, it needs a concrete proposal for people to discuss. So something to assign as a project for someone in future. If we can't find someone to assign, I will close this (unless it gets enough votes).

@nhsavage
Copy link
Contributor

nhsavage commented Jan 18, 2023

proposal

in pp.py change

calendar = cf_units.CALENDAR_STANDARD
to
calendar = cf_units.CALENDAR_PROLEPTIC_GREGORIAN

and in

pp_save_rules.py

change

elif time_coord.units.calendar == 'standard':

to

elif time_coord.units.calendar == 'proleptic_gregorian':

then we have to decide how to handle files in a standard a calendar

option 1: error
option 2: convert calendar to proleptic greogrian before saving. Will only affect results for dates before 1582

@nhsavage
Copy link
Contributor

option 2: convert calendar to proleptic greogrian before saving. Will only affect results for dates before 1582
I think this is the best solution here.

@rcomer rcomer linked a pull request Jan 18, 2023 that will close this issue
@rcomer
Copy link
Member

rcomer commented Jan 18, 2023

I have stuck up a PR based on my comments above at #5138

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: No status
Development

Successfully merging a pull request may close this issue.

5 participants