Extend scope of `alignment="same_verifs"` #699

dougiesquire · 2021-12-03T02:21:52Z

The "same_verifs" alignment generates a list of times from verif that are present in forecast at any init but all leads. This list will always be empty when the init frequency is lower than the lead frequency. Is there scope to extend "same_verifs" to instead deal appropriately with such cases? I'll try to give a concrete example of what I mean below.

Consider the following hindcasts:

import cftime
import climpred
import numpy as np
import xarray as xr

# Hindcasts initialised every year with monthly lead
init = xr.cftime_range(start="2000-01-01", end="2002-01-01", freq="AS")
lead = range(0, 24)
data = np.random.random((len(init), len(lead)))
hind = xr.DataArray(data, coords=[init, lead], dims=["init", "lead"], name="var")
hind["lead"].attrs["units"] = "months"
hind = climpred.utils.add_time_from_init_lead(hind)

I currently can't use "same_verifs" with this data because there are no common times available at all leads.

But, users may still want to align based on a common verification period. I.e., in this example, "valid_time"s [2001-01-01 and 2002-01-01] are available at all possible leads for which they can occur (leads 0 and 12 months). Similarly,

[2001-02-01 and 2002-02-01] are available at leads 1 and 13 months,
[2001-03-01 and 2002-03-01] are available at leads 2 and 14 months,
...
[2001-12-01 and 2002-12-01] are available at leads 11 and 23 months.

That is, by performing verification over the period 2001-01-01 - 2002-12-01 one includes:

the same dates at each lead where possible, given the init/lead frequencies
the same number of samples at each lead

period = [cftime.DatetimeGregorian(2001, 1, 1), cftime.DatetimeGregorian(2002, 12, 1)]

hind.where(
    np.logical_and(hind["valid_time"] >= period[0], hind["valid_time"] <= period[1])
).plot()

How do folks feel about trying to restructure cftime.utils._same_verifs_alignment() to use the above alignment dates in the above example? We would obviously do this such that the current behaviour is preserved for datasets that have common verification times across all leads.

The text was updated successfully, but these errors were encountered:

aaronspring · 2021-12-04T16:33:34Z

#702 will help to visualize the discussion

aaronspring · 2021-12-04T23:28:56Z

Thank you for this extension proposal issue @dougiesquire

In #702, I played around with your use case and indeed same_verifs doesn't work here:

init = xr.cftime_range(start="2000-01-01", end="2002-01-01", freq="AS")
lead = range(0, 24)
data = np.random.random((len(init), len(lead)))
hind = xr.DataArray(data, coords=[init, lead], dims=["init", "lead"], name="var")
hind["lead"].attrs["units"] = "months"

time = xr.cftime_range(
    start="2000-01-01", periods=len(init) * 12 + len(lead), freq="MS"
)
data = np.random.random(len(time))
obs = xr.DataArray(data, coords=dict(time=time), dims="time", name="var")

h = climpred.HindcastEnsemble(hind).add_observations(obs)
h.coords["valid_time"]

h.plot()

h.plot_alignment()

Some comments:

common verification period
We would obviously do this such that the current behaviour is preserved for datasets that have common verification times across all leads.

What about a new alignment method same_period or probably better name? This way same_verifs can stay as is. As long as it is clearly documented and distinguishable everything works for me.

the same number of samples at each lead

Thats what same_inits and same_verifs follow and maximize ignores.

I am still not quite understanding how this new alignment would look like. Would it essentially take 12 out of 24 lead months and slide from earlier leads at late inits to later leads at ealier inits? (12 depends on some other specifics I guess or is that because of the monthly freqs in a year?)
In plot_alignment the new approach would result in white spaces (=no verification dates) in the lower left corner (small inits, small leads) and upper right corner (large inits, large leads).

@bradyrx thoughts (on a new alignment)?

aaronspring · 2021-12-06T21:36:33Z

so this alignment would be the first where the number of leads gets reduced.
I am still unsure what this approach does to the interpretation of lead in results.

dougiesquire · 2021-12-06T22:30:21Z

Sorry, I think my description is unclear. And I'm not sure I've fully thought through my suggestion. I'm not meaning to suggest that the number of leads should be reduced.

I'm proposing an alignment that finds the maximum period that:

maintains equal numbers of samples at each lead
includes the same verification dates at each lead where possible

All valid_times that fall within this period would then be used. For hindcasts that have the same verification dates at every lead (e.g. where the lead is annual) this would be equivalent to "same_verifs". However, in cases like the one above (where the init frequency is lower than the lead frequency) a different set of verification dates may be used at one lead relative to another lead.

Consider the following examples with four hindcasts each

init freq: 3 month, lead freq: 3 month:

lead 0 lead 1 lead 2

2001-10 2002-01 2002-04

2001-07 2001-10 2002-01

2001-04 2001-07 2001-10

2001-01 2001-04 2001-07

Here the period that satisfies the above conditions is 2001-07 -> 2001-10. Keeping everything in this range is equivalent to what "same_verifs" currently does.
init freq: 3 month, lead freq: 1 month:

lead 0 lead 1 lead 2 lead 3

2001-10 2001-11 2001-12 2002-01

2001-07 2001-08 2001-09 2001-10

2001-04 2001-05 2001-06 2001-07

2001-01 2001-02 2001-03 2001-04

Here the period that satisfies the above conditions is 2001-04 -> 2001-10. This case would currently fail with "same_verifs" because the combination of init and lead frequencies means that we can never get the same verification dates at all leads.

Does this make sense?

aaronspring · 2021-12-06T23:12:12Z

Thanks @dougiesquire. Now I get your approach. So valid_times do not need to match across lead but is between on upper and lower bound.
It reminds me a bit of sel(method='nearest') but with a upper and lower bound.

Note: For your example to work you definitely need a monthly observation.

So for your second example, ~~striked~~ do not verify:

init freq: 3 month, lead freq: 1 month:

lead 0	lead 1	lead 2	lead 3
2001-10	~~2001-11~~	~~2001-12~~	~~2002-01~~
2001-07	2001-08	2001-09	2001-10
2001-04	2001-05	2001-06	2001-07
~~2001-01~~	~~2001-02~~	~~2001-03~~	2001-04

The number of sample isnt equal but wont differ more than +/- 1 IMO. Taking 2001-03 - 2001-11 makes three sample each.

I'd still prefer to make a new alignment keyword. maybe same_verifs_nearest or same_verifs_fill?

Would you lead a PR? Entrypoint is

climpred/climpred/alignment.py

Line 125 in f6e05d1

    
           def _same_verifs_alignment(init_lead_matrix, valid_inits, all_verifs, leads, n, freq):

I am happy to give feedback and test.

dougiesquire · 2021-12-06T23:30:10Z

Note: For your example to work you definitely need a monthly observation.

Yes exactly - sorry should've made that clearer

The number of sample isnt equal but wont differ more than +/- 1 IMO.

Good point. I messed that up, sorry. Now I realise there isn't a single solution to the constraints I've posed.

I think there'd be value in an alignment something like what I'm suggesting. But it seems like I still need to work out the best approach for climpred in my head. In the past for my own work I've just specified a period over which to verify and kept all dates within that period. I chose this period judiciously to make sure that there are equal numbers of samples at each lead.

Happy to open a PR where I can flesh this out a little better. But it might take me a little while to get to it sorry.

dougiesquire added the feature request label Dec 3, 2021

aaronspring changed the title ~~Extend scope of "same_verifs" alignment~~ Extend scope of alignment="same_verifs" Aug 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extend scope of `alignment="same_verifs"` #699

Extend scope of `alignment="same_verifs"` #699

dougiesquire commented Dec 3, 2021

aaronspring commented Dec 4, 2021 •

edited

aaronspring commented Dec 4, 2021

aaronspring commented Dec 6, 2021

dougiesquire commented Dec 6, 2021 •

edited

aaronspring commented Dec 6, 2021 •

edited

dougiesquire commented Dec 6, 2021

Extend scope of alignment="same_verifs" #699

Extend scope of alignment="same_verifs" #699

Comments

dougiesquire commented Dec 3, 2021

aaronspring commented Dec 4, 2021 • edited

aaronspring commented Dec 4, 2021

aaronspring commented Dec 6, 2021

dougiesquire commented Dec 6, 2021 • edited

aaronspring commented Dec 6, 2021 • edited

dougiesquire commented Dec 6, 2021

Extend scope of `alignment="same_verifs"` #699

Extend scope of `alignment="same_verifs"` #699

aaronspring commented Dec 4, 2021 •

edited

dougiesquire commented Dec 6, 2021 •

edited

aaronspring commented Dec 6, 2021 •

edited