Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improvements and changes to ECDF plots #2309

Open
sethaxen opened this issue Feb 5, 2024 · 1 comment
Open

Improvements and changes to ECDF plots #2309

sethaxen opened this issue Feb 5, 2024 · 1 comment

Comments

@sethaxen
Copy link
Member

sethaxen commented Feb 5, 2024

Tell us about it

Here are some proposed improvements to ECDF plots:

  • currently the code is quite PIT-focused. e.g. the simulated confidence bands are implemented assuming the target distribution is (discrete) uniform, but this is nowhere documented. We can generalize this method to support other distributions if the user provides both a cdf and an rvs function for the assumed distribution.
  • it is the only function that takes a keyword fpr. A more consistent one with our API would be to specify a prob=1-fpr.
  • we should add the optimized confidence bands from https://doi.org/10.1007/s11222-022-10090-6, which are faster and more stable than the simulated ones.
  • we should consider removing values2. This is supposedly for ECDF comparison, but it's not as useful as the ECDF comparison plot from the paper, which we should consider making its own plot.
  • we should allow the user to specify the evaluation points. The theory behind the confidence bands assumes the evaluation points are independent of the sampled values, and the below notebook shows that setting the evaluation points based on the sampled values can cause the bounds to be slightly too tight. Plus, if one wants to plot many ECDFs in the same plot, it's common one would want them to share evaluation points.
  • we should allow the user to provide a pre-computed confidence band. Alternatively, the confidence band could be its own plotting function. In cases like PIT and rank plots where all subplots share the same comparison distribution and evaluation points, one would want to compute the confidence band once and use it for all subplots (even with optimization, this is much more expensive than computing the ECDF)
  • When not plotting a band, we should default to using the sampled points as the evaluation points.

Thoughts on implementation

This notebook implements the bands and tests them on a few distributions. It also compares different methods of selecting the evaluation points.

@sethaxen
Copy link
Member Author

sethaxen commented Feb 6, 2024

Also, when the evaluation points are different from the sample points, it feels a little weird using step plots. Step plots give the sense that we know what the function values are between the points, but unlike the full ECDF, we don't. Line plots aren't much better, but we're more accustomed to lines in lineplots not necessarily implying interpolation. I wonder if we should only reserve step plots for the cases where eval points and sample points are the same (Edit: or at least make stepping configurable)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant