Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MAINT:update boost to fix skewnorm.ppf #20643

Merged
merged 5 commits into from May 6, 2024
Merged

Conversation

dschmitz89
Copy link
Contributor

@dschmitz89 dschmitz89 commented May 4, 2024

Reference issue

Supersedes #20125
Closes #20124

What does this implement/fix?

The skewnorm quantile calculation was fixed upstream, so this updates boost to the latest commit of the development branch.

Additional information

CC @maresb in case you have more stress tests for the skewnorm distribution.

I added the backport-candidate label as this is a bug fix but feel free to ignore @tylerjereddy as boost had many moving parts lately. Might be worth to run wheel tests as boost was prone to subtle bugs, will leave the decision to people more involved in that though.

@github-actions github-actions bot added scipy.stats scipy._lib maintenance Items related to regular maintenance tasks labels May 4, 2024
@dschmitz89 dschmitz89 added the backport-candidate This fix should be ported by a maintainer to previous SciPy versions. label May 4, 2024
@tylerjereddy tylerjereddy added this to the 1.14.0 milestone May 4, 2024
@maresb
Copy link
Contributor

maresb commented May 5, 2024

Thanks a lot @dschmitz89 for following up on this! I commented in boostorg/math#1120 (comment), but I'm hopeful that your work here will be helpful for my current applications.

The way scipy patches Boost's skewnorm CDF here is really helpful. But unfortunately if we want to use Boost's PPF solver, that will use Boost's unstable CDF function, so it will be plagued by the upstream inaccuracies.

from scipy.stats import skewnorm
from scipy.stats._boost import _skewnorm_cdf
from scipy import __version__

__version__  # '1.10.0'
skewnorm.cdf(-3, 5)  # 4.136670779725204e-55
_skewnorm_cdf(-3, 0, 1, 5)  # 2.3852447794681098e-18

With Mathematica, N[CDF[SkewNormalDistribution[0, 1, 5], -3], 20] gives 4.1366707600635155404e-55, so we see that scipy's numerical integration is very close, while Boost is way off.

I think I know how to stably compute the CDF without numerical integration, but I'm not sure. I'd need to first implement it and test it thoroughly. I'd like to contribute it upstream to Boost and get rid of the numerical integration fallback here, but I'm not sure when I'll be able to get around to it.

@dschmitz89
Copy link
Contributor Author

Fixing boost's CDF would be very helpful, thanks for looking into that still. Do you agree that the update for the PPF here is an improvement already?

@maresb
Copy link
Contributor

maresb commented May 5, 2024

I don't have the time to investigate in-depth, but you seem to have fixed my original test case to fairly good precision. Barring any regressions, which hopefully the test suite mostly prevents, this appears to me like a substantial improvement! 🎉

Thanks again @dschmitz89!

@maresb
Copy link
Contributor

maresb commented May 5, 2024

For my current application I only need to determine the PPF of roughly the first percentile. Boost's CDF should be safe as long as the quantile can be expressed as the difference between two floats of order 1. I think that's what this achieves. It'd be really excellent if this works! 🤞

@dschmitz89
Copy link
Contributor Author

I went ahead and tested the wheels: all pass, one test failure is unrelated.

@mdhaber
Copy link
Contributor

mdhaber commented May 5, 2024

Is it the change to Boost that made the wheel builder tests happen? NM. I see the commment ; )

@mdhaber
Copy link
Contributor

mdhaber commented May 6, 2024

@dschmitz89 This update seems to fix gh-19348 and gh-18906 on my end, too (or they were already fixed). Could you check on your side, and if so add a test?

Update - seems like they were already fixed by something? I'm closing those issues. If you can confirm locally that they appear to be resolved by running the tests mentioned in my last posts (in each issue), I'd appreciate it!

Co-authored-by: Ben Mares <services-git-throwaway1@tensorial.com>
@dschmitz89
Copy link
Contributor Author

dschmitz89 commented May 6, 2024

@dschmitz89 This update seems to fix gh-19348 and gh-18906 on my end, too (or they were already fixed). Could you check on your side, and if so add a test?

Update - seems like they were already fixed by something? I'm closing those issues. If you can confirm locally that they appear to be resolved by running the tests mentioned in my last posts (in each issue), I'd appreciate it!

Good point, I had already forgotten about those. I can confirm that both issues are resolved in main, probably due to the Boost update in #20393 .

@mdhaber
Copy link
Contributor

mdhaber commented May 6, 2024

My only hesitation with merging here is the submodule update, since people tend to find it disruptive and it was already updated quite recently. Would you like this to merge now, or can it wait until release time (at which point we might update to latest Boost)?

@dschmitz89
Copy link
Contributor Author

I agree. For the sake of not annoying contributors shortly before a release, let's delay until after the 1.14 release.

@mdhaber
Copy link
Contributor

mdhaber commented May 6, 2024

Oh, actually, I still wanted this to make it into 1.14. Let's just ask others who might have an opinion:

@tylerjereddy @rgommers this update to the Boost submodule would close gh-20124. There was just an update to the Boost submodule in gh-20393, and I know we don't like submodule updates to be too frequent. When would be the best time to merge this - now, wait a bit but merge before 1.14 branches, or wait until after 1.14 branches?

@rgommers
Copy link
Member

rgommers commented May 6, 2024

"it fixes a bug" is a good enough reason to bump a submodule I'd say.

@mdhaber mdhaber merged commit ee8c655 into scipy:main May 6, 2024
@mdhaber
Copy link
Contributor

mdhaber commented May 6, 2024

This submodule bump fixes a bug, so merging.

@dschmitz89 dschmitz89 deleted the boost_update branch May 7, 2024 12:11
@tylerjereddy tylerjereddy modified the milestones: 1.14.0, 1.13.1 May 15, 2024
@tylerjereddy tylerjereddy removed the backport-candidate This fix should be ported by a maintainer to previous SciPy versions. label May 15, 2024
@tylerjereddy tylerjereddy modified the milestones: 1.13.1, 1.14.0 May 15, 2024
@tylerjereddy
Copy link
Contributor

The backport label has been removed because on MacOS ARM with clang-16.0.6 I saw additional test failures with this branch cherry-picked in to the release branch (not just for the new test added here).

tylerjereddy pushed a commit to tylerjereddy/scipy that referenced this pull request May 15, 2024
* MAINT: stats: update boost to fix improve `skewnorm.ppf`
tylerjereddy added a commit to tylerjereddy/scipy that referenced this pull request May 15, 2024
@maresb
Copy link
Contributor

maresb commented May 16, 2024

Hey, great news, I just wanted to mention that I've been using a dev version of scipy from right after this merge. Previously I experienced a failure rate of roughly one in a million from my dataset. Now, after billions of invocations, I haven't seen a single failure. This is awesome, thanks so much everyone!!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
maintenance Items related to regular maintenance tasks scipy._lib scipy.stats
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: stats.skewnorm.ppf returns wrong values with moderately high skew parameters
5 participants