DDtheta: numerical stability of small angular separation in float32 #296

JonLoveday · 2023-06-22T10:37:23Z

General information

Corrfunc version: 2.4.0
platform: MacOS
installation method (pip/source/other?): pip

Issue description

When I perform pair counts on a random distribution with float32 precision using Corrfunc.mocks.DDtheta_mocks with 20 log bins between 0.01 and 10 degrees, I always get zero pairs in the second bin. Using float64 alleviates this problem. I'm not sure if this is a bug as such, but users should be strongly advised to only use float64 coordinates.

Expected behavior

Would not expect significantly different results betweein flloat32 and float64 coords for angular precision > 0.01 deg.

Actual behavior

Zero pair counts in second bin using float32.

What have you tried so far?

Using float64 precision avoids this problem. In fact, pair counts in all bins change.

Minimal failing example

import Corrfunc
import numpy as np
from numpy.random import default_rng
rng = default_rng()
def corrfunc_test(nran=10000, tmin=0.01, tmax=10, nbins=20):
    """Random-random pair count test."""

    bins = np.logspace(np.log10(tmin), np.log10(tmax), nbins + 1)
    ra = 20*rng.random(nran)
    dec = 20*rng.random(nran)
    counts = Corrfunc.mocks.DDtheta_mocks(1, 1, bins, ra, dec)
    print(counts)

    counts = Corrfunc.mocks.DDtheta_mocks(1, 1, bins, ra.astype('float32'), dec.astype('float32'))
    print(counts)

lgarrison · 2023-06-22T18:03:59Z

At first I thought this might be due to acos precision, but now I don't think it is.

@manodeep I think the method we're using to compute delta-theta is not numerically stable. We're doing

DOUBLE costheta = (one - half*sqr_chord_sep);

where sqr_chord_sep will be approximately (0.02/180*np.pi)**2 = 1.2e-7 for an angular separation of 0.02 deg. Then the costheta subtraction underflows to 1, unsurprisingly. It seems to me we should be able to do better than this.

Perhaps we could switch to theta = 2*arcsin(C/2) at some small angle; the expression is exact, but slower because it involves both arcsin and sqrt.

Or perhaps we could switch to the small angle approximation theta = C. For 0.02 deg, the error is only 2e-12. The next term in the expansion is C^3/24, which is also cheap to compute once we have C. It should be possible to compute the angle for which the error due to 1-C^2 is greater than the error due to the small angle approximation, and use that as the switchover point.

Or perhaps there's another trig trick that avoids 1-C^2?

manodeep · 2023-06-23T01:20:04Z

Thanks @JonLoveday for reporting the issue! It's not entirely unexpected - computing with floating points numbers can cause such seemingly odd behaviour. Corrfunc was not designed to be resilient against such numerical issues - for example, rpavg/weightavg values can differ between float and double precision.

What I would also be curious to know is if a brute-force python computation with float-precision produced results more consistent with double-precision.

@lgarrison Yeah $DD(\theta)$ is definitely the most susceptible to such floating point issues. Have not really thought about or investigated any numerical tricks to minimise such "catastrophic" errors. I have a vague memory that $DD(\theta)$ can compute theta ranges from [0, 180] - any numerical tricks we use should not reduce the max-theta range allowed. May be we could just use a python script to write out the absolute and relative errors for all "small" angles and then decide the switchover point

JonLoveday · 2023-06-23T09:13:51Z

Thanks @lgarrison and @manodeep! In the past, I have always used theta = 2*arcsin(C/2). It just occurred to me that you can avoid both the arcsin and sqrt calculations by converting the bins from theta -> 4 sin^2(theta/2).

manodeep · 2023-06-23T13:16:57Z

Currently Corrfunc can compute angular separations between [0, 180] - i.e., $-1.0 <= cos(\theta) <= 1.0$. If we take the squared route, then we will be limited to [0, 90] - which may not suit what others want. However, we would likely only use the squared approx. near zero separation anyway - so perhaps the entire angular range would not be affected.

JonLoveday · 2023-06-23T13:29:26Z

sin^2(theta/2) is a unique mapping of theta over [0, 180]

manodeep · 2023-06-24T04:58:07Z

Right! $\theta/2$ and not $\theta$ 🤦🏾

lgarrison · 2023-06-24T19:34:12Z

Thanks for the great suggestion, @JonLoveday! I made a proof-of-concept in #297. The biggest hurdle remaining is probably deciding how to compute theta for purposes of returning the average binned theta. We could bite the bullet and do 2*arcsin(sqrt(C^2)/2)... but for most angles arccos(1 - C^2/2) is probably accurate enough, and theta = C might suffice for small angles (maybe with an additional C^3/24 term).

On the other hand, whatever we choose has to be implemented in the SIMD kernels, which makes a theta computation with two branches less appealing.

I've updated the title to reflect the underlying issue.

JonLoveday · 2023-06-26T08:50:30Z

Thanks for the great suggestion, @JonLoveday! I made a proof-of-concept in #297. The biggest hurdle remaining is probably deciding how to compute theta for purposes of returning the average binned theta. We could bite the bullet and do 2*arcsin(sqrt(C^2)/2)... but for most angles arccos(1 - C^2/2) is probably accurate enough, and theta = C might suffice for small angles (maybe with an additional C^3/24 term).

I think for most practical purposes, it would be good enough to calculate the average binned C^2 value, and then transform to an average theta.

manodeep · 2023-08-21T23:29:38Z

While looking at #301, I realised that we could easily recast the current calculation to be:
costheta := x1*x2 + y1*y2 + z1*z2 and remove the entire chord_sep calculation. That should be better behaved and much easier to implement within the existing framework

lgarrison · 2023-08-22T14:11:21Z

I think this issue is less about avoiding the chord_sep calculation and more about choice of variables. If cos(theta) is the variable, no matter how you compute it, it's going to have very poor precision for small angles. A dot product wouldn't fix the original issue, for example—the cosine of an angle less than ~0.02 degrees is indistinguishable from 1 in float32, regardless of how you get there. So I don't think we'd see much gain from a dot-product approach, but I'd be happy to be proven wrong!

manodeep · 2023-08-23T06:03:26Z

I thought that the issue was arising because 1 - 0.5*C^2 for small C was numerically unstable, and switching to the dot product would add terms that were all of similar magnitude, and hence better-behaved numerically.

@lgarrison I can confirm that your assertion is correct - even with the dot-product method (in #303), the second bin still contains 0 elements for a float32 calculation. Now I understand what is going on - the difference between the two bin-edges for the second bin is 3e-8 and is smaller than float-eps (few*10^-7) - therefore, the bin can not be populated with a float32. Maybe that should be the criteria to switch to a different variable or even check that condition is satisfied or even "refuse" to compute in float32?

manodeep · 2023-08-23T09:35:41Z

Modifying the reproducer slightly to check the bins:

import numpy as np
def corrfunc_test(nran=10000, tmin=0.01, tmax=10, nbins=20):
    """Random-random pair count test."""

    bins = np.geomspace(tmin, tmax, nbins + 1)
    for typ in [np.float32, np.float64]:
        cosbins = np.cos(bins*np.pi/180., dtype=typ)
        diff = np.diff(cosbins)
        print("float type = {} max. diff = {}".format(typ, max(diff)))

produces

float type = <class 'numpy.float32'> max. diff = 0.0
float type = <class 'numpy.float64'> max. diff = -1.5158711841323225e-08

lgarrison · 2023-08-23T12:30:39Z

Maybe that should be the criteria to switch to a different variable or even check that condition is satisfied or even "refuse" to compute in float32?

I think using sep^2 as the variable (i.e. #297) is generally a better method, since we don't care as much about precision at large angles. I don't even think we'd need to switch between variables. However, for computing thetaavg, different methods are probably better for different separations (#296 (comment)), which is annoying for SIMD kernels.

And the warnings in #299 are all tunable; I chose a 1% loss of precision in theta as the threshold, but this can be changed or promoted to an error as desired.

manodeep · 2023-08-29T01:00:38Z

@lgarrison I don't quite follow why . The 0-counts issue that @JonLoveday has reported is occurring because the second bin has zero width when costheta is computed with 32-bit floats and obviously there can not be any pairs in a bin when the associated bin-width is zero. We should add a check within Corrfunc that errors out when a zero-width bin is passed (and the call from DDtheta_mocks would pass the relevant precision costheta bins)

Separately, there is the issue of what's the best numerical method to use with 32-bit floats. To me, the calculation of costheta as a dot product seems to be a better candidate than the calculation with (1-C^2/2) since it is possible that C << 1. If you see in #303, I have completely removed the chord_sep calculation within the innermost loop, and done the costheta calculation as a dot-product. On my M2 laptop, the new DDtheta fallback kernels run faster than the current ones with chord_sep - i.e., we might be getting faster and more precise with the dot-product.

Regardless of the method, we will need an inverse trig call when computing theta-averages - so I am not sure whether the timings will differ too much between arccos() and arcsin(sqrt(...))

lgarrison mentioned this issue Jun 24, 2023

ddtheta: bin in sep^2 instead of cos(theta) #297

Draft

lgarrison changed the title ~~Using float32 coordinates can produce spurious results~~ DDtheta: numerical stability of small angular separation in float32 Jun 24, 2023

manodeep mentioned this issue Jul 14, 2023

mocks: warn about small theta and large mu in float32 precision #299

Merged

manodeep mentioned this issue May 2, 2024

Simplifying fallback kernels #303

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DDtheta: numerical stability of small angular separation in float32 #296

DDtheta: numerical stability of small angular separation in float32 #296

JonLoveday commented Jun 22, 2023

lgarrison commented Jun 22, 2023

manodeep commented Jun 23, 2023

JonLoveday commented Jun 23, 2023

manodeep commented Jun 23, 2023

JonLoveday commented Jun 23, 2023

manodeep commented Jun 24, 2023

lgarrison commented Jun 24, 2023 •

edited

JonLoveday commented Jun 26, 2023

manodeep commented Aug 21, 2023

lgarrison commented Aug 22, 2023

manodeep commented Aug 23, 2023

manodeep commented Aug 23, 2023

lgarrison commented Aug 23, 2023

manodeep commented Aug 29, 2023

DDtheta: numerical stability of small angular separation in float32 #296

DDtheta: numerical stability of small angular separation in float32 #296

Comments

JonLoveday commented Jun 22, 2023

General information

Issue description

Expected behavior

Actual behavior

What have you tried so far?

Minimal failing example

lgarrison commented Jun 22, 2023

manodeep commented Jun 23, 2023

JonLoveday commented Jun 23, 2023

manodeep commented Jun 23, 2023

JonLoveday commented Jun 23, 2023

manodeep commented Jun 24, 2023

lgarrison commented Jun 24, 2023 • edited

JonLoveday commented Jun 26, 2023

manodeep commented Aug 21, 2023

lgarrison commented Aug 22, 2023

manodeep commented Aug 23, 2023

manodeep commented Aug 23, 2023

lgarrison commented Aug 23, 2023

manodeep commented Aug 29, 2023

lgarrison commented Jun 24, 2023 •

edited