Accuracy issue in SVD API "SGESDD " #45

srvasanth · 2020-12-07T09:24:52Z

Hi,
We are observing few failures in one of our customer applications using libFLAME and BLIS for SVD API “SGESDD”. The outputs of singular values S and the Orthogonal matrix U are differing from expected output. The tests pass with OpenBLAS and MKL libraries' outputs for the same API.

Input Matrix A Size : 9 x 100
Input values: {1} -> All 1s
Parameters:
JobZ : ‘O’
M : 9
N : 100
LDA : 9
LDU : M
LDVT : 1

Outputs from libflame+BLIS

Singular values (S)
3.0000e+01
4.6447e-06
5.2638e-13
1.2358e-19
0.0000e+00
-0.0000e+00
-0.0000e+00
-0.0000e+00
-0.0000e+00

Orthogonal Matrix(U)
3.3333e-01 -9.4281e-01 -0.0000e+00 -0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00
3.3333e-01 1.1785e-01 -9.3541e-01 -1.2102e-07 7.0755e-15 0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00
3.3333e-01 1.1785e-01 1.3363e-01 -9.2582e-01 1.3064e-08 0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00
3.3333e-01 1.1785e-01 1.3363e-01 1.5430e-01 9.1287e-01 0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00
3.3333e-01 1.1785e-01 1.3363e-01 1.5430e-01 -1.8257e-01 4.4721e-01 4.4721e-01 4.4721e-01 4.4721e-01
3.3333e-01 1.1785e-01 1.3363e-01 1.5430e-01 -1.8257e-01 -8.6180e-01 1.3820e-01 1.3820e-01 1.3820e-01
3.3333e-01 1.1785e-01 1.3363e-01 1.5430e-01 -1.8257e-01 1.3820e-01 -8.6180e-01 1.3820e-01 1.3820e-01
3.3333e-01 1.1785e-01 1.3363e-01 1.5430e-01 -1.8257e-01 1.3820e-01 1.3820e-01 -8.6180e-01 1.3820e-01
3.3333e-01 1.1785e-01 1.3363e-01 1.5430e-01 -1.8257e-01 1.3820e-01 1.3820e-01 1.3820e-01 -8.6180e-01

Expected output:

Singular values(S)
3.0000e+01
4.4731e-06
2.8951e-12
3.2130e-18
2.3120e-24
1.2895e-30
1.3683e-36
1.6802e-42
2.9427e-44

Orthogonal Matrix(U)
-3.3333e-01 9.4281e-01 6.4572e-07 9.9341e-09 9.9341e-09 -1.9868e-08 -1.9868e-08 -1.9868e-08 0.0000e+00
-3.3333e-01 -1.1785e-01 9.3541e-01 -1.0867e-06 -7.6012e-09 -1.5052e-08 1.5202e-08 5.1177e-09 0.0000e+00
-3.3333e-01 -1.1785e-01 -1.3363e-01 9.2582e-01 7.9038e-07 -1.7868e-08 3.0130e-10 -2.3329e-09 0.0000e+00
-3.3333e-01 -1.1785e-01 -1.3363e-01 -1.5430e-01 9.1287e-01 -5.9798e-07 2.1286e-08 -1.2825e-08 0.0000e+00
-3.3333e-01 -1.1785e-01 -1.3363e-01 -1.5430e-01 -1.8257e-01 8.9443e-01 -1.0503e-06 -1.6157e-08 0.0000e+00
-3.3333e-01 -1.1785e-01 -1.3363e-01 -1.5430e-01 -1.8257e-01 -2.2361e-01 8.6603e-01 1.3324e-06 -2.2352e-08
-3.3333e-01 -1.1785e-01 -1.3363e-01 -1.5430e-01 -1.8257e-01 -2.2361e-01 -2.8868e-01 8.1635e-01 -1.5403e-02
-3.3333e-01 -1.1785e-01 -1.3363e-01 -1.5430e-01 -1.8257e-01 -2.2361e-01 -2.8867e-01 -4.2152e-01 -6.9928e-01
-3.3333e-01 -1.1785e-01 -1.3363e-01 -1.5430e-01 -1.8257e-01 -2.2361e-01 -2.8867e-01 -3.9484e-01 7.1468e-01

Any analysis or help regarding this will be highly appreciated.

boegel · 2021-03-12T10:09:35Z

We saw a couple of failing numpy tests across a variety of CPUs (Haswell, Skylake, Zen2) when trying to build numpy 1.19.4 on top of latest BLIS (0.8.0) and libFLAME (5.2.0) and GCC 10.2, an example is below.

We're not seeing those failing tests when using OpenBLAS (0.3.12) with the LAPACK it ships, or with Intel MKL (2020 update 4).

If we use BLIS 0.8.0 + reference LAPACK 3.9.0, then there are no failing tests, so the culprit must be libFLAME...

_________________________________________________________________________________________________________________ TestRandomDist.test_multivariate_normal[svd] _________________________________________________________________________________________________________________

self = <numpy.random.tests.test_generator_mt19937.TestRandomDist object at 0x14eac29cce50>, method = 'svd'

    @pytest.mark.parametrize("method", ["svd", "eigh", "cholesky"])
    def test_multivariate_normal(self, method):
        random = Generator(MT19937(self.seed))
        mean = (.123456789, 10)
        cov = [[1, 0], [0, 1]]
        size = (3, 2)
        actual = random.multivariate_normal(mean, cov, size, method=method)
        desired = np.array([[[-1.747478062846581,  11.25613495182354  ],
                             [-0.9967333370066214, 10.342002097029821 ]],
                            [[ 0.7850019631242964, 11.181113712443013 ],
                             [ 0.8901349653255224,  8.873825399642492 ]],
                            [[ 0.7130260107430003,  9.551628690083056 ],
                             [ 0.7127098726541128, 11.991709234143173 ]]])

>       assert_array_almost_equal(actual, desired, decimal=15)
E       AssertionError:
E       Arrays are not almost equal to 15 decimals
E
E       Mismatched elements: 12 / 12 (100%)
E       Max absolute difference: 3.98341847
E       Max relative difference: 2.2477228
E        x: array([[[ 1.994391640846581,  8.74386504817646 ],
E               [ 1.243646915006621,  9.657997902970179]],
E       ...
E        y: array([[[-1.747478062846581, 11.25613495182354 ],
E               [-0.996733337006621, 10.342002097029821]],
E       ...

actual     = array([[[ 1.99439164,  8.74386505],
        [ 1.24364692,  9.6579979 ]],

       [[-0.53808839,  8.81888629],
        [-0.64322139, 11.1261746 ]],

       [[-0.46611243, 10.44837131],
        [-0.46579629,  8.00829077]]])
cov        = [[1, 0], [0, 1]]
desired    = array([[[-1.74747806, 11.25613495],
        [-0.99673334, 10.3420021 ]],

       [[ 0.78500196, 11.18111371],
        [ 0.89013497,  8.8738254 ]],

       [[ 0.71302601,  9.55162869],
        [ 0.71270987, 11.99170923]]])
mean       = (0.123456789, 10)
method     = 'svd'
random     = Generator(MT19937) at 0x14EAC29CE040
self       = <numpy.random.tests.test_generator_mt19937.TestRandomDist object at 0x14eac29cce50>
size       = (3, 2)

fgvanzee · 2021-03-14T19:18:07Z

Unfortunately SVD in libflame is not easy to debug because it uses a completely different algorithm than the one in LAPACK. So for now, I'll encourage you both to use netlib LAPACK + BLIS as your workaround. Apologies for the inconvenience.

boegel · 2021-03-14T19:35:44Z

@fgvanzee That's indeed the alternative approach we're going forward with, we've also seen some other problems with libFLAME (like #46).

boegel · 2021-03-15T18:49:37Z

Perhaps related to this: AMD has just released a new version of their libFLAME fork, see https://github.com/amd/libflame/releases/tag/3.0, which mentions in the release notes "Several bug fixes including handling denormal numbers in SVD functions".

I'm not sure those fixes are related to this issue, but it seems like they could be...

I'll try and find time to take that new AMD-libFLAME version for a spin, and see if I'm still running into problems with the numpy test suite.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Accuracy issue in SVD API "SGESDD " #45

Accuracy issue in SVD API "SGESDD " #45

srvasanth commented Dec 7, 2020 •

edited

boegel commented Mar 12, 2021

fgvanzee commented Mar 14, 2021

boegel commented Mar 14, 2021

boegel commented Mar 15, 2021

Accuracy issue in SVD API "SGESDD " #45

Accuracy issue in SVD API "SGESDD " #45

Comments

srvasanth commented Dec 7, 2020 • edited

Outputs from libflame+BLIS

Expected output:

boegel commented Mar 12, 2021

fgvanzee commented Mar 14, 2021

boegel commented Mar 14, 2021

boegel commented Mar 15, 2021

srvasanth commented Dec 7, 2020 •

edited