Segfault in fsspmdm #805

semi-h · 2023-08-08T14:52:46Z

I observe that libxsmm_fsspmdm_create is giving a segfault when ldb and ldc are large. The cutoff ldb/ldc value for segfault seems to vary a bit with the size of the A matrix.

I managed to recreate the issue with the pyfr samples in the libxsmm repository. Below there are three examples of segfaults. In the first two cases A matrix sizes are roughly the same but they have different nnz. Halving the ldb/ldc for the first two results in a successful run, and both fail at ldb=ldc=2,400,000 as shown. Last one is a larger A matrix but roughly the same nnz as the first example, and it fails at ldb=ldc=1,200,000.

$ ./pyfr_driver_asp_reg mats/p5/pri/m0-sp.mtx 2400000 1
CSR matrix data structure we just read (mats/p5/pri/m0-sp.mtx):
rows: 150, columns: 126, elements: 2520

LIBXSMM_VERSION: main_stable-1.17-3674 (25693786)
CLX/DP      TRY    JIT    STA    COL
   0..13      0      0      0      0 
  14..23      0      0      0      0 
  24..64      1      1      0      0 
Registry and code: 13 MB + 8 KB (gemm=1 spmdm=1)
Command: ./pyfr_driver_asp_reg mats/p5/pri/m0-sp.mtx 2400000 1
Uptime: 6.499610 s
Segmentation fault (core dumped)

$ ./pyfr_driver_asp_reg mats/p4/hex/m0-sp.mtx 2400000 1
CSR matrix data structure we just read (mats/p4/hex/m0-sp.mtx):
rows: 150, columns: 125, elements: 750

LIBXSMM_VERSION: main_stable-1.17-3674 (25693786)
CLX/DP      TRY    JIT    STA    COL
   0..13      0      0      0      0 
  14..23      0      0      0      0 
  24..64      1      1      0      0 
Registry and code: 13 MB + 8 KB (gemm=1 spmdm=1)
Command: ./pyfr_driver_asp_reg mats/p4/hex/m0-sp.mtx 2400000 1
Uptime: 5.788157 s
Segmentation fault (core dumped)

$ ./pyfr_driver_asp_reg mats/p6/hex/m0-sp.mtx 1200000 1
CSR matrix data structure we just read (mats/p6/hex/m0-sp.mtx):
rows: 294, columns: 343, elements: 2058

LIBXSMM_VERSION: main_stable-1.17-3674 (25693786)
CLX/DP      TRY    JIT    STA    COL
   0..13      0      0      0      0 
  14..23      0      0      0      0 
  24..64      0      0      0      0 
    > 64      1      1      0      0 
Registry and code: 13 MB + 12 KB (gemm=1 spmdm=1)
Command: ./pyfr_driver_asp_reg mats/p6/hex/m0-sp.mtx 1200000 1
Uptime: 10.935839 s
Segmentation fault (core dumped)

I used the latest available version of libxsmm but actually I first observed a segfault when running PyFR on Intel Skylake and ARM (Graviton2/3) a few months ago, just wasn't able to pinpoint until now where the segfault was originating. I believe the present issue was the root cause for all this so I think this issue first appeared at least a few months ago.

I run the above examples on an i7-1185G7 (Willow Cove). For building libxsmm I just did 'make' in the main folder and then again 'make' in samples/pyfr.

The text was updated successfully, but these errors were encountered:

FreddieWitherden · 2023-08-08T16:55:12Z

x86 displacements are limited to 32-bit signed integers. But log2(150*2400000*8) ~ 31.5. The matrix is sufficiently large that a single instruction can not reference the entire region. I can play some tricks to get us the full 32-bits by per-displacing the base pointer, but the right solution is to avoid jumbo matrices.

FreddieWitherden · 2023-08-08T18:01:27Z

@hfp Are you okay to add a check limiting the total size (k*ldb*sizeof(dtype) and m*ldc*sizeof(dtype)) to be less than 2**31? Technically, it is only needed on x86 but I would apply it to ARM too for consistency. This way the user gets a warning rather than a segfault.

alheinecke · 2023-08-08T21:45:43Z

this is in my eyes a hot fix.

I would like to see where the bug is in the code gen, and we can easily fix the large displacement issue with SIB addressing mode.

FreddieWitherden · 2023-08-08T22:20:54Z

The main offender for ldb is: https://github.com/libxsmm/libxsmm/blob/main_stable/src/generator_spgemm_csr_asparse_reg.c#L723 (with the displacement being defined as: https://github.com/libxsmm/libxsmm/blob/main_stable/src/generator_spgemm_csr_asparse_reg.c#L597) while for ldc: https://github.com/libxsmm/libxsmm/blob/main_stable/src/generator_spgemm_csr_asparse_reg.c#L734

hfp · 2023-08-14T07:19:43Z

@hfp Are you okay to add a check limiting the total size (k*ldb*sizeof(dtype) and m*ldc*sizeof(dtype)) to be less than 2**31? Technically, it is only needed on x86 but I would apply it to ARM too for consistency. This way the user gets a warning rather than a segfault.

I am ok with it (also my 1st thought was to check the input). However, with Alex' fix this is not necessary except for hotfix. If support for the full/anticipated range keeps slipping, we can still deploy a range-check.

FreddieWitherden · 2023-08-14T10:17:57Z

So the most efficient means of supporting this is probably through using several registers for storing b and c pointers each displaced by 4 GiB. By burning 6 GPRs we can support 12 GiB of input and 12 GiB of output without any real additional cost. (I think that we have the GPRs to spare.)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Segfault in fsspmdm #805

Segfault in fsspmdm #805

semi-h commented Aug 8, 2023

FreddieWitherden commented Aug 8, 2023

FreddieWitherden commented Aug 8, 2023 •

edited

alheinecke commented Aug 8, 2023 •

edited

FreddieWitherden commented Aug 8, 2023

hfp commented Aug 14, 2023

FreddieWitherden commented Aug 14, 2023

Segfault in fsspmdm #805

Segfault in fsspmdm #805

Comments

semi-h commented Aug 8, 2023

FreddieWitherden commented Aug 8, 2023

FreddieWitherden commented Aug 8, 2023 • edited

alheinecke commented Aug 8, 2023 • edited

FreddieWitherden commented Aug 8, 2023

hfp commented Aug 14, 2023

FreddieWitherden commented Aug 14, 2023

FreddieWitherden commented Aug 8, 2023 •

edited

alheinecke commented Aug 8, 2023 •

edited