Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What is the best way to debug BLIS? #797

Open
dmikushin opened this issue Feb 16, 2024 · 2 comments
Open

What is the best way to debug BLIS? #797

dmikushin opened this issue Feb 16, 2024 · 2 comments

Comments

@dmikushin
Copy link

dmikushin commented Feb 16, 2024

Dear Developers!

I've been looking around in BLIS for a week and came to a conclusion that the codebase is being developed by brilliant people. I don't have any other explanation to the fact that the BLIS code is absolutely impossible to debug. The heavy use of functional macros does not allow gdb to step into almost any meaningful line of code. So I conclude the issue analysis with a debugger is discouraged in BLIS. What alternative debug methods would you recommend for regular engineers like me, who make many mistakes?

@dmikushin
Copy link
Author

For example, I'm debugging BLIS 751d0a1 in Ubuntu 22.04 :

Thread 1 "test_gemmd" received signal SIGBUS, Bus error.
0x00007ffff72b427c in bao_zpackm_cxk (conja=BLIS_NO_CONJUGATE, schema=BLIS_PACKED_COL_PANELS, panel_dim=4, panel_dim_max=4, panel_len=128, panel_len_max=128, kappa=0x7fffffffbb10, d=0x5555555904b0, incd=1, a=0x7ffff7f9a010, inca=1, lda=128, p=0x7fff7b5be000, ldp=4, cntx=0x555555590ce0) at .../blis/addon/gemmd/bao_packm_cxk.c:314
314							bli_zzzscal2s( *ali, *dl, *pli );
(gdb) disass
Dump of assembler code for function bao_zpackm_cxk:
...
   0x00007ffff72b426a <+1174>:	mov    0x40(%rbp),%rax
   0x00007ffff72b426e <+1178>:	add    %rdx,%rax
   0x00007ffff72b4271 <+1181>:	mov    %rax,-0x80(%rbp)
   0x00007ffff72b4275 <+1185>:	mov    -0x90(%rbp),%rax
=> 0x00007ffff72b427c <+1192>:	movsd  (%rax),%xmm1
   0x00007ffff72b4280 <+1196>:	mov    -0x88(%rbp),%rax
   0x00007ffff72b4287 <+1203>:	movsd  (%rax),%xmm0

Does not happen in Release mode, but valgrind spots "0 bytes after a block of size 131,072 alloc'd". However, vectorized out-of-range reads are valid sometimes. Don't know, as I said: without debugging I'm blind.

Am I running into something similar to #550 ?

@rvdg
Copy link
Collaborator

rvdg commented Feb 16, 2024

I am not one of the developers, although I have worked with them for many years. They are brilliant, but there are techniques that can be shared.

May I suggest you join the BLIS Discord server https://github.com/flame/blis/blob/master/docs/Discord.md, maybe on the "general" channel and pose the question there, since it could become a broader discussion?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants