Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add possibility to reduce pool-block size upon micro-kernel behavior #478

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

hominhquan
Copy link
Contributor

  • Currently, MC (or NC) size of of pool-block is extended by max(MR, NR)
    to cover possible invalid prefetch (speculative) of micro-kernels, typically
    for their convenience in the last KC-iteration.

  • This commit adds a macro BLIS_UKERNELS_NO_SPECULATIVE_PREFETCH to leave a
    possibility to any architecture which, in exchange of carefull prefetching in
    their micro-kernels to avoid invalid loads, to save some memory space.

- Currently, MC (or NC) size of of pool-block is extended by max(MR, NR)
to cover possible invalid prefetch (speculative) of micro-kernels, typically
for their convenience in the last KC-iteration.

- This commit adds a macro BLIS_UKERNELS_NO_SPECULATIVE_PREFETCH to leave a
possibility to any architecture which, in exchange of carefull prefetching in
their micro-kernels to avoid invalid loads, to save some memory space.
@devinamatthews
Copy link
Member

@hominhquan I agree with the idea, but I think it might be simpler, clearer, and more flexible to instead create a macro that encodes how much to extend MC/NC when allocating, e.g. BLIS_GEMM_PANEL_PADDING or something. If not set, it could default to max(MR,NR). Then ukernels could set it to 0 or any other suitable value. @fgvanzee does this value already have a name internally?

Aside: "prefetch" isn't the right word because the prefetch is harmless, it's the speculative load that is the problem.

@hominhquan
Copy link
Contributor Author

@devinamatthews thanks for your comment, and okay to a yet-to-be-named BLIS_GEMM_PANEL_PADDING macro which is even more generic.

@fgvanzee
Copy link
Member

@hominhquan I agree with the idea, but I think it might be simpler, clearer, and more flexible to instead create a macro that encodes how much to extend MC/NC when allocating, e.g. BLIS_GEMM_PANEL_PADDING or something. If not set, it could default to max(MR,NR). Then ukernels could set it to 0 or any other suitable value. @fgvanzee does this value already have a name internally?

@devinamatthews Are you asking if I have coined terminology for this additional micropanel of space reserved in the packing blocks for A and B? Or are you asking if I've already parameterized this within the build system / cpp macros?

@devinamatthews
Copy link
Member

Either, but especially the latter.

@fgvanzee
Copy link
Member

Either, but especially the latter.

I guess I don't have a name for it yet. (I never expected there would be any need or desire to isolate it for disabling.)

but I think it might be simpler, clearer, and more flexible to instead create a macro that encodes how much to extend MC/NC when allocating, e.g. BLIS_GEMM_PANEL_PADDING or something. If not set, it could default to max(MR,NR).

In principle, I sympathize with wanting to make this a cpp macro parameter that can be set in a header file in the subconfiguration directory.

The problem I see with this is that the extra micropanel's size depends on the register blocksizes, which are set on a per-datatype basis. So you couldn't nicely capture the absolute size of padding for all datatypes with one number; you'd need four constants--one for each datatype.

One could imagine, however, a single constant--probably a double--that determines the size of padding relative to the size of one micropanel. We would have to establish clear rules how that micropanel size would be defined. For example, currently that micropanel is basically max( packmr, packnr ) * kc * dt in size. So maybe that would be our fundamental unit size.

This would allow @hominhquan to zero out the constant, but allow others to increase it. The only limitations are that (1) they'd have to express the constant in units of micropanels, and (2) the constant would not vary across datatypes (although the number of bytes would naturally vary given that the register blocksizes vary).

As an aside, I see some code that I could probably clean up and simplify. BLIS has not used right-side trsm macrokernels in a long time, and I don't think we will go back to that regime. If I can commit to that, then there is no need for computing the "scaling factors" that facilitate swapping of MR and NR.

@devinamatthews
Copy link
Member

4 macros seems reasonable as ukernels for different datatypes may have different behavior.

@fgvanzee
Copy link
Member

4 macros seems reasonable as ukernels for different datatypes may have different behavior.

I'm really trying to move BLIS away from situations where we have to explicitly handle different datatypes in the code because it adds yet another place that requires tending-to and maintenance when adding support for new datatypes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants