Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shave an instruction off len() #357

Open
overlookmotel opened this issue Jan 4, 2024 · 10 comments
Open

Shave an instruction off len() #357

overlookmotel opened this issue Jan 4, 2024 · 10 comments

Comments

@overlookmotel
Copy link
Contributor

overlookmotel commented Jan 4, 2024

I believe it's possible to shave an instruction off len() (and probably as_slice() too) if static string had same discriminant as heap.

Static string could instead be represented as a heap string with capacity 0, which I assume isn't otherwise a possible state. Then:

const HEAP_MASK_AFTER_SUB: usize = (HEAP_MASK as usize)
  .wrapping_sub(LENGTH_MASK as usize);

pub fn len(&self) -> usize {
  let len_heap = ensure_read(self.1);
  let last_byte = self.last_byte() as usize;
  let mut len = last_byte.wrapping_sub(LENGTH_MASK as usize);
  let is_heap = len == HEAP_MASK_AFTER_SUB;
  len = len.min(MAX_SIZE);
  if is_heap {
    len = len_heap;
  }
  len
}

On 64-bit machines, MAX_SIZE and HEAP_MASK_AFTER_SUB are the same (24). So this shaves off a cmp instruction as len == HEAP_MASK_AFTER_SUB and .min(MAX_SIZE) both work off the result of a single cmp, rather than requiring 2. And (if I'm reading the ASM right) it also uses one less register.

https://godbolt.org/z/e7Kc4PMTE

Quite possibly handling the special case of zero-capacity heap strings (which are actually static strings) would introduce costs elsewhere, but I'm not familiar enough with the code to say. Can anyone advise if that's likely to negate the gain?

@NobodyXu
Copy link
Contributor

NobodyXu commented Jan 4, 2024

It would probably affect as_str, as_bytes, reserve and as_mut_bytes, which is where the static str repr is special cased

@overlookmotel
Copy link
Contributor Author

Thanks for coming back so swiftly. Do you think this is worth further investigation or, does it sound like it's not going to be viable for that reason?

@NobodyXu
Copy link
Contributor

NobodyXu commented Jan 4, 2024

I think it worths giving it a shot

@overlookmotel
Copy link
Contributor Author

OK cool. I'll take a closer look when I get some time.

@Kijewski
Copy link
Contributor

Kijewski commented Jan 5, 2024

This is great! In your variant the ensure_read() optimization barrier can also be removed, probably because the compiler understands your intentions better than it does with the current code:

pub fn len_new_new(c: &C) -> usize {
    let last_byte = c.5 as usize;
    let mut len = last_byte.wrapping_sub(LENGTH_MASK as usize);
    let is_heap = len == HEAP_MASK_AFTER_SUB;
    len = len.min(MAX_SIZE);
    if is_heap {
        len = c.1;
    }
    len
}
example::len_new:
        movq    8(%rdi), %rcx
        movzbl  23(%rdi), %edx
        addq    $-192, %rdx
        cmpq    $24, %rdx
        movl    $24, %eax
        cmovbq  %rdx, %rax
        cmoveq  %rcx, %rax
        retq

example::len_new_new:
        movzbl  23(%rdi), %ecx
        addq    $-192, %rcx
        cmpq    $24, %rcx
        movl    $24, %eax
        cmovbq  %rcx, %rax
        cmoveq  8(%rdi), %rax
        retq

Without the ensure_read() blackbox the compiler might optimize better down the line, e.g. omitting repeated calls to this method, because it can tell that the code is pure.

@overlookmotel
Copy link
Contributor Author

overlookmotel commented Jan 6, 2024

Bingo! That's 2 less instructions, and 2 less registers.

I think I'd misunderstood the purpose of ensure_read(). I had thought the point of it was to kick off loading len_heap from memory as soon as possible, so that by the time you get to the cmoveq it may be ready in a register and so no stall while waiting for it to load.

Is that not the point of it at all?

And, if I may, another basic question: Aside from reducing instructions, is minimising the number of registers a function uses a useful thing to do?

(NB I'm new to all this reading the runes of ASM business, so really feeling my way here. Any pointers much appreciated.)

@NobodyXu
Copy link
Contributor

NobodyXu commented Jan 6, 2024

And, if I may, another basic question: Aside from reducing instructions, is minimising the number of registers a function uses a useful thing to do?

It depends though, if it uses callee-saved register then I think it could be useful.

Looking at example::len_new_new posted by @Kijewski it already looks quite optimal to me.

@mcronce
Copy link
Contributor

mcronce commented Apr 18, 2024

Also, if any call to the function is inlined, those saved registers can potentially have a positive impact on the caller

@ParkMyCar
Copy link
Owner

@overlookmotel this is great, thank you!

Static string could instead be represented as a heap string with capacity 0, which I assume isn't otherwise a possible state.

Yes! This is an idea I've had bouncing around for a bit, I think there are some optimizations we can make if we modify the discriminant a bit.

I'll whip up a PR this weekend, thanks again for the idea!

@overlookmotel
Copy link
Contributor Author

Thanks for taking it up. Glad you like it. I've learned a lot from the bit-fiddling tricks this library uses, so glad to contribute an idea (even if not the implementation!)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants