Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: encode instr_info_t::code as uint #6211

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

dvyukov
Copy link
Contributor

@dvyukov dvyukov commented Jul 17, 2023

RFC: encode instr_info_t::code as uint

Change instr_info_t::code from ptr_int_t to uint.
This strips ~128K from drdecode binary and ~50K
from runtime VM overhead:

    FILE SIZE        VM SIZE
 --------------  --------------
   +21% +5.60Ki   +22% +5.61Ki    .rodata
  +1.5%    +720  [ = ]       0    .symtab
  +1.8%    +431  [ = ]       0    .strtab
  +0.5%    +384  [ = ]       0    [ELF Section Headers]
  +0.5%    +264  [ = ]       0    .rela.text
  +0.1%    +117  +0.1%    +137    .text
   +31%     +17  [ = ]       0    .llvm_addrsig
  +0.2%     +15  [ = ]       0    [Unmapped]
  +0.0%      +4  [ = ]       0    .rodata.str1.1
 -19.3% -61.3Ki -19.3% -61.3Ki    .data.rel.ro
 -31.0% -73.9Ki  [ = ]       0    .rela.data.rel.ro
 -11.5%  -127Ki -10.6% -55.6Ki    TOTAL

The idea is to change the pointer to table number

If this change looks good in general,
I will work on productionizing it and update other arches.
Format of the uint encoding is discus-sable.
For example, if we collect all instr_info_t into a single
global array (and then make individual tables pointers into it),
then the encoding can be made simpler and faster,
but will require more significant changes.

In preparation for the next commit:
 - use the same macros for op_instr that are used
   to fill in instr_info_t::code in the tables
 - make these macros accept table indices
 - better encapsulate how instr_info_t::code is casted
   to instr_info_t pointer

No functional changes.

The bulk of the change was done with a bunch of
sed invocations along the lines of:

sed "s#&first_byte\[\(.*\?\)\]#tfb(\1)#g"
sed "s#&x64_extensions\[\(.*\?\)\]\[\(.*\?\)\]#t64e(\1, \2)#g"
sed "s#,\(\s*\)\([a-z][a-z0-9]\+\)\[\(.*\?\)\]}#,\1\2(\3)}#g"
sed "s#,\(\s*\)\([a-z][a-z0-9]\+\)\[\(.*\?\)\]\[\(.*\?\)\]}#,\1\2(\3, \4)}#g"
Change instr_info_t::code from ptr_int_t to uint.
This strips ~128K from drdecode binary and ~50K
from runtime VM overhead:

    FILE SIZE        VM SIZE
 --------------  --------------
   +21% +5.60Ki   +22% +5.61Ki    .rodata
  +1.5%    +720  [ = ]       0    .symtab
  +1.8%    +431  [ = ]       0    .strtab
  +0.5%    +384  [ = ]       0    [ELF Section Headers]
  +0.5%    +264  [ = ]       0    .rela.text
  +0.1%    +117  +0.1%    +137    .text
   +31%     +17  [ = ]       0    .llvm_addrsig
  +0.2%     +15  [ = ]       0    [Unmapped]
  +0.0%      +4  [ = ]       0    .rodata.str1.1
 -19.3% -61.3Ki -19.3% -61.3Ki    .data.rel.ro
 -31.0% -73.9Ki  [ = ]       0    .rela.data.rel.ro
 -11.5%  -127Ki -10.6% -55.6Ki    TOTAL

The idea is to change the pointer to table number
+ index in the table encoded as a single uint.
This removes the need in runtime relocations for pointers
(each taking 24 bytes in the binary).
Along with DynamoRIO#6206 this should also shrink sizeof(instr_info_t)
from 48 bytes to 32 bytes (not fully realized w/o DynamoRIO#6206 due
to alignment).

If this change looks good in general,
I will work on productionizing it and update other arches.
Format of the uint encoding is discussable.
For example, if we collect all instr_info_t into a single
global array (and then make individual tables pointers into it),
then the encoding can be made simpler and faster,
but will require more significant changes.
@dvyukov
Copy link
Contributor Author

dvyukov commented Jul 17, 2023

CC @melver

@dvyukov dvyukov marked this pull request as draft July 17, 2023 13:14
@derekbruening
Copy link
Contributor

I discussed this with @abhinav92003 and we agree the change looks reasonable: the code readability impact is acceptable. I would keep it as the two-level lookup rather than trying to make a single array.

@derekbruening
Copy link
Contributor

Maybe see if @khuey has any further input

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants