RFC: encode instr_info_t::code as uint #6211

dvyukov · 2023-07-17T13:13:20Z

RFC: encode instr_info_t::code as uint

Change instr_info_t::code from ptr_int_t to uint.
This strips ~128K from drdecode binary and ~50K
from runtime VM overhead:

    FILE SIZE        VM SIZE
 --------------  --------------
   +21% +5.60Ki   +22% +5.61Ki    .rodata
  +1.5%    +720  [ = ]       0    .symtab
  +1.8%    +431  [ = ]       0    .strtab
  +0.5%    +384  [ = ]       0    [ELF Section Headers]
  +0.5%    +264  [ = ]       0    .rela.text
  +0.1%    +117  +0.1%    +137    .text
   +31%     +17  [ = ]       0    .llvm_addrsig
  +0.2%     +15  [ = ]       0    [Unmapped]
  +0.0%      +4  [ = ]       0    .rodata.str1.1
 -19.3% -61.3Ki -19.3% -61.3Ki    .data.rel.ro
 -31.0% -73.9Ki  [ = ]       0    .rela.data.rel.ro
 -11.5%  -127Ki -10.6% -55.6Ki    TOTAL

The idea is to change the pointer to table number

index in the table encoded as a single uint.
This removes the need in runtime relocations for pointers
(each taking 24 bytes in the binary).
Along with RFC: convert global opcode string pointers to offsets #6206 this should also shrink sizeof(instr_info_t)
from 48 bytes to 32 bytes (not fully realized w/o RFC: convert global opcode string pointers to offsets #6206 due
to alignment).

If this change looks good in general,
I will work on productionizing it and update other arches.
Format of the uint encoding is discus-sable.
For example, if we collect all instr_info_t into a single
global array (and then make individual tables pointers into it),
then the encoding can be made simpler and faster,
but will require more significant changes.

In preparation for the next commit: - use the same macros for op_instr that are used to fill in instr_info_t::code in the tables - make these macros accept table indices - better encapsulate how instr_info_t::code is casted to instr_info_t pointer No functional changes. The bulk of the change was done with a bunch of sed invocations along the lines of: sed "s#&first_byte\[\(.*\?\)\]#tfb(\1)#g" sed "s#&x64_extensions\[\(.*\?\)\]\[\(.*\?\)\]#t64e(\1, \2)#g" sed "s#,\(\s*\)\([a-z][a-z0-9]\+\)\[\(.*\?\)\]}#,\1\2(\3)}#g" sed "s#,\(\s*\)\([a-z][a-z0-9]\+\)\[\(.*\?\)\]\[\(.*\?\)\]}#,\1\2(\3, \4)}#g"

Change instr_info_t::code from ptr_int_t to uint. This strips ~128K from drdecode binary and ~50K from runtime VM overhead: FILE SIZE VM SIZE -------------- -------------- +21% +5.60Ki +22% +5.61Ki .rodata +1.5% +720 [ = ] 0 .symtab +1.8% +431 [ = ] 0 .strtab +0.5% +384 [ = ] 0 [ELF Section Headers] +0.5% +264 [ = ] 0 .rela.text +0.1% +117 +0.1% +137 .text +31% +17 [ = ] 0 .llvm_addrsig +0.2% +15 [ = ] 0 [Unmapped] +0.0% +4 [ = ] 0 .rodata.str1.1 -19.3% -61.3Ki -19.3% -61.3Ki .data.rel.ro -31.0% -73.9Ki [ = ] 0 .rela.data.rel.ro -11.5% -127Ki -10.6% -55.6Ki TOTAL The idea is to change the pointer to table number + index in the table encoded as a single uint. This removes the need in runtime relocations for pointers (each taking 24 bytes in the binary). Along with DynamoRIO#6206 this should also shrink sizeof(instr_info_t) from 48 bytes to 32 bytes (not fully realized w/o DynamoRIO#6206 due to alignment). If this change looks good in general, I will work on productionizing it and update other arches. Format of the uint encoding is discussable. For example, if we collect all instr_info_t into a single global array (and then make individual tables pointers into it), then the encoding can be made simpler and faster, but will require more significant changes.

dvyukov · 2023-07-17T13:14:14Z

CC @melver

derekbruening · 2023-07-17T19:38:31Z

I discussed this with @abhinav92003 and we agree the change looks reasonable: the code readability impact is acceptable. I would keep it as the two-level lookup rather than trying to make a single array.

derekbruening · 2023-07-17T19:55:54Z

Maybe see if @khuey has any further input

dvyukov added 2 commits July 17, 2023 14:59

dvyukov marked this pull request as draft July 17, 2023 13:14

dvyukov mentioned this pull request Jul 20, 2023

RFC: convert global opcode string pointers to offsets #6206

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: encode instr_info_t::code as uint #6211

RFC: encode instr_info_t::code as uint #6211

dvyukov commented Jul 17, 2023

dvyukov commented Jul 17, 2023

derekbruening commented Jul 17, 2023

derekbruening commented Jul 17, 2023

RFC: encode instr_info_t::code as uint #6211

Are you sure you want to change the base?

RFC: encode instr_info_t::code as uint #6211

Conversation

dvyukov commented Jul 17, 2023

dvyukov commented Jul 17, 2023

derekbruening commented Jul 17, 2023

derekbruening commented Jul 17, 2023