8332265: RISC-V: Materialize pointers faster by using a temp register #19246

robehn · 2024-05-15T09:34:11Z

Hi, please consider!

Materializing a 48-bit pointer, using an additional register, we can do with:
lui + lui + slli + add + addi
This 15% faster both on VF2 and in CPU models, compared to movptr().

As we often materialize during calls there is free registers.

I have choose just a few spot to use it, many more can use.
E.g. la() with tmp register can use li48 instead of movptr.

Running tests now (so far so good), as if I screwed up IC calls it should be seen fast.
And benchmarks when hardware is free.

Progress

Change must be properly reviewed (1 review required, with at least 1 Reviewer)
Change must not contain extraneous whitespace
Commit message must refer to an issue

Issue

JDK-8332265: RISC-V: Materialize pointers faster by using a temp register (Enhancement - P4)

Reviewers

Fei Yang (@RealFYang - Reviewer)
Ludovic Henry (@luhenry - Committer)
Hamlin Li (@Hamlin-Li - Reviewer)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/19246/head:pull/19246
$ git checkout pull/19246

Update a local copy of the PR:
$ git checkout pull/19246
$ git pull https://git.openjdk.org/jdk.git pull/19246/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 19246

View PR using the GUI difftool:
$ git pr show -t 19246

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/19246.diff

Webrev

Link to Webrev Comment

bridgekeeper · 2024-05-15T09:34:50Z

👋 Welcome back rehn! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

openjdk · 2024-05-15T09:34:56Z

@robehn This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8332265: RISC-V: Materialize pointers faster by using a temp register

Reviewed-by: fyang, luhenry, mli

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been no new commits pushed to the master branch. If another commit should be pushed before you perform the /integrate command, your PR will be automatically rebased. If you prefer to avoid any potential automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

openjdk · 2024-05-15T09:35:35Z

@robehn The following label will be automatically applied to this pull request:

hotspot

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

mlbridge · 2024-05-15T09:41:11Z

Webrevs

RealFYang

Hi, This looks interesting. Could all the movptr callsites be changed? I am asking this as I am a bit worried about the complexity / reward ratio when we have both movptr and li48 which are the same in functionality.

robehn · 2024-05-16T07:52:40Z

Yes, but it's a long term job, as you need to free a register in many cases. (in non-call sites places)
All callsites should be easy to change as you have plenty of callee saved registers which are already saved when using movptr.

As li48 is faster than li when using more than 32-bits these cases should also use li48.
I.e. mv t0, addr
But mv is fishy partly because of RegisterOrConstant constructor, so we can't tell in mv if this was an address or not.
I have been looking into cleaning that up, so mv with literal and mv with address is two seperate cases.
To keep them apart would be to use e.g. "li reg, literal" and "li48 reg, temp_reg, address".

As there is much work, this PR is intended as the first step with the hardest peices implemented already, i.e. li48 is ready to go.

If we also fix mov_metadata la()->li48 we reduce static call stub size down from 12 to 10 instruction, which is significant.
That one is on my todo list.

RealFYang · 2024-05-20T02:01:54Z

Yes, but it's a long term job, as you need to free a register in many cases. (in non-call sites places) All callsites should be easy to change as you have plenty of callee saved registers which are already saved when using movptr.

OK, I guess this might be a good compromise. Inspired by PPC's Assembler::load_const, MacroAssembler::get_const and MacroAssembler::patch_const [1-3], I think we could have a similar design. Adding one extra tmp register param for movptr like void movptr(Register Rd, address addr, int32_t &offset, Register tmp=noreg);, we can factor out li48 then.
The only difference compared with PPC's solution is that that we will have different sizes depending on whether we could find a tmp register for movptr. But I guess that's not a big issue? We can add a reference param (say, size) to existing is_movptr_at/is_movptr, we get the correct size when checking the instruction sequence and return this in size.
I prefer this way as I don't think it's a good idea to have more li variants like li48 and it's really bad to have both movptr and li48 which duplicate the functionality. We should also remove the existing li64 which is not used anywhere. Could you please consider this?

[1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/ppc/assembler_ppc.cpp#L323
[2] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/ppc/macroAssembler_ppc.cpp#L327
[3] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/ppc/macroAssembler_ppc.cpp#L349

robehn · 2024-05-20T13:24:27Z

Hey, I did an update, not fully what you are saying.
We are missing a lot of 'abstraction' e.g. take a look at CodeInstaller::pd_next_offset in jvmciCodeInstaller.
I think this code should look like:

jint CodeInstaller::pd_next_offset(NativeInstruction* inst, jint pc_offset, JVMCI_TRAPS) {
  if(inst->is_call() || inst->is_jump() || inst->is_movptr()) {
    return pc_offset + inst->size();
  }
  JVMCI_ERROR_0("unsupported type of instruction for call site");
}

But I need to add a bunch of stuff to unrelated NativeInst, I think that is better suited in another PR.

src/hotspot/cpu/riscv/macroAssembler_riscv.cpp

RealFYang

Thanks for the update. Taking a more closer look.

src/hotspot/cpu/riscv/macroAssembler_riscv.cpp

RealFYang

Hi, Nice work! I only have several minor comments.

src/hotspot/cpu/riscv/nativeInst_riscv.hpp

src/hotspot/cpu/riscv/riscv.ad

src/hotspot/cpu/riscv/macroAssembler_riscv.cpp

src/hotspot/cpu/riscv/nativeInst_riscv.hpp

src/hotspot/cpu/riscv/macroAssembler_riscv.cpp

robehn · 2024-05-22T08:32:16Z

Thanks @luhenry ! Thanks for the second review pass @RealFYang !

RealFYang

Three more minor comments, looks good otherwise. Thanks.
BTW: You need to merge master and resolve conflicts :-)

src/hotspot/cpu/riscv/nativeInst_riscv.hpp

src/hotspot/cpu/riscv/riscv.ad

src/hotspot/cpu/riscv/nativeInst_riscv.hpp

robehn · 2024-05-22T18:38:01Z

Yes, thanks, done!

src/hotspot/cpu/riscv/macroAssembler_riscv.cpp

src/hotspot/cpu/riscv/nativeInst_riscv.hpp

robehn · 2024-05-23T11:01:16Z

Thanks again!

RealFYang

Updated change looks good. It would be nice to see how much this will benefit performance.

src/hotspot/cpu/riscv/macroAssembler_riscv.cpp

Hamlin-Li · 2024-05-24T10:55:27Z

And a general question about nativeInst_riscv.cpp and macroAssembler_riscv.cpp. I saw the functions in these 2 files call each other, that make the code a bit mess to me. It's not an issue introduced in this pr.
I wonder if this could be refactored? If so, I can work on it. But just in case you have easy answer already, so I don't have to do further investigation.

robehn · 2024-05-24T11:27:07Z

Updated change looks good. It would be nice to see how much this will benefit performance.

I tried todo some benchmarks but it seems like the error of them are larger than the benefit.
I'll try todo some longer runs, and minimize the error.

robehn · 2024-05-24T12:39:32Z

And a general question about nativeInst_riscv.cpp and macroAssembler_riscv.cpp. I saw the functions in these 2 files call each other, that make the code a bit mess to me. It's not an issue introduced in this pr. I wonder if this could be refactored? If so, I can work on it. But just in case you have easy answer already, so I don't have to do further investigation.

I agree, I would prefer having classes for the instruction where all the instruction functionality would be.
As it's now the opcodes are reapted everywhere, instead it should just be in in-place, this class.
And then have classes for instruction sequence where we keep all functionality gathered.

Hamlin-Li · 2024-05-24T13:38:32Z

And a general question about nativeInst_riscv.cpp and macroAssembler_riscv.cpp. I saw the functions in these 2 files call each other, that make the code a bit mess to me. It's not an issue introduced in this pr. I wonder if this could be refactored? If so, I can work on it. But just in case you have easy answer already, so I don't have to do further investigation.

I agree, I would prefer having classes for the instruction where all the instruction functionality would be. As it's now the opcodes are reapted everywhere, instead it should just be in in-place, this class. And then have classes for instruction sequence where we keep all functionality gathered.

OK, let me do some further investigation to see if we can make it more readable and maintainable.

Hamlin-Li

OK, let's move forward. :)
I create some bugs to track the further work.
https://bugs.openjdk.org/browse/JDK-8332899
https://bugs.openjdk.org/browse/JDK-8332900

Feel free to take them if you're also interested in them.

robehn · 2024-05-24T13:47:20Z

OK, let's move forward. :) I create some bugs to track the further work. https://bugs.openjdk.org/browse/JDK-8332899 https://bugs.openjdk.org/browse/JDK-8332900

Feel free to take them if you're also interested in them.

Thank you!

robehn · 2024-05-28T06:53:43Z

Updated change looks good. It would be nice to see how much this will benefit performance.

Here are 'some' number, it still unclear if these actually are significant:

BASELINE                       | movptr2
fop                  3120 msec |  2811 msec =0.900962
h2                  19156 msec | 17600 msec =0.918772
jython              24060 msec | 23343 msec =0.9702
luindex              3222 msec |  3226 msec =1.00124
lusearch             4383 msec |  4380 msec =0.999316
lusearch-fix         4096 msec |  4359 msec =1.06421
pmd                  7417 msec |  7342 msec =0.989888
jython              24060 msec | 23343 msec =0.9702
fop(Xcomp)           3060 msec |  3058 msec =0.999346
h2(Xcomp)           38724 msec | 38717 msec =0.999819
jython(Xcomp)       29999 msec | 29694 msec =0.989833
luindex(Xcomp)       5259 msec |  5195 msec =0.98783
lusearch(Xcomp)      6364 msec |  6269 msec =0.985072
lusearch-fix(Xcomp)  6430 msec |  6534 msec =1.01617
pmd(Xcomp)           7360 msec |  6999 msec =0.950951
jython(Xcomp)       29999 msec | 29694 msec =0.989833
Avg:0.983353

The issue is that reaching steady state AKA varmup takes tremondiusly long time.
As it takes long time it causes compilations differences between runs and it's unclear if we actually have finished varmup. Hence I run with Xcomp to get a more stable result.

Integrating later today!

robehn · 2024-05-28T12:53:34Z

/integrate

openjdk · 2024-05-28T12:54:39Z

Going to push as commit 7b52d0a.
Since your change was applied there have been 23 commits pushed to the master branch:

aa4c83a: 8332505: JEP 457: ClassRemapper forgets to remap bootstrap method references
cabe337: 8331921: Hotspot assembler files should use common logic to setup exported functions
2edb6d9: 8330386: Replace Opaque4Node of Initialized Assertion Predicate with new OpaqueInitializedAssertionPredicateNode
1850914: 8332864: Parallel: Merge ParMarkBitMapClosure into MoveAndUpdateClosure
2f2cf38: 8332883: Some simple cleanup in vectornode.cpp
b5e1615: 8292955: Collections.checkedMap Map.merge does not properly check key and value
86eb5d9: 8329958: Windows x86 build fails: downcallLinker.cpp(36) redefinition
be1d374: 8332825: ubsan: guardedMemory.cpp:35:11: runtime error: null pointer passed as argument 2, which is declared to never be null
ed81a47: 8327964: Simplify BigInteger.implMultiplyToLen intrinsic
08face8: 8332890: Module imports don't work inside the same module
... and 13 more: https://git.openjdk.org/jdk/compare/16dba04e8dfa871f8056480a42a9baeb24a2fb24...master

Your commit was automatically rebased without conflicts.

openjdk · 2024-05-28T12:54:44Z

@robehn Pushed as commit 7b52d0a.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

li48

f5efaca

openjdk bot added the rfr Pull request is ready for review label May 15, 2024

openjdk bot added the hotspot hotspot-dev@openjdk.org label May 15, 2024

RealFYang reviewed May 16, 2024

View reviewed changes

robehn added 2 commits May 20, 2024 08:09

Merge branch 'master' into 8332265

f2c172c

li48 -> movptr

edfdda2

luhenry reviewed May 20, 2024

View reviewed changes

src/hotspot/cpu/riscv/macroAssembler_riscv.cpp Outdated Show resolved Hide resolved

src/hotspot/cpu/riscv/macroAssembler_riscv.cpp Outdated Show resolved Hide resolved

RealFYang reviewed May 21, 2024

View reviewed changes

src/hotspot/cpu/riscv/macroAssembler_riscv.cpp Show resolved Hide resolved

src/hotspot/cpu/riscv/macroAssembler_riscv.cpp Show resolved Hide resolved

luhenry reviewed May 21, 2024

View reviewed changes

src/hotspot/cpu/riscv/macroAssembler_riscv.cpp Outdated Show resolved Hide resolved

robehn added 2 commits May 21, 2024 10:47

Small review update

09db287

Merge branch 'master' into 8332265

c406294

luhenry approved these changes May 21, 2024

View reviewed changes

RealFYang suggested changes May 22, 2024

View reviewed changes

robehn added 2 commits May 22, 2024 10:09

Merge branch 'master' into 8332265

a4e88a4

Review changes

e017302

RealFYang reviewed May 22, 2024

View reviewed changes

src/hotspot/cpu/riscv/nativeInst_riscv.hpp Show resolved Hide resolved

src/hotspot/cpu/riscv/riscv.ad Outdated Show resolved Hide resolved

src/hotspot/cpu/riscv/nativeInst_riscv.hpp Outdated Show resolved Hide resolved

robehn added 2 commits May 22, 2024 20:23

More review comments

0048ea1

Merge branch 'master' into 8332265

9134d4e

luhenry reviewed May 23, 2024

View reviewed changes

src/hotspot/cpu/riscv/macroAssembler_riscv.cpp Outdated Show resolved Hide resolved

Fixed comments

26120eb

luhenry reviewed May 23, 2024

View reviewed changes

src/hotspot/cpu/riscv/nativeInst_riscv.hpp Outdated Show resolved Hide resolved

luhenry reviewed May 23, 2024

View reviewed changes

src/hotspot/cpu/riscv/nativeInst_riscv.hpp Show resolved Hide resolved

Fixed more comments

cf4c306

luhenry approved these changes May 23, 2024

View reviewed changes

RealFYang approved these changes May 23, 2024

View reviewed changes

openjdk bot added the ready Pull request is ready to be integrated label May 23, 2024

Hamlin-Li reviewed May 24, 2024

View reviewed changes

src/hotspot/cpu/riscv/macroAssembler_riscv.cpp Show resolved Hide resolved

src/hotspot/cpu/riscv/macroAssembler_riscv.cpp Show resolved Hide resolved

src/hotspot/cpu/riscv/macroAssembler_riscv.cpp Show resolved Hide resolved

Hamlin-Li approved these changes May 24, 2024

View reviewed changes

Merge branch 'master' into 8332265

8dff3f1

openjdk bot added the integrated Pull request has been integrated label May 28, 2024

openjdk bot closed this May 28, 2024

openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels May 28, 2024

robehn deleted the 8332265 branch May 28, 2024 12:59

This was referenced May 28, 2024

8332899: RISC-V: add comment and make the code more readable (if possible) in MacroAssembler::movptr #19431

Closed

8332900: RISC-V: refactor nativeInst_riscv.cpp and macroAssembler_riscv.cpp #19459

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

8332265: RISC-V: Materialize pointers faster by using a temp register #19246

8332265: RISC-V: Materialize pointers faster by using a temp register #19246

robehn commented May 15, 2024 •

edited by openjdk bot

bridgekeeper bot commented May 15, 2024

openjdk bot commented May 15, 2024 •

edited

openjdk bot commented May 15, 2024

mlbridge bot commented May 15, 2024 •

edited

RealFYang left a comment •

edited

robehn commented May 16, 2024 •

edited

RealFYang commented May 20, 2024 •

edited

robehn commented May 20, 2024

RealFYang left a comment

RealFYang left a comment •

edited

robehn commented May 22, 2024

RealFYang left a comment •

edited

robehn commented May 22, 2024

robehn commented May 23, 2024

RealFYang left a comment

Hamlin-Li commented May 24, 2024

robehn commented May 24, 2024

robehn commented May 24, 2024 •

edited

Hamlin-Li commented May 24, 2024

Hamlin-Li left a comment

robehn commented May 24, 2024

robehn commented May 28, 2024 •

edited

robehn commented May 28, 2024

openjdk bot commented May 28, 2024

openjdk bot commented May 28, 2024

8332265: RISC-V: Materialize pointers faster by using a temp register #19246

8332265: RISC-V: Materialize pointers faster by using a temp register #19246

Conversation

robehn commented May 15, 2024 • edited by openjdk bot

Progress

Issue

Reviewers

Reviewing

Webrev

bridgekeeper bot commented May 15, 2024

openjdk bot commented May 15, 2024 • edited

openjdk bot commented May 15, 2024

mlbridge bot commented May 15, 2024 • edited

Webrevs

RealFYang left a comment • edited

Choose a reason for hiding this comment

robehn commented May 16, 2024 • edited

RealFYang commented May 20, 2024 • edited

robehn commented May 20, 2024

RealFYang left a comment

Choose a reason for hiding this comment

RealFYang left a comment • edited

Choose a reason for hiding this comment

robehn commented May 22, 2024

RealFYang left a comment • edited

Choose a reason for hiding this comment

robehn commented May 22, 2024

robehn commented May 23, 2024

RealFYang left a comment

Choose a reason for hiding this comment

Hamlin-Li commented May 24, 2024

robehn commented May 24, 2024

robehn commented May 24, 2024 • edited

Hamlin-Li commented May 24, 2024

Hamlin-Li left a comment

Choose a reason for hiding this comment

robehn commented May 24, 2024

robehn commented May 28, 2024 • edited

robehn commented May 28, 2024

openjdk bot commented May 28, 2024

openjdk bot commented May 28, 2024

robehn commented May 15, 2024 •

edited by openjdk bot

openjdk bot commented May 15, 2024 •

edited

mlbridge bot commented May 15, 2024 •

edited

RealFYang left a comment •

edited

robehn commented May 16, 2024 •

edited

RealFYang commented May 20, 2024 •

edited

RealFYang left a comment •

edited

RealFYang left a comment •

edited

robehn commented May 24, 2024 •

edited

robehn commented May 28, 2024 •

edited