-
Notifications
You must be signed in to change notification settings - Fork 5.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
8332265: RISC-V: Materialize pointers faster by using a temp register #19246
Conversation
👋 Welcome back rehn! A progress list of the required criteria for merging this PR into |
@robehn This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been no new commits pushed to the ➡️ To integrate this PR with the above commit message to the |
Webrevs
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, This looks interesting. Could all the movptr callsites be changed? I am asking this as I am a bit worried about the complexity / reward ratio when we have both movptr and li48 which are the same in functionality.
Yes, but it's a long term job, as you need to free a register in many cases. (in non-call sites places) As li48 is faster than li when using more than 32-bits these cases should also use li48. As there is much work, this PR is intended as the first step with the hardest peices implemented already, i.e. li48 is ready to go. If we also fix mov_metadata la()->li48 we reduce static call stub size down from 12 to 10 instruction, which is significant. |
OK, I guess this might be a good compromise. Inspired by PPC's [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/ppc/assembler_ppc.cpp#L323 |
Hey, I did an update, not fully what you are saying.
But I need to add a bunch of stuff to unrelated NativeInst, I think that is better suited in another PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the update. Taking a more closer look.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, Nice work! I only have several minor comments.
Thanks @luhenry ! Thanks for the second review pass @RealFYang ! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Three more minor comments, looks good otherwise. Thanks.
BTW: You need to merge master and resolve conflicts :-)
Yes, thanks, done! |
Thanks again! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated change looks good. It would be nice to see how much this will benefit performance.
And a general question about |
I tried todo some benchmarks but it seems like the error of them are larger than the benefit. |
I agree, I would prefer having classes for the instruction where all the instruction functionality would be. |
OK, let me do some further investigation to see if we can make it more readable and maintainable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, let's move forward. :)
I create some bugs to track the further work.
https://bugs.openjdk.org/browse/JDK-8332899
https://bugs.openjdk.org/browse/JDK-8332900
Feel free to take them if you're also interested in them.
Thank you! |
Here are 'some' number, it still unclear if these actually are significant:
The issue is that reaching steady state AKA varmup takes tremondiusly long time. Integrating later today! |
/integrate |
Going to push as commit 7b52d0a.
Your commit was automatically rebased without conflicts. |
Hi, please consider!
Materializing a 48-bit pointer, using an additional register, we can do with:
lui + lui + slli + add + addi
This 15% faster both on VF2 and in CPU models, compared to movptr().
As we often materialize during calls there is free registers.
I have choose just a few spot to use it, many more can use.
E.g. la() with tmp register can use li48 instead of movptr.
Running tests now (so far so good), as if I screwed up IC calls it should be seen fast.
And benchmarks when hardware is free.
Progress
Issue
Reviewers
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/19246/head:pull/19246
$ git checkout pull/19246
Update a local copy of the PR:
$ git checkout pull/19246
$ git pull https://git.openjdk.org/jdk.git pull/19246/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 19246
View PR using the GUI difftool:
$ git pr show -t 19246
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/19246.diff
Webrev
Link to Webrev Comment