Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8325821: [REDO] use "dmb.ishst+dmb.ishld" for release barrier #19278

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

kuaiwei
Copy link
Contributor

@kuaiwei kuaiwei commented May 17, 2024

he origin patch for https://bugs.openjdk.org/browse/JDK-8324186 has 2 issues:
1 It show regression in some platform, like Apple silicon in mac os
2 Can not handle instruction sequence like "dmb.ishld; dmb.ishst; dmb.ishld; dmb.ishld"

It can be fixed by:
1 Enable AlwaysMergeDMB by default, only disable it in architecture we can see performance improvement (N1 or N2)
2 Check the special pattern and merge the subsequent dmb.

It also fix a bug when code buffer is expanding, st/ld/dmb can not be merged. I added unit tests for these.

This patch still has a unhandled case. Insts like "dmb.ishld; dmb.ishst; dmb.ish", it will merge the last 2 instructions and can not merge all three. Because when emitting dmb.ish, if merge all previous dmbs, the code buffer will shrink the size. I think it may break some resumption and think it's not a common pattern.

In previous PR #18467 , I tried an implementation to use state machine for merging. But it looks risky to pending instruction during emitting.


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8325821: [REDO] use "dmb.ishst+dmb.ishld" for release barrier (Enhancement - P4)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/19278/head:pull/19278
$ git checkout pull/19278

Update a local copy of the PR:
$ git checkout pull/19278
$ git pull https://git.openjdk.org/jdk.git pull/19278/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 19278

View PR using the GUI difftool:
$ git pr show -t 19278

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/19278.diff

Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented May 17, 2024

👋 Welcome back kuaiwei! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented May 17, 2024

@kuaiwei This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8325821: [REDO] use "dmb.ishst+dmb.ishld" for release barrier

Reviewed-by: shade, aph

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 48 new commits pushed to the master branch:

  • 2a37764: 8333743: Change .jcheck/conf branches property to match valid branches
  • 75dc2f8: 8330182: Start of release updates for JDK 24
  • 054362a: 8332550: [macos] Voice Over: java.awt.IllegalComponentStateException: component must be showing on the screen to determine its location
  • 9b436d0: 8333674: Disable CollectorPolicy.young_min_ergo_vm for PPC64
  • 487c477: 8333647: C2 SuperWord: some additional PopulateIndex tests
  • d02cb74: 8333270: HandlersOnComplexResetUpdate and HandlersOnComplexUpdate tests fail with "Unexpected reference" if timeoutFactor is less than 1/3
  • 02f2404: 8333560: -Xlint:restricted does not work with --release
  • 606df44: 8332670: C1 clone intrinsic needs memory barriers
  • 33fd6ae: 8333622: ubsan: relocInfo_x86.cpp:101:56: runtime error: pointer index expression with base (-1) overflowed
  • 8de5d20: 8332865: ubsan: os::attempt_reserve_memory_between reports overflow
  • ... and 38 more: https://git.openjdk.org/jdk/compare/d826127970bd2ae8bf4cacc3c55634dc5af307c4...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@shipilev, @theRealAph) but any other Committer may sponsor as well.

➡️ To flag this PR as ready for integration with the above commit message, type /integrate in a new comment. (Afterwards, your sponsor types /sponsor in a new comment to perform the integration).

@openjdk openjdk bot added the rfr Pull request is ready for review label May 17, 2024
@openjdk
Copy link

openjdk bot commented May 17, 2024

@kuaiwei The following label will be automatically applied to this pull request:

  • hotspot

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot hotspot-dev@openjdk.org label May 17, 2024
@mlbridge
Copy link

mlbridge bot commented May 17, 2024


ins_encode %{
__ block_comment("membar_release");
__ membar(Assembler::LoadStore|Assembler::StoreStore);
__ membar(Assembler::StoreStore);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to respect AlwaysMergeDMBhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, usually they can be merged in macroAssembler. but it can help to reduce the possibility of unmerged case. Thanks to point it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked code again. They will be merged if enable AlwaysMergeDMB. So we can skip the check.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a comment:

// These will be merged if AlwaysMergeDMB is enabled.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment added.

Copy link
Member

@shipilev shipilev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursory review:

asm_check((const unsigned int *)code.insts()->start(), insns, sizeof insns / sizeof insns[0]);
}

TEST_VM(AssemblerAArch64, merge_ldst) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test seems to be irrelevant for the issue at hand? Tests ld/st -> ldp/stp merging, not the barrier merges?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this patch, I fixed an issue, dmb/st/ld may not merge if CodeBuffer is expanding, I added some unit tests to check it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aha. So the logic is that the same issue that affects dmb merging also affects ldp merging? All right, it's fine to leave it here then.

src/hotspot/cpu/aarch64/globals_aarch64.hpp Outdated Show resolved Hide resolved
printf("%s\n", ss.as_string());
}

TEST_VM(AssemblerAArch64, merge_dmb) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the previous experience with barrier merges that prompted the backout, I would prefer to have a more comprehensive test here, maybe an additional one. I am thinking something like the exhaustive combination of 4 back-to-back barriers of each of 5 types. This gives us 5^4 = 625 test cases, which I think is still manageable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test is added as merge_dmb_all_kinds

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. I was implicitly thinking that we can do this without coding the explicit patterns into the test. As it stands now, it is hard to check that generated patterns are actually correct. Let me see if I can whip up a sample of what I had in mind.

Copy link
Member

@shipilev shipilev May 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about this: improve-tests.patch. Note how it uses the constants for better readability, and also runs the test in both AlwaysMergeDMB modes. You might want to adapt other tests to similar pattern.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your patch. I patched and add few sanity check.

@openjdk openjdk bot removed the rfr Pull request is ready for review label May 24, 2024
@openjdk openjdk bot added the rfr Pull request is ready for review label May 27, 2024
@theRealAph
Copy link
Contributor

This looks ready to me. I think we need jcstress with C1 and C2, and we should be done. @shipilev , do you agree?

@shipilev
Copy link
Member

shipilev commented May 29, 2024

This looks ready to me. I think we need jcstress with C1 and C2, and we should be done. @shipilev , do you agree?

Yes. Just run jcstress with defaults, maybe limiting the time budget to about 24 hours, and we are done. Default configuration would work through different combinations of C1/C2 compilations for all actors, which is what we want to check for this change: that we don't mess up the barrier emitting scheme in different compilers/interpreters.

@shipilev
Copy link
Member

Note that current jcstress run would likely fail due to JDK-8332670.

@kuaiwei
Copy link
Contributor Author

kuaiwei commented May 30, 2024

This looks ready to me. I think we need jcstress with C1 and C2, and we should be done. @shipilev , do you agree?

Yes. Just run jcstress with defaults, maybe limiting the time budget to about 24 hours, and we are done. Default configuration would work through different combinations of C1/C2 compilations for all actors, which is what we want to check for this change: that we don't mess up the barrier emitting scheme in different compilers/interpreters.

I can run the jcstress test. I will run fastdebug build with
java -jar jcstress-latest.jar -tb 24h
Is it the correct command ?

@shipilev
Copy link
Member

I can run the jcstress test. I will run fastdebug build with java -jar jcstress-latest.jar -tb 24h Is it the correct command ?

Yes, I think so.

@shipilev
Copy link
Member

I can run the jcstress test. I will run fastdebug build with java -jar jcstress-latest.jar -tb 24h Is it the correct command >
Yes, I think so.

FTR, I ran Linux AArch64 server release on Graviton 3 instance (ergonomics selects -AlwaysMergeDMB there) for 12 hours. Apart from failures from JDK-8332670, I see no other trouble. Scheduled a quick run with +AlwaysMergeDMB as well.

@kuaiwei
Copy link
Contributor Author

kuaiwei commented May 31, 2024

I can run the jcstress test. I will run fastdebug build with java -jar jcstress-latest.jar -tb 24h Is it the correct command >
Yes, I think so.

FTR, I ran Linux AArch64 server release on Graviton 3 instance (ergonomics selects -AlwaysMergeDMB there) for 12 hours. Apart from failures from JDK-8332670, I see no other trouble. Scheduled a quick run with +AlwaysMergeDMB as well.

Thanks for testing. I'm running jcstress on a neoverse-n2 instance. I got some "soft errs" in console output. Are they real error?

Comment on lines 209 to 211
constexpr uint32_t test_encode_dmb_ld = 0xd50339bf;
constexpr uint32_t test_encode_dmb_st = 0xd5033abf;
constexpr uint32_t test_encode_dmb = 0xd5033bbf;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you maybe move these to the top, and use these constants across the test? You would not need the comments like 0xd5033abf, // dmb.ishst then.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

asm_check((const unsigned int *)code.insts()->start(), insns, sizeof insns / sizeof insns[0]);
}

TEST_VM(AssemblerAArch64, merge_ldst) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aha. So the logic is that the same issue that affects dmb merging also affects ldp merging? All right, it's fine to leave it here then.

@@ -150,6 +150,7 @@ class MacroAssembler: public Assembler {
void bind(Label& L) {
Assembler::bind(L);
code()->clear_last_insn();
code()->set_last_label(pc());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, so we have added _last_label to shared code in codeBuffer, but only update it in aarch64. This would be surprising for other platforms. On the other hand, this is what we already do with _last_insn -- only implementing it for specific platforms. Probably fine, but it would be nice to strengthen this with asserts, maybe in separate PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It reminds me it could be applied to riscv. It also need merge membar. I will move this part to a new PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need _last_label in this patch. I need it to check previous 2 instructions and they are not cross block boundary. I will create a new PR for RISCV only.

@shipilev
Copy link
Member

Thanks for testing. I'm running jcstress on a neoverse-n2 instance. I got some "soft errs" in console output. Are they real error?

Soft errors are API inconsistencies that jcstress harness handled. As long as you don't have new hard errors, you are fine.

@kuaiwei
Copy link
Contributor Author

kuaiwei commented Jun 3, 2024

I can run the jcstress test. I will run fastdebug build with java -jar jcstress-latest.jar -tb 24h Is it the correct command >
Yes, I think so.

FTR, I ran Linux AArch64 server release on Graviton 3 instance (ergonomics selects -AlwaysMergeDMB there) for 12 hours. Apart from failures from JDK-8332670, I see no other trouble. Scheduled a quick run with +AlwaysMergeDMB as well.

Thanks for testing. I'm running jcstress on a neoverse-n2 instance. I got some "soft errs" in console output. Are they real error?

My jcstress test is done and no error is reported.

Copy link
Contributor

@theRealAph theRealAph left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we're done here. Thank you for your patience and hard work.
@shipilev, are you happy too?

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Jun 3, 2024
Copy link
Member

@shipilev shipilev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I believe we are done. Good job!

@kuaiwei
Copy link
Contributor Author

kuaiwei commented Jun 4, 2024

Thanks for your review.

@kuaiwei
Copy link
Contributor Author

kuaiwei commented Jun 4, 2024

/integrate

@openjdk openjdk bot added the sponsor Pull request is ready to be sponsored label Jun 4, 2024
@openjdk
Copy link

openjdk bot commented Jun 4, 2024

@kuaiwei
Your change (at version 68c3018) is now ready to be sponsored by a Committer.

@kuaiwei
Copy link
Contributor Author

kuaiwei commented Jun 4, 2024

@theRealAph @shipilev Could you help sponsor it ?
Thanks.

@shipilev
Copy link
Member

shipilev commented Jun 4, 2024

GH says the branch has conflicts that must be resolved. Merging from master is advisable anyway to avoid merge surprises. Let's do that before integrating.

@openjdk openjdk bot removed the sponsor Pull request is ready to be sponsored label Jun 5, 2024
@TobiHartmann
Copy link
Member

Given that the first version of the patch had some issues, I would recommend waiting for the fork tomorrow and only integrating this into JDK 24.

@TobiHartmann
Copy link
Member

I'll run this through our correctness and performance testing and report back once it passed.

@shipilev
Copy link
Member

shipilev commented Jun 7, 2024

I'll run this through our correctness and performance testing and report back once it passed.

How was it, Tobias? Any surprises?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot hotspot-dev@openjdk.org ready Pull request is ready to be integrated rfr Pull request is ready for review
4 participants