Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Heuristics to minimize performance impact of -XX:+DebugOnRestore #19342

Merged
merged 6 commits into from
May 31, 2024

Conversation

dsouzai
Copy link
Contributor

@dsouzai dsouzai commented Apr 18, 2024

This PR includes several heuristics to minimize the performance impact of -XX:+DebugOnRestore:

  • Generate FSD compilations pre-checkpoint
  • Compile "important" methods pre-checkpoint
    • "important" methods are those that are in the SCC, that would have
      been loaded at scount=20
    • These "important" methods are compiled without FSD enabled
  • Compile failed compilations pre-checkpoint as opposed to post-restore
  • Reset environment after proactive compilation
  • Don't change JVM Phase pre-checkpoint
  • Enable IProfiling during startup; because SCC validation occurs post
    restore, there is no persistent IProfiling data to pull from
  • Turn off IProfiling on restore so that the normal heuristics can turn
    it back on post-startup
  • Trigger recomps pre-checkpoint

Parent issue #18866

@dsouzai dsouzai added comp:jit criu Used to track CRIU snapshot related work labels Apr 18, 2024
@dsouzai
Copy link
Contributor Author

dsouzai commented Apr 18, 2024

@mpirvu could you please review? This is the last big change on the compiler side in terms of getting a reasonable set of runtime characteristics under -XX:+DebugOnRestore; footprint is still impacted (though probably less so with your datacache disclaiming changes).

@mpirvu mpirvu self-assigned this Apr 18, 2024
@mpirvu
Copy link
Contributor

mpirvu commented Apr 22, 2024

If "important" methods are compiled without FSD enabled pre-checkpoint, what happens if the user actually wants to use FSD post restore?

@dsouzai
Copy link
Contributor Author

dsouzai commented May 20, 2024

Hm I thought I covered that scenario, but it looks like I only partially addressed it, namely that if FSD is specified post-restore then FSD remains enabled. But, there needs to be a way to invalidate the methods that were compiled proactively.

I guess originally the idea was to "commit" the startPCs of these proactive methods on restore, in which case if someone specified FSD post-restore then we could just skip the "commit". However, since "commit"-ing post-restore hurts footprint, I updated the extra field pre-checkpoint, but now those methods will not support FSD.

A relatively simple solution to this would be to just invalidate the entire code cache if someone specifies FSD post-restore. All the stacks at that point will be at a safe point and they will be running FSD code, so even though the extra field is updated with a non-FSD body startPC, it wouldn't matter from an OSR point of view.

Otherwise, I'll need another list to keep track of methods that were compiled proactively and revert the extra field of those methods if FSD is specified post-restore. Maybe that's something I can do in another PR since it's fairly non-trivial, and this PR is already reasonably complicated.

I'll make the simple change to this PR tomorrow.

@dsouzai
Copy link
Contributor Author

dsouzai commented May 21, 2024

@mpirvu good for review again.

Copy link
Contributor

@mpirvu mpirvu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If FSD is enabled post-restore, the entire code cache is invalidated. Then why bother to generate warm FSD bodies pre-checkpoint?

runtime/compiler/runtime/CRRuntime.cpp Show resolved Hide resolved
runtime/compiler/control/HookedByTheJit.cpp Show resolved Hide resolved
runtime/compiler/control/J9Options.hpp Outdated Show resolved Hide resolved
runtime/compiler/control/CompilationThread.cpp Outdated Show resolved Hide resolved
runtime/compiler/runtime/CRRuntime.cpp Outdated Show resolved Hide resolved
@dsouzai
Copy link
Contributor Author

dsouzai commented May 22, 2024

If FSD is enabled post-restore, the entire code cache is invalidated. Then why bother to generate warm FSD bodies pre-checkpoint?

I guess because originally we were going to do the recompilation post-restore, it was going to be easier to just continue using the FSD code generated pre-checkpoint. However, I guess at least in this PR, I'm just invalidating the code cache as a first simple solution; maybe if we realize that it's too hard to keep track of all the methods that need to be reset back to FSD, then there's no point having warm FSD (though there may be some performance impact due to a difference in iprofiling data and what not).

@dsouzai
Copy link
Contributor Author

dsouzai commented May 28, 2024

@mpirvu looks like at least on pingperf, not generating warm FSD didn't have any effect on startup, first response, or throughput; it did however, have a minor footprint improvement, so I just removed the code that generated warm FSD.

@dsouzai
Copy link
Contributor Author

dsouzai commented May 29, 2024

@mpirvu should be good for review now.

Copy link
Contributor

@mpirvu mpirvu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mpirvu
Copy link
Contributor

mpirvu commented May 29, 2024

jenkins test sanity xlinux,zlinux jdk21

@mpirvu
Copy link
Contributor

mpirvu commented May 30, 2024

jenkins compile win jdk17

@mpirvu
Copy link
Contributor

mpirvu commented May 30, 2024

Compilation errors on Windows:

f:\users\jenkins\workspace\build_jdk17_x86-64_windows_personal\openj9\runtime\compiler\control\CompilationThread.cpp(4173): error C2039: 'getCRRuntime': is not a member of 'TR::CompilationInfo'
12:02:11  f:\users\jenkins\workspace\build_jdk17_x86-64_windows_personal\openj9\runtime\compiler\runtime/IProfiler.hpp(76): note: see declaration of 'TR::CompilationInfo'
12:02:11  [ 41%] Building C object runtime/jcl/CMakeFiles/jclse.dir/common/mgmtcompilation.c.obj
12:02:11  RegionValidator.cpp
12:02:11  make[6]: *** [runtime/compiler/CMakeFiles/j9jit.dir/build.make:371: runtime/compiler/CMakeFiles/j9jit.dir/control/CompilationThread.cpp.obj] Error 2

- Generate cold FSD compilations pre-checkpoint
- Compile "important" methods pre-checkpoint
  - "important" methods are those that are in the SCC, that would have
    been loaded at scount=20
  - These "important" methods are compiled without FSD enabled
- Compile failed compilations pre-checkpoint as opposed to post-restore

Signed-off-by: Irwin D'Souza <dsouzai.gh@gmail.com>
Proactive compilation is done with FSD disabled; therefore, reset the
environment back to FSD enabled to ensure consistency with the
post-restore Options processing.

Signed-off-by: Irwin D'Souza <dsouzai.gh@gmail.com>
Don't allow the JVM Phase to change to NOT_STARTUP pre-checkpoint, as
anything pre-checkpoint should be considered part of the startup phase
of the JVM.

Signed-off-by: Irwin D'Souza <dsouzai.gh@gmail.com>
- Enable IProfiling during startup; because SCC validation occurs post
  restore, there is no persistent IProfiling data to pull from
- Turn off IProfiling on restore so that the normal heuristics can turn
  it back on post-startup

Signed-off-by: Irwin D'Souza <dsouzai.gh@gmail.com>
Move recompilation of FSD bodies to occur pre-checkpoint in order to
prevent an increase in footprint post-restore.

Signed-off-by: Irwin D'Souza <dsouzai.gh@gmail.com>
As a simple solution for dealing with a user wanting debug post-restore,
invalidate all compiled bodies to deal with the fact that proactive
compilation that occurred pre-checkpoint did not have FSD enabled.

Signed-off-by: Irwin D'Souza <dsouzai.gh@gmail.com>
@dsouzai
Copy link
Contributor Author

dsouzai commented May 30, 2024

jenkins compile win jdk17

@mpirvu mpirvu merged commit 6aa11f7 into eclipse-openj9:master May 31, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:jit criu Used to track CRIU snapshot related work
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants