-
Notifications
You must be signed in to change notification settings - Fork 707
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve object heap allocation for Compressed References JVM for zLinux #19490
Comments
On code-gen we would see the performance impact going from shift 1 to higher shifts (Not for 0 shift) where we lose the capability of embedding the shift into the load/store instruction itself forcing us to generate extra instruction for each load/store. Perf wise, I am pasting the old numbers I have collected comparing the shift of 1 vs 3(Few weeks ago, I did refresh those numbers but are on the machine which is offline now, so would be able to extract those later this week). For now pasting old numbers to get the conversation continue, will update comment with latest results (Though I think, performance delta was similar)
|
@dmitripivkine just so I am clear on the proposal, it is composed of two steps that are different from current default scheme, both of which only apply to the case when heap size could have been allocated below 8gb following a "bottom-up" approach in effect today by default on zLinux.
Is this understanding correct ? |
No, not exactly. Sorry, I was not clear. My base suggestion is change bottom-up to top-down allocation direction for all zLinux cases except Concurrent Scavenger with HW support case (which might be addressed later if we need, just requires more work). Changing of allocation direction will reduce usage of the memory below 4G bar, it is going to be better or the same in the worst case scenario. If currently with bottom-up all free memory below 4G bar is consumed for sure, with top-down approach it might be consumed if there is not enough memory between 4G bar and maximum address supported with selected minimum shift. Now, when we state this, our allocation policy can be improved optionally if we like. We can reduce (or eliminate) memory usage below 4G bar by playing with parameters (for the price to go to higher shift sometimes of course). I tried to explain this in example in case 2. I am open for ideas how heap allocation can be improved (and we do have tools to do it on zLinux). However would be good to have allocation logic to be aligned with other platforms. |
Thanks @dmitripivkine My preference would be to go with the "top-down" scheme described under case 1. But, does this scheme not come with its own throughput risk ? Specifically, in a case where the heap size was such that the entire heap could have been contained in the lower 4gb, we may have been able to run without any shifting with the "bottom-up" approach, whereas with the "top-down" approach, we may do shift=1 (not as bad as shift=3 but also not as good as no shift). If so, maybe we need to compare shift=1 vs no shift (i.e. not what Rahil had collected before). The optional enhancement described in case 2 maybe can come later, if we find that employing the "top-down" scheme did not help in enough of the cases that you are attacking with this proposal. Is it support cases that are driving this proposal and if so, do you feel that trying with just "top-down" approach change would be worth trying as an initial step to address what you are seeing with the support cases ? We can discuss going further if needed later in my opinion (but happy to hear more reasons to reconsider that position). |
@vijaysun-omr No, there is no risk. There are details have not been described, I have focused on zLinux specific:
So, steps 1, 4 and 5 exists today for all platforms except Z. I am suggesting to apply them for zLinux too with addition of steps 2 and 3 specific for zLinux only. |
Thanks for those details. I am fine with the proposed "top-down" scheme since it carries no throughput risk. |
Implementation eclipse/omr#7344 |
There are a few examples of heap location for new implementation: -- 512m, 0-shift
-- 3G located [1G,4G], 0-shift
— 4G, [4G, 8G] 1-shift
— 5G, [3G,8G] 1-shift
— 11G, [5G,16G] 2-shift
-- 23G, [9G,32G] 3-shift
— 27G, [5G,32G] 3-shift
— 29G, [35G,64G] 4-shift <--- this is only difference from current behaviour, pushed to 4-shift
— 35G, [29G,64G] 4-shift
— 60G, [4G,64G] 4-shift
|
Current behaviour for object heap allocation on zLinux is direction bottom-up. The bottom-up allocation direction is selected to get performance benefit from smaller shift on Z platform. By this reason Compressed Refs shift on Z is supported as (0,1,2,3,4) while on other platforms it is (0,3,4). However it also means object heap consumes part of memory below 4GB bar always, which prevents Suballocator expansion if necessary (default 200MB or customer specified value is not large enough).
The preferred solution for me to change zLinux behaviour to match non-Z platforms. However it is not clear how much performance regression we might have deal with.
If we decide to continue support (0,1,2,3,4) shifts for zLinux there is an alternative suggestion:
In the worst case f(rom consuming memory below 4GB bar point of view) the allocation scenario still be the same. However in general case there is chance that pressure for memory below 4GB bar is going to be reduced while shift-wise nothing is changed.
Please note the allocation scheme for Concurrent Scavenger with HW Support (Guarded Storage) is still be bottom-up due HW implementation support complication.
Also there is a way to reduce such pressure even more (up to nothing) but for price to have higher shift value.
There is "estimated heap start address" variable. It can be set to 0 for most conservative case. The minimum possible shift for requested size is calculated for top heap address as
estimated top heap address = estimated heap start address + requested size
.If allocation is not possible (for instance, there is no realistic way to allocate 7.5GB heap with 1-shift below 8GB) an allocation is going to fail and higher shift is attempted.
There are examples for illustration:
In this case we still guarantee minimum shift but for price of consuming 1GB below 4GB bar
In this case we guarantee 0-pressure to memory below 4GB bar but for price of using higher shift (2 instead of 1)
We are using this logic currently for non-Z platforms for protection memory below bar during switching from 3 to 4 shift.
An implementation for this enhancement is pretty simple, there is low risk change. I prefer to isolate this change for zLinux only, so there is no risk to break other platforms.
Just to summarize:
Case 1 is an enhancement for current behaviour but with partial improvement.
Case 2 should resolve issue with low memory below 4GB bar but with changing current behaviour, so it might be negative performance impact related to larger CR shift value.
@vijaysun-omr @joransiu @TobiAjila @r30shah @amicic What do you think?
The text was updated successfully, but these errors were encountered: