You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It has come to @adriaarmejach's and my attention that the destination register, in some vector instructions, is set as an additional source register as a mean to implement the tail/mask undisturbed policy (which in gem5 is assumed by default, as the agnostic policy allows the same behavior) so as to copy old inactive (i.e. in tail) or masked-off contents when renaming is in place.
The issue with this is that it can potentially create very odd data dependencies which can affect performance. This is very clear with loads that want to write to the whole register (but also other instructions that may not care/overwrite), as they will have to wait for any prior vector instruction that has the same destination.
To solve this we propose to also implement the option of an agnostic policy which can write 1s to the masked-off/inactive elements, which would circumvent the need of using the destination register as source (as there's no need of copying) when both tail and mask are agnostic (i.e. vtype.vta=1 and vtype.vma=1).
The text was updated successfully, but these errors were encountered:
We will create a staging branch on May 20th for the upcoming release of v24.0. For this change to be included, the corresponding PR should be pushed no later than Tuesday, May 14th, to allow enough time for it to be reviewed. Thanks.
It has come to @adriaarmejach's and my attention that the destination register, in some vector instructions, is set as an additional source register as a mean to implement the tail/mask undisturbed policy (which in gem5 is assumed by default, as the agnostic policy allows the same behavior) so as to copy old inactive (i.e. in tail) or masked-off contents when renaming is in place.
The issue with this is that it can potentially create very odd data dependencies which can affect performance. This is very clear with loads that want to write to the whole register (but also other instructions that may not care/overwrite), as they will have to wait for any prior vector instruction that has the same destination.
To solve this we propose to also implement the option of an agnostic policy which can write 1s to the masked-off/inactive elements, which would circumvent the need of using the destination register as source (as there's no need of copying) when both tail and mask are agnostic (i.e.
vtype.vta=1
andvtype.vma=1
).The text was updated successfully, but these errors were encountered: