You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The multicore (and C) backends will sometimes generate inefficient code where arrays are sequentially traversed with a stride. This is obviously bad. I think a good solution would be to do something similar to the "kernel babysitter" used by the GPU pipelines, where we analyse traversal patterns and transpose the arrays in advance such that the eventual traversal will be optimal. This is not as good as tiling, but it is very general.
The text was updated successfully, but these errors were encountered:
The multicore (and C) backends will sometimes generate inefficient code where arrays are sequentially traversed with a stride. This is obviously bad. I think a good solution would be to do something similar to the "kernel babysitter" used by the GPU pipelines, where we analyse traversal patterns and transpose the arrays in advance such that the eventual traversal will be optimal. This is not as good as tiling, but it is very general.
The text was updated successfully, but these errors were encountered: