-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Variable definition problem within main_ocl.cpp function #34
Comments
More or less they both express the amount of workload assigned to each workitem. However, they differ on the ordering of replicated operations and the applied memory access patterns. So, you might want to experiment by changing both values. |
What is the overall design idea,Will there be any differences between GPUs of different architectures? |
Certainly, these parameters can have different impact on different GPU architectures. Actually these parameters had been very first introduced to address different optimizations between NVidia and AMD GPUs. Even the compiler plays a significant role on this as it might lead to different patterns. So, you can do your experiments to optimize these values or leave the default ones if you don't want to focus on a specific architecture. |
Why did the results I measured and did not reach the theoretical value, and what could be the reason for this? |
Hi bro.
What do you mean by variables "elements per workitem" and “workitem fusion degree” defined in the function?
The text was updated successfully, but these errors were encountered: