Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Variable definition problem within main_ocl.cpp function #34

Open
tomorrow2000 opened this issue Nov 11, 2022 · 4 comments
Open

Variable definition problem within main_ocl.cpp function #34

tomorrow2000 opened this issue Nov 11, 2022 · 4 comments

Comments

@tomorrow2000
Copy link

Hi bro.
What do you mean by variables "elements per workitem" and “workitem fusion degree” defined in the function?

@ekondis
Copy link
Owner

ekondis commented Nov 26, 2022

More or less they both express the amount of workload assigned to each workitem. However, they differ on the ordering of replicated operations and the applied memory access patterns. So, you might want to experiment by changing both values.

@tomorrow2000
Copy link
Author

What is the overall design idea,Will there be any differences between GPUs of different architectures?

@ekondis
Copy link
Owner

ekondis commented Nov 30, 2022

Certainly, these parameters can have different impact on different GPU architectures. Actually these parameters had been very first introduced to address different optimizations between NVidia and AMD GPUs. Even the compiler plays a significant role on this as it might lead to different patterns. So, you can do your experiments to optimize these values or leave the default ones if you don't want to focus on a specific architecture.

@tomorrow2000
Copy link
Author

Why did the results I measured and did not reach the theoretical value, and what could be the reason for this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants