Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better parallelization algorithm for gridlink #240

Open
lgarrison opened this issue Jan 13, 2021 · 2 comments
Open

Better parallelization algorithm for gridlink #240

lgarrison opened this issue Jan 13, 2021 · 2 comments

Comments

@lgarrison
Copy link
Collaborator

Gridlink currently uses a simple parallel scheme that does two passes over all particles: one to compute cell indices, and another to write the particles into cells. Each thread is actually still reading all particles in the second pass and is just treating those in its domain to avoid race conditions.

We can do better! With a fast, parallel sort, each thread will only have to read particles in its cell domain, and the writes will be contiguous, rather than semi-random, like they are now.

The current method is still pretty fast (see timings in #239), so the challenge is writing a fast enough sort. A parallel radix sort might be the trick. We likely want this to be an argsort, since the particles can be "heavy" (12 or 24 bytes) and the indices are relatively light (8 or maybe 4 bytes if we economize). Worse, the movement of the weights requires a loop over a number of weights only known at runtime, which could bog down any swaps.

We'll want to port any such algorithm to the mock grindlink(s); the current parallel scheme is only for the theory gridlink.

@lgarrison
Copy link
Collaborator Author

The in-place version will need to use swaps instead of argsort. Or we'll need to figure out how to do an efficient in-place permutation. Either way, it should be parallelized.

@manodeep
Copy link
Owner

Another thing to do would be to find the final sorted order (i.e., account for sort_on_z) while performing this initial partitioning of particles into cells.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants