Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement parallel processing for charged_particle_radiography.Tracker #1728

Open
pheuer opened this issue Sep 6, 2022 · 12 comments · May be fixed by #2226
Open

Implement parallel processing for charged_particle_radiography.Tracker #1728

pheuer opened this issue Sep 6, 2022 · 12 comments · May be fixed by #2226
Labels
effort: high Requiring perhaps ∼1–2 weeks. Can this be split up into multiple smaller/focused issues? feature request Issues requesting a new feature or enhancement help wanted Unassigned issues that would benefit from assistance or contributions needed from the community optimization Related to optimizing performance, such as with Cython or Numba Plasma Lv0 | Novice Issues that do not require knowledge of physics plasmapy.diagnostics Related to the plasmapy.diagnostics subpackage Python Lv3 | Proficient Issues that require proficiency in Python

Comments

@pheuer
Copy link
Member

pheuer commented Sep 6, 2022

Particle tracing in the charged_particle_radiography.Tracker class is an embarrassingly parallelizable problem , but the calculations are currently being performed in serial. Execution time is limited by processing the interpolation steps. Implementing parallel processing using the multiprocessing library should allow a ~3-4x speed increase in tracing on laptops and very large increases on computing resources with more cores. This would significantly improve the module and make it feasible to run large batches of simulations for things like training machine learning algorithms.

Only the push step of the algorithm needs to be parallelized. The code should probably split particles into groups (#/ processing core) and then push them in parallel, stopping each process independently when the particles have left the grid(s).

There are a few challenges to doing this:

  • The grids (from which fields for each particle are interpolated) are large (often >1 GB) and so replicating them in memory for each core is undesirable, especially on smaller computers. They are read-only, so it should definitely be possible to hold the grid in shared memory for use by all the processes. I'm not sure how this would work in detail, or whether it would be worth making this optional (maybe if the RAM is available, making copies of the grid will allow faster interpolation?).
  • The structure of the code needs minor modifications to make it compatible with parallel processing. The _push and run methods will need to be modified.

Note that improvements made here would be easily incorporated into the more general ParticleTracker framework (see #936 and #1096 )

@pheuer pheuer added effort: high Requiring perhaps ∼1–2 weeks. Can this be split up into multiple smaller/focused issues? feature request Issues requesting a new feature or enhancement help wanted Unassigned issues that would benefit from assistance or contributions needed from the community plasmapy.diagnostics Related to the plasmapy.diagnostics subpackage optimization Related to optimizing performance, such as with Cython or Numba Python Lv3 | Proficient Issues that require proficiency in Python Plasma Lv0 | Novice Issues that do not require knowledge of physics labels Sep 6, 2022
@pheuer pheuer added this to To do in HEDP Diagnostics Sep 6, 2022
@namurphy
Copy link
Member

namurphy commented Sep 6, 2022

Do you think dask would be a suitable tool for this?

@namurphy
Copy link
Member

namurphy commented Sep 6, 2022

@MKastek also mentioned concurrent.futures.

@pheuer
Copy link
Member Author

pheuer commented Sep 6, 2022

@namurphy I have little experience with either, but this is a simple parallelization problem so I'm sure many libraries could work. I've used multiprocessing recently, so I know that could work.

@JaydenR2305
Copy link
Member

JaydenR2305 commented Sep 11, 2022

I'll give this a try

I think dask would be the superior option since it allows both local and distributed computation with a high level API

@pheuer
Copy link
Member Author

pheuer commented Sep 11, 2022

@JaydenR2305 great, thanks for taking a look at it! As you do, keep in mind also issue #1725: whatever parallel solution we will also eventually want to extend to working with multiple grids. Something to think about when choosing an implementation.

@github-actions github-actions bot added the Stale Dormant issues & PRs which will be automatically closed if the label is not removed. label Nov 11, 2022
@Sobeskes
Copy link
Contributor

Sobeskes commented Dec 6, 2022

Hello, I've been looking into this issue, and I would like to give it a try as my first contribution.

@namurphy
Copy link
Member

namurphy commented Dec 6, 2022

@Sobeskes — awesome!

@JaydenR2305 — were you still planning to work on this?

Many thanks!

@github-actions github-actions bot removed the Stale Dormant issues & PRs which will be automatically closed if the label is not removed. label Dec 7, 2022
@pheuer
Copy link
Member Author

pheuer commented Dec 7, 2022

@Sobeskes awesome! Let me know if I can be of assistance.

I believe @JaydenR2305 is working on some other projects right now, so you're welcome to take a crack at this if you'd like.

Note that there are currently a few other PRs working on this code:

#1274 Is almost complete and won't change the Tracker class, although we are moving some code around the package so the location of that function will change

#1799 builds on Main and is almost complete

#1802 builds on #1799 and will requires more work + a full review before it will be done. This PR does make some improvements to some of the internal workings of Tracker, as well as adding new functionality.

I would recommend playing with the parallelization on a fork of #1802: that way you're building on the upgrades to Tracker that I've put in there.

@Sobeskes
Copy link
Contributor

Sobeskes commented Dec 8, 2022

Hm, this issue has been harder to tackle than I thought! I made a few implementations using different parallelization libraries (threading, concurrent.futures, etc) and was unable to manage to get anything that performed better than the serial implementation. The serial implementation ran the test suite ran in around 30 seconds, but the best performing parallelized implementation I created using the concurrent.futures library took around 70 seconds to run. It might be that the overhead associated with creating different threads is too high, or that I'm not understanding the problem correctly. It seems like the most potential for parallelization occurs with the interpolation of Ex, Bx, etc.

@namurphy
Copy link
Member

namurphy commented Dec 8, 2022

@Sobeskes, thank you for looking into this! That's really helpful information. Did dask happen to be one of the packages that you tried?

@pheuer
Copy link
Member Author

pheuer commented Dec 8, 2022

@Sobeskes interesting - at what level are you parallelizing? I would suggest cutting the list of particles into sub-lists (one per core) at the very beginning and then running them entirely separately on different cores from then on, only recombining in the detector plane.

I intentionally chose the test problems to be fast, so they many not accurately represent the gains from parallelization on a realistic run. It only takes 30 seconds because it's running several dozen of them: each individual run is probably close to a second. Try something with a 200x200x200 grid and 1E6 particles and see if you see a difference then: that should take like 20-30 minutes on the serial version.

@StanczakDominik
Copy link
Member

I would definitely recommend trying this with dask.array as well!

@github-actions github-actions bot added the Stale Dormant issues & PRs which will be automatically closed if the label is not removed. label May 13, 2023
@namurphy namurphy removed the Stale Dormant issues & PRs which will be automatically closed if the label is not removed. label May 23, 2023
@JaydenR2305 JaydenR2305 linked a pull request Jun 27, 2023 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
effort: high Requiring perhaps ∼1–2 weeks. Can this be split up into multiple smaller/focused issues? feature request Issues requesting a new feature or enhancement help wanted Unassigned issues that would benefit from assistance or contributions needed from the community optimization Related to optimizing performance, such as with Cython or Numba Plasma Lv0 | Novice Issues that do not require knowledge of physics plasmapy.diagnostics Related to the plasmapy.diagnostics subpackage Python Lv3 | Proficient Issues that require proficiency in Python
Projects
5 participants