Implement parallel processing for `charged_particle_radiography.Tracker` #1728

pheuer · 2022-09-06T14:53:34Z

Particle tracing in the charged_particle_radiography.Tracker class is an embarrassingly parallelizable problem , but the calculations are currently being performed in serial. Execution time is limited by processing the interpolation steps. Implementing parallel processing using the multiprocessing library should allow a ~3-4x speed increase in tracing on laptops and very large increases on computing resources with more cores. This would significantly improve the module and make it feasible to run large batches of simulations for things like training machine learning algorithms.

Only the push step of the algorithm needs to be parallelized. The code should probably split particles into groups (#/ processing core) and then push them in parallel, stopping each process independently when the particles have left the grid(s).

There are a few challenges to doing this:

The grids (from which fields for each particle are interpolated) are large (often >1 GB) and so replicating them in memory for each core is undesirable, especially on smaller computers. They are read-only, so it should definitely be possible to hold the grid in shared memory for use by all the processes. I'm not sure how this would work in detail, or whether it would be worth making this optional (maybe if the RAM is available, making copies of the grid will allow faster interpolation?).
The structure of the code needs minor modifications to make it compatible with parallel processing. The _push and run methods will need to be modified.

Note that improvements made here would be easily incorporated into the more general ParticleTracker framework (see #936 and #1096 )

The text was updated successfully, but these errors were encountered:

namurphy · 2022-09-06T17:41:53Z

Do you think dask would be a suitable tool for this?

namurphy · 2022-09-06T18:24:05Z

@MKastek also mentioned concurrent.futures.

pheuer · 2022-09-06T18:41:36Z

@namurphy I have little experience with either, but this is a simple parallelization problem so I'm sure many libraries could work. I've used multiprocessing recently, so I know that could work.

JaydenR2305 · 2022-09-11T11:53:19Z

I'll give this a try

I think dask would be the superior option since it allows both local and distributed computation with a high level API

pheuer · 2022-09-11T16:08:13Z

@JaydenR2305 great, thanks for taking a look at it! As you do, keep in mind also issue #1725: whatever parallel solution we will also eventually want to extend to working with multiple grids. Something to think about when choosing an implementation.

Sobeskes · 2022-12-06T03:54:13Z

Hello, I've been looking into this issue, and I would like to give it a try as my first contribution.

namurphy · 2022-12-06T04:26:20Z

@Sobeskes — awesome!

@JaydenR2305 — were you still planning to work on this?

Many thanks!

pheuer · 2022-12-07T13:23:09Z

@Sobeskes awesome! Let me know if I can be of assistance.

I believe @JaydenR2305 is working on some other projects right now, so you're welcome to take a crack at this if you'd like.

Note that there are currently a few other PRs working on this code:

#1274 Is almost complete and won't change the Tracker class, although we are moving some code around the package so the location of that function will change

#1799 builds on Main and is almost complete

#1802 builds on #1799 and will requires more work + a full review before it will be done. This PR does make some improvements to some of the internal workings of Tracker, as well as adding new functionality.

I would recommend playing with the parallelization on a fork of #1802: that way you're building on the upgrades to Tracker that I've put in there.

Sobeskes · 2022-12-08T04:08:23Z

Hm, this issue has been harder to tackle than I thought! I made a few implementations using different parallelization libraries (threading, concurrent.futures, etc) and was unable to manage to get anything that performed better than the serial implementation. The serial implementation ran the test suite ran in around 30 seconds, but the best performing parallelized implementation I created using the concurrent.futures library took around 70 seconds to run. It might be that the overhead associated with creating different threads is too high, or that I'm not understanding the problem correctly. It seems like the most potential for parallelization occurs with the interpolation of Ex, Bx, etc.

namurphy · 2022-12-08T16:26:16Z

@Sobeskes, thank you for looking into this! That's really helpful information. Did dask happen to be one of the packages that you tried?

pheuer · 2022-12-08T16:36:28Z

@Sobeskes interesting - at what level are you parallelizing? I would suggest cutting the list of particles into sub-lists (one per core) at the very beginning and then running them entirely separately on different cores from then on, only recombining in the detector plane.

I intentionally chose the test problems to be fast, so they many not accurately represent the gains from parallelization on a realistic run. It only takes 30 seconds because it's running several dozen of them: each individual run is probably close to a second. Try something with a 200x200x200 grid and 1E6 particles and see if you see a difference then: that should take like 20-30 minutes on the serial version.

StanczakDominik · 2022-12-11T19:45:01Z

I would definitely recommend trying this with dask.array as well!

pheuer added this to To do in HEDP Diagnostics Sep 6, 2022

github-actions bot added the Stale Dormant issues & PRs which will be automatically closed if the label is not removed. label Nov 11, 2022

github-actions bot removed the Stale Dormant issues & PRs which will be automatically closed if the label is not removed. label Dec 7, 2022

Sobeskes mentioned this issue Dec 11, 2022

Modified the Tracker class in ~plasmapy/diagnostics/charged_particle_… #1830

Draft

JaydenR2305 mentioned this issue May 10, 2023

Parallelize Tracker in synthetic_radiography #2126

Closed

github-actions bot added the Stale Dormant issues & PRs which will be automatically closed if the label is not removed. label May 13, 2023

namurphy removed the Stale Dormant issues & PRs which will be automatically closed if the label is not removed. label May 23, 2023

JaydenR2305 linked a pull request Jun 27, 2023 that will close this issue

Parallelize Volume Interpolator #2226

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement parallel processing for `charged_particle_radiography.Tracker` #1728

Implement parallel processing for `charged_particle_radiography.Tracker` #1728

pheuer commented Sep 6, 2022 •

edited

namurphy commented Sep 6, 2022

namurphy commented Sep 6, 2022

pheuer commented Sep 6, 2022

JaydenR2305 commented Sep 11, 2022 •

edited

pheuer commented Sep 11, 2022

Sobeskes commented Dec 6, 2022

namurphy commented Dec 6, 2022

pheuer commented Dec 7, 2022

Sobeskes commented Dec 8, 2022

namurphy commented Dec 8, 2022

pheuer commented Dec 8, 2022

StanczakDominik commented Dec 11, 2022

Implement parallel processing for charged_particle_radiography.Tracker #1728

Implement parallel processing for charged_particle_radiography.Tracker #1728

Comments

pheuer commented Sep 6, 2022 • edited

namurphy commented Sep 6, 2022

namurphy commented Sep 6, 2022

pheuer commented Sep 6, 2022

JaydenR2305 commented Sep 11, 2022 • edited

pheuer commented Sep 11, 2022

Sobeskes commented Dec 6, 2022

namurphy commented Dec 6, 2022

pheuer commented Dec 7, 2022

Sobeskes commented Dec 8, 2022

namurphy commented Dec 8, 2022

pheuer commented Dec 8, 2022

StanczakDominik commented Dec 11, 2022

Implement parallel processing for `charged_particle_radiography.Tracker` #1728

Implement parallel processing for `charged_particle_radiography.Tracker` #1728

pheuer commented Sep 6, 2022 •

edited

JaydenR2305 commented Sep 11, 2022 •

edited