Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallelize Volume Interpolator #2226

Open
wants to merge 13 commits into
base: main
Choose a base branch
from

Conversation

JaydenR2305
Copy link
Member

@JaydenR2305 JaydenR2305 commented Jun 26, 2023

Resolves #1728

Use Dask to parallelize the volume_averaged_interpolator method defined in the CartesianGrid object

Signed-off-by: Roberts, Jayden <jaro@lle.rochester.edu>
@JaydenR2305 JaydenR2305 requested a review from a team as a code owner June 26, 2023 19:09
@JaydenR2305 JaydenR2305 requested review from pheuer and removed request for a team June 26, 2023 19:09
@github-actions github-actions bot added the plasmapy.plasma Related to the plasmapy.plasma subpackage label Jun 26, 2023
@github-actions
Copy link

Thank you for submitting a pull request (PR) to PlasmaPy! ✨ The future of the project depends on contributors like you, so we deeply appreciate it! 🌱

Our contributor guide has information on:

The bottom of this page shows several checks that are run for every PR. Don't worry if something broke! We break stuff all the time. 😺 Click on "Details" to learn why a check didn't pass. Please also feel free to ask for help. We do that all the time as well. 🌸 You can find us in our chat room or weekly community meeting & office hours. Here are some tips:

  • Try fixing CI / Python 3.11 test failures first.
  • Most pre-commit.ci - pr failures can be automagically fixed by commenting pre-commit.ci autofix below, followed by a git pull to bring the changes back to your computer.
  • If pre-commit.ci - pr says that a function is too long or complex, try breaking up that function into multiple short functions that each do one thing. See also these tips on writing clean scientific software.
  • If the CI / Documentation check ends with a cryptic error message, check out our documentation troubleshooting guide.
  • For a documentation preview, click on Details next to docs/readthedocs.org:plasmapy.

If this PR is marked as ready for review, someone should stop by to provide a code review and offer suggestions soon. ✅ If you don't get a review within a few days, please feel free to send us a reminder.

Please also use SI units within PlasmaPy, except when there is strong justification otherwise or in some examples.

We thank you once again!

@JaydenR2305 JaydenR2305 marked this pull request as draft June 26, 2023 19:12
Roberts, Jayden added 4 commits June 26, 2023 15:26
Signed-off-by: Roberts, Jayden <jaro@lle.rochester.edu>
Signed-off-by: Roberts, Jayden <jaro@lle.rochester.edu>
Update requirements

Signed-off-by: Roberts, Jayden <jaro@lle.rochester.edu>
Signed-off-by: Roberts, Jayden <jaro@lle.rochester.edu>
@github-actions github-actions bot added requirements Related to updating requirements packaging Related to packaging or distribution labels Jun 27, 2023
Roberts, Jayden added 2 commits June 27, 2023 08:13
Signed-off-by: Roberts, Jayden <jaro@lle.rochester.edu>
Signed-off-by: Roberts, Jayden <jaro@lle.rochester.edu>
@codecov
Copy link

codecov bot commented Jun 27, 2023

Codecov Report

Patch coverage: 100.00% and no project coverage change.

Comparison is base (c5bdbda) 98.21% compared to head (8668833) 98.21%.
Report is 47 commits behind head on main.

❗ Current head 8668833 differs from pull request most recent head f6ceba8. Consider uploading reports for the commit f6ceba8 to get more accurate results

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #2226   +/-   ##
=======================================
  Coverage   98.21%   98.21%           
=======================================
  Files          98       98           
  Lines        8568     8584   +16     
=======================================
+ Hits         8415     8431   +16     
  Misses        153      153           
Files Changed Coverage Δ
plasmapy/plasma/grids.py 99.35% <100.00%> (+0.02%) ⬆️

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@JaydenR2305 JaydenR2305 marked this pull request as ready for review June 27, 2023 19:12
Copy link
Member

@pheuer pheuer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a pretty clean start! Some additional stuff I'd like to see

  1. Some tests that show that this is actually faster, and under what conditions. It may be we want to have running in parallel/dask be an option and raise a warning if people use it with very small numbers of particles or something? These don't need to be CI tests, just something you run and report the results in this PR thread so we have a record of it.

  2. Apply the same method to the nearest neighbor interpolator so we have it in both cases.

  3. I presume this is the case since all the tests seem to be passing, but just confirm that the interpolator is giving the same result as before. If you keep 'serial' as an option, you could write a simple test that ensures that they give the same answer.

changelog/2226.feature.rst Outdated Show resolved Hide resolved
@@ -1132,60 +1133,16 @@ def nearest_neighbor_interpolator(
]
return output[0] if len(output) == 1 else tuple(output)

def volume_averaged_interpolator(
self, pos: Union[np.ndarray, u.Quantity], *args, persistent=False
@staticmethod
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even the private static method should have a docstring that defines what each of these variables is

requirements.txt Show resolved Hide resolved
requirements.txt Outdated Show resolved Hide resolved
Signed-off-by: JaydenR2305 <109694185+JaydenR2305@users.noreply.github.com>
Signed-off-by: JaydenR2305 <109694185+JaydenR2305@users.noreply.github.com>
Signed-off-by: JaydenR2305 <109694185+JaydenR2305@users.noreply.github.com>
Signed-off-by: JaydenR2305 <109694185+JaydenR2305@users.noreply.github.com>
Signed-off-by: JaydenR2305 <109694185+JaydenR2305@users.noreply.github.com>
@pheuer
Copy link
Member

pheuer commented Jul 5, 2023

@JaydenR2305 This SE page looks relevant: https://stackoverflow.com/questions/53351060/dask-shared-memory-in-parallel-model

One comment suggests the joblib package, which I've used before (works pretty well)

Edit: and this link within: https://stackoverflow.com/questions/10721915/shared-memory-objects-in-multiprocessing

@JaydenR2305
Copy link
Member Author

JaydenR2305 commented Aug 5, 2023

I have changed from our current implementation of the volume interpolation to SciPy's which has offered a roughly 50% speed improvement. I wonder if it would be beneficial to partition the grid using some sort of octree approach and then passing the respective particles along for parallel processing?

@pheuer
Copy link
Member

pheuer commented Aug 7, 2023

I have changed from our current implementation of the volume interpolation to SciPy's which has offered a roughly 50% speed improvement.

Hm, I'm surprised its fast but I guess the RegularGridInterpolator with the linear method should be basically the same as this volume averaging scheme, so that's probably good? In that case I wonder if we should eventually change the keyword from 'volume_averaged' to 'linear' in the run() method. If this PR gets merged with this change, I would open an issue for that keyword change.

I wonder if it would be beneficial to partition the grid using some sort of octree approach and then passing the respective particles along for parallel processing?

Yes I think that's a good way to do it: the challenge is still how to share the grid across branches? Making copies would be the easiest, but could be memory prohibitive for large grids? In that case the ideal number of cores would depend both on the ncores available and on the amount of RAM available.

@namurphy namurphy added this to the Future milestone Feb 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
packaging Related to packaging or distribution plasmapy.plasma Related to the plasmapy.plasma subpackage requirements Related to updating requirements
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement parallel processing for charged_particle_radiography.Tracker
3 participants