Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Too much memory usage for composite processing #2764

Open
akasom89 opened this issue Mar 20, 2024 · 2 comments
Open

Too much memory usage for composite processing #2764

akasom89 opened this issue Mar 20, 2024 · 2 comments

Comments

@akasom89
Copy link

Describe the bug
for creating composite products (even when I ignore atmospheric correction) out of ABI imagery, peak memory usage exceeds 30 GB. I suspect something may be going wrong, as it also takes over 8 minutes. Are there any best practices for increasing the speed? For example, should we tweak parameters such as chunk size to find the optimum? Additionally, can we implement caching or pre-compute certain data (as the field of view of ABI is fixed) to increase the speed for subsequent runs?

To Reproduce

scn.load(scn.available_dataset_names())
scn_resmp = scn.resample(destination=dst_area_def, radius_of_influence=50000)
composite = 'true_color_raw'
scn_resmp.load([composite])
dataset = scn_resmp[composite]

plt.figure()
img = get_enhanced_image(dataset)
img_data = img.data
img_data.plot.imshow(vmin=0, vmax=1, rgb='bands')
img_data.plot.imshow(rgb='bands')

Expected behavior
As the file size are much less and using dask, I expected it be executed much smoother(in usual 8 or 16 Gb ram system) and faster(in less than 2-3 mins).

Actual results
During visualization, I encounter too many of these warnings. I'm unsure of how much they are related to the performance issue.
lib\site-packages\dask\core.py:119: RuntimeWarning: invalid value encountered in cos return func(*(_execute_task(a, cache) for a in args))

Environment Info:

  • OS: Windows (also tested on a Linux instance)
  • Satpy Version: 0.43.0.post0
  • PyResample Version: 1.26.1
@pnuu
Copy link
Member

pnuu commented Mar 20, 2024

First thing: do all the loading in the first Scene object. So scn.load([composite]). Loading all the available datasets is unnecessary, and you actually end up resampling them all, too.

Things to try:

  • scn_resmp = scn.resample(..., cache_dir=some_directory_path)
  • scn_resmp = scn.resample(dst_area, resampler="gradient_search")`
    • uses an algorithm relying on data being contiguous, gives the additional bonus of givin bilinear interpolation (can be forced to nearest if necessary)
  • set environment variables to control chunking, number of Dask workers and OpenMP threads
    • DASK_ARRAY__CHUNK_SIZE - number of bytes per chunks. Try for example "32 MiB"
    • DASK_NUM_WORKERS - number of workers. Sometimes less is faster
    • OMP_NUM_THREADS - set to "1" and let Dask handle the parallellization

@djhoese
Copy link
Member

djhoese commented Mar 20, 2024

More details on performance frequently asked questions:

https://satpy.readthedocs.io/en/stable/faq.html

I agree with everything Panu said, but additionally want to point out that if your destination/target area definition for resampling is in the satellite's native projection then there are other options besides resampling with nearest neighbor or gradient search that would likely be faster.

Otherwise, how does your example script compare with what you are actually doing? You have two imshow calls in your code if I'm seeing things correctly. Why is that? When do you notice the large memory usage? Is it a peak memory usage of 30GB or is that the memory usage you see once the plot is displayed? My guess is that a majority of your memory usage is from the plotting and not from Satpy directly. If you saved the data to disk with a dask-friendly writer like "geotiff" then my guess is your processing would be much faster and not take up nearly as much memory, especially after chunk size and number of workers is tweaked.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants