Too much memory usage for composite processing #2764

akasom89 · 2024-03-20T17:57:09Z

Describe the bug
for creating composite products (even when I ignore atmospheric correction) out of ABI imagery, peak memory usage exceeds 30 GB. I suspect something may be going wrong, as it also takes over 8 minutes. Are there any best practices for increasing the speed? For example, should we tweak parameters such as chunk size to find the optimum? Additionally, can we implement caching or pre-compute certain data (as the field of view of ABI is fixed) to increase the speed for subsequent runs?

To Reproduce

scn.load(scn.available_dataset_names())
scn_resmp = scn.resample(destination=dst_area_def, radius_of_influence=50000)
composite = 'true_color_raw'
scn_resmp.load([composite])
dataset = scn_resmp[composite]

plt.figure()
img = get_enhanced_image(dataset)
img_data = img.data
img_data.plot.imshow(vmin=0, vmax=1, rgb='bands')
img_data.plot.imshow(rgb='bands')

Expected behavior
As the file size are much less and using dask, I expected it be executed much smoother(in usual 8 or 16 Gb ram system) and faster(in less than 2-3 mins).

Actual results
During visualization, I encounter too many of these warnings. I'm unsure of how much they are related to the performance issue.
lib\site-packages\dask\core.py:119: RuntimeWarning: invalid value encountered in cos return func(*(_execute_task(a, cache) for a in args))

Environment Info:

OS: Windows (also tested on a Linux instance)
Satpy Version: 0.43.0.post0
PyResample Version: 1.26.1

pnuu · 2024-03-20T18:20:37Z

First thing: do all the loading in the first Scene object. So scn.load([composite]). Loading all the available datasets is unnecessary, and you actually end up resampling them all, too.

Things to try:

scn_resmp = scn.resample(..., cache_dir=some_directory_path)
scn_resmp = scn.resample(dst_area, resampler="gradient_search")`
- uses an algorithm relying on data being contiguous, gives the additional bonus of givin bilinear interpolation (can be forced to nearest if necessary)
set environment variables to control chunking, number of Dask workers and OpenMP threads
- DASK_ARRAY__CHUNK_SIZE - number of bytes per chunks. Try for example "32 MiB"
- DASK_NUM_WORKERS - number of workers. Sometimes less is faster
- OMP_NUM_THREADS - set to "1" and let Dask handle the parallellization

djhoese · 2024-03-20T18:32:52Z

More details on performance frequently asked questions:

https://satpy.readthedocs.io/en/stable/faq.html

I agree with everything Panu said, but additionally want to point out that if your destination/target area definition for resampling is in the satellite's native projection then there are other options besides resampling with nearest neighbor or gradient search that would likely be faster.

Otherwise, how does your example script compare with what you are actually doing? You have two imshow calls in your code if I'm seeing things correctly. Why is that? When do you notice the large memory usage? Is it a peak memory usage of 30GB or is that the memory usage you see once the plot is displayed? My guess is that a majority of your memory usage is from the plotting and not from Satpy directly. If you saved the data to disk with a dask-friendly writer like "geotiff" then my guess is your processing would be much faster and not take up nearly as much memory, especially after chunk size and number of workers is tweaked.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Too much memory usage for composite processing #2764

Too much memory usage for composite processing #2764

akasom89 commented Mar 20, 2024

pnuu commented Mar 20, 2024

djhoese commented Mar 20, 2024

Too much memory usage for composite processing #2764

Too much memory usage for composite processing #2764

Comments

akasom89 commented Mar 20, 2024

pnuu commented Mar 20, 2024

djhoese commented Mar 20, 2024