Get rid of the shared memory arrays : add a colfile mmap option #265

jonwright · 2024-03-19T07:07:26Z

In sinograms/properties.py and sinograms/point_by_point.py the code works via shared memory.
This does not scale beyond one node and it fails on python2.7.

For global read-only memory we could use mmap with numpy on a non-compressed hdf5 file (https://gist.github.com/maartenbreddels/09e1da79577151e5f7fec660c209f06e):

assert dset.chunks is None and dset.compression is None and dset.is_virtual is None and dset.is_external is None and etc.
file = open(path, "rb")
fileno = file.fileno()
mapping = mmap.mmap(fileno, 0, access=mmap.ACCESS_READ)
np.frombuffer(mapping, dtype=dset.dtype, count=dset.size, offset=dset.id.get_offset()).reshape(dset.shape)

This may be useful for reducing some out-of-memory problems.

Another upgrade path could be looking into dask.dataframe for distributed processing.

The text was updated successfully, but these errors were encountered:

jonwright · 2024-05-21T13:24:58Z

To make some progress try to break this up into a bunch of smaller tasks:

Point by point code: write a hdf colfile. Each worker process reads it during pool init.

Properties.py is more challenging:

Make lima_segmenter record the pixel -> peak labeling for both of the cp and lm schemes.
Refactor lima_segmenter to write fewer files (e.g. one per process?).
Labels will be saved with pixels.
Peaks2d properties array (s1, sI, scI, srI, id) are available during segmenting to be saved with sparse pixels.
Check the io speed / size with/without compression for saving pixels peaks. Pick something.
Find and save the overlaps. This is one 'peaksearch' per overlap dimension. Output of a form (peak_i, peak_j, score).
Determine the peaks3d labels across omega or dty
Determine the peaks4d labels across the sinogram

jonwright · 2024-05-21T13:26:58Z

Note : multiprocessing + shared memory seems to be buggy. The remove from tracker monkeypatch does not work. Abandon it.

Exception ignored in: <Finalize object, dead>
Traceback (most recent call last):
  File "/cvmfs/hpc.esrf.fr/software/packages/linux/x86_64/jupyter-slurm/2023.10.7/envs/jupyter-slurm/lib/python3.11/multiprocessing/util.py", line 224, in __call__
    res = self._callback(*self._args, **self._kwargs)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/cvmfs/hpc.esrf.fr/software/packages/linux/x86_64/jupyter-slurm/2023.10.7/envs/jupyter-slurm/lib/python3.11/multiprocessing/synchronize.py", line 87, in _cleanup
    sem_unlink(name)
FileNotFoundError: [Errno 2] No such file or directory

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Get rid of the shared memory arrays : add a colfile mmap option #265

Get rid of the shared memory arrays : add a colfile mmap option #265

jonwright commented Mar 19, 2024

jonwright commented May 21, 2024 •

edited

jonwright commented May 21, 2024

Get rid of the shared memory arrays : add a colfile mmap option #265

Get rid of the shared memory arrays : add a colfile mmap option #265

Comments

jonwright commented Mar 19, 2024

jonwright commented May 21, 2024 • edited

jonwright commented May 21, 2024

jonwright commented May 21, 2024 •

edited