Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: LinearNDInterpolator >5 times slower than equivalent MATLAB's scatteredInterpolant #19625

Open
Timothy-Anders0n opened this issue Dec 2, 2023 · 4 comments
Labels
query A question or suggestion that requires further information scipy.interpolate

Comments

@Timothy-Anders0n
Copy link

Timothy-Anders0n commented Dec 2, 2023

Describe your issue.

Hi,

I have been updating an old MATLAB source base into a python one but I have come across some difficulty with interpolation.

We simulate a signal (ZZ) on a 2D mesh (XX, YY). We then create a linear interpolator using scattered data to get the following inversion map (x, z) -> y.
Roughly speaking, the ranges of X, Y, and Z are [10, 200], [100, 4095], and [-0.5, 0.5] respectively (important due to rescaling).

While I can retrieve the exact results from our MATLAB test suite by

  • setting SciPy's LinearNDInterpolator's rescale parameter to True (MATLAB does this automatically),
  • setting MATLAB's scatteredInterpolant extrapolation to False (MATLAB does this by default)

Creating the interpolator takes about the same amount of time (< 20 seconds). I found that evaluating the LinearNDInterpolator object takes about 5 times longer than evaluating the scatteredInterpolant object when evaluating the object on my (somewhat large) input dataset.

Disclaimer: I'm using the MATLAB software (R2023 Update 5) on Windows (31.7G RAM) and the same version on WSL2 (15.5G RAM), while I run the python code on WSL2 (15.5G RAM). Using the reproduction scripts below on 1 CPU in WSL2, I don't see any significant additional RAM usage during LinearNDInterpolator's evaluation (<0.01G RAM). Using scatteredInterpolant, I see about 0.20G additional RAM being used during evaluation.

The input data, its data type etc. is all the same and I turned off the parallel pool in my MATLAB client to interference.
Still, I get runtimes of under 2 minutes with MATLAB and runtimes of close to 10 minutes with SciPy using a 40+ millions 2D input dataset.

I wondered if this was linked to a recent issue (#19547) but isn't QHull used when creating the Delaunay triangulation? Would it have an impact when evaluating the interpolant once the interpolator is created?

I don't really know what could make it that SciPy's evaluation takes so much longer than that of MATLAB when the rest of the code has been made much faster with Python/NumPy/SciPy, so I thought it may be useful submitting this.


In the meantime, I looked into ways to parallelize the evaluation of the interpolator using Python's multiprocessing's starmap on 10 CPUs, taking #8856 into account (2018) and splitting the dataset in 100 batches. However, the RAM usage is perpetually increasing the evaluation of the 40+ millions points had not finished after 26 minutes when it started hitting the swap, so I terminated the script. I had checked the script finished when using a very small input dataset, so I'm not sure what's causing the RAM to increase like. Since it's not happening on the sequential version, I'm considering it may be more of a multiprocessing issue though, like not freeing everything upon transferring the results to the main process.

I also looked into using Numba on the interpolator but these functions aren't supported yet. I'm considering looking into the threading library next, so I'll update once it's done.


Note: in the code block below, I reduced the number of 2D-points to evaluate the interpolant on down to 1 000 000. It took my computer 54 seconds using my WSL2 Python environment and only 4 seconds using my MATLAB environment (R2023a Update 5) in Windows and 3.5 seconds in WSL2. I'd have imagined the MATLAB speedup to be lower than x5 when using smaller datasets but this may be due to using a 1 million 2D points randomly generated dataset rather than our more behaved 40+ millions 2D points physical dataset. I could check tomorrow how the speedup scales using the randomly generated dataset.

Reproducing Code Example

## Python
from time import time
from scipy.interpolate import LinearNDInterpolator
from numpy import arange, meshgrid, exp
from numpy.random import randint, rand

X = arange(1, 201, 1, dtype=int)
Y = arange(0, 4096, 1, dtype=int)
XX, YY = meshgrid(X, Y)
ZZ = 1 / (1 + exp(-(XX-100)/100)) - YY/4095 - 0.5

new_X = randint(1, 200, size=1000000)
new_Z = rand(1000000) - 0.5

interpolator = LinearNDInterpolator(list(zip(XX.ravel(), ZZ.ravel())), YY.ravel(), rescale=True)
start = time()
new_Y = interpolator(new_X, new_Z)
print(f"Elapsed time: {time() - start:.2f}")
%% MATLAB
X  = 1:200;
Y  = 0:4095;
[XX, YY] = meshgrid(X, Y);
ZZ = 1 ./ (1 + exp(-(XX-100)/100)) - YY./4095 - 0.5;

new_X = randi([1 200], 1, 1000000);
new_Z = rand(1, 1000000) - 0.5;

interpolator = scatteredInterpolant(XX(:), ZZ(:), YY(:), 'linear', 'none');
tic
new_Y = interpolator(new_X, new_Z);
toc

SciPy/NumPy/Python version and system information

1.11.4 1.26.2 sys.version_info(major=3, minor=11, micro=0, releaselevel='final', serial=0)
Build Dependencies:
  blas:
    detection method: pkgconfig
    found: true
    name: openblas
    openblas configuration: USE_64BITINT= DYNAMIC_ARCH=1 DYNAMIC_OLDER= NO_CBLAS=
      NO_LAPACK=0 NO_LAPACKE= NO_AFFINITY=1 USE_OPENMP=0 PRESCOTT MAX_THREADS=128
    version: 0.3.21
  lapack:
    detection method: pkgconfig
    found: true
    name: openblas
    openblas configuration: USE_64BITINT= DYNAMIC_ARCH=1 DYNAMIC_OLDER= NO_CBLAS=
      NO_LAPACK=0 NO_LAPACKE= NO_AFFINITY=1 USE_OPENMP=0 PRESCOTT MAX_THREADS=128
    version: 0.3.21
  pybind11:
    detection method: pkgconfig
    name: pybind11
    version: 2.10.4
Compilers:
  c:
    commands: /croot/scipy_1701295040508/_build_env/bin/x86_64-conda-linux-gnu-cc
    linker: ld.bfd
    name: gcc
    version: 11.2.0
  c++:
    commands: /croot/scipy_1701295040508/_build_env/bin/x86_64-conda-linux-gnu-c++
    linker: ld.bfd
    name: gcc
    version: 11.2.0
  cython:
    commands: cython
    linker: cython
    name: cython
    version: 0.29.36
  fortran:
    commands: /croot/scipy_1701295040508/_build_env/bin/x86_64-conda-linux-gnu-gfortran
    linker: ld.bfd
    name: gcc
    version: 11.2.0
  pythran:
    include directory: ../../_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_p/lib/python3.11/site-packages/pythran
    version: 0.13.1
Machine Information:
  build:
    cpu: x86_64
    endian: little
    family: x86_64
    system: linux
  cross-compiled: false
  host:
    cpu: x86_64
    endian: little
    family: x86_64
    system: linux
Python Information:
  version: '3.11'
@Timothy-Anders0n Timothy-Anders0n added the defect A clear bug or issue that prevents SciPy from being installed or used as expected label Dec 2, 2023
@Timothy-Anders0n Timothy-Anders0n changed the title BUG: LinearNDInterpolator 5 times slower than equivalent MATLAB's scatteredInterpolant BUG: LinearNDInterpolator >5 times slower than equivalent MATLAB's scatteredInterpolant Dec 3, 2023
@lucascolley lucascolley added scipy.interpolate query A question or suggestion that requires further information labels Dec 3, 2023
@j-bowhay
Copy link
Member

j-bowhay commented Dec 3, 2023

If your data is on a meshgrid you should use scipy.interpolate.RegularGridInterpolator instead

@lucascolley lucascolley removed the defect A clear bug or issue that prevents SciPy from being installed or used as expected label Dec 3, 2023
@Timothy-Anders0n
Copy link
Author

If your data is on a meshgrid you should use scipy.interpolate.RegularGridInterpolator instead

I agree but

  • it does not explain why it's so much slower than the MATLAB equivalent which could mean there is an issue in the implementation;
  • I'm using meshgrid to compute the signal ZZ, with ZZ somewhat randomly distributed in [-0.5, 0.5], but then I'm using the interpolant to invert the map (XX, YY) -> ZZ into (XX, ZZ) -> YY and RegularGridInterpolator requires XX and ZZ to be vectors (so; using vector x and making ZZ into a vector of shape y, the vectors used to make the XX, YY meshgrid). However, ZZ is not the same vector in every column. And that's without considering it requires the values of the input vectors to be monotonously increasing or decreasing, since I haven't found a way to reduce ZZ into a vector of shape y without loss of information anyways.

So in another case I would use RegularGridInterpolator (and have before) but here I cannot, and I'm doing the same in MATLAB though many times faster there which is why I'm posting this issue (and I'll post on math stackoverflow sometimes later for a more elegant way to invert the map, but which will introduce a breaking change from our code base).

@ev-br
Copy link
Member

ev-br commented Dec 3, 2023

To add to what Jake said above:

Alternatively, if you can use GPUs, you may want to keep an eye on cupy/cupy#7985.

@Timothy-Anders0n
Copy link
Author

Timothy-Anders0n commented Dec 3, 2023

To add to what Jake said above:

Alternatively, if you can use GPUs, you may want to keep an eye on cupy/cupy#7985.

Yeah that's the issue with closed source... It's also why we're moving our code base to Python.

Thanks for letting me know where to find the relevant SciPy sources. I'll try to find some time to look at it, though I'm sure I won't be able to contribute much since you, @j-bowhay and the rest of the SciPy team have already well optimized everything beyond what little lambda users like me can provide.

For my short-term use, I'll also look into CuPy as you suggested. Thanks! I'll update this issue if I find anything.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
query A question or suggestion that requires further information scipy.interpolate
Projects
None yet
Development

No branches or pull requests

4 participants