WIP: ENH: full set of cuSPARSE wrappers via CFFI with a limited set of higher-level interface routines #88

grlee77 · 2014-12-08T17:06:11Z

Hi @lebedov,
I saw that you and @untom had started wrapping cusparse via CFFI, but it didn't look like it had been updated recently. Last week, I needed some CSR routines and looked into CFFI for wrapping them. I figured that if I could autogenerate the wrappers from the header file then it would be nearly as easy to just wrap everything instead. I had never used CFFI previously, but based on your example code in the cffi branch on how to pass the complex data types, etc., I was able to get it to work.

Thank you for your work on the scikits.cuda package. I hope you will find these wrappers to be of use and that after some tweaking that they can eventually be merged into the package.

This pull request differs from what is in your cffi branch in a couple of ways:
1.) I am using the API approach in cffi by passing in text extracted from the cusparse header file rather
than using the library ABI. By reading in the full header, all routines present will get wrapped. The drawback is that compilation is required. This will currently be done upon the first import of scikits.cuda.cusparse

2.) I then made a crude parser/string replacement code that goes through each C function declaration and generates the python wrapper (including docstrings) for each function in the cusparse library
see _cusparse_cffi.py and the supporting routines for _cusparse_cffi_autogen.py for how this part is done. This part works on my machine with CUDA v6.5 and 64-bit linux, but will likely still need a bit of tweaking for other platforms and/or older CUDA versions. Anyone better at string parsing could probably simplify/improve this part considerably.

3.) Once all the low-level functions had been wrapped, I then wrapped some of the ones I intend to use in higher level python functions that will call the function for the appropriate datatype, etc. These higher level wrappers as well as an example CSR class are implemented in cusparse.py.

Specifically, there are simplified, data-type agnostic wrappers for the following routines:

Level2 functions:
csrmv

Level3 functions:
csrmm
csrmm2

extra functions:
csrgeam
csrgemm

conversions:
csc2coo
csc2dense
csr2coo
csr2csc
csr2dense
coo2csc
coo2csr
dense2csc
dense2csr
nnz

I found that it is very easy to get the arguments or data ordering wrong (e.g. cuSPARSE expects dense matrices to be column-major, etc.) when using the low level functions, so these wrappers and the checks they perform greatly simplify things.

5.) There are also tests for a subset of the routines (particularly the specific ones wrapped above) within tests/test_cusparse.py. These can be run via nosetests, but they are not currently implemented as small individual unit tests like I saw in the other test_*.py files.

6.) See also CSR object example I put in the comments to #44

grlee77 · 2014-12-22T22:00:43Z

Note: On Dec 22, I rebased off of current master, cleaned up the commits into a small set of features and then did a force update.

A copy of the old, messy set of commits can still be seen at:
https://github.com/grlee77/scikits.cuda/commits/cusparse_cffi_backup

untom · 2015-01-12T20:47:59Z

Sorry for not having a look at this sooner. I personally think this looks nice! I like the idea of a CSR Matrix a lot, but I'd stick with some of the usual scipy and PyCuda conventions:

merge mm and mm2 into a function dot
instead of tocsr_scipy I'd call it get (just like the PyCuda array class has for dense arrays)

untom · 2015-01-12T20:52:19Z

scikits/cuda/cusparse.py

+        return C
+
+
+if toolkit_version >= (5, 0, 0):


I think these checks could be better done via a decorator (saves the code from indents)

untom · 2015-01-12T20:53:50Z

Having looked through it in more detail, I think this is really nice work, and the approach could be re-used to wrap other things (CULA/Magma/cuDNN/....). But of course, it's up to @lebedov what to do with this :)

…ilable

grlee77 · 2015-01-14T15:48:50Z

I was interested whether the code would work directly with the new CUDA 7.0 RC, so I tried it out yesterday. There are a lot of new functions in 7.0 related to sorting, but otherwise the existing functions only changed in the name of some variables within the header. Due to all the name changes (e.g. csrValA to csrSortedValA, etc.) this would have been very painful without some sort of automated approach along these lines. I wasn't certain if pyCUDA would still work properly with 7.0, but that did not seem to be an issue and all tests in test_cusparse.py are passing for 7.0 RC as well.

The only new issue I ran into was that the release candidate has two duplicate function definitions within the cusparse.h header that were causing CFFI to raise an error. Commenting the duplicate definitions out allowed the wrapping to proceed without issue. I filed a bug report with NVIDIA about that, so hopefully that will not still be the case in the final 7.0 release.

I made a few minor tweaks for python 3 compatibility and verified that all tests pass with python 3.4.

Finally, a few suggestions from @untom were implemented. I did not get to the one about the .dot method yet. Any feedback on the decorator implementation is appreciated. As is, all higher level wrappers can be imported regardless of the underlying CUDA version, but any that are unsupported for that CUDA version will raise a NotImplementedError when called.

On a separate note : There is also a brand new cuSolver library in the 7.0 release candidate that covers eigenvalues, factorizations, SVD, etc. In the future this could be wrapped with a similar approach to replace or supplement much of the functionality from CULA, which seems to no longer be regularly updated.

… refactor

Tasignotas · 2015-01-25T20:54:12Z

When a gpuarray or np.ndarray is passed to the CSR.to_CSR method, the following error occurs:

 B = CSR.to_CSR(B, handle=self.handle)
  File "/sxxxxx/honours/venv/lib/python2.7/site-packages/scikits/cuda/cusparse.py", line 1073, in to_CSR
    (descr, csrVal, csrRowPtr, csrColInd) = dense2csr(A, handle=handle)
  File "/sxxxxx/honours/venv/lib/python2.7/site-packages/scikits/cuda/cusparse.py", line 158, in dense2csr
    csrColIndA = gpuarray.zeros((nnz, ), dtype=np.int32, allocator=alloc)
  File "/sxxxxx/honours/venv/lib/python2.7/site-packages/pycuda/gpuarray.py", line 932, in zeros
    result.fill(zero)
  File "/sxxxxx/honours/venv/lib/python2.7/site-packages/pycuda/gpuarray.py", line 527, in fill
    value, self.gpudata, self.mem_size)
  File "/sxxxxx/honours/venv/lib/python2.7/site-packages/pycuda/driver.py", line 505, in function_prepared_async_call
    arg_buf = pack(func.arg_format, *args)
pycuda._pvt_struct.error: cannot convert argument to long

This happens because when the gpuarray containing csrColIndA is constructed, it's size is set to 0 and its self.gpudata is set to None.
Maybe the CSR.to_CSR method should check for this case and handle it?

grlee77 · 2015-01-26T16:29:56Z

@Tasignotas
I have not seen this problem. Is your matrix size zero? If so, then nnz=0 and pycuda will raise the error you saw on the following call:

gpuarray.zeros((0,), dtype=np.float64)

If your matrix is not size zero, then this warrents further investigation. Please provide a specific example that gives the error you mention and I will look into it.

There are basic tests of CSR creation here that all pass for me:
https://github.com/grlee77/scikits.cuda/blob/2daeb8625ee3460638ab90ba1b48ea1992ea9a77/tests/test_cusparse.py#L707

Also, I don't know which revision you have checked out, but if you have the latest version you are no longer required to pass in a second handle argument during CSR creation. A default handle is now provided (but the test cases above haven't been updated to reflect that and still generate an explicit handle).

I would remind you that the CUSPARSE wrappers are still in an early development/alpha stage and have not been officially adopted as part of scikits.cuda. I appreciate you using them and providing feedback, but I just wanted to caution you that the CSR object is not finalized and still subject to change.

grlee77 · 2015-01-26T16:55:31Z

Hello again @Tasignotas,

I think I know what the problem is now. The new behavior of not requiring an explicit handle to be passed resulted in the need to either call cusparse.init() before using the CSR object, or alternatively use the following import:

import scikits.cuda.autoinit

An error will occur if a numpy array that is entirely composed of zeros is passed in. That particular case should be fixed to behave like scipy.sparse does in that case.

grlee77 · 2015-07-17T19:56:02Z

I have put a slightly updated standalone version of the CFFI wrappers for cuSPARSE (and now cuSOLVER, but no higher level wrappers there) in a repository at https://github.com/grlee77/python-cuda-cffi. I do not have time to do much more on them at the moment, but have put them online in case others find them useful. Pull requests are welcome.

@lebedov: If at some point you have more free time to work on it and want to merge these wrappers into scikit-cuda proper, I am happy to help update this pull request as needed.

lebedov · 2015-07-22T14:56:36Z

@grlee77: Duly noted. Your wrapper autogeneration is increasingly compelling given the growing number of available GPU libraries. I'll keep you in the loop if/when I have to time to revisit - thanks for your work on this line of approach.

wonghang · 2019-06-27T14:44:18Z

Hi, what is the current development plan on skcuda/cusparse.py? Would this PR merge into the master? Would you mind I hand write the wrapper for cuSPARSE?

lebedov · 2019-07-23T02:23:43Z

@wonghang I don't wish to merge this into master at the present time given that it uses cffi and I don't currently have plans to convert the master branch from ctypes to cffi. If there are specific ctypes wrappers you would like to have added to cusparse.py, feel free to contribute them separately.

grlee77 added 4 commits December 22, 2014 16:48

cusparse CFFI autowrapping implemented

50e46dd

cusparse: some higher level wrappers added

09b4d5c

cusparse tests added

d07c25f

higher lever CSR object added to cusparse

718ee1d

grlee77 force-pushed the cusparse_cffi branch from 1bb60ca to 718ee1d Compare December 22, 2014 21:55

grlee77 added 2 commits December 22, 2014 18:20

cusparse: use global handle and allocator, analagous to cublas case

0c589b5

a few additional tests added

a18f8d8

untom reviewed Jan 12, 2015
View reviewed changes

grlee77 added 7 commits January 13, 2015 17:45

nbytes property added

ded16c6

support CUDA 7.0

92f119f

minor changes for python 3 compatibility

83c93e1

print details of the exception if CFFI autogeneration fails

9496ebf

use unittest.skipIf decorator to skip tests based on CUDA version ava…

7314fc8

…ilable

do version checks in cusparse.py using @defineIf decorator

ce5b901

rename CSR.tocsr_scipy() to CSR.get()

fea1c16

grlee77 added 2 commits January 16, 2015 13:10

update generate_func_descriptions_json for v7.0

38b3bc2

added .json files back to MANIFEST.in. was missing after the previous…

2daeb86

… refactor

bmenn mentioned this pull request Mar 1, 2017

cuSolver eigen decomposition support #189

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: ENH: full set of cuSPARSE wrappers via CFFI with a limited set of higher-level interface routines #88

WIP: ENH: full set of cuSPARSE wrappers via CFFI with a limited set of higher-level interface routines #88

grlee77 commented Dec 8, 2014

grlee77 commented Dec 22, 2014

untom commented Jan 12, 2015

untom Jan 12, 2015

untom commented Jan 12, 2015

grlee77 commented Jan 14, 2015

Tasignotas commented Jan 25, 2015

grlee77 commented Jan 26, 2015

grlee77 commented Jan 26, 2015

grlee77 commented Jul 17, 2015

lebedov commented Jul 22, 2015

wonghang commented Jun 27, 2019

lebedov commented Jul 23, 2019

WIP: ENH: full set of cuSPARSE wrappers via CFFI with a limited set of higher-level interface routines #88

Are you sure you want to change the base?

WIP: ENH: full set of cuSPARSE wrappers via CFFI with a limited set of higher-level interface routines #88

Conversation

grlee77 commented Dec 8, 2014

grlee77 commented Dec 22, 2014

untom commented Jan 12, 2015

untom Jan 12, 2015

Choose a reason for hiding this comment

untom commented Jan 12, 2015

grlee77 commented Jan 14, 2015

Tasignotas commented Jan 25, 2015

grlee77 commented Jan 26, 2015

grlee77 commented Jan 26, 2015

grlee77 commented Jul 17, 2015

lebedov commented Jul 22, 2015

wonghang commented Jun 27, 2019

lebedov commented Jul 23, 2019