Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: ENH: full set of cuSPARSE wrappers via CFFI with a limited set of higher-level interface routines #88

Open
wants to merge 15 commits into
base: master
Choose a base branch
from

Conversation

grlee77
Copy link
Contributor

@grlee77 grlee77 commented Dec 8, 2014

Hi @lebedov,
I saw that you and @untom had started wrapping cusparse via CFFI, but it didn't look like it had been updated recently. Last week, I needed some CSR routines and looked into CFFI for wrapping them. I figured that if I could autogenerate the wrappers from the header file then it would be nearly as easy to just wrap everything instead. I had never used CFFI previously, but based on your example code in the cffi branch on how to pass the complex data types, etc., I was able to get it to work.

Thank you for your work on the scikits.cuda package. I hope you will find these wrappers to be of use and that after some tweaking that they can eventually be merged into the package.

This pull request differs from what is in your cffi branch in a couple of ways:
1.) I am using the API approach in cffi by passing in text extracted from the cusparse header file rather
than using the library ABI. By reading in the full header, all routines present will get wrapped. The drawback is that compilation is required. This will currently be done upon the first import of scikits.cuda.cusparse

2.) I then made a crude parser/string replacement code that goes through each C function declaration and generates the python wrapper (including docstrings) for each function in the cusparse library
see _cusparse_cffi.py and the supporting routines for _cusparse_cffi_autogen.py for how this part is done. This part works on my machine with CUDA v6.5 and 64-bit linux, but will likely still need a bit of tweaking for other platforms and/or older CUDA versions. Anyone better at string parsing could probably simplify/improve this part considerably.

3.) Once all the low-level functions had been wrapped, I then wrapped some of the ones I intend to use in higher level python functions that will call the function for the appropriate datatype, etc. These higher level wrappers as well as an example CSR class are implemented in cusparse.py.

Specifically, there are simplified, data-type agnostic wrappers for the following routines:

Level2 functions:
csrmv

Level3 functions:
csrmm
csrmm2

extra functions:
csrgeam
csrgemm

conversions:
csc2coo
csc2dense
csr2coo
csr2csc
csr2dense
coo2csc
coo2csr
dense2csc
dense2csr
nnz

I found that it is very easy to get the arguments or data ordering wrong (e.g. cuSPARSE expects dense matrices to be column-major, etc.) when using the low level functions, so these wrappers and the checks they perform greatly simplify things.

5.) There are also tests for a subset of the routines (particularly the specific ones wrapped above) within tests/test_cusparse.py. These can be run via nosetests, but they are not currently implemented as small individual unit tests like I saw in the other test_*.py files.

6.) See also CSR object example I put in the comments to #44

@grlee77
Copy link
Contributor Author

grlee77 commented Dec 22, 2014

Note: On Dec 22, I rebased off of current master, cleaned up the commits into a small set of features and then did a force update.

A copy of the old, messy set of commits can still be seen at:
https://github.com/grlee77/scikits.cuda/commits/cusparse_cffi_backup

@untom
Copy link
Contributor

untom commented Jan 12, 2015

Sorry for not having a look at this sooner. I personally think this looks nice! I like the idea of a CSR Matrix a lot, but I'd stick with some of the usual scipy and PyCuda conventions:

  • merge mm and mm2 into a function dot
  • instead of tocsr_scipy I'd call it get (just like the PyCuda array class has for dense arrays)

return C


if toolkit_version >= (5, 0, 0):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these checks could be better done via a decorator (saves the code from indents)

@untom
Copy link
Contributor

untom commented Jan 12, 2015

Having looked through it in more detail, I think this is really nice work, and the approach could be re-used to wrap other things (CULA/Magma/cuDNN/....). But of course, it's up to @lebedov what to do with this :)

@grlee77
Copy link
Contributor Author

grlee77 commented Jan 14, 2015

I was interested whether the code would work directly with the new CUDA 7.0 RC, so I tried it out yesterday. There are a lot of new functions in 7.0 related to sorting, but otherwise the existing functions only changed in the name of some variables within the header. Due to all the name changes (e.g. csrValA to csrSortedValA, etc.) this would have been very painful without some sort of automated approach along these lines. I wasn't certain if pyCUDA would still work properly with 7.0, but that did not seem to be an issue and all tests in test_cusparse.py are passing for 7.0 RC as well.

The only new issue I ran into was that the release candidate has two duplicate function definitions within the cusparse.h header that were causing CFFI to raise an error. Commenting the duplicate definitions out allowed the wrapping to proceed without issue. I filed a bug report with NVIDIA about that, so hopefully that will not still be the case in the final 7.0 release.

I made a few minor tweaks for python 3 compatibility and verified that all tests pass with python 3.4.

Finally, a few suggestions from @untom were implemented. I did not get to the one about the .dot method yet. Any feedback on the decorator implementation is appreciated. As is, all higher level wrappers can be imported regardless of the underlying CUDA version, but any that are unsupported for that CUDA version will raise a NotImplementedError when called.

On a separate note : There is also a brand new cuSolver library in the 7.0 release candidate that covers eigenvalues, factorizations, SVD, etc. In the future this could be wrapped with a similar approach to replace or supplement much of the functionality from CULA, which seems to no longer be regularly updated.

@Tasignotas
Copy link

When a gpuarray or np.ndarray is passed to the CSR.to_CSR method, the following error occurs:

 B = CSR.to_CSR(B, handle=self.handle)
  File "/sxxxxx/honours/venv/lib/python2.7/site-packages/scikits/cuda/cusparse.py", line 1073, in to_CSR
    (descr, csrVal, csrRowPtr, csrColInd) = dense2csr(A, handle=handle)
  File "/sxxxxx/honours/venv/lib/python2.7/site-packages/scikits/cuda/cusparse.py", line 158, in dense2csr
    csrColIndA = gpuarray.zeros((nnz, ), dtype=np.int32, allocator=alloc)
  File "/sxxxxx/honours/venv/lib/python2.7/site-packages/pycuda/gpuarray.py", line 932, in zeros
    result.fill(zero)
  File "/sxxxxx/honours/venv/lib/python2.7/site-packages/pycuda/gpuarray.py", line 527, in fill
    value, self.gpudata, self.mem_size)
  File "/sxxxxx/honours/venv/lib/python2.7/site-packages/pycuda/driver.py", line 505, in function_prepared_async_call
    arg_buf = pack(func.arg_format, *args)
pycuda._pvt_struct.error: cannot convert argument to long

This happens because when the gpuarray containing csrColIndA is constructed, it's size is set to 0 and its self.gpudata is set to None.
Maybe the CSR.to_CSR method should check for this case and handle it?

@grlee77
Copy link
Contributor Author

grlee77 commented Jan 26, 2015

@Tasignotas
I have not seen this problem. Is your matrix size zero? If so, then nnz=0 and pycuda will raise the error you saw on the following call:

gpuarray.zeros((0,), dtype=np.float64)

If your matrix is not size zero, then this warrents further investigation. Please provide a specific example that gives the error you mention and I will look into it.

There are basic tests of CSR creation here that all pass for me:
https://github.com/grlee77/scikits.cuda/blob/2daeb8625ee3460638ab90ba1b48ea1992ea9a77/tests/test_cusparse.py#L707

Also, I don't know which revision you have checked out, but if you have the latest version you are no longer required to pass in a second handle argument during CSR creation. A default handle is now provided (but the test cases above haven't been updated to reflect that and still generate an explicit handle).

I would remind you that the CUSPARSE wrappers are still in an early development/alpha stage and have not been officially adopted as part of scikits.cuda. I appreciate you using them and providing feedback, but I just wanted to caution you that the CSR object is not finalized and still subject to change.

@grlee77
Copy link
Contributor Author

grlee77 commented Jan 26, 2015

Hello again @Tasignotas,

I think I know what the problem is now. The new behavior of not requiring an explicit handle to be passed resulted in the need to either call cusparse.init() before using the CSR object, or alternatively use the following import:

import scikits.cuda.autoinit

An error will occur if a numpy array that is entirely composed of zeros is passed in. That particular case should be fixed to behave like scipy.sparse does in that case.

@grlee77
Copy link
Contributor Author

grlee77 commented Jul 17, 2015

I have put a slightly updated standalone version of the CFFI wrappers for cuSPARSE (and now cuSOLVER, but no higher level wrappers there) in a repository at https://github.com/grlee77/python-cuda-cffi. I do not have time to do much more on them at the moment, but have put them online in case others find them useful. Pull requests are welcome.

@lebedov: If at some point you have more free time to work on it and want to merge these wrappers into scikit-cuda proper, I am happy to help update this pull request as needed.

@lebedov
Copy link
Owner

lebedov commented Jul 22, 2015

@grlee77: Duly noted. Your wrapper autogeneration is increasingly compelling given the growing number of available GPU libraries. I'll keep you in the loop if/when I have to time to revisit - thanks for your work on this line of approach.

@wonghang
Copy link

Hi, what is the current development plan on skcuda/cusparse.py? Would this PR merge into the master? Would you mind I hand write the wrapper for cuSPARSE?

@lebedov
Copy link
Owner

lebedov commented Jul 23, 2019

@wonghang I don't wish to merge this into master at the present time given that it uses cffi and I don't currently have plans to convert the master branch from ctypes to cffi. If there are specific ctypes wrappers you would like to have added to cusparse.py, feel free to contribute them separately.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants