Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Accelerating performance #31

Open
jmason42 opened this issue Jan 11, 2019 · 2 comments
Open

Accelerating performance #31

jmason42 opened this issue Jan 11, 2019 · 2 comments
Labels
enhancement New feature or request question Further information is requested

Comments

@jmason42
Copy link
Contributor

It seems like the most critical need right now is improved performance. I see three avenues:

  • More Python/NumPy-level performance improvements. Apart from some micro-optimizations, I don't know that there is much more to be gained here (and said micro-optimizations may make future development more cumbersome).
  • Assisted generation of compiled code via tools like Cython or Numba.
    • I've written these sorts of algorithms in Cython before; the code can be painfully opaque but speed-ups are considerable when done right. Unlike Numba, NumPy's RandomState objects can be used fairly effectively by periodically regenerating a large list of random numbers.
    • Numba was promising. However I'm increasingly unconvinced of its ability to accelerate more than a few isolated lines of code, and it doesn't play as nicely with NumPy's random number generation as it ought to.
    • There are other tools (other JIT compilers, high-level modeling languages used in machine learning) that might be useful, but in my experience these often are 1) too restrictive (can't mix with plain Python code), 2) not broad enough (can't handle loops, or useful things like lists), or 3) slower than pure NumPy solutions (my personal experience with Theano).
  • A pure C/C++ implementation, as proposed by @prismofeverything. I can't say much here.

Without a run-off of sorts, I don't know which solution would be best. I lean towards whatever is less complicated and more maintainable. As a final reflection I'll note that parts of NumPy, like the random number generation module, are written using a combination of pure C and Cython.

@jmason42 jmason42 added enhancement New feature or request question Further information is requested labels Jan 11, 2019
@prismofeverything
Copy link
Member

Yeah, I think this is the key. I did try the cython approach and found some good speedups (see cython branch). It was insufficient in that state, but this was before the last two non-cython speed improvements (progressively caching calculations further out the stack), so it may be more viable now. Also, in the cython branch I focused mostly on choose and propensity as a kind of initial beachhead, so probably we could get more wins there by extending it to the rest of the code.

As for the C implementation, I am passing values in between python and C and also set up an interface into a pure C gillespie function (so we can separate the wrapper/communication code from the actual implementation of the algorithm). Learning a lot about the python/C interface which could be useful for the future as well. I will continue down this road and hopefully have some results soon so we can get a sense of how much of an improvement we can expect without having to spend too much time on a fully optimized implementation.

@jmason42 If you have experience with the cython maybe you want to pick up the cython branch I started (or start a new one) and pull in our latest improvements? I think I am going to focus on the C. Maybe between the two of us we can get this thing at the performance we need : )

@jmason42
Copy link
Contributor Author

Sure, I'll take a look at Cython.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants