Optimize GeneralBindingModel #90

jchodera · 2017-04-23T16:57:42Z

From some profiling tests, it looks like root solving in the GeneralBindingModel is the dominant cost of the new autoprotocol-based assaytools changes, with total cost being somewhere around 76% of total sampling time spent in calls to scipy.optimize.root here.

I imagine we could greatly speed this up by rewriting the target function using some optimization scheme.

The text was updated successfully, but these errors were encountered:

jchodera · 2017-04-23T16:58:46Z

@pgrinaway @gregoryross @maxentile : If you had any advice on how to optimize the target function with minimal effort, I'd be very appreciative of suggestions!

pgrinaway · 2017-04-24T19:20:37Z

Taking a look at this now.

jchodera · 2017-04-24T19:26:04Z

Thanks!

I realized that even just encoding the quantities as numpy arrays outside of the target function will likely lead to a big speedup, but I was wondering if there was some sort of numba or other thing that might trivially speed this up.

pgrinaway · 2017-04-24T19:33:26Z

Ok, the most obvious way of optimizing this would be to implement it in Cython, but there are a few tricky points:

The code appends to lists--we wouldn't want to do that in Cython, since that would require manipulating Python objects (as opposed to numpy arrays which are well-represented in C).
The code uses dictionaries heavily. This would also diminish performance under Cython, because again it would just be calling the Python interpreter.
That function would have to be removed as a closure, and accept all of the data that it needs (with types fully specified) as parameters. This is not very difficult.

It's not substantially easier with numba--with both numba and Cython you'll be able to take this code and directly compile it (I think--numba might not be happy with the closure, I've never tried it), but you probably won't see much performance increase.

So, it would require a bit of re-thinking how the function works (which might even in pure python/numpy make the code faster). I can take a stab at this, if you want.

pgrinaway · 2017-04-24T19:38:23Z

I don't think numba would be trivial (to get a speedup), but it's easy to try.

jchodera · 2017-04-24T19:42:09Z

I think the key to speeding things up is to speed up just the target function that gets called many times by the root finder. It accepts a numpy array and returns two numpy arrays. I should be able to move some logic outside the target function and do something to speed up the computation inside the target.

We can still have the binding model funciton accept and return dicts, but it can do some preprocessing so that the target function runs as fast as possible on numpy arrays.

maxentile · 2017-04-24T19:43:23Z

Also took a look!

Yeah, numba doesn't know what to do with dicts (code using dictionaries will run in "object mode" / as slow as interpreted code), so those will need to be flattened out into arrays beforehand for numba to have any effect...

@jchodera : Could you point to a few input instances to optimize for? On the input in the doctest, the Pycharm profiler says the most time-consuming functions are in blas, so I think loop-jitting might not produce a substantial speed-up on that instance. However, if there are a lot more reactions, species, or conservation_equations, the interpreter overhead would be more substantial

jchodera · 2017-04-24T19:43:35Z

Here is the description of the calculation in equations, for reference:

http://assaytools.readthedocs.io/en/latest/theory.html#general-binding-model

jchodera · 2017-04-24T19:50:06Z

Whoops, let me post the branch name and info on how to run the example shortly. Sorry about that!

I do realize we don't want to have any dict processing inside the target function. I can do preprocessing outside the target and feed numpy arrays in. But is that the best I can do?

pgrinaway · 2017-04-24T19:53:08Z

I think the key to speeding things up is to speed up just the target function that gets called many times by the root finder. It accepts a numpy array and returns two numpy arrays. I should be able to move some logic outside the target function and do something to speed up the computation inside the target.

Right, this is what I meant. If we deal with dicts and resizing lists inside the target function, it's going to be pretty difficult to optimize.

pgrinaway · 2017-04-24T19:57:26Z

I do realize we don't want to have any dict processing inside the target function. I can do preprocessing outside the target and feed numpy arrays in. But is that the best I can do?

Once you've done that, you can give numba a try. But my experience with numba is that you end up having to write essentially Cythonic /C-style code to get the substantial speed up: see what we ended up with in perses. All types are fully specified, and the numba compiler is even instructed to fail if it can't avoid using the interpreter. I spent a bit of time gradually optimizing those routines, and ended up with what you see in that link.

Same with Cython--you can see what I did in yank. I also gradually made the code more and more C-like to get maximum performance.

pgrinaway · 2017-04-24T19:58:40Z

At first glance it might not work, but you might be able to use numexpr somewhere. I suspect that might take as much effort as refactoring for Cython, though.

jchodera · 2017-04-24T19:59:01Z

Thanks! This is super helpful!

jchodera · 2017-04-24T20:23:33Z

No need to spend time with this, but since someone asked, you want to check out the optimizer-speedup branch (which prints some output about how much time is spent in the fzero() call), and try running examples/autoprotocol/single-wavelength-probe-assay.py if you want to try it out.

@maxentile may be totally right that the time-consuming part may not be the target function computation. I hadn't even considered that!

I had tried adding an LRU cache to store the most recent 128 or 1024 evaluations, but that did not seem to improve performance.

pgrinaway · 2017-04-24T21:12:28Z

Ok, I've profiled that script, and the top two functions that are causing slowness are:

values from WeakValueDictionary
ftarget (this is actually 47% of the total time, though its "own" time is 8.5%). One of the big consumers there is new_method, which is something deep in PyMC.

I will post the profiling results in full later, but this does suggest that ftarget could stand to be optimized.

pgrinaway · 2017-04-24T21:17:34Z

Ah, new_method is a function inside the function create_uni_method in CommonDeterministics. Not sure why it gets called every time ftarget does.

pgrinaway · 2017-04-24T21:18:44Z

Sorry for the spam, but it seems to me like it is bits of PyMC that are quite slow, and that the ftarget function itself might not be so bad.

jchodera · 2017-04-24T22:39:47Z

That's really odd. The ftarget function should be called many times each call to equilibrium_concentrations (which is called once inside a couple of deterministics). pymc may be doing some weird wrapping of things, or perhaps a weird unwrapping of things where pymc variables are being substituted for floats deep inside the ftarget function.

I wish I knew more about how pymc worked, since this suggests that we could speed things up by unwrapping the input pymc variables once and feeding them to a pre-constructed function that only dealt in numpy floats to do the optimization with pre-computed matrices.

In other news, trying to figure out how to recode this all for tensorflow or edward is also not proving easy, since tensorflow compute graphs can't have nodes that are iterative operations like function solvers.

pgrinaway · 2017-04-25T00:18:01Z

That's really odd. The ftarget function should be called many times each call to equilibrium_concentrations

It is called several times more than equilibrium_concentrations, you're correct.

pymc may be doing some weird wrapping of things, or perhaps a weird unwrapping of things where pymc variables are being substituted for floats deep inside the ftarget function.

I suspect this is the case. It seems like some sort of pymc variable is being instantiated each call, which, according to profiling, is extremely expensive.

In other news, trying to figure out how to recode this all for tensorflow or edward is also not proving easy, since tensorflow compute graphs can't have nodes that are iterative operations like function solvers.

Right, that's unfortunate. What about just coding it up in regular Python with numpy? It would probably be relatively fast, pythonic, nicely extensible using just Python, etc., and if some further optimization is necessary, numpy plays very nicely with Cython.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize GeneralBindingModel #90

Optimize GeneralBindingModel #90

jchodera commented Apr 23, 2017

jchodera commented Apr 23, 2017

pgrinaway commented Apr 24, 2017

jchodera commented Apr 24, 2017

pgrinaway commented Apr 24, 2017

pgrinaway commented Apr 24, 2017

jchodera commented Apr 24, 2017

maxentile commented Apr 24, 2017

jchodera commented Apr 24, 2017

jchodera commented Apr 24, 2017

pgrinaway commented Apr 24, 2017

pgrinaway commented Apr 24, 2017

pgrinaway commented Apr 24, 2017

jchodera commented Apr 24, 2017

jchodera commented Apr 24, 2017

pgrinaway commented Apr 24, 2017

pgrinaway commented Apr 24, 2017

pgrinaway commented Apr 24, 2017

jchodera commented Apr 24, 2017

pgrinaway commented Apr 25, 2017

Optimize GeneralBindingModel #90

Optimize GeneralBindingModel #90

Comments

jchodera commented Apr 23, 2017

jchodera commented Apr 23, 2017

pgrinaway commented Apr 24, 2017

jchodera commented Apr 24, 2017

pgrinaway commented Apr 24, 2017

pgrinaway commented Apr 24, 2017

jchodera commented Apr 24, 2017

maxentile commented Apr 24, 2017

jchodera commented Apr 24, 2017

jchodera commented Apr 24, 2017

pgrinaway commented Apr 24, 2017

pgrinaway commented Apr 24, 2017

pgrinaway commented Apr 24, 2017

jchodera commented Apr 24, 2017

jchodera commented Apr 24, 2017

pgrinaway commented Apr 24, 2017

pgrinaway commented Apr 24, 2017

pgrinaway commented Apr 24, 2017

jchodera commented Apr 24, 2017

pgrinaway commented Apr 25, 2017