New class to help design experiments based on assaytools MCMC data #80

gregoryross · 2017-04-07T14:41:18Z

A class and function have been written to estimate the coefficient of variation, variance, and bias of fluorescence experiments at different assay parameters, such as the protein and ligand concentrations.

This PR includes a Jupyter notebook to demonstrate the functioning of these tools on assaytools PyMC data on the p38-Bosutinib complex. This PR has been opened so we can discuss what other functionality we need from this, or if there are any glaring errors. Currently, the files are kept in a separate folder, but these can be moved to more appropriate locations once before this PR is merged.

To do

Probably change the model for the detector noise (currently using Gaussian noise)
unit tests (contingent on having a small sized PyMC data file)

…entratin in assays

…nt of variation and total error of future experiments as a function of protein concetration.

… concentrations. Cleaned up functions in AssaySimulator.

…ance, and bias for different experimental parameters. The jupyter notebook was updated to reflect this. Two superfluous files were removed.

…other is superfluous.

gregoryross · 2017-04-07T14:41:45Z

Tagging @sonyahanson, @MehtapIsik, and @jchodera.

sonyahanson · 2017-04-07T19:24:50Z

Thanks! Will take a look!

jchodera · 2017-04-07T22:23:33Z

Whoa, the CV of the binding free energy is 1000-9000%?! This sounds extremely horribly wrong.

jchodera · 2017-04-07T22:24:35Z

And the relative error is also in the binding free energy and varies 80-180%?

This all sounds suspiciously wrong. Errors this high would mean that the approach has almost no value.

jchodera

I've gone through assay_simulator.py.

Currently, it seems that the only source of uncertainty in your bootstrap samples is the initial guess for DeltaG, and that the least-squares optimization results in something like a CV of 9000%, which suggests to me that the least-squares fit is so unstable as to be unusable.

There may be some other unit errors or bugs, though I didn't spot any in my pass through.

It would probably be useful to plot what is going on with the simulated fluorescence and resulting least-squares fit to see why it is getting stuck so far from the solution.

We should really be sure to incorporate the sources of uncertainty we think are important into this model. I was hoping that you could use the same pymc model to simulate the sources of uncertainty rather than have to recode everything---I think there's a way to do this---but if we do implement everything from scratch as is done here, we would want to incorporate:

protein concentration (something like ~10% uncertain)
ligand dispensed concentrations (~8% per well)
measurement noise and the noise floor (or background fluorescence), since this determines minimum detectable quantity of complex

jchodera · 2017-04-08T16:50:20Z

experimental_design/assay_simulator.py

+            if noisy:
+                Fmodel += np.random.normal(loc=0.0, scale=self.sigma, size=len(self.l_total))
+        else:
+            Fmodel = self.F_PL * PL + self.F_L * L_free + self.F_P * P_free + self.F_buffer * self.path_length


Is this missing self.F_plate?

It is, adding it in now.

jchodera · 2017-04-08T16:52:22Z

experimental_design/assay_simulator.py

+            """
+            The sum of squares between model fluorescence and the target
+            """
+            model = self.simulate_fluorescence(DeltaG, p_total, noisy=False)


Shouldn't we be adding measurement noise here if we want to quantify the CV?

This is the loss function that is minimized during the fitting, so the prediction (called model here) must be a deterministic function of the parameters for gradient decent. The measurement noise is added a few lines above this:

target = self.simulate_fluorescence(p_total)

The target fluorescence has the noise added by default. As this is not clear, I'll change this to

target = self.simulate_fluorescence(p_total, noisy=True)

jchodera · 2017-04-08T16:53:36Z

experimental_design/assay_simulator.py

+            simulator = AssaySimulator(pymc_data=pymc_data, l_total=l_total, sample_index=ind,  p_total=p_total[p], **kwargs)
+            # Draw fluorescence data with different values of random noise
+            for j in range(nsamples):
+                fit = simulator.fit_deltaG()


Since fit_deltaG currently doesn't add any noise, won't this omit the uncertainty in the measurements?

jchodera · 2017-04-08T16:54:34Z

experimental_design/assay_simulator.py

+            return np.sum((model - target)**2)
+
+        # Start the initial guess within about 10% of the "true" value
+        guess = self.DeltaG + np.random.normal(loc=0, scale=0.1 * np.abs(self.DeltaG))


It looks like this is the only source of uncertainty currently being added to your model---the initial starting guess for the least-squares fit---since noisy=False in sum_of_squares above. Is that right?

As described above, noise is added when the fluorescence data is simulated. https://github.com/choderalab/assaytools/blob/experiment_design/experimental_design/assay_simulator.py#L178-L179

As this isn't clear, I improve the documentation.

jchodera · 2017-04-09T20:49:13Z

I'm wondering if there might be a way to replace AssaySimulator with a much simpler class.

Consider the following scheme:

Generate a pymc model with pymcmodels.make_model
Run the normal MCMC sampling with pymcmodels.run_mcmc(pymc_model)
Replace any stochastic variables in the resulting model that we want to fix (such as the true thermodynamic parameters and fluorescence yields) with point estimates by setting their observed attribute to True. The simplest thing to do is to fix them at their final values by only changing observed (possibly after a final MLE estimate), but we could replace the values with averages over the MCMC run if we want.
Change the observed fluorescence random variables attributes from observed=True to observed=False so that these will be sampled as random variables
Run the MCMC simulation again to generate realizations of the experiment using exactly the same model
Fit these realizations with MCMC

We can then change the relevant design parameters---the stated protein concentration, the assay volume/area, and the ligand concentrations---and see how this affects things.

Does that sound like it would be a viable approach? I haven't had a chance to play with this myself yet, but it seems like this kind of imputation is possible with pymc2.

-improving doctrings and a few comments -removing extra factor of 100 from CV estimation -added background fluorescence to data simulation when inner_filter=False

gregoryross · 2017-04-10T16:01:53Z

Thanks a lot for going through this @jchodera.

The CV predictions had an extra factor of 100 applied to them when plotting (I'd recently changed where I converted the fractions to percent). This has now been removed such that the minimum CV has a value of ~30%.

With regards to some of your comments, currently the only source of noise comes from fluorescence noise parametrized by either sigma_top or sigma_bottom. Every time AssaySimulator.fit_deltaG() is called, a target fluorescence profile is generated, with added noise, and the free energy is estimated using least-squares minimization.

I like your suggestion of using pymc itself for generating the data. The biggest pro would be that if the fluorescence model is changed, the experimental-design part would not require any modifications. The reason why I opted to write a separate class to estimate errors is that one doesn't need to run any more MCMC after the assaytools parameters have been estimated: the class uses the samples that have already been generated. The class itself is very simple, and is really just an interface to read the parameters from pymc and perform least-squares regression.

As for you suggestion about allowing the protein and ligand concentration to change for each fit, I can easily add that in now if you and @sonyahanson like?

sonyahanson · 2017-04-11T18:24:31Z

Looks good to me. It'd be nice to be able to get the actual delG predictions from predict_assay_error as well as the error, with the ultimate goal of making a figure like this:

…tionality of AssaySimulator was slighlty changed so that instead of fitting a single binding free energy, multiple estimates of the free energy are returned instead.

merging master into experiment_design

sonyahanson · 2017-05-16T14:15:50Z

Added additions that allow this to work with the new logNormal wrapper. Also added a section of the notebook to see how CV's dependence on [P] varies by deltaG.

sonyahanson · 2017-05-18T13:08:42Z

Three things:

In the new notebook the CV/error plots in lines 13 and 14 don't show the trend seen in the notebook below where the delG is fixed. @gregoryross mentioned this might be due to the error being estimated that is not compatible with the new lognormal formatting?
Tried to run this on a different dataset with higher affinity ligand, and make the last set of plots for DeltaG's of -18, -17, -16, -15, and -14. At the lower affinities this modeling finished in about two hours, for the higher affinities, I let it run for 24 hours and it never finished.
It would be good to be able to get out the predicted total fluorescence as a function of ligand for at least some of the simulated datasets. This is probably pretty simple, but noting it here.

jchodera · 2017-05-18T23:23:57Z

@sonyahanson : I don't understand the description of the problems you list above, so come find me in meatspace if I can help.

…intial guess starts at the true value. In addition, the gradient tolerance of the BFGS optimizer was also readuced for speed.

…tted to is now outputed. Notebook updated accordingly.

sonyahanson · 2017-06-05T19:31:13Z

Thanks, Greg! All three points addressed with the last two commits. Currently running a notebook to test on a higher affinity dataset.

gregoryross added 5 commits March 29, 2017 17:44

Initial commit of class that can be used to optimize the protein conc…

12760cf

…entratin in assays

Apparantly functioning version of code. It can estimate the coefficie…

302d517

…nt of variation and total error of future experiments as a function of protein concetration.

Tidied up example for using AssaySimulator to predict optimal protein…

c3e2bb3

… concentrations. Cleaned up functions in AssaySimulator.

Added a simple function to compute the coefficient of variation, vari…

8367369

…ance, and bias for different experimental parameters. The jupyter notebook was updated to reflect this. Two superfluous files were removed.

Removed the tracking of two files. One is just a placeholder and the …

abf8d83

…other is superfluous.

jchodera reviewed Apr 8, 2017

View reviewed changes

Addressed comments by jchodera, including:

aec0a1e

-improving doctrings and a few comments -removing extra factor of 100 from CV estimation -added background fluorescence to data simulation when inner_filter=False

gregoryross and others added 5 commits April 12, 2017 14:32

Fixed bug that resulted in abnormally high predicted errors. The func…

97f08a5

…tionality of AssaySimulator was slighlty changed so that instead of fitting a single binding free energy, multiple estimates of the free energy are returned instead.

Merge pull request #103 from choderalab/master

668264f

merging master into experiment_design

updated assay simulator to be able to vary delG

feae9d5

updated protein conc for new format

5ef1e2a

added deltaG variation to ipynb

be1a08a

gregoryross added 2 commits June 2, 2017 16:19

To speed-up the run-time of the least-squares fitting of DeltaG, the …

12db131

…intial guess starts at the true value. In addition, the gradient tolerance of the BFGS optimizer was also readuced for speed.

For each estimated vector of Delta Gs, the intensity that each was fi…

6480347

…tted to is now outputed. Notebook updated accordingly.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New class to help design experiments based on assaytools MCMC data #80

New class to help design experiments based on assaytools MCMC data #80

gregoryross commented Apr 7, 2017

gregoryross commented Apr 7, 2017

sonyahanson commented Apr 7, 2017

jchodera commented Apr 7, 2017

jchodera commented Apr 7, 2017

jchodera left a comment

jchodera Apr 8, 2017

gregoryross Apr 10, 2017

jchodera Apr 8, 2017

gregoryross Apr 10, 2017 •

edited

jchodera Apr 8, 2017

jchodera Apr 8, 2017

gregoryross Apr 10, 2017

jchodera commented Apr 9, 2017 •

edited

gregoryross commented Apr 10, 2017

sonyahanson commented Apr 11, 2017

sonyahanson commented May 16, 2017

sonyahanson commented May 18, 2017

jchodera commented May 18, 2017

sonyahanson commented Jun 5, 2017 •

edited

New class to help design experiments based on assaytools MCMC data #80

Are you sure you want to change the base?

New class to help design experiments based on assaytools MCMC data #80

Conversation

gregoryross commented Apr 7, 2017

gregoryross commented Apr 7, 2017

sonyahanson commented Apr 7, 2017

jchodera commented Apr 7, 2017

jchodera commented Apr 7, 2017

jchodera left a comment

Choose a reason for hiding this comment

jchodera Apr 8, 2017

Choose a reason for hiding this comment

gregoryross Apr 10, 2017

Choose a reason for hiding this comment

jchodera Apr 8, 2017

Choose a reason for hiding this comment

gregoryross Apr 10, 2017 • edited

Choose a reason for hiding this comment

jchodera Apr 8, 2017

Choose a reason for hiding this comment

jchodera Apr 8, 2017

Choose a reason for hiding this comment

gregoryross Apr 10, 2017

Choose a reason for hiding this comment

jchodera commented Apr 9, 2017 • edited

gregoryross commented Apr 10, 2017

sonyahanson commented Apr 11, 2017

sonyahanson commented May 16, 2017

sonyahanson commented May 18, 2017

jchodera commented May 18, 2017

sonyahanson commented Jun 5, 2017 • edited

gregoryross Apr 10, 2017 •

edited

jchodera commented Apr 9, 2017 •

edited

sonyahanson commented Jun 5, 2017 •

edited