Multiple improvements to pymcmodels #83

jchodera · 2017-04-17T00:11:37Z

Add support for specified F_PL and its uncertainty dF_PL
Introduce LogNormalWrapper to make lognormal priors more convenient (Add LogNormalWrapper for pymc2 #84)
Eliminate normal prior for concentrations (leaving lognormal)
Convert strictly positive quantities to lognormal priors for robustness (Convert all strictly positive quantities to log-space for pymc2 models #85)
Separate handling of [L] = 0 wells (Allow handling of [L] = 0 data. #81)
Add rudimentary nosetest to make sure I haven't broken things completely
Decouple buffer background fluorescence for +/- protein rows

I didn't have time to update any examples to try specifying the F_PL and its uncertainty dF_PL as optional arguments to pymcmodels.make_model(), but feel free to add to this branch/PR directly as you test it out, @sonyahanson!

jchodera · 2017-04-18T03:26:33Z

@sonyahanson : I'm having trouble figuring out how to test any of these improvements.

What can I run to actually test things? I can't find anything in examples/direct-fluorescence-assay that will actually generate some output to test, and the Jupyter notebooks I've found (like 3a Bayesian fit xml file - SrcGefitinib) don't seem to actually use make_model from pymcmodels.py.

…y lognormal priors.

jchodera · 2017-04-18T04:28:55Z

@sonyahanson : Regarding [L]=0 support for +/- protein: Do we want to support only a single well for [L]=0 with protein and a single well without protein, or do you want to be able to add an arbitrary number of [L]=0 control wells with or without protein?

sonyahanson · 2017-04-18T14:42:09Z

You need to make sure the xml file path in inputs_p38_singlet actually points to your xml files. I can make the behavior here, more intuitive. (e.g. throw an error saying it didn't find the xml files).

We definitely need more/any tests, and better documentation on the examples. This is on my to do list to work on in the next week anyway. Let me know of suggestions.

Regarding [L]=0, whichever is easiest for now. A single well is perfectly fine for now, but maybe in the future we would want an arbitrary number, but maybe that future will see a pretty large overhaul of the API anyway...

jchodera · 2017-04-19T00:44:56Z

You need to make sure the xml file path in inputs_p38_singlet actually points to your xml files. I can make the behavior here, more intuitive. (e.g. throw an error saying it didn't find the xml files).

That would be super useful!

We definitely need more/any tests, and better documentation on the examples. This is on my to do list to work on in the next week anyway. Let me know of suggestions.

Priorities for tests are probably:

Does it run at all with python 2/3?
Does it produce correct results for synthetic data?
Can it process a variety of XML files (with short runs) without dying?

Regarding [L]=0, whichever is easiest for now. A single well is perfectly fine for now, but maybe in the future we would want an arbitrary number, but maybe that future will see a pretty large overhaul of the API anyway...

The difference in API between one and multiple wells is very little, but it does change the implementation under the hood.

jchodera · 2017-04-19T00:45:28Z

OK, things seem to work locally now, so I'm going to finish the [L]=0 case and then add a very rudimentary test just to make sure things run.

…improvements

jchodera · 2017-04-19T01:56:48Z

I think I was going about the [L]=0 case the wrong way by trying to break it out as a separate set of five things to pass into the API. Instead, I think I can just detect which elements of Lstated are zero and handle those as special cases internally, which doesn't require any changes to the API at all. Trying this now.

jchodera · 2017-04-19T03:46:44Z

I've implemented the scheme I suggested above for [L]=0. We autodetect when either Lstated or Pstated have zero concentration entries and deal with those appropriately.

I've also added a quickmodel run as a travis test.

This PR should now be complete.

jchodera · 2017-04-19T03:48:37Z

@sonyahanson : Review and merge when ready!

gregoryross · 2017-04-19T14:21:59Z

.travis.yml

@@ -30,6 +30,8 @@ script:
  - conda install --yes --quiet nose nose-timer
  # Test the package
  - cd devtools && nosetests $PACKAGENAME --nocapture --verbosity=2 --with-timer -a '!slow' && cd ..
+  # Run quickmodel
+  - pushd . && cd examples/direct-fluorescence-assay && env PYTHONPATH="./" quickmodel --inputs 'inputs_p38_singlet' && popd


It's great that quickmodel has been added to travis. However, quickmodel runs the default number of MCMC steps, currently set as 20000 PyMC moves, which may take a while on travis. How about we augment the argparser on quickmodel's so that we can specify far fewer moves on travis? Parsing the number of MCMC moves to quickmodel will make it easier to use as well.

gregoryross

This all looks good to me! My comment on quickmodel is minor, and could be added in another PR if we want to use these changes asap.

sonyahanson · 2017-04-19T16:08:01Z

I'm going to work through this now. One thing that pops out immediately: why were the defaults for run_mcmc changed from nthin=50, nburn=500, niter=1000 to nthin=50, nburn=0, niter=20000?

I agree with Greg that travis should be run with fewer iterations.

jchodera · 2017-04-19T16:17:16Z

One thing that pops out immediately: why were the defaults for run_mcmc changed from nthin=50, nburn=500, niter=1000 to nthin=50, nburn=0, niter=20000?

The output of quickmodel was absolute garbage otherwise. Runtime was ~1 min/dataset on my machine, and total processing time for the quickmodel example was < 10 min, which seems reasonable for travis.

I'm not certain why travis isn't running, however. We might have to fix that in a different branch.

I like the idea of adding the command-line option to quickmodel, but think the new defaults are sensible. I'll add the command-line argument idea as an issue.

sonyahanson · 2017-04-19T17:26:22Z

https://github.com/choderalab/assaytools/blob/master/scripts/quickmodel.py#L301

sonyahanson · 2017-04-19T17:29:57Z

I will be careful to use the same nthin,niter,nburn to compare the master branch to this updated 'improved' branch just to make sure the comparisons are fair.

jchodera · 2017-04-19T17:30:03Z

Looks like you're discarding the initial non-equilibrated DeltaG values, but not discarding the initial non-equilibrated traces when plotting:
https://github.com/choderalab/assaytools/blob/master/scripts/quickmodel.py#L129-L134

Can I fix that too?

sonyahanson · 2017-04-19T17:30:48Z

Yes, that's correct, feel free to fix.

sonyahanson · 2017-04-19T17:31:15Z

Maybe include but colored differently?

jchodera · 2017-04-19T17:37:08Z

I will be careful to use the same nthin,niter,nburn to compare the master branch to this updated 'improved' branch just to make sure the comparisons are fair.

I totally understand. How about this: The current settings are configured to do the same amount of sampling. Maybe run the data and compare it to what you have printed out?

…g in command-line argument parsing

sonyahanson · 2017-04-19T17:51:27Z

on it

jchodera · 2017-04-19T18:31:57Z

Here's the Bosutinib-CD example from above:

jchodera · 2017-04-19T18:33:07Z

The pre-equilibration traces are still shown, just very lightly shaded. They're mostly lying underneath the darker traces, but you can see them a little bit in the salmon traces here:

jchodera · 2017-04-19T18:57:42Z

I'm moving on to other projects now, so I'll let you folks take the PR over from here.

I can tackle the outlier detection in a separate PR when I have time to cycle back.

sonyahanson · 2017-04-19T19:02:00Z

yeah, I think this is fine for now, but don't think we should merge quite yet.

… protein rows

jchodera · 2017-04-19T20:55:49Z

I've just fixed a bug in the code where the same ligand concentration was being used for both the +protein and -protein rows. This prevented the true ligand concentrations from adjusting to deviations from the expected binding curve even when dispensing errors occurred.

I've also temporarily disabled the Metropolis step methods, and am testing how things work without them.

sonyahanson · 2017-04-21T18:58:29Z

So it seems like the big problem was allowing the metropolis tuning throughout, since we've now taken out the burnin step.

But making this changed improved our overlap between repeats from this:

To this:

Merge away. We can address further problems in a future PR.

sonyahanson · 2017-04-21T21:32:17Z

This change fcde1cf doesn't quite match the commit message, what was the motivation? I'm curious if this also effected the change in our results...

jchodera · 2017-04-21T22:43:56Z

Whoops---there were two changes in rapid succession:

Tune throughout (this was the major remedy)
Also adjust the scale of the AdaptiveMetropolis method that makes joint proposals between F_PL and DeltaG. This was not being tuned before, but once tuning was working, the exact choice became less relevant. I changed it to 0.1 to match all the other Metropolis step methods

jchodera · 2017-04-21T22:44:15Z

Can we merge this and forge ahead?

jchodera · 2017-04-22T22:49:27Z

Merging this so I can forge ahead.

Add support for specified F_PL and its uncertainty dF_PL

f5b5f51

Remove 'normal' concentration priors; use LogNormalWrapper to simplif…

0bdb607

…y lognormal priors.

jchodera changed the title ~~Add support for specified F_PL and its uncertainty dF_PL~~ Multiple improvements to pymcmodels Apr 18, 2017

jchodera added 3 commits April 17, 2017 23:44

Clean up LogNormalWrapper

d17a381

Convert quantum yield priors to lognormal by default.

4814199

Convert fluorescence and absorbance errors to lognormal

0d3e197

This was referenced Apr 18, 2017

Convert all strictly positive quantities to log-space for pymc2 models #85

Open

Add LogNormalWrapper for pymc2 #84

Open

jchodera added 4 commits April 18, 2017 17:19

Fix py3 issues in quickmodel

2bc6031

Fix python 3 incompatibility in platereader.py

8c89cd6

Have quickmodel print exceptions it runs into

c7697ad

Adjust Metropolis step methods for new LogNormal priors; some py3 fixes

99a0839

jchodera added 3 commits April 18, 2017 21:47

Start [L]=0 implementation

85e15da

Merge branch 'improvements' of github.com:choderalab/assaytools into …

cf69d59

…improvements

Add quickmodel to continuous integration tests.

3e99202

Allow ligand or protein concentrations to be zero.

fd9724e

jchodera requested review from sonyahanson and gregoryross April 19, 2017 03:59

gregoryross reviewed Apr 19, 2017

View reviewed changes

gregoryross approved these changes Apr 19, 2017

View reviewed changes

Add pymbar to requirements

b1026a4

Whoops! Previous behavior was actually 10000 samples, not 1000

f26e8c0

quickmodel now plots equilibrated traces in a different shade; fix bu…

ef095fe

…g in command-line argument parsing

jchodera added 2 commits April 19, 2017 16:49

Fixed bug where same ligand concentration was being used for both +/-…

f41dc66

… protein rows

Decouple +/- protein buffer background priors

20a3e81

jchodera added 4 commits April 19, 2017 17:10

Restored Metropolis step methods with better step size guesses

dbec19f

Add AdaptiveMetropolis for correlated DeltaG and log_F_PL moves

b02d9ac

Added tune_throughout=True

d841d8f

Allow Metropolis methods to tune throughout

fcde1cf

sonyahanson approved these changes Apr 21, 2017

View reviewed changes

jchodera merged commit eded3f8 into master Apr 22, 2017

This was referenced Apr 27, 2017

Unknown error/warning with Quickmodel #92

Closed

LogNormalWrapper improvements #93

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multiple improvements to pymcmodels #83

Multiple improvements to pymcmodels #83

jchodera commented Apr 17, 2017 •

edited

jchodera commented Apr 18, 2017

jchodera commented Apr 18, 2017

sonyahanson commented Apr 18, 2017

jchodera commented Apr 19, 2017

jchodera commented Apr 19, 2017

jchodera commented Apr 19, 2017

jchodera commented Apr 19, 2017 •

edited

jchodera commented Apr 19, 2017

gregoryross Apr 19, 2017

gregoryross left a comment

sonyahanson commented Apr 19, 2017

jchodera commented Apr 19, 2017

sonyahanson commented Apr 19, 2017

sonyahanson commented Apr 19, 2017

jchodera commented Apr 19, 2017

sonyahanson commented Apr 19, 2017

sonyahanson commented Apr 19, 2017

jchodera commented Apr 19, 2017

sonyahanson commented Apr 19, 2017

jchodera commented Apr 19, 2017

jchodera commented Apr 19, 2017

jchodera commented Apr 19, 2017

sonyahanson commented Apr 19, 2017

jchodera commented Apr 19, 2017

sonyahanson commented Apr 21, 2017

sonyahanson commented Apr 21, 2017

jchodera commented Apr 21, 2017

jchodera commented Apr 21, 2017

jchodera commented Apr 22, 2017

Multiple improvements to pymcmodels #83

Multiple improvements to pymcmodels #83

Conversation

jchodera commented Apr 17, 2017 • edited

jchodera commented Apr 18, 2017

jchodera commented Apr 18, 2017

sonyahanson commented Apr 18, 2017

jchodera commented Apr 19, 2017

jchodera commented Apr 19, 2017

jchodera commented Apr 19, 2017

jchodera commented Apr 19, 2017 • edited

jchodera commented Apr 19, 2017

gregoryross Apr 19, 2017

Choose a reason for hiding this comment

gregoryross left a comment

Choose a reason for hiding this comment

sonyahanson commented Apr 19, 2017

jchodera commented Apr 19, 2017

sonyahanson commented Apr 19, 2017

sonyahanson commented Apr 19, 2017

jchodera commented Apr 19, 2017

sonyahanson commented Apr 19, 2017

sonyahanson commented Apr 19, 2017

jchodera commented Apr 19, 2017

sonyahanson commented Apr 19, 2017

jchodera commented Apr 19, 2017

jchodera commented Apr 19, 2017

jchodera commented Apr 19, 2017

sonyahanson commented Apr 19, 2017

jchodera commented Apr 19, 2017

sonyahanson commented Apr 21, 2017

sonyahanson commented Apr 21, 2017

jchodera commented Apr 21, 2017

jchodera commented Apr 21, 2017

jchodera commented Apr 22, 2017

jchodera commented Apr 17, 2017 •

edited

jchodera commented Apr 19, 2017 •

edited