New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to use these datasets? #3
Comments
I guess the first thing we should do is figure out how to port the scripts to pymbar 2.0. The easiest way may be for me to write an mbar1.0 compatability object that exactly reproduces the API of pymbar1.0, but calls pymbar2.0 code under the hood. |
For example, there's the issue of U_kln versus U_kn. It would take considerable time to rewrite all the scripts here to format the data into the new format, so a compatibility layer might be key. |
I also think we might want to consider looking for more simple test cases where there are unambiguous right answers, either analytical or numerical. |
I would prefer our approach to be:
As a minimal alternative, we can just make sure the code runs on these datasets, but that is a very low bar. |
Is @mrshirts subscribed here? |
Yes |
I agree with the synthetic dataset stuff. IMHO I'm just overwhelmed by the idea of us maintaining thousands of lines of user-contributed code as part of our testing protocol. |
There may still be a few large datasets that we would like the code to work on or at least give consistent answers on, such as the large trypsin datasets that Michael has generated. But this seems like a low priority goal over testing systems with analytical results. I still need to code to some analytically tractable systems for binding affinity calculations. Those could be included in our tests as well if we feel we need more diversity than just harmonic oscillators. John |
Hi, all- Busy all day with classes and meetings! I'm adding these datasets because In all cases, there is a script that is currently working that can be run I don't think we want or need to maintain these things, other than perhaps Going back to a question that kyle asked earlier; I suspect that in the A = \sum exp(log W_n + log A_n). Where W_n is the mixture distribution weight of sample n. This would incur the cost of exponentials each time, but it's not an It's possible that one could have some way to test which version would be Free energies of unsampled states would be f_new = -log \sum exp(log W_n - u_newn). Where A_n has been transformed to always be greater than 1. Note that if we keep a legacy routine (of any flavor) that does everything On Tue, Nov 26, 2013 at 11:39 AM, John Chodera notifications@github.comwrote:
|
So it seems like for most of these datasets, there's no "right" answer, at least when compared to analytical test cases. That brings up the questions of how we can use these tests in an automated test framework.
The second issue that I'm seeing is that these tests essentially involve running python scripts that involve ~1000 lines of IO, preprocessing, analysis, and output. The scripts are not something that will be easy to integrate into an automated test framework.
The text was updated successfully, but these errors were encountered: