Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should equilibration iterations and effective production iterations be integers? #715

Open
IAlibay opened this issue Sep 9, 2023 · 2 comments
Milestone

Comments

@IAlibay
Copy link
Contributor

IAlibay commented Sep 9, 2023

In multistatesamplersanalyzer, the return of _get_equilibration_data can sometimes include a non integer n_uncorrelated_iterations.

In practice you wouldn't operate on fraction of an iteration, would it make sense for this value to be rounded down to the nearest value?

P.S. Whether or not it should be floor or ceil probably should be based on what pymbar.subsample_correlated_data does, from a quick look it's the former? https://github.com/choderalab/pymbar/blob/e5b7f120f74442658a03c02b1a1898b5c2fa913a/pymbar/timeseries.py#L665

@jchodera
Copy link
Member

The _get_equilibration_data() method is here.

I'm unclear on why this is a private method that is never used internally---something must have gone wrong in our API definition process. I know the analysis API is highly underdeveloped---we would love some feedback on what would be most useful from a public API.

Since I can't tell where we actually use this in the public API, I can't quite tell what it should be doing.
@ijpulidos or @IAlibay : Can you point me to code that uses this method? A good public API should provide whatever makes the downstream code that uses it simplest.

@jchodera
Copy link
Member

The more I look at this method, the less I'm sure of what it does or why.
If the internal fields are set (how do they get set?), then they are returned.
Otherwise, a completely separate calculation is performed that uses multistate.utils.get_equilibration_data_per_sample rather than the pymbar.timeseries automated equilibration detection, with the cryptic title "Compute the correlation time and n_effective per sample with tuning to how you want your data formatted". I don't understand what the "per sample" means here.

For now, I'd say:

  • Yes, we should probably make n_uncorrelated_iterations return an int that reprsents the floor and update the documentation
  • We really need to deprecate this analysis API because it is extremely confusing and probably doesn't do what we want. If we can work toward a better public API that exposes what we need, that would be a great start.

@mikemhenry mikemhenry added this to the 0.23.2 milestone Sep 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants