Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replacing Fire minimization with LocalEnergyMinimizer #672

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

ijpulidos
Copy link
Contributor

Description

Replaces FireMinimizer with LocalEnergyMinimizer to do the replica minimization.

This should enhance the stability of minimization. Supersedes #557

Resolves #668

Todos

  • Implement feature / fix bug
  • Add tests
  • Update documentation as needed
  • Update changelog to summarize changes in behavior, enhancements, and bugfixes implemented in this PR

Status

  • Ready to go

Changelog message

L-BFGS based `openmm.LocalEnergyMinimizer` is now used instead of `FireMinimizer` for energy minimization in replicas.

@ijpulidos
Copy link
Contributor Author

I'll be also running a pass of the perses tyk2 benchmark just to check this doesn't change the results from it.

Copy link
Member

@jchodera jchodera left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! This looks good except for the question of whether we should be del context at the end of this---see comment.

# TODO if energy > 0, use slower openmm minimizer

# Clean up the integrator
del context
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ijpulidos : I think we might need to remove the del context if we're using self.energy_context_cache.get_context() since this would delete a Context object under the management of the context cache. I can't recall what the correct behavior is here---I think it was to leave the Context under management.

This was perhaps different for the FIRE minimizer since a different integrator would lead to a different Context we would not re-use.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a good point, I think we should be okay by removing the del context here.

@ijpulidos
Copy link
Contributor Author

I found out that with these changes the minimization seems to be significantly slower and is not using the 100% (nor close) of the GPU. For the tyk2 benchmark, the differences in times when minimizing complex phase replicas are as follows:

  • Using FIREMinimizationIntegrator: 32.290s
  • Using LocalEnergyMinimizer: 500.754s

I could reproduce this both locally with my workstation (that's where I can track the GPU usage) and on lilac (HPC). Is there something special in using our own custom integrators (such as the FireMinimizationIntegrator or the GradientDescentMinimizationIntegrator used in #557 ) that may explain these differences compared to just using openmm.LocalEnergyMinimizer?

@jchodera Maybe you have some suggestions on why this is the case. Thanks!

@ijpulidos
Copy link
Contributor Author

In case it helps, the LocalEnergyMinimizer ends up using the GeodesicBAOABIntegrator from OpenMM in the end.

@ijpulidos ijpulidos changed the title Replacing FireMinimizer with LocalEnergyMinimizer Replacing Fire minimization with LocalEnergyMinimizer Mar 29, 2023
@ijpulidos
Copy link
Contributor Author

FYI, I'm having a bit of trouble understanding the tolerance units here, we are passing Quantity(value=1.0, unit=kilojoule/(nanometer*mole)) as tolerance, but as far as I understand, tolerance for energy should just be kJ/mol, why does it have nanometers in the denominator? How does this translate between the FireMinimizationIntegrator and LocalEnergyMinimizer? Since we are passing the same value for both.

@jchodera
Copy link
Member

The tolerance for LocalEnergyMinimizer is a force tolerance, while it is an energy tolerance for FIRE. We probably need to relax the force tolerance considerably so that we don't max out the number of iterations. Can you try timing with something much less stringent, like 10 or 100 kJ/mol/nm?

@mikemhenry
Copy link
Contributor

Another thing to consider is, how often do we run this minimizer? It is just at the start of the simulation, right? So adding 9 minutes to a 12-hour-long simulation isn't too bad if this will be more correct and robust--but we should first explore if any of the slowdown is something we can improve (ex John's comment #672 (comment))

@jchodera
Copy link
Member

It's just at the beginning of the calculation, but (1) it makes for a terrible user experience if startup times are really long, (2) it makes testing really slow, and (3) it shouldn't take 9 minutes to get to the point where we can simulate something stably.

@ijpulidos
Copy link
Contributor Author

Timings are as follows (initial and final energies are just for the first replica):

Minimizer tolerance time minimizing all reps. initial E kT Final E. kT
FIRE 1 kJ/mol/nm 32.290s -306594.761 -309780.056
LocalEnergyMinimizer 1 kj/mol 489.617s -306554.708 -326469.138
LocalEnergyMinimizer 10 kj/mol 126.885s -313782.497 -330901.743
LocalEnergyMinimizer 100 kj/mol 3.570s -313902.645 -314011.075

@jchodera
Copy link
Member

jchodera commented Apr 3, 2023

Hm, this is more difficult since we're not starting from the same initial energy. Any idea what is causing the non-determinisim?

I think we want to compare the quick (<60 second) methods: FIRE and LocalEnergyMinimizer with a loose tolerance (e.g. 50 kJ/mol or so).

I'll look at whether I can fix the FIRE minimizer since it is really fast.

@jchodera
Copy link
Member

jchodera commented Apr 3, 2023

@ijpulidos : Were you able to sort out whether we should or should not delete the Context at the end of minimization?

@ijpulidos
Copy link
Contributor Author

@ijpulidos : Were you able to sort out whether we should or should not delete the Context at the end of minimization?

Deleting or not deleting does not seem to affect the outcome in terms of energies. And with regards to times I only see a small discrepancy. With deleting: 500s -- Without deleting: 520s. This could be just due to the load on my local workstation at the time, I don't think the difference is statistically significant.

@ijpulidos
Copy link
Contributor Author

Hm, this is more difficult since we're not starting from the same initial energy. Any idea what is causing the non-determinisim?

Yeah, i also noticed that, the only thing I can tell so far is that the first two (which start from similar energies) were run on the same day and around the same time of the day. Whereas the other two were run at a different day but around the same time. Maybe some correlation with the seed? (This shouldn't happen though, right?)

@jchodera
Copy link
Member

jchodera commented Apr 4, 2023

Deleting or not deleting does not seem to affect the outcome in terms of energies.

The deleting of Contexts is a different issue:

  • if we do not delete Context objects we will never re-use, we consume GPU resources and can end up with new simulations accidentally being run on the CPU
  • if we do delete a Context under the control of the context cache, we could potentially corrupt the ContextCache

We may want to make sure to create a new Context and clean it up, rather than use the ContextCache.

@ijpulidos ijpulidos modified the milestones: 0.22.0, 0.22.1 Apr 4, 2023
@jfennick
Copy link
Contributor

jfennick commented Aug 9, 2023

This PR only 'supercedes' my PR #557 coincidentally, because L-BFGS does not (currently) modify the unit cell vectors, even if pressure is enabled. I still maintain that the proper fix is to simply temporarily disable the pressure during initial minimization, as I have done here f8dd5c7 That said, it has been >1 year without any movement on this issue and I have moved on to other projects. If you want to merge any of my PRs, now is the time; I will be deleting my fork soon.

@IAlibay
Copy link
Contributor

IAlibay commented Nov 30, 2023

@ijpulidos I think this switch would still be really useful for us. Is there anything we can help out with to get this to a merged state?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use LocalEnergyMinimizer instead of FireMinimizer
5 participants