Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance improvements for fix_ipi #4157

Merged
merged 11 commits into from
May 28, 2024

Conversation

ceriottm
Copy link
Collaborator

@ceriottm ceriottm commented May 2, 2024

Summary

In the process of benchmarking and speeding up the i-pi engine, we noticed a couple of performance improvements that need to be applied to the LAMMPS client routines, that are in fix_ipi.
These improvements will also benefit other engines that use the same communication protocol, such as ASE.
The performance issues and the fixes are discussed in #4155, but in short they are due to quirks of TCP/IP that slow down communication of small messages, and to issues in ensuring consistent folding of the atom positions on the client/server sides, that lead to unnecessarily frequent updates of the neighbor list.

Related Issue(s)

This closes #4155, implementing the second solution proposed in there, which after some thinking seems to be the cleanest. The only change outside fix_ipi.cpp is adding a getter to access the positions at the last NL update.

Author(s)

Michele Ceriotti (EPFL) michele.ceriotti@gmail.com
Davide Tisi (EPFL) davide.tisi@epfl.ch

Licensing

By submitting this pull request, I agree, that my contribution will be included in LAMMPS and redistributed under either the GNU General Public License version 2 (GPL v2) or the GNU Lesser General Public License version 2.1 (LGPL v2.1).

Backward Compatibility

This should change nothing outside the i-PI communication. For i-PI simulations using LAMMPS as the backend, the fact this applies unwrapping to the trajectory introduces small numerical noise, but trajectories should be unchanged for a few hundred steps - and of course statistically equivalent throughout.

Implementation Notes

Every time new configurations are received from i-PI, they are folded back to the image that is closest to the positions at the time the neighbor list was updated. This effectively applies a minimum image convention to the calculation of the atom drift, and ensures that neighbor lists are updated as rarely as possible.

Post Submission Checklist

  • The feature or features in this pull request is complete
  • Licensing information is complete
  • Corresponding author information is complete
  • The source code follows the LAMMPS formatting guidelines
  • Suitable new documentation files and/or updates to the existing docs are included
  • The added/updated documentation is integrated and tested with the documentation build system
  • The feature has been verified to work with the conventional build system
  • The feature has been verified to work with the CMake based build system
  • A package specific README file has been included or updated

Further Information, Files, and Links

See i-pi for a discussion of the background and what fix_ipi does.

@ceriottm ceriottm requested a review from sjplimp as a code owner May 2, 2024 06:36
@ceriottm
Copy link
Collaborator Author

ceriottm commented May 2, 2024

I don't know what's going on with the MacOS unit tests, looks like a CI issue perhaps?

@stanmoore1
Copy link
Contributor

I don't know what's going on with the MacOS unit tests, looks like a CI issue perhaps?

@ceriottm Yes, can you please pull the latest upstream develop branch here, which should fix the issue? I don't seem to have permissions to modify your branch.

@stanmoore1 stanmoore1 requested review from akohlmey, athomps and sjplimp and removed request for sjplimp May 13, 2024 18:14
@stanmoore1 stanmoore1 self-assigned this May 13, 2024
@stanmoore1
Copy link
Contributor

stanmoore1 commented May 13, 2024

@ceriottm there are also some trailing whitespaces in fix_ipi.cpp, you can either remove them manually, or use make fix-whitespace, see https://docs.lammps.org/Build_development.html.

@stanmoore1 stanmoore1 assigned ceriottm and unassigned stanmoore1 May 13, 2024
@akohlmey akohlmey requested review from jtclemm and removed request for akohlmey May 13, 2024 19:59
@ceriottm
Copy link
Collaborator Author

Thanks. I have rebased on master, and fixed (hopefully) the whitespace issues.

x[i][2] += xhold[i][2];
}
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could just be missing something, but I don't understand how this scheme works for a general box. Say the box doesn't contain the origin $\vec{0}$. If a particle doesn't move, such that $\vec{x} = \vec{x}_{hold}$, then domain->pbc() would shift $(\vec{x} - \vec{x}_{hold}) = \vec{0}$ across periodic images back into the box. Adding $\vec{x}_{hold}$ again would then shift the particle back out of the box and trigger an unnecessary neighbor rebuild since now $\vec{x} \ne \vec{x}_{hold}$.

If I understand the intent correctly, what about alternatively shifting xhold to the nearest periodic image, e.g. something like

if (domain->triclinic) domain->x2lamda(atom->nlocal);
domain->pbc();
domain->reset_box();
if (domain->triclinic) domain->lamda2x(atom->nlocal);
if (neigh->ncalls != 0) {
  for (int i = 0; i < nlocal; i++) {
    delx = xhold[i][0] - x[i][0];
    ...
    domain->minimum_image(delx, dely, delz);
    xhold[i][0] = x[i][0] + delx;
    ...
  }
}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works because xhold is shifted back to the 000 replica when the neighbor list is updated. So this code makes sure the displacement is has a minimum image convention applied when it's computed in the neighbor list code, without touching that part of the code that is too core to be messed with, IMO.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. So what happens if $\vec{x} = \vec{x}_{hold}$ (both in the same image) but the box doesn't contain the origin? Does the neighbor list get rebuilt?

Since domain->pbc() acts on position vectors and its behavior depends on the location of the boxlo, I would not assume feeding it a displacement vector (invariant to origin translations) would produce the desired behavior for all box geometries and displacements.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I agree this is cleaner. I have reformulated the code in fix_ipi.cpp - only difference is I am modifying the coordinates rather than xhold, again following the principle of minimal messing with the core classes.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, happy to help. Hopefully that works to improve performance!

@stanmoore1
Copy link
Contributor

Waiting for review by @sjplimp

@stanmoore1 stanmoore1 assigned sjplimp and unassigned ceriottm May 14, 2024
src/neighbor.cpp Outdated Show resolved Hide resolved
src/neighbor.cpp Outdated
{
// Returns the pointer containing the last positions stored by the NL builder,
// checking it has actually been initialized
if (maxhold == 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not clear on this error check. Does fix ipi know or check whether the neighbor class is using the xhold array? That is a neighbor setting which can be turned on/off.

Also, it might be possible, now or in the future, that xhold is NULL and maxhold is 0 if the proc has no atoms.

Seems like it might be better to just return xhold and have the caller (iPI) decide what to do with it ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK that is fine with me. I wanted to have kind of a safety check given we're accessing an array that might or might not be initialized but I can def let the caller deal with the checking (and maybe just skip the unwrapping of the coordinates if xhold is not initialized.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to clarify a bit more. If the user sets neigh_modify check no (which used to be the default but is not now), then LAMMPS uses other criteria to determine when to re-neighbor, and the xhold array is not allocated or used.

Also, if xhold is used, then on each reneighboring it is assigned the current atom coords, which will be guaranteed to be inside the simulation box, because PBC are applied first. On every subsequent step (until the next reneighboring), all atoms coords will be "close" to xhold, even if they go slightly outside the box, because PBC are not applied. So unless IPI is applying PBC on those in-between steps to its copy of the LAMMPS atom coords (e.g. to ensure its atom coords are inside the box), I'm not clear why you need to apply minimum-image adjustments to your del xyz.

Also xhold can be reallocated by any proc on any reneighboring step (but not otherwise during a run), so you should invoke get_xhold() at least that often. The public variable Neighbor::ago stores how many elapsed timesteps since the last reneighbor, so some classes use that to check if they need to do something. Note that reneighboring happens in between your fix ipi initial and final integrate methods.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I now see that you call domain->pbc() every timestep in fix ipi initial_integrate()
The comment says it is in case you also need to invoke irregular(), which is checked for later.

I don't understand the motivation here. However, if xhold does not exist, you cannot "undo" what
the pbc() method did, and I don't think you will successfully restore the atom coord values LAMMPS is expecting. E.g. if a pair style is used later in the same timestep, its pairwise atom distances will be wrong.

Also, if irregular is invoked then it seems like you should be forcing re-neighboring to occur on this timestep.
B/c current neighbor lists are now invalid.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, thanks for the detailed summary of the neighboring process. It's very useful because most of the use cases we're aware of might use a small slice of the functionalities of LAMMPS .
To give you a similarly concise view of what happens on the i-PI side, the tricky bit is that (1) there is no guarantee that i-PI will pass coordinates and cell that follow a continuous trajectory, e.g. there might be replica exchange going on and then LAMMPS will be called repeatedly with completely unrelated structures; (2) the data flow is minimal and e.g. i-PI has no way of knowing that LAMMPS folds coordinates when setting xhold.
So the problem we had was that i-PI passes un-folded coordinates. LAMMPS applies PBC and sets xhold. i-PI then runs MD and sends again un-folded coordinates; if we don't fold them, they'll be far off the xhold reference, and if we fold there's no guarantee they'll be folded in the same way. Either way, we essentially trigger a NL rebuild at each step. This PR is trying (and in all our tests succeeds) to avoid this rebuild.
In the current implementation, at each step when LAMMPS gets new positions, if xhold is not set the atoms will just be folded back into the cell at the beginning of the step, otherwise they'll be "unwrapped" relative to xhold. I understand there may be reneighboring settings where this might fail, but I think that also the previous implementation (which was just folding regardless) would fail; if such a case occurs, we may have to look deeper into this; I think the previous implementation (including the use if irregular() ) was done with a lot of help from @akohlmey so I trust it much more than if it were of my own making ^_^

this also allows it to fall-back on do-nothing rather than crash
@sjplimp
Copy link
Contributor

sjplimp commented May 27, 2024

@ceriottm Hi Michele - thanks for the additional context. If you are convinced this current PR is working as you expect for your typical i-PI use cases, then I think we should go ahead and merge it. So that it make it into the upcoming stable release.

One final suggestion - If I understand, you are now using this fix for cases beyond the PIMD usage described in the fix ipi doc page. So I suggest you broaden the doc page to explain it can be used for other kinds of i-PI-based models. If there are examples of this already in the LAMMPS distro (which includes tools/ipi) or on the i-PI doc pages, that would be useful for users to be aware of.

After the stable release, I suggest we re-visit how this fix ipi works. I think there are still a few issues that could be more robust with respect to LAMMPS options. Possibly that would enable other use cases of LAMMPS via i-PI. For example, if you just want LAMMPS to compute and return forces for a series of unrelated configurations, I don't think you want to be doing that via intitial_integrate() and final_integrate().

@ceriottm
Copy link
Collaborator Author

@ceriottm Hi Michele - thanks for the additional context. If you are convinced this current PR is working as you expect for your typical i-PI use cases, then I think we should go ahead and merge it. So that it make it into the upcoming stable release.

Thanks, I agree this is the best course of action for now - this fixes a clear problem, and doesn't affect any of the examples I've seen and tried - that are quite diverse.

One final suggestion - If I understand, you are now using this fix for cases beyond the PIMD usage described in the fix ipi doc page. So I suggest you broaden the doc page to explain it can be used for other kinds of i-PI-based models. If there are examples of this already in the LAMMPS distro (which includes tools/ipi) or on the i-PI doc pages, that would be useful for users to be aware of.

This is true. I will edit the doc page, and actually I suggest it might be a good idea to remove i-PI from tools - or at least update it: we are about to release 3.0 and I had completely forgotten we had embedded 1.0 with LAMMPS. It is now also possible to get i-PI from pip, so it doesn't seem a good idea to bundle it with LAMMPS. Let me know what you think and whether you'd like to do remove or update in a separate PR - seems cleaner.

After the stable release, I suggest we re-visit how this fix ipi works. I think there are still a few issues that could be more robust with respect to LAMMPS options. Possibly that would enable other use cases of LAMMPS via i-PI. For example, if you just want LAMMPS to compute and return forces for a series of unrelated configurations, I don't think you want to be doing that via intitial_integrate() and final_integrate().

I agree, absolutely. One of our plans for the future is to also include MPI-based communicators. LAMMPS is one of our main clients and it'd be great to coordinate.

@akohlmey
Copy link
Member

@ceriottm please go to src/ and type make fix-whitespace and commit the resulting changes.

I suggest it might be a good idea to remove i-PI from tools

A proposal for that is in PR 4176

@sjplimp
Copy link
Contributor

sjplimp commented May 28, 2024

Thanks @ceriottm - doc page changes look good

@stanmoore1 stanmoore1 assigned stanmoore1 and unassigned sjplimp May 28, 2024
@stanmoore1 stanmoore1 merged commit dedcfa1 into lammps:develop May 28, 2024
6 checks passed
@Luthaf Luthaf deleted the bugfix/ipi-neigh branch May 29, 2024 11:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature Request] Improve efficiency of fix_ipi
6 participants