New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DUCC is not parallelized #485
Comments
Unfortunately, the DUCC procedure was never parallized. For various reasons, the initial implementation and testing took precedence over implementing a parallelized code. Therefore it should only be run on 1 thread. I just pushed an updated DUCC code that removes unnecessary operations to improve performance and also includes various alternative truncation schemes for the similarity transformed Hamiltonian. When installed, you will need to add two keywords to your input: |
Thanks a lot for pushing the updated code and for the references. |
On my test case, the new code is 2x faster (when using the same model, i.e. model 3). |
Describe the bug
Hamiltonian downfolding in DUCC (nwchem/src/tce/ducc/ducc.F) runs on 1 thread, regardless of number of OMP threads or MPI processes. Calculation of the CCSD amplitudes is (obviously) parallelized but once this is done (i.e. when the line ' From DUCC CCSD corr. ene.' is printed), the rest of the code is serial. The consequence is that downfolding of larger-then-trivial Hamiltonians takes ages, even if the resulting Hamiltonian (to be run on a quantum device/emulator) is small.
Describe settings used
NWChem 7.0.2 from conda-forge, on MacOS Darwin (py39hd99b644_3) and also on linux Azure VM (py38h1985094_3).
To Reproduce
Run nwchem with the following input file with more than one OMP thread. Once the line 'From DUCC CCSD...' appears in the output, the code uses only one thread.
Example input file:
The text was updated successfully, but these errors were encountered: