Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DUCC is not parallelized #485

Open
mkrompiec opened this issue Dec 8, 2021 · 3 comments
Open

DUCC is not parallelized #485

mkrompiec opened this issue Dec 8, 2021 · 3 comments

Comments

@mkrompiec
Copy link

mkrompiec commented Dec 8, 2021

Describe the bug
Hamiltonian downfolding in DUCC (nwchem/src/tce/ducc/ducc.F) runs on 1 thread, regardless of number of OMP threads or MPI processes. Calculation of the CCSD amplitudes is (obviously) parallelized but once this is done (i.e. when the line ' From DUCC CCSD corr. ene.' is printed), the rest of the code is serial. The consequence is that downfolding of larger-then-trivial Hamiltonians takes ages, even if the resulting Hamiltonian (to be run on a quantum device/emulator) is small.

Describe settings used
NWChem 7.0.2 from conda-forge, on MacOS Darwin (py39hd99b644_3) and also on linux Azure VM (py38h1985094_3).

To Reproduce
Run nwchem with the following input file with more than one OMP thread. Once the line 'From DUCC CCSD...' appears in the output, the code uses only one thread.

Example input file:

start mol
memory stack 2000 mb heap 8000 mb global 10000 mb verify
print high

geometry units angstroms
symmetry C1
  Ar 0.0 0.0 0.0
  Ar 3.1 0.0 0.0
end

basis spherical
 * library cc-pVDZ
end

scf
  singlet
  rhf
  thresh 1e-10
end

tce
  2eorb
  2emet 13
  ccsd
  thresh 1.0d-8
  maxiter 150
end

set tce:qducc T
set tce:nactv 4

task tce energy
@npbauman
Copy link
Collaborator

npbauman commented Dec 8, 2021

Unfortunately, the DUCC procedure was never parallized. For various reasons, the initial implementation and testing took precedence over implementing a parallelized code. Therefore it should only be run on 1 thread. I just pushed an updated DUCC code that removes unnecessary operations to improve performance and also includes various alternative truncation schemes for the similarity transformed Hamiltonian. When installed, you will need to add two keywords to your input:
set tce:nonhf F
set tce:ducc_model 3
The "nonhf" keyword should be "F" if an RHF reference is used and set to "T" if a CAS or DFT reference is used instead. Six models for the truncation of the similarity transformed are implemented and numbered 1-6. They correspond to approximations A(2)-A(7), respectively, that are detailed in (https://arxiv.org/abs/2110.12077). Model 3 corresponds to the original/earliest implementation described in (https://doi.org/10.1063/1.5094643) and (https://doi.org/10.1063/1.5128103). Model 2 is an aggressive truncation and is not recommended for use.

@mkrompiec
Copy link
Author

Thanks a lot for pushing the updated code and for the references.

@mkrompiec
Copy link
Author

On my test case, the new code is 2x faster (when using the same model, i.e. model 3).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants