DUCC is not parallelized #485

mkrompiec · 2021-12-08T12:51:29Z

Describe the bug
Hamiltonian downfolding in DUCC (nwchem/src/tce/ducc/ducc.F) runs on 1 thread, regardless of number of OMP threads or MPI processes. Calculation of the CCSD amplitudes is (obviously) parallelized but once this is done (i.e. when the line ' From DUCC CCSD corr. ene.' is printed), the rest of the code is serial. The consequence is that downfolding of larger-then-trivial Hamiltonians takes ages, even if the resulting Hamiltonian (to be run on a quantum device/emulator) is small.

Describe settings used
NWChem 7.0.2 from conda-forge, on MacOS Darwin (py39hd99b644_3) and also on linux Azure VM (py38h1985094_3).

To Reproduce
Run nwchem with the following input file with more than one OMP thread. Once the line 'From DUCC CCSD...' appears in the output, the code uses only one thread.

Example input file:

start mol
memory stack 2000 mb heap 8000 mb global 10000 mb verify
print high

geometry units angstroms
symmetry C1
  Ar 0.0 0.0 0.0
  Ar 3.1 0.0 0.0
end

basis spherical
 * library cc-pVDZ
end

scf
  singlet
  rhf
  thresh 1e-10
end

tce
  2eorb
  2emet 13
  ccsd
  thresh 1.0d-8
  maxiter 150
end

set tce:qducc T
set tce:nactv 4

task tce energy

The text was updated successfully, but these errors were encountered:

npbauman · 2021-12-08T23:04:17Z

Unfortunately, the DUCC procedure was never parallized. For various reasons, the initial implementation and testing took precedence over implementing a parallelized code. Therefore it should only be run on 1 thread. I just pushed an updated DUCC code that removes unnecessary operations to improve performance and also includes various alternative truncation schemes for the similarity transformed Hamiltonian. When installed, you will need to add two keywords to your input:
set tce:nonhf F
set tce:ducc_model 3
The "nonhf" keyword should be "F" if an RHF reference is used and set to "T" if a CAS or DFT reference is used instead. Six models for the truncation of the similarity transformed are implemented and numbered 1-6. They correspond to approximations A(2)-A(7), respectively, that are detailed in (https://arxiv.org/abs/2110.12077). Model 3 corresponds to the original/earliest implementation described in (https://doi.org/10.1063/1.5094643) and (https://doi.org/10.1063/1.5128103). Model 2 is an aggressive truncation and is not recommended for use.

mkrompiec · 2021-12-09T09:24:07Z

Thanks a lot for pushing the updated code and for the references.

mkrompiec · 2021-12-09T16:37:09Z

On my test case, the new code is 2x faster (when using the same model, i.e. model 3).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DUCC is not parallelized #485

DUCC is not parallelized #485

mkrompiec commented Dec 8, 2021 •

edited

npbauman commented Dec 8, 2021 •

edited

mkrompiec commented Dec 9, 2021

mkrompiec commented Dec 9, 2021

DUCC is not parallelized #485

DUCC is not parallelized #485

Comments

mkrompiec commented Dec 8, 2021 • edited

npbauman commented Dec 8, 2021 • edited

mkrompiec commented Dec 9, 2021

mkrompiec commented Dec 9, 2021

mkrompiec commented Dec 8, 2021 •

edited

npbauman commented Dec 8, 2021 •

edited