Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Frequency Calculations Are Slow or Time Consuming #3125

Open
Sulstice opened this issue Jan 23, 2024 · 4 comments
Open

Frequency Calculations Are Slow or Time Consuming #3125

Sulstice opened this issue Jan 23, 2024 · 4 comments

Comments

@Sulstice
Copy link

Sulstice commented Jan 23, 2024

Hi,

Goal

My goal is to run thermodynamic analysis on big molecules using the psi4 module however on bigger molecules the frequency calculations becoming exceedingly slow and I wonder what is the reason if there is something wrong with my setup.

Question 1

The molecule I am working with is private but I have provided a z-matrix of methanol as a test:

methanol = """\
O11
H11  O11  0.9316
C11  O11  1.4349  H11  107.5890
H12  C11  1.1029  O11  111.8699  H11    0.0000
H13  C11  1.1029  O11  111.8699  H11  122.9683
H14  C11  1.1029  O11  111.8699  H11 -118.5158
"""

Picture:

Screenshot 2024-01-23 at 4 48 12 PM

Here's the full script:

import psi4
import numpy as np

psi4.set_options({
  'scf_type': 'df',
  'g_convergence': 'gau_loose',
  'freeze_core': 'true',
  'reference': 'rhf',
  'save_jk': True,
  'geom_maxiter': 50,
})

psi4.set_output_file('free_energy_run.out')
psi4.set_num_threads(8)
psi4.set_memory('8 GB')

universe = psi4.geometry(methanol)
universe.update_geometry()
universe.print_in_input_format()

energy, wave_function = psi4.freq(
        'hf/6-31G*',
        return_wfn=True,
        molecule=universe,
        dertype='gradient'
)

So this calculation happens pretty rapidly with methanol but as I get to a system of 50ish atoms and 193 displacements needed it takes more than 24 hours so I was thinking there was a convergence problem somewhere or perhaps it's taking a really long time for those calculations?

Analysis

I was looking at the time for the calculations of the gradient

Module time:
        user time   =       0.37 seconds =       0.01 minutes
        system time =       0.02 seconds =       0.00 minutes
        total time  =          0 seconds =       0.00 minutes

it's pretty fast for a small system. However, for my system it's pretty slow per iteration

Module time:
	user time   =     738.22 seconds =      12.30 minutes
	system time =      17.29 seconds =       0.29 minutes
	total time  =         99 seconds =       1.65 minutes

This is probably why it's taking so long, any thoughts as to that other than it's bigger?

@susilehtola
Copy link
Member

Why dertype='gradient'? If I am not mistaken, Psi4 does have analytic Hessians for Hartree-Fock.

The cost of quantum calculations increases non-linearly with the size of the system. It may well be that you are hitting the asymptotic scaling wall.

@Sulstice
Copy link
Author

Sulstice commented Jan 24, 2024

That was me playing around, trying to figure out when to use the different dertypes. The level of theory I will be using is w97X-D DFT perhaps in the future but with Hartree-Fock I wanted to test the code that I obtain something back.

I've always used the default before when calculating single point energy scans so when to apply to which different level of theory is a little lost on me.

dertype='energy'
dertype'gradient'

How would I get around this issue? In my mind

Option 1

Give it more juice (like CPU). If i have nodes on a cluster how do I distribute the job between the nodes perhaps.

Option 2

Play around with option parameters and maybe obtain orbitals from previous geometry? Is that faster?

https://psicode.org/psi4manual/master/autodir_options_c/scf__guess.html

Should I be changing into the guess parameter?

Update 1

I was playing around a bit more:

Module time:
	user time   =     221.38 seconds =       3.69 minutes
	system time =       8.88 seconds =       0.15 minutes
	total time  =         38 seconds =       0.63 minutes

The time decreased when I set the configurations to 'g_convergence': 'gau_loose' I think from gau_tight? dertype is still gradient.

@loriab
Copy link
Member

loriab commented Jan 25, 2024

Be aware that analytic Hessians are available for Hartree–Fock (and a few DFT fctls that no one uses) only. So if wB97X-D is the target, it probably is best to prototype with freq(..., dertype='gradient') for consistency (as you were already doing). For all those displacements, QCFractal is the proposed way to run through them in parallel. You can get an idea how it works with a "snowflake" calc that just uses all the threads on a single node (and doesn't require database storage setup). conda-wise you'd need to conda install qcfractal postgresql -c conda-forge. An example is https://github.com/psi4/psi4/blob/master/tests/ddd-deriv/input.dat#L40-L47 . Snowflake is a lightweight single-node route. The full QCFractal approach is backed by a database (yours; not MolSSI's) and handles distributing gradient jobs through your cluster's queue. It takes a little more setup.

@Sulstice
Copy link
Author

Thanks @loriab I went back through the chart in the documentation as well. This is useful and I think QC fractal is a way to go when we want to scale soon. We have own graph database structure under the hood that is different from relational databases so it would be good to rope into that.

Screenshot 2024-01-25 at 8 53 14 AM

The computation time is a lot faster and I obtain my thermodynamic analysis.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants