Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Interface Psi4 to GauXC's sn-LinK #3150

Open
wants to merge 246 commits into
base: master
Choose a base branch
from

Conversation

davpoolechem
Copy link
Contributor

@davpoolechem davpoolechem commented Mar 22, 2024

Description

GauXC is a standalone library developed by @wavefunction91, among others, with the purpose of computing XC terms within Density Functional Theory (DFT) calculations in a massively parallel fashion, including utilization of multiple nodes via MPI, and GPU support. Of more interest to this PR, GauXC contains an implementation of the sn-LinK algorithm, a seminumerical method very similar to COSX for computing the exact exchange term (i.e., the K matrix). Like the XC components of GauXC, sn-LinK also has support for GPU execution. At PsiCon 2023, it was noted that GauXC's sn-LinK algorithm could be interfaced to Psi4 through the CompositeJK framework. This would give Psi4 its first-ever instance of noncommercial GPU support for the JK construction process. And that is the goal of this PR - interfacing Psi4 to GauXC's sn-LinK code via CompositeJK.

This interface is implemented primarily via a new SplitJK derived class, snLinK, which contains the guts of the interface to GauXC. The snLinK constructor parameterizes and constructs the GauXC::XCIntegrator object and all related objects (e.g., load balancer, molecular weights partitioner), as well as other auxiliary work such as defining the GauXC execution space. snLinK::build_G_component constructs the K matrix from the input density via the GauXC integrator's eval_exx function. build_G_component also handles fundamental pre- and post-processing required for the involved matrices - Spherical Harmonic integral reordering permutations if psi4_SHGAUSS_ORDERING is set to the default option of gaussian , and Spherical-to-Cartesian transforms, if required, to enable GPU support (also forcible via the SNLINK_FORCE_CARTESIAN keyword). As a SplitJK instance, sn-LinK can be called within Psi4 via SCF_TYPE = J_ALGO+SNLINK, with J_ALGO being the J construction algorithm of choice (currently only DFDIRJ). If GPU support is enabled, the SNLINK_USE_GPU keyword can be turned on to run the sn-LinK algorithm using GPUs.

Construction of the GauXC interface is specified at compile-time, using the ENABLE_gauxc flag. Psi4 can either build an internal instance of GauXC, or hook up to an external GauXC install specified by gauxc_DIR. If ENABLE_gauxc is turned off, the snLinK class will instead be a stub which throws an exception upon construction. For GPU support, there is the gauxc_ENABLE_GPU keyword, which ensures that the Psi4/GauXC interface supports GPU execution. For testing, test_compositejk.py and test_comprehensive_jk_screening.py both now include sn-LinK tests, conditional on the sn-LinK interface being built.

User API & Changelog headlines

  • Psi4 has been interfaced to the GauXC standalone library, specifically the sn-LinK exact exchange algorithm present within GauXC. Construction of the interface is specified at compile-time with the compile-time flag ENABLE_gauxc, and GPU support is specified with the gauxc_ENABLE_GPU compile-time flag. Once installed, SCF_TYPE=J_ALGO+SNLINK can be used to call GauXC within Psi4. A large number of keywords have been added for controlling the behavior of SNLINK. For controlling the GauXC grid, SNLINK_RADIAL_POINTS, SNLINK_SPHERICAL_POINTS, and SNLINK_RADIAL_SCHEME control the GauXC radial point count, spherical point count, and radial quadrature, respectively. SNLINK_USE_GPU controls GPU execution of GauXC. Finally, SNLINK_INTS_TOLERANCE controls the integral screening threshold used by GauXC's sn-LinK algorithm.

Dev notes & details

  • Adds a new new Psi4 compile-time option, ENABLE_gauxc. When set to ON, ENABLE_gauxc will build Psi4 with support for GauXC. The Psi4 build system has been adjusted so that Psi4 can either build an internal instance of GauXC, or hook up to an external GauXC instance defined by gauxc_DIR. The gauxc_ENABLE_GPU keyword builds the Psi4/GauXC interface to support GPU execution. Additionally, for internally-build GauXC instances, setting gauxc_ENABLE_GPU to ON will build the internal GauXC install with GPU support. For external GauXC installs, gauxc_ENABLE_GPU will ensure that the external GauXC install supports GPU execution.
  • Adds a new SplitJK derived class, snLinK. When Psi4 is built with GauXC support, snLinK contains the implementation of the details of the Psi4/GauXC interface, and is responsible for calling GauXC within Psi4. When Psi4 is not built with GauXC support, snLinK will instead throw an exception upon construction, as implemented in a stub class. The snLinK class supports both CPU and GPU execution of GauXC, controllable at runtime via the SNLINK_USE_GPU keyword. Additionally, the snLinK class operates correctly regardless of the value of psi4_SHGAUSS_ORDERING, as well as for both Spherical and Cartesian basis sets.
  • Updates the test_compositejk.py and test_comprehensive_jk_screening.py pytests to test sn-LinK functionalities, given that GauXC is installed.

Notes

  • Unlike COSX, which has a two-grid scheme currently, sn-LinK uses a single-grid scheme, simply converging the SCF on a single grid and leaving it at that. A multi-grid sn-LinK scheme is a potential idea for the future, but it is likely to be added as a separate PR.
  • In terms of interfacing to GauXC, this PR only handles interfacing to the GauXC sn-LinK exact exchange algorithm through CompositeJK. GauXC, as I understand, also has capabilities for providing features such as standardized grids and functionals, but that is beyond the scope of this PR.
  • Currently, there is a an issue in the code wherein the code will break for cases where SNLINK_FORCE_CARTESIAN is turned on for calculations with symmetry enabled (i.e., non-C1 symmetry). For now, I simply have the code throw an exception for such cases, but it's worth noting.

Questions

  • N/A

Checklist

  • Tests added for any new features
  • All or relevant fraction of full tests run

Status

  • Ready for review
  • Ready for merge

@davpoolechem davpoolechem marked this pull request as ready for review April 3, 2024 20:13
@davpoolechem
Copy link
Contributor Author

davpoolechem commented Apr 3, 2024

Think I'll open this up for review now.

external/upstream/gauxc/CMakeLists.txt Outdated Show resolved Hide resolved
external/upstream/gauxc/CMakeLists.txt Outdated Show resolved Hide resolved
external/upstream/gauxc/CMakeLists.txt Outdated Show resolved Hide resolved
external/upstream/gauxc/CMakeLists.txt Show resolved Hide resolved
external/upstream/gauxc/CMakeLists.txt Outdated Show resolved Hide resolved
external/upstream/gauxc/CMakeLists.txt Outdated Show resolved Hide resolved
psi4/CMakeLists.txt Outdated Show resolved Hide resolved
psi4/src/psi4/libfock/snLinK.cc Outdated Show resolved Hide resolved
psi4/src/psi4/libfock/snLinK.cc Outdated Show resolved Hide resolved
codedeps.yaml Outdated Show resolved Hide resolved
doc/sphinxman/source/scf.rst Outdated Show resolved Hide resolved
CMakeLists.txt Show resolved Hide resolved
@davpoolechem
Copy link
Contributor Author

Hi all! Just a heads-up. Something I would like to add to this PR before it gets merged completely, is a framework for generating GauXC HDF5 reference files via the Psi4/GauXC interface. I have a lot of the framework for this lying around separately, and I am working on porting it over to Psi4 proper, hooked up to the Pytest setup that Psi4 has.

psi4/src/psi4/libfock/snLinK.cc Outdated Show resolved Hide resolved
#ifdef GAUXC_HAS_DEVICE
if (use_gpu_) {
// 0.9 indicates to use maximum 90% of maximum GPU memory, I think?
rt = std::make_unique<GauXC::DeviceRuntimeEnvironment>( GAUXC_MPI_CODE(MPI_COMM_WORLD,) 0.9 );
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps making this 90% an expert option would be a good idea.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is actually dangerous if there are other GPU libs in flight - the more sustainable solution would eventually be to allocate a (common) buffer and pass it in - this mode is supported to enable interop with other GPU libraries that support external allocations (to the largest extent possible) - GPU data occupancy is assumed to be ephemeral, allocation is just amortized as it's expensive in critical paths. If it's desirable, we could also set up a mode where the GPU memory is reallocated every instance of eval_xyz to allow for (sensible) interop with libraries that don't support external memory allocation (not sure what BrianQC does or does not support)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed on @TiborGY's suggestion; I did start implementing your idea.

I can definitely see the concerns that @wavefunction91 brings up - I presume there could be issues with different GPU libraries allocating GPU memory simultaneously. But I have to wonder if the common buffer idea falls within the scope of this PR, even though it is likely the long-term solution. If we do think it should be added this PR, I can dig around and see if there are any other GPU-enabled Psi4 capabilities (e.g., perhaps the GPU CC plugins) and look at their use of external GPU memory allocation, or lack thereof. Thoughts?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there isn't pervasive GPU code in Psi4, I probably wouldn't bog this down with it. Going with @TiborGY 's suggestion may just be the way to go. IIRC the GPU CC modules allocate memory to themselves, so as long as the XC instance goes out of scope before a CC calculation starts, you're probably fine. I'm not sure if that module can use a preallocated buffer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So for now, I have implemented @TiborGY's suggestion. There is a new keyword for snLinK, SNLINK_GPU_MEM, which controls the % of GPU memory used by the GauXC instance (defaulting to 90, the previously-hardcoded behavior). I do want to explicitly test this out with the GPU CC plugin, however, which I have not yet done.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I dug around the GPU DFCC plugin a bit. It seems to be hardcoded to only work with a limited subset of SCF_TYPES (CD, DISK_DF), so you can't currently run GPU DFCC in conjunction with SNLINK anyway. I suppose because of that, there shouldn't be issues between those two in practice currently?

David Poole and others added 29 commits May 10, 2024 09:14
Co-authored-by: Lori A. Burns <lori.burns@gmail.com>
Co-authored-by: Lori A. Burns <lori.burns@gmail.com>
@davpoolechem
Copy link
Contributor Author

davpoolechem commented May 10, 2024

All right, I believe all outstanding issues up to this point have been resolved! As of recent, all checks and queries based on GauXC's AM (e.g., L2 cross-validation) are now conducted at runtime instead of compile-time, GauXC GPU memory allocation is user-controllable, and the build system is set up to propogate CMAKE_CUDA_ARCHITECTURES to GauXC in a reasonable fashion.

Current issues in CI seem to revolve around problems with finding a suitable basis_set_exchange package to use for configuration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants