New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Interface Psi4 to GauXC's sn-LinK #3150
base: master
Are you sure you want to change the base?
Conversation
4753c8a
to
ab0e11c
Compare
Think I'll open this up for review now. |
d9d22d1
to
47aa36b
Compare
Hi all! Just a heads-up. Something I would like to add to this PR before it gets merged completely, is a framework for generating GauXC HDF5 reference files via the Psi4/GauXC interface. I have a lot of the framework for this lying around separately, and I am working on porting it over to Psi4 proper, hooked up to the Pytest setup that Psi4 has. |
37f2ee3
to
2267a50
Compare
25ddccd
to
eac764b
Compare
psi4/src/psi4/libfock/snLinK.cc
Outdated
#ifdef GAUXC_HAS_DEVICE | ||
if (use_gpu_) { | ||
// 0.9 indicates to use maximum 90% of maximum GPU memory, I think? | ||
rt = std::make_unique<GauXC::DeviceRuntimeEnvironment>( GAUXC_MPI_CODE(MPI_COMM_WORLD,) 0.9 ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps making this 90% an expert option would be a good idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is actually dangerous if there are other GPU libs in flight - the more sustainable solution would eventually be to allocate a (common) buffer and pass it in - this mode is supported to enable interop with other GPU libraries that support external allocations (to the largest extent possible) - GPU data occupancy is assumed to be ephemeral, allocation is just amortized as it's expensive in critical paths. If it's desirable, we could also set up a mode where the GPU memory is reallocated every instance of eval_xyz
to allow for (sensible) interop with libraries that don't support external memory allocation (not sure what BrianQC does or does not support)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed on @TiborGY's suggestion; I did start implementing your idea.
I can definitely see the concerns that @wavefunction91 brings up - I presume there could be issues with different GPU libraries allocating GPU memory simultaneously. But I have to wonder if the common buffer idea falls within the scope of this PR, even though it is likely the long-term solution. If we do think it should be added this PR, I can dig around and see if there are any other GPU-enabled Psi4 capabilities (e.g., perhaps the GPU CC plugins) and look at their use of external GPU memory allocation, or lack thereof. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there isn't pervasive GPU code in Psi4, I probably wouldn't bog this down with it. Going with @TiborGY 's suggestion may just be the way to go. IIRC the GPU CC modules allocate memory to themselves, so as long as the XC instance goes out of scope before a CC calculation starts, you're probably fine. I'm not sure if that module can use a preallocated buffer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So for now, I have implemented @TiborGY's suggestion. There is a new keyword for snLinK, SNLINK_GPU_MEM
, which controls the % of GPU memory used by the GauXC instance (defaulting to 90, the previously-hardcoded behavior). I do want to explicitly test this out with the GPU CC plugin, however, which I have not yet done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I dug around the GPU DFCC plugin a bit. It seems to be hardcoded to only work with a limited subset of SCF_TYPES
(CD, DISK_DF
), so you can't currently run GPU DFCC in conjunction with SNLINK
anyway. I suppose because of that, there shouldn't be issues between those two in practice currently?
Co-authored-by: Lori A. Burns <lori.burns@gmail.com>
Co-authored-by: Lori A. Burns <lori.burns@gmail.com>
All right, I believe all outstanding issues up to this point have been resolved! As of recent, all checks and queries based on GauXC's AM (e.g., L2 cross-validation) are now conducted at runtime instead of compile-time, GauXC GPU memory allocation is user-controllable, and the build system is set up to propogate Current issues in CI seem to revolve around problems with finding a suitable |
Description
GauXC is a standalone library developed by @wavefunction91, among others, with the purpose of computing XC terms within Density Functional Theory (DFT) calculations in a massively parallel fashion, including utilization of multiple nodes via MPI, and GPU support. Of more interest to this PR, GauXC contains an implementation of the sn-LinK algorithm, a seminumerical method very similar to COSX for computing the exact exchange term (i.e., the K matrix). Like the XC components of GauXC, sn-LinK also has support for GPU execution. At PsiCon 2023, it was noted that GauXC's sn-LinK algorithm could be interfaced to Psi4 through the
CompositeJK
framework. This would give Psi4 its first-ever instance of noncommercial GPU support for the JK construction process. And that is the goal of this PR - interfacing Psi4 to GauXC's sn-LinK code viaCompositeJK
.This interface is implemented primarily via a new
SplitJK
derived class,snLinK
, which contains the guts of the interface to GauXC. ThesnLinK
constructor parameterizes and constructs theGauXC::XCIntegrator
object and all related objects (e.g., load balancer, molecular weights partitioner), as well as other auxiliary work such as defining the GauXC execution space.snLinK::build_G_component
constructs the K matrix from the input density via the GauXC integrator'seval_exx
function.build_G_component
also handles fundamental pre- and post-processing required for the involved matrices - Spherical Harmonic integral reordering permutations ifpsi4_SHGAUSS_ORDERING
is set to the default option of gaussian , and Spherical-to-Cartesian transforms, if required, to enable GPU support (also forcible via theSNLINK_FORCE_CARTESIAN
keyword). As aSplitJK
instance, sn-LinK can be called within Psi4 viaSCF_TYPE = J_ALGO+SNLINK
, withJ_ALGO
being the J construction algorithm of choice (currently onlyDFDIRJ
). If GPU support is enabled, theSNLINK_USE_GPU
keyword can be turned on to run the sn-LinK algorithm using GPUs.Construction of the GauXC interface is specified at compile-time, using the
ENABLE_gauxc
flag. Psi4 can either build an internal instance of GauXC, or hook up to an external GauXC install specified bygauxc_DIR
. IfENABLE_gauxc
is turned off, thesnLinK
class will instead be a stub which throws an exception upon construction. For GPU support, there is thegauxc_ENABLE_GPU
keyword, which ensures that the Psi4/GauXC interface supports GPU execution. For testing,test_compositejk.py
andtest_comprehensive_jk_screening.py
both now include sn-LinK tests, conditional on the sn-LinK interface being built.User API & Changelog headlines
ENABLE_gauxc
, and GPU support is specified with thegauxc_ENABLE_GPU
compile-time flag. Once installed,SCF_TYPE=J_ALGO+SNLINK
can be used to call GauXC within Psi4. A large number of keywords have been added for controlling the behavior ofSNLINK
. For controlling the GauXC grid,SNLINK_RADIAL_POINTS
,SNLINK_SPHERICAL_POINTS
, andSNLINK_RADIAL_SCHEME
control the GauXC radial point count, spherical point count, and radial quadrature, respectively.SNLINK_USE_GPU
controls GPU execution of GauXC. Finally,SNLINK_INTS_TOLERANCE
controls the integral screening threshold used by GauXC's sn-LinK algorithm.Dev notes & details
ENABLE_gauxc
. When set toON
,ENABLE_gauxc
will build Psi4 with support for GauXC. The Psi4 build system has been adjusted so that Psi4 can either build an internal instance of GauXC, or hook up to an external GauXC instance defined bygauxc_DIR
. Thegauxc_ENABLE_GPU
keyword builds the Psi4/GauXC interface to support GPU execution. Additionally, for internally-build GauXC instances, settinggauxc_ENABLE_GPU
toON
will build the internal GauXC install with GPU support. For external GauXC installs,gauxc_ENABLE_GPU
will ensure that the external GauXC install supports GPU execution.SplitJK
derived class,snLinK
. When Psi4 is built with GauXC support,snLinK
contains the implementation of the details of the Psi4/GauXC interface, and is responsible for calling GauXC within Psi4. When Psi4 is not built with GauXC support, snLinK will instead throw an exception upon construction, as implemented in a stub class. ThesnLinK
class supports both CPU and GPU execution of GauXC, controllable at runtime via theSNLINK_USE_GPU
keyword. Additionally, thesnLinK
class operates correctly regardless of the value ofpsi4_SHGAUSS_ORDERING
, as well as for both Spherical and Cartesian basis sets.test_compositejk.py
andtest_comprehensive_jk_screening.py
pytests to test sn-LinK functionalities, given that GauXC is installed.Notes
CompositeJK
. GauXC, as I understand, also has capabilities for providing features such as standardized grids and functionals, but that is beyond the scope of this PR.SNLINK_FORCE_CARTESIAN
is turned on for calculations with symmetry enabled (i.e., non-C1 symmetry). For now, I simply have the code throw an exception for such cases, but it's worth noting.Questions
Checklist
Status