New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
graph: allow access to native graph object #6904
base: develop
Are you sure you want to change the base?
Conversation
33c4eb8
to
c4a50e1
Compare
d378260
to
10c1f91
Compare
@dalg24 @masterleinad If you don't have any further comment, this one is good to go! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a test that demonstrates a use case?
Would something along these lines be OK for you ? //! @test This test ensures that the underlying @c CUDA objects can be retrieved from a @c Kokkos::Graph.
TEST_F(TEST_CATEGORY, cuda_graph) {
//! Build a non-empty graph. The content does not matter as it won't be executed.
auto graph =
Kokkos::Experimental::create_graph<TEST_EXECSPACE>([&](auto root) {
root.then_parallel_for(1, CountTestFunctor<TEST_EXECSPACE>{});
});
//! Retrieve the @c Kokkos::Impl::Graph.
auto root_node_ref = Kokkos::Impl::GraphAccess::create_root_ref(graph);
auto graph_ptr_impl = Kokkos::Impl::GraphAccess::get_graph_weak_ptr(root_node_ref);
//! Instantiate the graph manually.
cudaGraphInstantiate(
graph_ptr_impl->get_cuda_graph_exec(),
graph_ptr_impl->get_cuda_graph()
);
} (in |
Yeah, something like that, maybe also call |
10c1f91
to
3031aca
Compare
|
7820e8b
to
90245f2
Compare
There is an error with CUDA 11.0
There is also a problem with ROCM 5.2:
The graph library is bugged in ROCm 5.2, so we are not using it. |
cudaGraph_t& get_cuda_graph() { return m_graph; } | ||
cudaGraphExec_t& get_cuda_graph_exec() { return m_graph_exec; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you change and now return a reference? AFAIK cudaGraph[Exec]_t
are handle types, some typedef for a pointer to an implementation-defined struct and they are intended to be passed by value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That said I was about to ask about the const-qualifier in the previous version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
get_cuda_graph
I think that if we want to allow interoperability in both directions, it has to be
cudaGraph_t& get_cuda_graph() { return m_graph; }
(i.e. by reference).
Because if I have an existing cudaGraph_t
, and want to "set it" to my Kokkos::Graph
, there is no other way I guess. Or maybe you'd want a set_cuda_graph
? Up to you 😄
get_cuda_graph_exec
For this one, one of the primary reasons to handle cudaGraphExec_t
"manually" would be when one wants to instantiate the graph with parameters (something the Kokkos
API does not allow us to do for now).
The signature of one of the overloads of cudaGraphInstantiate
from the Cuda
documentation:
__host__ cudaError_t cudaGraphInstantiateWithParams ( cudaGraphExec_t* pGraphExec, cudaGraph_t graph, cudaGraphInstantiateParams* instantiateParams );
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK so that was your way of addressing #6904 (review)
I am not a big fan of the getter returning a reference. I would have preferred if we could find a way to set the root node at construction and not allow anyone to change the graph from underneath Kokkos otherwise. This makes it possible to break invariances such as mismatch between the graph and the "executable".
Regarding the setter for the graph executable. Do you actually need it? Are you not able to set properties from the handle when it is returned by value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am also concerned about this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Graph handle
For the cudaGraph_t
object, I guess it's enough for now to be able to retrieve it by value. I suggest we use this PR to add just the regualr getter as follows:
cudaGraph_t get_cuda_graph() const { return m_graph; }
This would already enable interoperability "from Kokkos
to outside". And we could discuss in another issue how to best enable interoperability "from outside to Kokkos
", i.e. how to allow construction of a Kokkos::Graph
from an existing cudaGraph_t
handle.
Executable graph handle
The function used by Kokkos
with the error node and logs has been "removed" in CUDA 12
(https://developer.nvidia.com/blog/cuda-toolkit-12-0-released-for-general-availability/, still in https://docs.nvidia.com/cuda/archive/11.1.0/cuda-runtime-api/group__CUDART__GRAPH.html#group__CUDART__GRAPH but not anymore in https://docs.nvidia.com/cuda/archive/12.4.0/cuda-runtime-api/group__CUDART__GRAPH.html#group__CUDART__GRAPH).
I looked in my include files (I'm on Cuda 12.3.2
):
static __inline__ __host__ cudaError_t cudaGraphInstantiate( cudaGraphExec_t *pGraphExec, cudaGraph_t graph, cudaGraphNode_t *pErrorNode, char *pLogBuffer, size_t bufferSize){
(void)pErrorNode;
(void)pLogBuffer;
(void)bufferSize;
return ::cudaGraphInstantiate(pGraphExec, graph, 0);
}
In the new Cuda
versions, these are the API functions that can be used to instantiate the graph:
__host__ cudaError_t cudaGraphInstantiate ( cudaGraphExec_t* pGraphExec, cudaGraph_t graph, unsigned long long flags = 0 )
__host__ cudaError_t cudaGraphInstantiateWithFlags ( cudaGraphExec_t* pGraphExec, cudaGraph_t graph, unsigned long long flags = 0 )
__host__ cudaError_t cudaGraphInstantiateWithParams ( cudaGraphExec_t* pGraphExec, cudaGraph_t graph, cudaGraphInstantiateParams* instantiateParams )
So we could now pass the parameters through Kokkos::Impl::instantiate
more easily. I pushed the changes just for you to see. Note that we might enter into some CUDA
API versioning issues (the CUDA
graph API changed in CUDA 12
).
auto root_node_ref = Kokkos::Impl::GraphAccess::create_root_ref(graph); | ||
auto graph_ptr_impl = | ||
Kokkos::Impl::GraphAccess::get_graph_weak_ptr(root_node_ref).lock(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just noting that we are reaching into an Impl::
facility here so users won't technically be allowed to use it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed. I guess this convoluted way of accessing the backend specialization stems from the original implementation strategy which I think aimed at providing some sort of type erasure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does not necessarily need to be addressed here, but so far we have only looked into interoperability in one direction Kokkos -> raw CUDA graph but we might want to think about ways to build a root node from and existing CUDA graph or whatnot.
6b04674
to
ca66f44
Compare
The native graph objects should be accessible for advanced users.
ca66f44
to
47e3b8d
Compare
Summary
As discussed with @dalg24 , the native graph objects should be accessible for advanced users.
Description
For
Cuda
andHIP
backends, theKokkos::Impl::Graph
can be viewed as a convenient wrapper aroundCuda
orHIP
objects (e.g.cudaGraph_t
).Kokkos
should not prevent advanced users to retrieve the native graph objects.A few use cases:
cudaGraphDebugDotPrint
Kokkos