Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

graph: allow access to native graph object #6904

Open
wants to merge 2 commits into
base: develop
Choose a base branch
from

Conversation

romintomasetti
Copy link
Contributor

Summary

As discussed with @dalg24 , the native graph objects should be accessible for advanced users.

Description

For Cuda and HIP backends, the Kokkos::Impl::Graph can be viewed as a convenient wrapper around Cuda or HIP objects (e.g. cudaGraph_t).

Kokkos should not prevent advanced users to retrieve the native graph objects.

A few use cases:

  • cudaGraphDebugDotPrint
  • experimenting with graph features not supported (yet) by Kokkos

@romintomasetti romintomasetti force-pushed the kokkos-graph-get-native branch 2 times, most recently from d378260 to 10c1f91 Compare March 28, 2024 15:46
@romintomasetti
Copy link
Contributor Author

@dalg24 @masterleinad If you don't have any further comment, this one is good to go!

Copy link
Contributor

@masterleinad masterleinad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a test that demonstrates a use case?

@romintomasetti
Copy link
Contributor Author

Can you add a test that demonstrates a use case?

Would something along these lines be OK for you ?

//! @test This test ensures that the underlying @c CUDA objects can be retrieved from a @c Kokkos::Graph.
TEST_F(TEST_CATEGORY, cuda_graph) {
  //! Build a non-empty graph. The content does not matter as it won't be executed.
  auto graph =
      Kokkos::Experimental::create_graph<TEST_EXECSPACE>([&](auto root) {
        root.then_parallel_for(1, CountTestFunctor<TEST_EXECSPACE>{});
      });

  //! Retrieve the @c Kokkos::Impl::Graph.
  auto root_node_ref  = Kokkos::Impl::GraphAccess::create_root_ref(graph);
  auto graph_ptr_impl = Kokkos::Impl::GraphAccess::get_graph_weak_ptr(root_node_ref);

  //! Instantiate the graph manually.
  cudaGraphInstantiate(
    graph_ptr_impl->get_cuda_graph_exec(),
    graph_ptr_impl->get_cuda_graph()
  );
}

(in TestCuda_Graph.cpp, and similarly in testHIP_Graph.cpp)

@masterleinad
Copy link
Contributor

Yeah, something like that, maybe also call cudaGraphDebugDotPrint?

@romintomasetti
Copy link
Contributor Author

  • I wrote a test for Cuda, and I basically copy-pasted it for HIP. Is the copy-paste the expected testing strategy ?
  • The test creates a graph through Kokkos API, then retrieves the underlying backend objects to instantiate them manually, and then uses the Kokkos API again to submit the graph. It seems this test is more of an "interoperability" test. Kokkos has some *InterOp* test files. Should I move my tests to some TestCuda_InterOp_Graph.cpp ?

@romintomasetti romintomasetti force-pushed the kokkos-graph-get-native branch 8 times, most recently from 7820e8b to 90245f2 Compare March 29, 2024 15:28
@Rombur
Copy link
Member

Rombur commented Apr 1, 2024

There is an error with CUDA 11.0

/var/jenkins/workspace/Kokkos_PR-6904/core/unit_test/cuda/TestCuda_Graph.cpp(79): error: identifier "cudaGraphDebugDotFlagsVerbose" is undefined
/var/jenkins/workspace/Kokkos_PR-6904/core/unit_test/cuda/TestCuda_Graph.cpp(78): error: identifier "cudaGraphDebugDotPrint" is undefined

There is also a problem with ROCM 5.2:

/var/jenkins/workspace/Kokkos_PR-6904/core/unit_test/hip/TestHIP_Graph.cpp:49:29: error: no member named 'get_hip_graph_exec' in 'Kokkos::Impl::GraphImpl<Kokkos::HIP>'
  ASSERT_EQ(graph_ptr_impl->get_hip_graph_exec(), nullptr);
            ~~~~~~~~~~~~~~~~^

The graph library is bugged in ROCm 5.2, so we are not using it. Kokkos_HIP_Graph_Impl.hpp is not included when you compile with ROCm 5.2, so you need to guard your test.

Comment on lines 204 to 205
cudaGraph_t& get_cuda_graph() { return m_graph; }
cudaGraphExec_t& get_cuda_graph_exec() { return m_graph_exec; }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you change and now return a reference? AFAIK cudaGraph[Exec]_t are handle types, some typedef for a pointer to an implementation-defined struct and they are intended to be passed by value.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That said I was about to ask about the const-qualifier in the previous version.

Copy link
Contributor Author

@romintomasetti romintomasetti Apr 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_cuda_graph

I think that if we want to allow interoperability in both directions, it has to be

cudaGraph_t& get_cuda_graph() { return m_graph; }

(i.e. by reference).

Because if I have an existing cudaGraph_t, and want to "set it" to my Kokkos::Graph, there is no other way I guess. Or maybe you'd want a set_cuda_graph? Up to you 😄

get_cuda_graph_exec

For this one, one of the primary reasons to handle cudaGraphExec_t "manually" would be when one wants to instantiate the graph with parameters (something the Kokkos API does not allow us to do for now).

The signature of one of the overloads of cudaGraphInstantiate from the Cuda documentation:

__host__ ​cudaError_t cudaGraphInstantiateWithParams ( cudaGraphExec_t* pGraphExec, cudaGraph_t graph, cudaGraphInstantiateParams* instantiateParams );

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK so that was your way of addressing #6904 (review)

I am not a big fan of the getter returning a reference. I would have preferred if we could find a way to set the root node at construction and not allow anyone to change the graph from underneath Kokkos otherwise. This makes it possible to break invariances such as mismatch between the graph and the "executable".

Regarding the setter for the graph executable. Do you actually need it? Are you not able to set properties from the handle when it is returned by value?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am also concerned about this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graph handle

For the cudaGraph_t object, I guess it's enough for now to be able to retrieve it by value. I suggest we use this PR to add just the regualr getter as follows:

cudaGraph_t get_cuda_graph() const { return m_graph; }

This would already enable interoperability "from Kokkos to outside". And we could discuss in another issue how to best enable interoperability "from outside to Kokkos", i.e. how to allow construction of a Kokkos::Graph from an existing cudaGraph_t handle.

Executable graph handle

The function used by Kokkos with the error node and logs has been "removed" in CUDA 12 (https://developer.nvidia.com/blog/cuda-toolkit-12-0-released-for-general-availability/, still in https://docs.nvidia.com/cuda/archive/11.1.0/cuda-runtime-api/group__CUDART__GRAPH.html#group__CUDART__GRAPH but not anymore in https://docs.nvidia.com/cuda/archive/12.4.0/cuda-runtime-api/group__CUDART__GRAPH.html#group__CUDART__GRAPH).

I looked in my include files (I'm on Cuda 12.3.2):

static __inline__ __host__ cudaError_t cudaGraphInstantiate( cudaGraphExec_t *pGraphExec,  cudaGraph_t graph,  cudaGraphNode_t *pErrorNode,  char *pLogBuffer,  size_t bufferSize){
  (void)pErrorNode;
  (void)pLogBuffer;
  (void)bufferSize;
  return ::cudaGraphInstantiate(pGraphExec, graph, 0);
}

In the new Cuda versions, these are the API functions that can be used to instantiate the graph:

__host__ ​cudaError_t cudaGraphInstantiate ( cudaGraphExec_t* pGraphExec, cudaGraph_t graph, unsigned long long flags = 0 )

__host__ ​cudaError_t cudaGraphInstantiateWithFlags ( cudaGraphExec_t* pGraphExec, cudaGraph_t graph, unsigned long long flags = 0 )

__host__ ​cudaError_t cudaGraphInstantiateWithParams ( cudaGraphExec_t* pGraphExec, cudaGraph_t graph, cudaGraphInstantiateParams* instantiateParams ) 

So we could now pass the parameters through Kokkos::Impl::instantiate more easily. I pushed the changes just for you to see. Note that we might enter into some CUDA API versioning issues (the CUDA graph API changed in CUDA 12).

Comment on lines +43 to +45
auto root_node_ref = Kokkos::Impl::GraphAccess::create_root_ref(graph);
auto graph_ptr_impl =
Kokkos::Impl::GraphAccess::get_graph_weak_ptr(root_node_ref).lock();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just noting that we are reaching into an Impl:: facility here so users won't technically be allowed to use it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed. I guess this convoluted way of accessing the backend specialization stems from the original implementation strategy which I think aimed at providing some sort of type erasure.

Copy link
Member

@dalg24 dalg24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does not necessarily need to be addressed here, but so far we have only looked into interoperability in one direction Kokkos -> raw CUDA graph but we might want to think about ways to build a root node from and existing CUDA graph or whatnot.

@romintomasetti romintomasetti force-pushed the kokkos-graph-get-native branch 2 times, most recently from 6b04674 to ca66f44 Compare April 12, 2024 09:28
The native graph objects should be accessible for advanced users.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants