New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kokkos_malloc
should accept an execution space instance
#6918
Comments
@dalg24 @masterleinad @crtrott What do you think ? |
Do you have a good motivation for using |
Motivation - Runtime polymorphism on device
The motivation is mostly related to creating vtables on device to allow for dynamic polymorphism. Just a few references:
Basically, a generic code that generates the vtable could look like the following (if you follow the presentation of @vbrunini): /// A custom deleter that can be used in e.g. @c std::unique_ptr or @c std::shared_ptr.
/// Inspired by V. Brunini.
template <typename device_type>
struct DeviceDeleter
{
template <typename T>
void operator()(T* ptr) const
{
Kokkos::parallel_for(Kokkos::RangePolicy<typename device_type::execution_space>(0, 1),
KOKKOS_LAMBDA (const int /* */) {ptr->~T();});
Kokkos::kokkos_free<typename device_type::memory_space>(ptr);
}
};
/// Copy a host object to device with a placement new calling the copy constructor, thereby creating the @c vtable on device.
/// Inspired by V. Brunini.
template <
typename Derived,
typename device_type,
typename smart_ptr_t = std::shared_ptr<Derived>
>
smart_ptr_t copy_to_device(const typename device_type::execution_space& space, const Derived& derived)
{
auto* ptr = static_cast<Derived*>(Kokkos::kokkos_malloc<typename device_type::memory_space>(sizeof(Derived)));
Kokkos::parallel_for(Kokkos::RangePolicy<typename device_type::execution_space>(space, 0, 1),
KOKKOS_LAMBDA (const int /* */) {new (ptr) Derived(derived);});
return smart_ptr_t(ptr, DeviceDeleter<execution_space>());
} If you use a rank-0 view as @masterleinad suggested, you might have a code like this: /// Same as @ref copy_to_device, but using a rank-0 @c Kokkos::View.
template <typename Derived, typename device_type, typename view_t = Kokkos::View<Derived, device_type>>
view_t copy_to_device_in_view(const typename device_type::execution_space& space, const Derived& derived)
{
view_t copied(Kokkos::view_alloc(space, "label"));//, Kokkos::WithoutInitializing));
Kokkos::parallel_for(Kokkos::RangePolicy<typename device_type::execution_space>(space, 0, 1),
KOKKOS_LAMBDA (const int /* */) {new (copied.data()) Derived(derived);});
return copied;
} It is indeed shorter to use a rank-0 view for that purpose. However, we could not use kokkos/core/src/impl/Kokkos_ViewMapping.hpp Lines 3038 to 3050 in 4b90930
Allocating on the right stream
I guess the motivation is that, if you can't pass an execution space instance to
|
Summary
Kokkos::kokkos_malloc
can be used to allocate memory in a memory space, allowing the memory to be tracked byKokkos
(among other benefits such as label).However, it seems it is not possible to pass an execution space instance (that would allow one to do stream ordered allocation).
Currently, the code looks like:
kokkos/core/src/Kokkos_Core.hpp
Lines 152 to 165 in a833fb0
Actions
Kokkos::kokkos_malloc
toKokkos::malloc
?Joint work with @maartenarnst while thinking of the requirements for a copy-to-device helper, that helps users generating the vtable on device.
The text was updated successfully, but these errors were encountered: