You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Suppose that I want to vectorize a 1-D array copy operation using mdspan and aligned_accessor (P2897). I have a copy_8_floats function that expresses a hardware instruction for copying a contiguous array of 8 floats, where the start of the array is aligned to 8 * sizeof(float) (32) bytes.
The natural next step would be to use copy_8_floats to implement copying 1-D float arrays by the usual "strip-mining" approach.
template<classElementType>
using array_view = mdspan<ElementType,
extents<int, dynamic_extent>, layout_right,
default_accessor<ElementType>>;
template<size_t Extent>
voidcopy(array_view<floatconst> source, array_view<float> destination)
{
constexprint vector_length = 8;
// Handle possibly unaligned prefix of less than vector_length elements. auto aligned_starting_index = [](auto* ptr) {
constexprauto v = static_cast<unsigned>(vector_length);
auto ptr_value = std::bit_cast<uintptr_t>(ptr_value);
autoremainder = ptr_value % v;
returnstatic_cast<int>(ptr_value + (v - remainder) % v);
};
int src_upper = aligned_starting_index(source.data());
int dst_upper = aligned_starting_index(destination.data());
int src_lower = 0;
int dst_lower = 0;
for (; src_lower < src_upper && dst_lower < dst_upper; ++src_lower, ++dst_lower) {
destination[dst_lower] = source[src_lower];
}
// Strip-mine the aligned vector_length segments of the array.constint src_upper = (source.size() / vector_length) * vector_length;
constint dst_upper = (destination.size() / vector_length) * vector_length;
for (; src_lower < src_upper && dst_lower < dst_upper;
src_lower += vector_length, dst_lower += vector_length)
{
constexprauto one = std::integral_constant<int, 1>{};
constexprauto vec_len = std::integral_constant<int, vector_length>{};
// Using strided_slice ensures that the extent is fully known at compile time.constexprauto vector_slice = strided_slice{.offset=dst_lower, .extent=vector_length, .stride=one};
// PROBLEM: Current wording from P2630 and P2642 makes this always layout_stride.
aligned_array_view source_slice = submdspan(source, vector_slice);
aligned_array_view destination_slice = submdspan(destination, vector_slice);
copy_8_floats(source_slice, destination_slice);
}
// Handle suffix of less than vector_length elements.for (; src_lower < source.size() && dst_lower < destination.size(); ++src_lower, ++dst_lower) {
destination[dst_lower] = source[src_lower];
}
}
One must use strided_slice so that the resulting extent is fully known at compile time. Using index-pair-like (e.g., tuple) would result in a run-time extent, because both elements of the pair would have run-time components.
There are two problems with the above code.
Current wording from P2630 and P2642 makes the above submdspan results always have layout_stride.
Accessor .offset only takes a size_t offset. Thus, it discards compile-time information, like knowing that the offset is divisible by some overalignment factor.
I suggest the following strategy to fix these issues.
We can fix (1) in P2642 for layout_left_padded and layout_right_padded.
We can fix (1) for layout_left and layout_right with a separate paper, because C++26 is still a working draft.
Fixing (2) is harder and would complicate the accessor policy, so we might want to consider not doing that.
auto sub_map_offset = submdspan_mapping(src.mapping(), slices...);
return mdspan(src.accessor().offset(src.data(), sub_map_offset.offset),
sub_map_offset.mapping,
AccessorPolicy::offset_policy(src.accessor()));
The problem is AccessorPolicy::offset_policy(src.accessor()). The offset_policy is just a type, and offset returns offset_policy::data_handle_type. If we want an offset with suitable compile-time alignment to have a different accessor type, then we would need two changes. First, offset would need an overload taking a type that expresses the product of a compile-time integer (of suitable alignment) and a run-time integer. Second, we would need some way to get the accessor type corresponding to that offset overload -- say, a new templated type alias whose template parameter is the type of the corresponding offset overload's parameter. Given that offset is already part of the accessor, this would make more sense than introducing a customization point just for the offset accessor policy.
The work-around for a known accessor type is to replace submdspan and handle the accessor conversion by hand. The source accessor can still have an offset overload that takes the offset as a type other than size_t.
Given that this work-around is easy to do, should only be needed for very special cases, and avoids a big design change to the accessor policy requirements, it should suffice.
The text was updated successfully, but these errors were encountered:
Suggested fix: Write a separate follow-on paper that addresses layout_left, layout_right, layout_left_padded, and layout_right_padded all at once.
For the padded layouts, if the leftmost resp. rightmost extent gets strided_slice and its .extent is a compile-time value, then preserve the layout (taking care to preserve the stride, which might entail making it a run-time value -- check all cases here).
Suppose that I want to vectorize a 1-D array copy operation using
mdspan
andaligned_accessor
(P2897). I have acopy_8_floats
function that expresses a hardware instruction for copying a contiguous array of 8float
s, where the start of the array is aligned to8 * sizeof(float)
(32) bytes.The natural next step would be to use
copy_8_floats
to implement copying 1-Dfloat
arrays by the usual "strip-mining" approach.One must use
strided_slice
so that the resulting extent is fully known at compile time. Usingindex-pair-like
(e.g.,tuple
) would result in a run-time extent, because both elements of the pair would have run-time components.There are two problems with the above code.
submdspan
results always havelayout_stride
..offset
only takes asize_t
offset. Thus, it discards compile-time information, like knowing that the offset is divisible by some overalignment factor.I suggest the following strategy to fix these issues.
layout_left_padded
andlayout_right_padded
.layout_left
andlayout_right
with a separate paper, because C++26 is still a working draft.Regarding fixing (2), ([mdspan.submdspan.submdspan] paragraph 6](https://eel.is/c++draft/views.multidim#mdspan.submdspan.submdspan-6) says that
submdspan(src, slices...)
has effects equivalent to the following.The problem is
AccessorPolicy::offset_policy(src.accessor())
. Theoffset_policy
is just a type, andoffset
returnsoffset_policy::data_handle_type
. If we want an offset with suitable compile-time alignment to have a different accessor type, then we would need two changes. First,offset
would need an overload taking a type that expresses the product of a compile-time integer (of suitable alignment) and a run-time integer. Second, we would need some way to get the accessor type corresponding to thatoffset
overload -- say, a new templated type alias whose template parameter is the type of the correspondingoffset
overload's parameter. Given thatoffset
is already part of the accessor, this would make more sense than introducing a customization point just for the offset accessor policy.The work-around for a known accessor type is to replace
submdspan
and handle the accessor conversion by hand. The source accessor can still have anoffset
overload that takes the offset as a type other thansize_t
.Given that this work-around is easy to do, should only be needed for very special cases, and avoids a big design change to the accessor policy requirements, it should suffice.
The text was updated successfully, but these errors were encountered: