C ABI records #2082

dcz-self · 2024-01-11T07:24:50Z

This is an enhancement proposal, aiming at better ergonomics and/or speed.

Passing arrays or records or tuples into entry points can currently be done efficiently by marshalling them transposed. To send an argument like

foo: [N](u8, u16)

the caller needs to call tuple_new() N times, potentially(?) allocating each time, and crossing the FFI barrier, making optimizations harder (if LTO can eliminate the calls; otherwise impossible).

To get back that speed, the arguments must be like this:

foo0: [N]u8
foo1: [N]u16

In this case, the array needs to be created only twice, but no extra allocations happen.

The issue with this approach is that this kind of output is rarely idiomatic in today's programming languages. To get this output, all data, if it exists in the form [N](u8, u16), must be processed again by the CPU (slow) in order to match this new shape.

Some languages without a stable internal data representation get around this problem by having another kind of record type. In Rust, a regular struct can be turned into one with a defined field order by specifying the C representation:

#[repr(C)]
struct Foo {
   a: u8,
   b: u16,
}

Such structs can be accessed on the C side as:

struct Foo {
  uint8_t a;
  uint16_t b;
}

and also as array members.

Futhark already supports primitive types laid out consecutively in memory. The purpose of the C ABI record would be to have a representation that lets it get packed into memory by the caller. Then a value like foo: [N](c_abi(u8, u16)) would be created from an array directly:

struct futhark_i32_1d *futhark_new_c_abi_tuple_1d(struct futhark_context *ctx, const c_abi_tuple *data, int64_t dim0);

As a result, the caller has to do nearly no extra work.

Details

This is faster than the old way only if element accesses on the futhark side are as fast as for normal records/tuples.

While the operations this gets rid of are all O(elements), this should have impact where bandwidth to the processing unit is low compared to the computations done by each entry call.

This scheme can be nested, but it makes sense to start with only primitive types at first.

The idea came to me when learning that small arrays are inefficient. I wanted an array of records, but records aren't easy to pass to Futhark. That pushed me to an array of small arrays, which turn out to be inefficient. Now I'm packing data into u32, which is both limited and annoying. It's still less annoying than splitting the data into multiple arrays.

The text was updated successfully, but these errors were encountered:

athas · 2024-01-11T09:44:48Z

Having a c_abi type in Futhark itself is not viable. It would significantly complicate the compiler, and for some of Futhark's compilation targets it is meaningless to talk about "the" C ABI (it's not necessarily the same on the GPU as on the host code). Further, that representation would often be very slow. At some point the copy needs to be done.

Taking a step back, the real question is just how to make it more ergonomic to create arrays of tuples through the C API. It would not be particularly difficult to make a zip function available, and it would be very fast (zero copy in practice). You would still need to create the scalar arrays. This can be automated in bridges (but at the cost of copying twice), but maybe we can even make it possible for the user to provide a stride in the source array when producing fresh arrays. Then it would be possible to efficiently go from a C-style array-of-structs to multiple Futhark arrays of primitives, at the cost of only a single copy per component. But there is no way to get around that final copy.

FluxusMagna · 2024-01-12T13:07:27Z

I think for arrays of tuples/records, reusing the same format of new and project functions as for the scalar tuples/records themselves would work well. The bridge should take care of the rest.

Separate arrays for each component is a nice format for any language that has constructs like zip and unzip, as they become zero cost essentially. It's used for unboxed arrays in Haskell, and will also be employed in the tiny array library I'm making for Futhask, once the required API-functions are in place(the normal unboxed arrays are sadly not suitable due to how memory is allocated and managed).

The aim for Futhask is generate Haskell types that mirror the Futhark types and can be easily and efficiently converted to and from, while being easy to construct and manipulate in Haskell. While creating the abstractions necessary to maintain ergonomics with this type of representation might be difficult in some languages, I think it'd be a shame if every conversion involved this type of packing/unpacking transformation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

C ABI records #2082

C ABI records #2082

dcz-self commented Jan 11, 2024

athas commented Jan 11, 2024

FluxusMagna commented Jan 12, 2024

C ABI records #2082

C ABI records #2082

Comments

dcz-self commented Jan 11, 2024

Details

athas commented Jan 11, 2024

FluxusMagna commented Jan 12, 2024