Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1838 Use Python buffer protocol when converting Python objects to .NET arrays #2372

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

chickenservice
Copy link

What does this implement/fix? Explain your changes.

If a python object exposes the buffer protocol then copy the buffer directly to a .NET array. This works for (blittable) single- or multidimensional arrays. I added tests for numpy arrays, bytearrays and python arrays to check whether this change works across different buffer protocol implementations.

My implementation currently has the following caveats:

  • The Python object must implement the buffer protocol and be C-contiguous. This is only relevant for multidimensional arrays. I explicitly request a C-contiguous buffer because (multidimensional) .NET arrays are C-contiguous as well. This allows copying the whole chunk of memory at once. Supporting multidimensional Fortran-contiguous buffers would make the implementation more complicated though I have not thought too much about it.
  • PIL-style buffers that do not store the whole buffer contiguously but through pointers via suboffsets are not supported. This would also need special handling.
  • The .NET element type of the array must be blittable and not a struct. Structs can be blittable if all their members are blittable. I imagine this may make the implementation more complicated because as far as I know no there's no out-of-the-box way to check whether a type is blittable. I think this is better handled with custom Codecs?
  • Works for multidimensional arrays which ToArray previously didn't handle, so we weaken a precondition of the converter. However efficient copying of multidimensional arrays is highly desirable in my opinion.
  • I use Buffer.MemoryCopy which is marked as not CLS-compliant. If this is not acceptable I will of course find another way.
  • The implementation is done directly with the C-API. I can rewrite this with PyBuffer if you want.

Does this close any currently open issues?

1838 Use Python buffer protocol when converting Python objects to .NET arrays

Any other comments?

The test with bytearray to .NET byte array is currently failing for NET 6.0 due to the way the Py_Buffer struct is being marshalled. I opened a separate issue #2371 to address this.

Checklist

Check all those that are applicable and complete.

  • Make sure to include one or more tests for your change
  • If an enhancement PR, please create docs and at best an example
  • Ensure you have signed the .NET Foundation CLA
  • Add yourself to AUTHORS
  • Updated the CHANGELOG

If a python object is converted to a managed array and it exposes the buffer protocol the buffer can be copied directly to the managed array. For now this works only if the requested array element type is blittable.
Currently not handled by the buffer copying mechanism anyway
/// </summary>
private static bool ToArray(BorrowedReference value, Type obType, out object? result, bool setError)
{
Type elementType = obType.GetElementType();
result = null;

// structs can also be blittable but for the standard usecases this suffices
var isBlittable = elementType.IsPrimitive && elementType != typeof(char) && elementType != typeof(bool);
if (isBlittable && Runtime.PyObject_GetBuffer(value, out var view, (int)PyBUF.CONTIG_RO) == 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both directions need a check that source and target element types match.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be ok to match the typecode of the .NET type to the corresponding format string of the buffer, i.e. roughly like so:

var compatible = false;
var format = buf.format;
switch (tc)
{
    case TypeCode.Boolean:
        compatible = format == "?";
        break;
    case TypeCode.Byte:
        compatible = format == "b";
        break;
        //etc...
    default:
        break;

This would require #2371 to be fixed so I can actually retrieve the format field.

/// </summary>
private static bool ToArray(BorrowedReference value, Type obType, out object? result, bool setError)
{
Type elementType = obType.GetElementType();
result = null;

// structs can also be blittable but for the standard usecases this suffices
var isBlittable = elementType.IsPrimitive && elementType != typeof(char) && elementType != typeof(bool);
if (isBlittable && Runtime.PyObject_GetBuffer(value, out var view, (int)PyBUF.CONTIG_RO) == 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure it is safe to assume getting buffer has the same semantics as using iterator for an arbitrary type that supports GetBuffer.

Imho, this could be made the default behavior for converting a fixed set of standard types, but for the others it might be better to expose some kind of explicit call instead of conversion. E.g. pyObjectInst.ReadBuffer(destinationArray)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this caught by the primitive blitable check?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could move the whole thing to after the if (IterObject.IsNull()) check to make sure it's an iterable? Other than that as @filmor said I already allow only primitive blittable types as array element types.

Copy link
Member

@lostmsu lostmsu May 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This issue is not about element types, but about container types. Does Python guarantee, that GetBuffer()[0], GetBuffer()[1], etc is the same sequence as .iter[0], .iter[1], etc for every type that implements GetBuffer?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought that was the point of the buffer protocol :). I'll read up more on it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chickenservice the concern is about breaking changes, especially within Python.NET 3.x, more especially within Python.NET 3.0.x.

If there's a difference in result, we should delay the change of behavior to 4.x, and in 3.x only add new functions.

IMHO, we can only immediately apply the speedups for the types that we know have the same behavior.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lostmsu that makes sense. The tests cover bytearray, array.array and ndarray. From my experience these behave according to what you described and are the canonical implementations anyway. Any concerns? Are there other types you have in mind? Should we limit ndim == 1 for now?

Copy link
Member

@lostmsu lostmsu May 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the tests do not cover multidim or the behavior in multidim is different, we should limit ndim to 1.

But again, I strongly suggest to add an explicit new method to PyObject that does the right thing with buffers, and only works with buffer protocol.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I misunderstood your suggestion, to clarify:

For upcoming minor/patch release expose this as a dedicated method and remove it from ToArray completely as breaking changes cannot be ruled out completely.

For the next major release reevaluate whether this can/should be done implicitly in ToArray and accept breaking changes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chickenservice yep, that's exactly right.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants