You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I returned array data from my UDF, but I got an error saying that arrays must be 1-dimensional:
(ReadRange->MapBatches(f) pid=69903) Could not construct Arrow block from numpy array; encountered values of unsupported numpy type `17` in column named 'unsupported', which cannot be casted to an Arrow data type. Falling back to using pandas block type, which is slower and consumes more memory. For maximum performance, consider applying the following suggestions before ingesting into Ray Data in order to use native Arrow block types:
(ReadRange->MapBatches(f) pid=69903) - Expand out each key-value pair in the dict column into its own column
(ReadRange->MapBatches(f) pid=69903) - Replace `None` values with an Arrow supported data type
(ReadRange->MapBatches(f) pid=69903)
Running 0: 0%| | 0/20 [00:00<?, ?it/s]2024-05-09 17:23:05,898(ERROR streaming_executor_state.py:455 -- An exception was raised from a task of operator "ReadRange->MapBatches(f)". Dataset execution will now abort. To ignore this exception and continue, set DataContext.max_errored_blocks.
2024-05-09 17:23:05,916 ERROR exceptions.py:73 -- Exception occurred in Ray Data or Ray Core internal code. If you continue to see this error, please open an issue on the Ray project GitHub page with the full stack trace below: https://github.com/ray-project/ray/issues/new/choose
ray.data.exceptions.SystemException
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/Users/balaji/Documents/GitHub/ray/1.py", line 30, in <module>
ray.data.range(100).map_batches(f).materialize()
File "/Users/balaji/Documents/GitHub/ray/python/ray/data/dataset.py", line 4541, in materialize
copy._plan.execute(force_read=True)
File "/Users/balaji/Documents/GitHub/ray/python/ray/data/exceptions.py", line 86, in handle_trace
raise e.with_traceback(None) from SystemException()
ray.exceptions.RayTaskError(ValueError): ray::ReadRange->MapBatches(f)() (pid=69903, ip=127.0.0.1)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/balaji/Documents/GitHub/ray/python/ray/data/_internal/arrow_block.py", line 210, in numpy_to_block
col = ArrowTensorArray.from_numpy(col, col_name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/balaji/Documents/GitHub/ray/python/ray/air/util/tensor_extensions/arrow.py", line 376, in from_numpy
raise e
File "/Users/balaji/Documents/GitHub/ray/python/ray/air/util/tensor_extensions/arrow.py", line 336, in from_numpy
pa_dtype = pa.from_numpy_dtype(arr.dtype)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "pyarrow/types.pxi", line 5164, in pyarrow.lib.from_numpy_dtype
File "pyarrow/error.pxi", line 154, in pyarrow.lib.pyarrow_internal_check_status
File "pyarrow/error.pxi", line 91, in pyarrow.lib.check_status
pyarrow.lib.ArrowNotImplementedError: Unsupported numpy type 17
During handling of the above exception, another exception occurred:
ray::ReadRange->MapBatches(f)() (pid=69903, ip=127.0.0.1)
File "/Users/balaji/Documents/GitHub/ray/python/ray/data/_internal/execution/operators/map_operator.py", line 410, in _map_task
for b_out in map_transformer.apply_transform(iter(blocks), ctx):
File "/Users/balaji/Documents/GitHub/ray/python/ray/data/_internal/execution/operators/map_transformer.py", line 393, in __call__
add_fn(data)
File "/Users/balaji/Documents/GitHub/ray/python/ray/data/_internal/output_buffer.py", line 48, in add_batch
self._buffer.add_batch(batch)
File "/Users/balaji/Documents/GitHub/ray/python/ray/data/_internal/delegating_block_builder.py", line 38, in add_batch
block = BlockAccessor.batch_to_block(batch)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/balaji/Documents/GitHub/ray/python/ray/data/block.py", line 380, in batch_to_block
return pd.DataFrame(dict(batch))
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/balaji/anaconda3/envs/ray/lib/python3.11/site-packages/pandas/core/frame.py", line 664, in __init__
mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy, typ=manager)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/balaji/anaconda3/envs/ray/lib/python3.11/site-packages/pandas/core/internals/construction.py", line 493, in dict_to_mgr
return arrays_to_mgr(arrays, columns, index, dtype=dtype, typ=typ, consolidate=copy)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/balaji/anaconda3/envs/ray/lib/python3.11/site-packages/pandas/core/internals/construction.py", line 118, in arrays_to_mgr
index = _extract_index(arrays)
^^^^^^^^^^^^^^^^^^^^^^
File "/Users/balaji/anaconda3/envs/ray/lib/python3.11/site-packages/pandas/core/internals/construction.py", line 653, in _extract_index
raise ValueError("Per-column arrays must each be 1-dimensional")
ValueError: Per-column arrays must each be 1-dimensional
The text was updated successfully, but these errors were encountered:
bveeramani
added
bug
Something that is supposed to be working; but isn't
P0
Issue that must be fixed in short order
data
Ray Data-related issues
labels
May 10, 2024
@982945902 Ray Data has a custom extension type for multi-dimensional array data. We should automatically use the extension type, but we're not in this code path
Hi @bveeramani , thanks for the fix.
Does this fix #39559 too?
I had a unit test that checks the example there, and once I upgraded to 2.23, it no longer failed.
What happened + What you expected to happen
I returned array data from my UDF, but I got an error saying that arrays must be 1-dimensional:
Versions / Dependencies
6a266db
Reproduction script
Issue Severity
High: It blocks me from completing my task.
The text was updated successfully, but these errors were encountered: