Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: NotImplementedError: mod not implemented in pandas 2.2.2 with int64[pyarrow] #58723

Closed
2 of 3 tasks
seanslma opened this issue May 14, 2024 · 2 comments
Closed
2 of 3 tasks
Labels
Bug Needs Triage Issue that has not been reviewed by a pandas team member

Comments

@seanslma
Copy link

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd
d = pd.DataFrame({'x': [1, 2]}, dtype='int64[pyarrow]')
d['x'].mod(2)

Issue Description

This is basically the same issue here: #56693

As I am not able to reopen that issue, I created this new one.

NotImplementedError                       Traceback (most recent call last)
Cell In[6], line 4
      2 print(f'pandas version: {pd.__version__}')
      3 d = pd.DataFrame({'x': [1, 2]}, dtype='int64[pyarrow]')
----> 4 d['x'].mod(2)
      6 # def pa_mod(ds, divisor):
      7 #   """
      8 #   Calculates remainder after division by any positive divisor for a pandas DataFrame column with int64[pyarrow] dtype.
   (...)
     34 # dd = pa_mod(d['x'], 3)
     35 # dd

File ~\conda-envs\test-env\lib\site-packages\pandas\core\series.py:6381, in Series.mod(self, other, level, fill_value, axis)
   6379 @Appender(ops.make_flex_doc("mod", "series"))
   6380 def mod(self, other, level=None, fill_value=None, axis: Axis = 0) -> Series:
-> 6381     return self._flex_method(
   6382         other, operator.mod, level=level, fill_value=fill_value, axis=axis
   6383     )

File ~\conda-envs\test-env\lib\site-packages\pandas\core\series.py:6260, in Series._flex_method(self, other, op, level, fill_value, axis)
   6257         return op(self, fill_value)
   6258     self = self.fillna(fill_value)
-> 6260 return op(self, other)

File ~\conda-envs\test-env\lib\site-packages\pandas\core\ops\common.py:76, in _unpack_zerodim_and_defer.<locals>.new_method(self, other)
     72             return NotImplemented
     74 other = item_from_zerodim(other)
---> 76 return method(self, other)

File ~\conda-envs\test-env\lib\site-packages\pandas\core\arraylike.py:226, in OpsMixin.__mod__(self, other)
    224 @unpack_zerodim_and_defer("__mod__")
    225 def __mod__(self, other):
--> 226     return self._arith_method(other, operator.mod)

File ~\conda-envs\test-env\lib\site-packages\pandas\core\series.py:6135, in Series._arith_method(self, other, op)
   6133 def _arith_method(self, other, op):
   6134     self, other = self._align_for_op(other)
-> 6135     return base.IndexOpsMixin._arith_method(self, other, op)

File ~\conda-envs\test-env\lib\site-packages\pandas\core\base.py:1382, in IndexOpsMixin._arith_method(self, other, op)
   1379     rvalues = np.arange(rvalues.start, rvalues.stop, rvalues.step)
   1381 with np.errstate(all="ignore"):
-> 1382     result = ops.arithmetic_op(lvalues, rvalues, op)
   1384 return self._construct_result(result, name=res_name)

File ~\conda-envs\test-env\lib\site-packages\pandas\core\ops\array_ops.py:273, in arithmetic_op(left, right, op)
    260 # NB: We assume that extract_array and ensure_wrapped_if_datetimelike
    261 #  have already been called on `left` and `right`,
    262 #  and `maybe_prepare_scalar_for_op` has already been called on `right`
    263 # We need to special-case datetime64/timedelta64 dtypes (e.g. because numpy
    264 # casts integer dtypes to timedelta64 when operating with timedelta64 - GH#22390)
    266 if (
    267     should_extension_dispatch(left, right)
    268     or isinstance(right, (Timedelta, BaseOffset, Timestamp))
   (...)
    271     # Timedelta/Timestamp and other custom scalars are included in the check
    272     # because numexpr will fail on it, see GH#31457
--> 273     res_values = op(left, right)
    274 else:
    275     # TODO we should handle EAs consistently and move this check before the if/else
    276     # (https://github.com/pandas-dev/pandas/issues/41165)
    277     # error: Argument 2 to "_bool_arith_check" has incompatible type
    278     # "Union[ExtensionArray, ndarray[Any, Any]]"; expected "ndarray[Any, Any]"
    279     _bool_arith_check(op, left, right)  # type: ignore[arg-type]

File ~\conda-envs\test-env\lib\site-packages\pandas\core\ops\common.py:76, in _unpack_zerodim_and_defer.<locals>.new_method(self, other)
     72             return NotImplemented
     74 other = item_from_zerodim(other)
---> 76 return method(self, other)

File ~\conda-envs\test-env\lib\site-packages\pandas\core\arraylike.py:226, in OpsMixin.__mod__(self, other)
    224 @unpack_zerodim_and_defer("__mod__")
    225 def __mod__(self, other):
--> 226     return self._arith_method(other, operator.mod)

File ~\conda-envs\test-env\lib\site-packages\pandas\core\arrays\arrow\array.py:787, in ArrowExtensionArray._arith_method(self, other, op)
    786 def _arith_method(self, other, op):
--> 787     return self._evaluate_op_method(other, op, ARROW_ARITHMETIC_FUNCS)

File ~\conda-envs\test-env\lib\site-packages\pandas\core\arrays\arrow\array.py:773, in ArrowExtensionArray._evaluate_op_method(self, other, op, arrow_funcs)
    771 pc_func = arrow_funcs[op.__name__]
    772 if pc_func is NotImplemented:
--> 773     raise NotImplementedError(f"{op.__name__} not implemented.")
    775 result = pc_func(self._pa_array, other)
    776 return type(self)(result)

NotImplementedError: mod not implemented.

Expected Behavior

As this bug has been fixed based on this issue #56693, pandas 2.2.2 should work for the provided example.

Installed Versions

INSTALLED VERSIONS

commit : d9cdd2e
python : 3.9.18.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.19044
machine : AMD64
processor : Intel64 Family 6 Model 58 Stepping 0, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : English_Australia.1252

pandas : 2.2.2
numpy : 1.26.0
pytz : 2023.3.post1
dateutil : 2.8.2
setuptools : 65.6.3
pip : 23.3.1
Cython : 3.0.5
pytest : 7.4.3
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : 3.1.9
lxml.etree : 4.9.3
html5lib : None
pymysql : None
psycopg2 : 2.9.9
jinja2 : None
IPython : 8.17.2
pandas_datareader : None
adbc-driver-postgresql: None
adbc-driver-sqlite : None
bs4 : 4.12.2
bottleneck : 1.3.7
dataframe-api-compat : None
fastparquet : None
fsspec : 2023.10.0
gcsfs : None
matplotlib : 3.8.4
numba : 0.58.1
numexpr : 2.8.7
odfpy : None
openpyxl : 3.1.2
pandas_gbq : None
pyarrow : 16.0.0
pyreadstat : None
python-calamine : None
pyxlsb : None
s3fs : None
scipy : 1.11.3
sqlalchemy : 2.0.25
tables : None
tabulate : None
xarray : 2023.10.1
xlrd : None
zstandard : 0.19.0
tzdata : 2023.3
qtpy : None
pyqt5 : None

@seanslma seanslma added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels May 14, 2024
@seanslma seanslma changed the title BUG: NotImplementedError: mod not implemented in pandas 2.2.2 BUG: NotImplementedError: mod not implemented in pandas 2.2.2 with int64[pyarrow] May 14, 2024
@mroeschke
Copy link
Member

Thanks for the report. But this is an upstream issue as pandas would need an arrow compute kernel to call on mod apache/arrow#28497

Going to close as an upstream issue

@seanslma
Copy link
Author

seanslma commented May 15, 2024

Thanks. Seems the upstream issue has been there for a few years now.

Here I provide a workaround that might be helpful to anyone who needs the mod function.

def pa_mod(val, divisor):
    if divisor <= 0:
        raise ValueError('Divisor must be a positive integer')
    if divisor & (divisor - 1) == 0:
       remainder = val & (divisor - 1)
    else:
       quotient = val // divisor
       remainder = val - (quotient * divisor)
    return remainder

Test code:

d = pd.DataFrame({'x': np.arange(-10, 10)}, dtype='int64[pyarrow]')
d = d.assign(y=pa_mod(d['x'], 3))
print(d)

d = pd.DataFrame({'x': np.arange(-10, 10)-0.5}, dtype='float64[pyarrow]')
d = d.assign(y=pa_mod(d['x'], 3))
print(d)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Needs Triage Issue that has not been reviewed by a pandas team member
Projects
None yet
Development

No branches or pull requests

2 participants