Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Properties that default to pandas have unclear error messages #7233

Closed
noloerino opened this issue Apr 30, 2024 · 0 comments · Fixed by #7269
Closed

Properties that default to pandas have unclear error messages #7233

noloerino opened this issue Apr 30, 2024 · 0 comments · Fixed by #7269
Labels
Enable plugin Fixes needed to enable external plugins P2 Minor bugs or low-priority feature requests

Comments

@noloerino
Copy link
Collaborator

When a DataFrame/Series property defaults to pandas by registering an operator in modin.core.dataframe.algebra.default2pandas, the raised warning message does not properly render the name of the property. For example, commenting out the PandasQueryCompiler assignment of dt_date and instead using the BaseQueryCompiler implementation (which uses DateTimeDefault) yields the following:

>>> import modin.pandas as pd
>>> pd.Series([pd.Timestamp("2020-01-01 00:01:01")]).dt.date
UserWarning: Distributing <class 'list'> object. This may take some time.
UserWarning: <function Series.<property object at 0x11fcdf790>> is not currently supported by PandasOnRay, defaulting to pandas implementation.
Please refer to https://modin.readthedocs.io/en/stable/supported_apis/defaulting_to_pandas.html for explanation.
0    2020-01-01
dtype: object

This occurs because the method is registered with DateTimeDefault.register(pandas.Series.dt.date), and since dt.date is a property, it has no __name__ field. To resolve this, the below line of DefaultMethod.register should be changed to access func.fget.__name__ if func is a Python property object.

fn_name = getattr(func, "__name__", str(func)) if fn_name is None else fn_name

As far as I can tell, this does not affect any of the first-class Modin backends. Properties like DataFrame.attrs use BasePandasDataset._default_to_pandas, which explicitly requires a name to be passed in. dt/str accessor properties use StrDefault and DatetimeDefault in BaseQueryCompiler, but PandasQueryCompiler uses the Map operator to avoid this default. However, downstream libraries that implement a custom subclass of BaseQueryCompiler are affected.

@noloerino noloerino added P2 Minor bugs or low-priority feature requests Enable plugin Fixes needed to enable external plugins labels Apr 30, 2024
sfc-gh-joshi added a commit to snowflakedb/snowpark-python that referenced this issue May 1, 2024
#1454)

Please answer these questions before submitting your pull requests.
Thanks!

1. What GitHub issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   Fixes SNOW-1347394

2. Fill out the following pre-review checklist:

- [ ] I am adding a new automated test(s) to verify correctness of my
new code
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency

3. Please describe how your code solves the related issue.

This PR removes our vendored copy of the `BaseQueryCompiler` class,
inheriting the class from upstream Modin instead. Similarly, it removes
all the operator registration classes defined in
`snowflake.snowpark.modin.core.dataframe.algebra.default2pandas`, with
one exception. Upstream Modin does not properly render the names of
`property` objects (modin-project/modin#7233),
so we should override `DataFrameDefault.register` to fix this until this
issue is fixed upstream.

This PR incidentally removes `Series.dt.week` + `Series.dt.weekofyear`,
which were already removed in pandas 2.0.

---------

Co-authored-by: Naren Krishna <naren.krishna@snowflake.com>
noloerino added a commit to noloerino/modin that referenced this issue May 14, 2024
…ror messages

Signed-off-by: Jonathan Shi <jhshi07@gmail.com>
noloerino added a commit to noloerino/modin that referenced this issue May 14, 2024
…ror messages

Signed-off-by: Jonathan Shi <jhshi07@gmail.com>
YarShev pushed a commit that referenced this issue May 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enable plugin Fixes needed to enable external plugins P2 Minor bugs or low-priority feature requests
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant