BUG: using docstring extensions causes several incorrect behaviors because DataFrame, Series, and BasePandasDataset cannot recognize each other #7138
Labels
bug 🦗
Something isn't working
Enable plugin
Fixes needed to enable external plugins
P3
Very minor bugs, or features we can hopefully add some day.
Modin version checks
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest released version of Modin.
I have confirmed this bug exists on the main branch of Modin. (In order to do this you can follow this guide.)
Reproducible Example
Issue Description
The reproducible example is a modified version of
modin/config/test/test_envvars.py::test_doc_module
at modin commit a616e4c.The cause is that we redefine
BasePandasDataset
,Series
, andDataFrame
twice each:BasePandasDataset
, thenSeries
, thenDataFrame
when we first loadmodin.pandas
BasePandasDataset
when we doimportlib.reload(pd.base)
here because we're adding the doc module.DataFrame
when we doimportlib.reload(pd.dataframe)
here because we're adding the doc module.Series
when we doimportlib.reload(pd.series)
here because we're adding the doc module.When we execute
all
,BasePandasDataset
here,BasePandasDataset
is the secondBasePandasDataset
class, butresult
is an instance of the firstBasePandasDataset
class because dataframe importedSeries
before theseries
module was reloaded. Soresult
should be an instance ofBasePandasDataset
, but it's not, so we skip the extra reduction and end up with aSeries
result instead of a boolean.There's no good order to do the reloads in because all three classes need to reference each other. We could do
import modin.pandas.base as modin_base
and always dobase.BasePandasDataset
instead offrom modin.pandas.base import BsaePandasDataset
, but that's hadr to read.Proposed solution: instead of doing
importlib.reload
, we should update docstrings in place when we doDocModule.put()
.Expected Behavior
all(axis=None)
should always return a boolean. In general,BasePandasDataset
,Series
, andDataFrame
should all be able to reference each other.Error Logs
Installed Versions
INSTALLED VERSIONS
commit : a616e4c
python : 3.9.18.final.0
python-bits : 64
OS : Darwin
OS-release : 23.4.0
Version : Darwin Kernel Version 23.4.0: Wed Feb 21 21:45:49 PST 2024; root:xnu-10063.101.15~2/RELEASE_ARM64_T6020
machine : arm64
processor : arm
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
Modin dependencies
modin : 0.28.0+35.ga616e4cc.dirty
ray : 2.8.0
dask : 2024.3.1
distributed : 2024.3.1
hdk : None
pandas dependencies
pandas : 2.2.1
numpy : 1.26.1
pytz : 2023.3.post1
dateutil : 2.8.2
setuptools : 68.0.0
pip : 23.3
Cython : None
pytest : 8.1.1
hypothesis : None
sphinx : 7.2.6
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 5.1.0
html5lib : None
pymysql : None
psycopg2 : 2.9.9
jinja2 : 3.1.2
IPython : 8.17.2
pandas_datareader : None
adbc-driver-postgresql: None
adbc-driver-sqlite : None
bs4 : 4.12.3
bottleneck : None
dataframe-api-compat : None
fastparquet : 2024.2.0
fsspec : 2024.3.1
gcsfs : None
matplotlib : 3.8.1
numba : None
numexpr : 2.8.4
odfpy : None
openpyxl : 3.1.2
pandas_gbq : 0.22.0
pyarrow : 14.0.1
pyreadstat : None
python-calamine : None
pyxlsb : None
s3fs : 2024.3.1
scipy : 1.11.3
sqlalchemy : 2.0.29
tables : 3.9.2
tabulate : None
xarray : 2024.2.0
xlrd : 2.0.1
zstandard : None
tzdata : 2023.3
qtpy : None
pyqt5 : None
The text was updated successfully, but these errors were encountered: