Releases: modin-project/modin
Modin 0.30.0
This release introduces support for DataFrame API standard, a distributed implementation for right merge/join,
more efficient implementation of internal operators, which gives a performance boost to almost all distributed Modin functions,
improved compatibility with pandas on pyarrow backend, type hints for pandas API to improve UX.
Key Features and Updates Since 0.29.0
- Stability and Bugfixes
- FIX-#0000: Fix badge in README.md (#7213)
- FIX-#0000: Make merge tests more stable by sorting results (#7266)
- FIX-#6967: Remove
read_pickle_distributed
/to_pickle_distributed
functions as deprecated (#7258) - FIX-#7093: Make sure
idxmax
andidxmin
can work with string columns (#7193) - FIX-#7102: Remove
enable_api_only
mode in modin logging (#7194) - FIX-#7103: Move lower-level functionality logging to debug (#7184)
- FIX-#7143: Constructing a DataFrame from a Modin Series with tuple name should produce MultiIndex columns (#7214)
- FIX-#7185: Add extra check for some config classes (#7189)
- FIX-#7201: Update docs on how to enable Modin logs for high-level API and low-level API (#7209)
- FIX-#7206: Make sure
df.melt
handle duplicatevalue_vars
correctly (#7208) - FIX-#7219: Pin
dataframe-api-compat>=0.2.7
(#7220) - FIX-#7221: Don't use
use_legacy_dataset=False
forParquetDataset
(#7222) - FIX-#7224: Importing
modin.pandas.api.extensions
overwrites re-export ofpandas.api
submodules (#7225) - FIX-#7233: Display property name in
default_to_pandas
error messages (#7269) - FIX-#7234: Deprecate HDK engine (#7235)
- FIX-#7238: Fix docstring inheritance for
cached_property
and use it (#7239) - FIX-#7240: Allow
doc_checker.py
works withfunctools.cached_property
(#7241) - FIX-#7246: Pin
pyarrow>=10.0.1
aspandas 2.2.*
does (#7247) - FIX-#7248: Make sure
_validate_dtypes_sum_prod_mean
works correctly with datetime types (#7237) - FIX-#7250: Revert "PERF-#6666: Avoid internal reset_index for left merge" (#7251)
- Performance enhancements
- Refactor Codebase
- Update testing suite
- Documentation improvements
- New Features
- FEAT-#5394: Reduce amount of remote calls for
Map
operator (#7136) - FEAT-#5394: Reduce amount of remote calls for
TreeReduce
andGroupByReduce
operators (#7245) - FEAT-#6492: Add
from_map
feature to create dataframe (#7215) - FEAT-#6498: Make
Fold
operator more flexible (#7257) - FEAT-#6808: Implement
__arrow_array__
for Series (#7200) - FEAT-#6890: Modin implementation of DataFrame API standard (#7216)
- FEAT-#7139: Use
ray-core
instead ofray-default
(#6955) - FEAT-#7187: Change
master
branch tomain
(#7188) - FEAT-#7202: Use custom resources for Ray (#7205)
- FEAT-#7203: Make sure Modin works correctly with pandas, which uses pyarrow as a backend (#7204)
- FEAT-#7207: Add the ability to assign a df to a columns selection without d2p (#7210)
- FEAT-#7252: Add type hints for
base.py
(#7253) - FEAT-#7254: Support right
merge
/join
(#7226)
- FEAT-#5394: Reduce amount of remote calls for
Contributors
@Retribution98
@YarShev
@anmyachev
@arunjose696
@noloerino
@sfc-gh-jkew
Modin 0.29.0
This release introduces modin.pandas.testing
and modin.pandas.arrays
modules, faster implementation (range-partitioning) for
pivot_table
, unique
, drop_duplicates
, nunique
, df.resample
functions, new functions to interact with Dask: to/from_dask
,
distributed implementation for Series.case_when
, optimization for astype
function with scalar dtype.
Key Features and Updates Since 0.28.0
- Stability and Bugfixes
- FIX-#6227: Make sure
Series.unique()
with pyarrow dtype returnsArrowExtensionArray
(#7042) - FIX-#6793: Use
pandas_dtype
instead ofnp.dtype
for some more places in Modin code (#6794) - FIX-#7039: Pass scalar dtype as is to
astype
query compiler (#7152) - FIX-#7051: Update exception message for
astype
function (#7052) - FIX-#7054: Update exception message for
shift
function (#7055) - FIX-#7056: Update exception message for
iloc/loc
functions (#7057) - FIX-#7058: Update exception message for
insert
function (#7059) - FIX-#7060: Fix
pivot
when index or columns are of Index type (#7061) - FIX-#7062: Update exception message for
aggregate
function (#7063) - FIX-#7072: Replace
MaterializationHook
with the materialized object on serialization (#7075) - FIX-#7088: Make sure
rank
raisesNo axis named None...
exception (#7089) - FIX-#7115: Exclude Ray 2.10.0 from deps installation (#7116)
- FIX-#7135: Fix appending a new row (#7172)
- FIX-#7153: Fix
Series.corr
withmethod != pearson
(#7158) - FIX-#7157: Make sure
quantile
function works withnumeric_only=True
(#7160) - FIX-#7170: Don't use
MinPartitionSize
configuration variable in remote context (#7177)
- FIX-#6227: Make sure
- Performance enhancements
- PERF-#5296: Partition parquet file if it has too few row groups (#7016)
- PERF-#7068: Provide
shape_hint="column"
for some more operations with Series (#7069) - PERF-#7123: Preserve
shape_hint
for dropna (#7124) - PERF-#7130: Preserve partition lengths in
apply_full_axis
withkeep_partitioning=True
(#7131) - PERF-#7132: Preserve partition lengths in
apply_full_axis
withkeep_partitioning=False
(#7133) - PERF-#7150: Reduce peak memory consumption (#7149)
- Refactor Codebase
- Update testing suite
- TEST-#3622: Centralize tests in Modin (#7137)
- TEST-#6016: Make sure
eval_general
doesn't expect exceptions by default (#6954) - TEST-#7064: Explicitly check for exceptions in
test_groupby.py
(#7065) - TEST-#7066: Explicitly check for exceptions in
test_io.py
(#7067) - TEST-#7073: Explicitly check for exceptions in
test_default.py
(#7074) - TEST-#7076: Explicitly check for exceptions in
test_map_metadata.py
(#7077) - TEST-#7082: Explicitly check for exceptions in
test_series.py
(#7083) - TEST-#7084: Explicitly check for exceptions in
test_indexing.py
(#7085) - TEST-#7086: Explicitly check for exceptions in
test_reduce.py
(#7087) - TEST-#7094: Rename
raising_exceptions
argument ofeval_general
testing function (#7095) - TEST-#7125: Explicitly install modin in CI tests (#7126)
- TEST-#7165: Add codecov token to fix CI on master (#7175)
- TEST-#7166: Fix HDF tests in CI (#7167)
- TEST-#7173: Update github actions (#7168)
- Documentation improvements
- New Features
- FEAT-#4527: Add Modin logging to
AxisPartition
andBlockPartition
classes (#7079) - FEAT-#6783: Implement
modin.pandas.testing
module (#7045) - FEAT-#6929: Implement
Series.case_when
in a distributed way (#6972) - FEAT-#7004: Use generators when returning from
_deploy_ray_func
remote function. (#7005) - FEAT-#7021: Implement
to/from_dask
functions (#7022) - FEAT-#7047: Add range-partitioning implementation for
.pivot_table()
(#7048) - FEAT-#7070: Add
modin.pandas.arrays
module (#7071) - FEAT-#7078: Add
modin_layer
names to classes that inheritClassLogger
(#7099) - FEAT-#7090: Add range-partitioning implementation for
.unique()
and.drop_duplicates()
(#7091) - FEAT-#7100: Add range-partitioning impl for
nunique()
(#7101) - FEAT-#7102: Deprecate
enable_api_only
mode in modin logging (#7114) - FEAT-#7111: Implemented
@remote_function
decorator with cache (#7112) - FEAT-#7117: Support building range-partitioning from an index level (#7120)
- FEAT-#7118: Add range-partitioning impl for
df.resample()
(#7140) - FEAT-#7128: Update minimal supported version of Ray up to 2.1.0 (#7129)
- FEAT-#7141: Add an ability to use config variables with a context manager (#7142)
- FEAT-#7146: Use
BaseQueryCompiler
,BasePandasDataset
,DataFrame
orSeries
type hints at a high level (#7147) - FEAT-#7156: Add type hints for
Series
(#7154) - FEAT-#7178: Add type hints for
DataFrame
(#7179) - FEAT-#7180: Add type hints for
modin.pandas.[functions]
(#7181)
- FEAT-#4527: Add Modin logging to
Contributors
@AndreyPavlenko
@Retribution98
@YarShev
@anmyachev
@arunjose696
@dchigarev
@sfc-gh-mvashishtha
Modin 0.28.2
This release reverts the pandas requirement from
2.2.1 to >=2.2,<2.3
Key Features and Updates Since 0.28.1
Contributors
Modin 0.28.1
This release pins pandas to 2.2.1. This pin will be removed in a subsequent release. Key Features and Updates Since 0.28.0 ------------------------------------- * New Features * FEAT-#7162: Pin pandas to 2.2.1 (87d147f) Contributors ------------ @sfc-gh-dpetersohn
Modin 0.28.0
This release introduces modin.pandas.api.extensions
module, faster implementations for merge
and
groupby.rolling
(by default) functions, and new functions to work with Ray Dataset: to/from_ray_dataset
.
It also includes some other new features, performance optimizations and bug fixes.
Key Features and Updates Since 0.27.0
- Stability and Bugfixes
- FIX-#6935: Fix
merge
when right operand is an empty dataframe (#6941) - FIX-#6936: Fix
read_parquet
when dataset is created withto_parquet
andindex=False
(#6937) - FIX-#6944: Apply
isort
formatting for scripts from tutorials (#6945) - FIX-#6946: Remove
needs: [lint-black-isort, ...]
(#6947) - FIX-#6948: Fix
groupby
when Modin dataframe has several column partitions (#6951) - FIX-#6952: Use
render_as_string
to get sqlalchemy engine url (#6953) - FIX-#6968: Align API with pandas (#6969)
- FIX-#6974: Always use actual pandas version in
test_all_urls_exist
(#6975) - FIX-#6982: Updating data in notebooks from yellow taxi to green taxi dataset (#6993)
- FIX-#6984: Ensure the results of inplace operations materialize (for tests) (#6985)
- FIX-#6935: Fix
- Performance enhancements
- Refactor Codebase
- REFACTOR-#6856: Rename
read_pickle_distributed/to_pickle_distributed
toread_pickle_glob/to_pickle_glob
(#6957) - REFACTOR-#6939: Make
modin.pandas.DataFrame._to_pandas
a public method (#6940) - REFACTOR-#6958: Remove
DataFrame.to_pickle_distributed
in favour ofDataFrame.modin.to_pickle_distributed
(#6959) - REFACTOR-#7002: Get more information about exceptions from
eval_general
utility (#7003) - REFACTOR-#7008: Remove
check_exception_type
argument ofeval_general
function (#7009) - REFACTOR-#7013: Move
to_pandas
andto_ray_dataset
into modin namespace (#7014) - REFACTOR-#7017: Align
to_hdf
andhist
signatures to pandas (#7018)
- REFACTOR-#6856: Rename
- Update testing suite
- Documentation improvements
- New Features
- FEAT-#3044: Create Extensions Module in Modin (#6961)
- FEAT-#4622: Unify data type of
log_level
in logging module (#6992) - FEAT-#6913: Support sqlalchemy connectables in
read_sql
by getting connection url (#6956) - FEAT-#6934: Support
include_groups=False
parameter ingroupby.apply()
(#6938) - FEAT-#6942: Enable range-partitioning impl for
groupby().rolling()
by default (#6943) - FEAT-#6965: Implement
.merge()
using range-partitioning implementation (#6966) - FEAT-#6970: Implement
to/from_ray_dataset
functions (#6971) - FEAT-#6983: Add Pluggable Documentation Module Support (#6986)
- FEAT-#7001: Do not force materialization in
MetaList.__getitem__()
(#7006)
Contributors
@AndreyPavlenko
@Retribution98
@YarShev
@anmyachev
@arunjose696
@dchigarev
@sfc-gh-dpetersohn
@tochigiv
Modin 0.27.0
This release updates pandas to 2.2, introduces lazy execution mode on Ray, new functions that support glob
syntax and speeds up several more groupby cases. It also includes some other new features, performance
optimizations and many bug fixes.
Key Features and Updates Since 0.26.0
- Stability and Bugfixes
- FIX-#2405: Make sure named aggregation work for Series objects (#6892)
- FIX-#5925: Put a sorting-hack into groupby tests to hide #6875 bug (#6896)
- FIX-#6830: Pass AWS related env vars to mpiexec (#6867)
- FIX-#6840: Call
tolist
function inDtypesDescriptor._merge_dtypes
(#6844) - FIX-#6855: Make sure
read_parquet
works with integer columns for pyarrow engine (#6874) - FIX-#6879: Convert the right DF to single partition before broadcasting in
query_compiler.merge
(#6880) - FIX-#6881: Make sure
astype
works correctly withint32
andfloat32
dtypes (#6884) - FIX-#6897: Preprocess kernel function that aligns columns in groupby (#6898)
- FIX-#6897: Revert unidist specific fix for groupby (#6902)
- FIX-#6899: Avoid sending lazy categorical proxies to workers (#6900)
- FIX-#6904: Align levels of partially known dtypes with MultiIndex labels (#6905)
- FIX-#6911: Remove unidist specific workaround in
.from_pandas()
(#6912) - FIX-#6916: Unpin
pydantic
dependency (#6917) - FIX-#6924: HDK: Use
JoinNode
instead ofMaskNode
for non-range row_position (#6926)
- Performance enhancements
- Refactor Codebase
- REFACTOR-#6293: Corrected
missmatch
tomismatch
inErrorMessage.missmatch_with_pandas
method (#6901) - REFACTOR-#6812: Remove
PyarrowOnRay
execution in favour of pyarrow-backed pandas dataframes (#6848) - REFACTOR-#6833: Remove
SocksProxy
,DoLogRpyc
,DoTraceRpyc
outdated classes (#6834) - REFACTOR-#6845: Fix import issues found by CodeQL (#6837)
- REFACTOR-#6852: Remove
OrderedDict
in favor of builtindict
(#6853) - REFACTOR-#6858: Rename
_get_dimensions
and change arguments (#6859) - REFACTOR-#6889: Define
__all__
inmodin.config.__init__.py
(#6886) - REFACTOR-#6903: Remove duplicated definitions of
create_test_series
(#6910) - REFACTOR-#6918: Docstring and type hints fixes (#6925)
- REFACTOR-#6293: Corrected
- Update testing suite
- TEST-#6708: Create test files using
tmp_path
fixture (#6709) - TEST-#6777: Make
to_csv
tests on Unidist more stable (fortest-all-unidist
CI job) (#6851) - TEST-#6830: Use local s3 server instead of public s3 buckets (#6863)
- TEST-#6846: Skip unstable Unidist
to_csv
tests (#6847) - TEST-#6868: Remove tests for
gs
remote protocol since we rely onfsspec
(#6882) - TEST-#6885: Switch to
black>=24.1.0
(#6887) - TEST-#6893: Added support for
pytest 8.0.0
(#6894) - TEST-#6920: Remove testing for Ray client (#6921)
- TEST-#6708: Create test files using
- Documentation improvements
- New Features
- FEAT-#3450: Implement
read_json_glob
andto_json_glob
(#6873) - FEAT-#5809: New implementation of the Ray lazy execution queue (#6731)
- FEAT-#5925: Enable grouping on categoricals with range-partitioning impl (#6862)
- FEAT-#6382: Execute bitwise NOT (~) operations on HDK (#6383)
- FEAT-#6398: Improved performance of list-like objects insertion into HDK DataFrames (#6412)
- FEAT-#6830: Remove public s3 bucket reference (#6829)
- FEAT-#6831: Implement
read_parquet_glob
andto_parquet_glob
(#6854) - FEAT-#6832: Implement
read_xml_glob
,to_xml_glob
(#6930) - FEAT-#6835: Do not put binary functions to the Ray storage multiple times (#6836)
- FEAT-#6838: Prefer lazy execution for binary operations with scalar (#6839)
- FEAT-#6841: Fixing ray anti pattern with
.length()
and.width()
being called in a loop (#6842) - FEAT-#6849: Removing
to_pandas
call inmerge
andjoin
functions (#6850) - FEAT-#6883: Support grouping on a Series with range-partitioning impl (#6888)
- FEAT-#6906: Update to pandas
2.2.*
(#6907) - FEAT-#6908: Remove the warning regarding engine initialization (#6909)
- FEAT-#6914: Add a config for setting a number of threads per Dask worker (#6915)
- FEAT-#6918: Add auto mode to the lazy execution. (#6919)
- FEAT-#3450: Implement
Contributors
@AndreyPavlenko
@YarShev
@anmyachev
@arunjose696
@dchigarev
@leshikus
@vedant
Modin 0.26.1
This release includes a fix for concat
function.
Key Features and Updates Since 0.26.0
- Stability and Bugfixes
- Update testing suite
- New Features
Contributors
Modin 0.26.0
This release introduces a new, faster implementation for groupby.apply
, as well as many performance fixes related to improving asynchronous execution, a new namespace for accessing experimental functions (for example, DataFrame.modin.to_pickle_distributed
), a fix for a long-standing problem with the use of Modin objects inside UDFs for apply
and many other fixes.
Note: to get Modin on MPI through unidist (as of unidist 0.5.0) fully working by installing with pip it is required to have a working MPI implementation installed beforehand.
Key Features and Updates Since 0.25.0
- Stability and Bugfixes
- FIX-#4355: Fix rename algebraic operator to avoid copying (#4356)
- FIX-#6594: Fix usage of Modin objects inside UDFs for
apply
(#6673) - FIX-#6664: Use
@lazy_metadata_decorator
forPandasDataFrame.finalize
(#6720) - FIX-#6684: Adapt to pandas 2.1.2 (#6685)
- FIX-#6687: Explicitly add users to CODEOWNERS (#6688)
- FIX-#6693: Revert creating an additional copy in
astype
op (#6692) - FIX-#6703: Don't use
set_index_name(None)
(#6698) - FIX-#6732: Fix inferring result dtypes for binary operations (#6737)
- FIX-#6745: Pin
unidist <= 0.4.1
(#6746) - FIX-#6752: Preserve dtypes cache on
.insert()
(#6757) - FIX-#6768: Make sure
to_numpy
use**kwargs
after #6704 (#6769) - FIX-#6771: Avoid
ValueError: assignment destination is read-only
forcumsum
(#6772) - FIX-#6773: Make sure
_to_pandas
return mutable pandas objects (#6775) - FIX-#6774: Modify conditions for
loc
to get similar behavior to pandas (#6798) - FIX-#6778: Read parquet files without file extensions using fastparquet (#6790)
- FIX-#6779: Pass only one indexer into
Series.__getitem__
(#6780) - FIX-#6781: Use
pandas.api.types.pandas_dtype
to convert to valid numpy and pandas only dtypes (#6788) - FIX-#6782: Filter pandas warnings when precomputing dtypes (#6811)
- FIX-#6786: Properly d2p for cross
DataFrame.join
(#6787) - FIX-#6791: Pass additional environment variables to MPI workers (#6792)
- FIX-#6799: Allow creating incomplete
ModinIndex
objects (#6800) - FIX-#6822: Do not propagate
NotImplementedError
to a user on aset_columns()
with dupl labels (#6823) - FIX-#6824: Invalidate
ModinIndex._lengths_id
on empty partitions filtering (#6825)
- Performance enhancements
- PERF-#4777: Don't use
copy=True
parameter forconcat
calls insideto_pandas
(#4778) - PERF-#4804: Preserve lengths/widths caches in
broadcast_apply_full_axis
(#6760) - PERF-#6666: Avoid internal
reset_index
for leftmerge
(#6665) - PERF-#6668: Use
copy=False
for internal usage ofset_axis
(#6667) - PERF-#6669: Avoid one extra
copy()
call forSeries.reset_index
(#6670) - PERF-#6671: Don't iterate over the result of the
Series.tolist
function (#6672) - PERF-#6690: Use
sync_labels=False
forrank
function (#6689) - PERF-#6694: Use
lazy_map_partitions()
for dtypes conversion (#6695) - PERF-#6696: Use cached dtypes in fillna when possible. (#6697)
- PERF-#6701: Use
get_axis
internal function instead ofaxes
property (#6700) - PERF-#6702: Don't materialize axes when calling
to_numpy
(#6699) - PERF-#6710: Don't materialize index in
_groupby_shuffle
internal function (#6707) - PERF-#6712: Copy
_shape_hint
inquery_complier.copy
function (#6713) - PERF-#6714: Assign
qc._shape_hint = column
incolumnarize
function (#6715) - PERF-#6716: Avoid materializing axes in
_filter_empties
(#6717) - PERF-#6718: Use
_get_axis_lengths
function instead of_axes_lengths
property (#6719) - PERF-#6721: Use
keep_partitioning=True
, forduplicated
implementation (#6722) - PERF-#6723: Use
_shape_hint = "column"
inDataFrame.squeeze
(#6724) - PERF-#6727: Remove remaining
result.name = None
in groupby code (#6726) - PERF-#6728: In the case of narrow dataframes, it is cheaper to convert partitions to numpy in the main process. (#6704)
- PERF-#6747: Preserve columns/dtypes cache when merging on a single index level (#6748)
- PERF-#6749: Preserve partial dtype for the result of
reset_index()
(#6751) - PERF-#6753: Preserve dtypes cache on
.__setitem__()
(#6758) - PERF-#6754: Merge partial dtype caches on
.concat(axis=0)
(#6759) - PERF-#6756: Don't materialize index when sorting (#6755)
- PERF-#6762: Carry dtypes information in lazy indices (#6763)
- PERF-#4777: Don't use
- Refactor Codebase
- REFACTOR-#0000: Cleanup one todo and flake8 issues in modin/utils.py (#6826)
- REFACTOR-#6739: Use
execution_wrapper
instead of directly addressingDaskWrapper
(#6740) - REFACTOR-#6805: Move all IO functions to
modin.pandas.io
module (#6806) - REFACTOR-#6807: Rename experimental groupby and experimental numpy variables (#6809)
- REFACTOR-#6815: Move experimental parsers into
modin.experimental
folder (#6813) - REFACTOR-#6818: Don't implicitly enable experimental mode (#6817)
- Update testing suite
- Documentation improvements
- New Features
- FEAT-#5836: Introduce 'partial' dtypes cache (#6663)
- FEAT-#6735: Make Modin on MPI through unidist component more obvious (#6736)
- FEAT-#6767: Provide the ability to use experimental functionality when experimental mode is not enabled globally via an environment variable (#6764)
- FEAT-#6784: Add d2p implementations for
DataFrame.__rdivmod__/__divmod__
(#6785) - FEAT-#6801: Add
modin.pandas.error
module (#6802) - FEAT-#6803: Enable range-partitioning impl for
groupby.apply()
by default (#6804) - FEAT-#6820: Make sure IO functions works with path-like filenames (#6821)
Contributors
@AndreyPavlenko
@JignyasAnand
@RehanSD
@YarShev
@anmyachev
@devin-petersohn
@dchigarev
@mvashishtha
@seydar
Modin 0.24.1.post0
Hotfix for Unidist.
Key Features and Updates Since 0.24.1
- Stability and Bugfixes
Note: broken pip wheel, use https://github.com/modin-project/modin/releases/tag/0.24.1.post1 instead
Contributors
Modin 0.25.1
Hotfix for Unidist.
Key Features and Updates Since 0.25.0
- Stability and Bugfixes