Skip to content

Commit

Permalink
Merge branch 'main' into fix/type_coercion_for_unobserved_categories
Browse files Browse the repository at this point in the history
  • Loading branch information
undermyumbrella1 committed Apr 16, 2024
2 parents b3520d1 + 888b6bc commit be71a4d
Show file tree
Hide file tree
Showing 222 changed files with 3,117 additions and 2,376 deletions.
4 changes: 0 additions & 4 deletions .circleci/config.yml
Expand Up @@ -72,10 +72,6 @@ jobs:
no_output_timeout: 30m # Sometimes the tests won't generate any output, make sure the job doesn't get killed by that
command: |
pip3 install cibuildwheel==2.15.0
# When this is a nightly wheel build, allow picking up NumPy 2.0 dev wheels:
if [[ "$IS_SCHEDULE_DISPATCH" == "true" || "$IS_PUSH" != 'true' ]]; then
export CIBW_ENVIRONMENT="PIP_EXTRA_INDEX_URL=https://pypi.anaconda.org/scientific-python-nightly-wheels/simple"
fi
cibuildwheel --prerelease-pythons --output-dir wheelhouse
environment:
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/code-checks.yml
Expand Up @@ -85,7 +85,7 @@ jobs:
echo "PYTHONPATH=$PYTHONPATH" >> $GITHUB_ENV
if: ${{ steps.build.outcome == 'success' && always() }}

- name: Typing + pylint
- name: Typing
uses: pre-commit/action@v3.0.1
with:
extra_args: --verbose --hook-stage manual --all-files
Expand Down
15 changes: 1 addition & 14 deletions .github/workflows/wheels.yml
Expand Up @@ -139,27 +139,14 @@ jobs:
shell: bash -el {0}
run: echo "sdist_name=$(cd ./dist && ls -d */)" >> "$GITHUB_ENV"

- name: Build normal wheels
if: ${{ (env.IS_SCHEDULE_DISPATCH != 'true' || env.IS_PUSH == 'true') }}
- name: Build wheels
uses: pypa/cibuildwheel@v2.17.0
with:
package-dir: ./dist/${{ startsWith(matrix.buildplat[1], 'macosx') && env.sdist_name || needs.build_sdist.outputs.sdist_file }}
env:
CIBW_PRERELEASE_PYTHONS: True
CIBW_BUILD: ${{ matrix.python[0] }}-${{ matrix.buildplat[1] }}

- name: Build nightly wheels (with NumPy pre-release)
if: ${{ (env.IS_SCHEDULE_DISPATCH == 'true' && env.IS_PUSH != 'true') }}
uses: pypa/cibuildwheel@v2.17.0
with:
package-dir: ./dist/${{ startsWith(matrix.buildplat[1], 'macosx') && env.sdist_name || needs.build_sdist.outputs.sdist_file }}
env:
# The nightly wheels should be build witht he NumPy 2.0 pre-releases
# which requires the additional URL.
CIBW_ENVIRONMENT: PIP_EXTRA_INDEX_URL=https://pypi.anaconda.org/scientific-python-nightly-wheels/simple
CIBW_PRERELEASE_PYTHONS: True
CIBW_BUILD: ${{ matrix.python[0] }}-${{ matrix.buildplat[1] }}

- name: Set up Python
uses: mamba-org/setup-micromamba@v1
with:
Expand Down
37 changes: 6 additions & 31 deletions .pre-commit-config.yaml
Expand Up @@ -16,10 +16,10 @@ ci:
autofix_prs: false
autoupdate_schedule: monthly
# manual stage hooks
skip: [pylint, pyright, mypy]
skip: [pyright, mypy]
repos:
- repo: https://github.com/astral-sh/ruff-pre-commit
rev: v0.3.1
rev: v0.3.4
hooks:
- id: ruff
args: [--exit-non-zero-on-fix]
Expand All @@ -30,16 +30,10 @@ repos:
files: ^pandas
exclude: ^pandas/tests
args: [--select, "ANN001,ANN2", --fix-only, --exit-non-zero-on-fix]
- id: ruff
name: ruff-use-pd_array-in-core
alias: ruff-use-pd_array-in-core
files: ^pandas/core/
exclude: ^pandas/core/api\.py$
args: [--select, "ICN001", --exit-non-zero-on-fix]
- id: ruff-format
exclude: ^scripts
- repo: https://github.com/jendrikseipp/vulture
rev: 'v2.10'
rev: 'v2.11'
hooks:
- id: vulture
entry: python scripts/run_vulture.py
Expand Down Expand Up @@ -73,31 +67,12 @@ repos:
- id: fix-encoding-pragma
args: [--remove]
- id: trailing-whitespace
- repo: https://github.com/pylint-dev/pylint
rev: v3.0.1
hooks:
- id: pylint
stages: [manual]
args: [--load-plugins=pylint.extensions.redefined_loop_name, --fail-on=I0021]
- id: pylint
alias: redefined-outer-name
name: Redefining name from outer scope
files: ^pandas/
exclude: |
(?x)
^pandas/tests # keep excluded
|/_testing/ # keep excluded
|^pandas/util/_test_decorators\.py # keep excluded
|^pandas/_version\.py # keep excluded
|^pandas/conftest\.py # keep excluded
args: [--disable=all, --enable=redefined-outer-name]
stages: [manual]
- repo: https://github.com/PyCQA/isort
rev: 5.12.0
rev: 5.13.2
hooks:
- id: isort
- repo: https://github.com/asottile/pyupgrade
rev: v3.15.0
rev: v3.15.2
hooks:
- id: pyupgrade
args: [--py39-plus]
Expand All @@ -116,7 +91,7 @@ repos:
hooks:
- id: sphinx-lint
- repo: https://github.com/pre-commit/mirrors-clang-format
rev: v17.0.6
rev: v18.1.2
hooks:
- id: clang-format
files: ^pandas/_libs/src|^pandas/_libs/include
Expand Down
1 change: 1 addition & 0 deletions asv_bench/asv.conf.json
Expand Up @@ -41,6 +41,7 @@
// pip (with all the conda available packages installed first,
// followed by the pip installed packages).
"matrix": {
"pip+build": [],
"Cython": ["3.0"],
"matplotlib": [],
"sqlalchemy": [],
Expand Down
2 changes: 1 addition & 1 deletion asv_bench/benchmarks/categoricals.py
Expand Up @@ -24,7 +24,7 @@ def setup(self):
self.codes = np.tile(range(len(self.categories)), N)

self.datetimes = pd.Series(
pd.date_range("1995-01-01 00:00:00", periods=N / 10, freq="s")
pd.date_range("1995-01-01 00:00:00", periods=N // 10, freq="s")
)
self.datetimes_with_nat = self.datetimes.copy()
self.datetimes_with_nat.iloc[-1] = pd.NaT
Expand Down
24 changes: 24 additions & 0 deletions asv_bench/benchmarks/frame_methods.py
Expand Up @@ -862,4 +862,28 @@ def time_last_valid_index(self, dtype):
self.df.last_valid_index()


class Update:
def setup(self):
rng = np.random.default_rng()
self.df = DataFrame(rng.uniform(size=(1_000_000, 10)))

idx = rng.choice(range(1_000_000), size=1_000_000, replace=False)
self.df_random = DataFrame(self.df, index=idx)

idx = rng.choice(range(1_000_000), size=100_000, replace=False)
cols = rng.choice(range(10), size=2, replace=False)
self.df_sample = DataFrame(
rng.uniform(size=(100_000, 2)), index=idx, columns=cols
)

def time_to_update_big_frame_small_arg(self):
self.df.update(self.df_sample)

def time_to_update_random_indices(self):
self.df_random.update(self.df_sample)

def time_to_update_small_frame_big_arg(self):
self.df_sample.update(self.df)


from .pandas_vb_common import setup # noqa: F401 isort:skip
2 changes: 1 addition & 1 deletion asv_bench/benchmarks/timeseries.py
Expand Up @@ -29,7 +29,7 @@ def setup(self, index_type):
"dst": date_range(
start="10/29/2000 1:00:00", end="10/29/2000 1:59:59", freq="s"
),
"repeated": date_range(start="2000", periods=N / 10, freq="s").repeat(10),
"repeated": date_range(start="2000", periods=N // 10, freq="s").repeat(10),
"tz_aware": date_range(start="2000", periods=N, freq="s", tz="US/Eastern"),
"tz_local": date_range(
start="2000", periods=N, freq="s", tz=dateutil.tz.tzlocal()
Expand Down
52 changes: 3 additions & 49 deletions ci/code_checks.sh
Expand Up @@ -83,8 +83,6 @@ if [[ -z "$CHECK" || "$CHECK" == "docstrings" ]]; then
-i "pandas.DataFrame.__iter__ SA01" \
-i "pandas.DataFrame.assign SA01" \
-i "pandas.DataFrame.at_time PR01" \
-i "pandas.DataFrame.axes SA01" \
-i "pandas.DataFrame.backfill PR01,SA01" \
-i "pandas.DataFrame.bfill SA01" \
-i "pandas.DataFrame.columns SA01" \
-i "pandas.DataFrame.copy SA01" \
Expand All @@ -99,12 +97,10 @@ if [[ -z "$CHECK" || "$CHECK" == "docstrings" ]]; then
-i "pandas.DataFrame.kurt RT03,SA01" \
-i "pandas.DataFrame.kurtosis RT03,SA01" \
-i "pandas.DataFrame.last_valid_index SA01" \
-i "pandas.DataFrame.mask RT03" \
-i "pandas.DataFrame.max RT03" \
-i "pandas.DataFrame.mean RT03,SA01" \
-i "pandas.DataFrame.median RT03,SA01" \
-i "pandas.DataFrame.min RT03" \
-i "pandas.DataFrame.pad PR01,SA01" \
-i "pandas.DataFrame.plot PR02,SA01" \
-i "pandas.DataFrame.pop SA01" \
-i "pandas.DataFrame.prod RT03" \
Expand All @@ -119,19 +115,11 @@ if [[ -z "$CHECK" || "$CHECK" == "docstrings" ]]; then
-i "pandas.DataFrame.sparse.to_dense SA01" \
-i "pandas.DataFrame.std PR01,RT03,SA01" \
-i "pandas.DataFrame.sum RT03" \
-i "pandas.DataFrame.swapaxes PR01,SA01" \
-i "pandas.DataFrame.swaplevel SA01" \
-i "pandas.DataFrame.to_feather SA01" \
-i "pandas.DataFrame.to_markdown SA01" \
-i "pandas.DataFrame.to_parquet RT03" \
-i "pandas.DataFrame.to_period SA01" \
-i "pandas.DataFrame.to_timestamp SA01" \
-i "pandas.DataFrame.tz_convert SA01" \
-i "pandas.DataFrame.tz_localize SA01" \
-i "pandas.DataFrame.unstack RT03" \
-i "pandas.DataFrame.value_counts RT03" \
-i "pandas.DataFrame.var PR01,RT03,SA01" \
-i "pandas.DataFrame.where RT03" \
-i "pandas.DatetimeIndex.ceil SA01" \
-i "pandas.DatetimeIndex.date SA01" \
-i "pandas.DatetimeIndex.day SA01" \
Expand Down Expand Up @@ -165,11 +153,6 @@ if [[ -z "$CHECK" || "$CHECK" == "docstrings" ]]; then
-i "pandas.DatetimeTZDtype SA01" \
-i "pandas.DatetimeTZDtype.tz SA01" \
-i "pandas.DatetimeTZDtype.unit SA01" \
-i "pandas.ExcelFile PR01,SA01" \
-i "pandas.ExcelFile.parse PR01,SA01" \
-i "pandas.ExcelWriter SA01" \
-i "pandas.Float32Dtype SA01" \
-i "pandas.Float64Dtype SA01" \
-i "pandas.Grouper PR02,SA01" \
-i "pandas.HDFStore.append PR01,SA01" \
-i "pandas.HDFStore.get SA01" \
Expand Down Expand Up @@ -226,7 +209,6 @@ if [[ -z "$CHECK" || "$CHECK" == "docstrings" ]]; then
-i "pandas.Index.to_list RT03" \
-i "pandas.Index.union PR07,RT03,SA01" \
-i "pandas.Index.unique RT03" \
-i "pandas.Index.value_counts RT03" \
-i "pandas.Index.view GL08" \
-i "pandas.Int16Dtype SA01" \
-i "pandas.Int32Dtype SA01" \
Expand Down Expand Up @@ -400,7 +382,6 @@ if [[ -z "$CHECK" || "$CHECK" == "docstrings" ]]; then
-i "pandas.Series.list.flatten SA01" \
-i "pandas.Series.list.len SA01" \
-i "pandas.Series.lt PR07,SA01" \
-i "pandas.Series.mask RT03" \
-i "pandas.Series.max RT03" \
-i "pandas.Series.mean RT03,SA01" \
-i "pandas.Series.median RT03,SA01" \
Expand Down Expand Up @@ -477,17 +458,10 @@ if [[ -z "$CHECK" || "$CHECK" == "docstrings" ]]; then
-i "pandas.Series.to_frame SA01" \
-i "pandas.Series.to_list RT03" \
-i "pandas.Series.to_markdown SA01" \
-i "pandas.Series.to_period SA01" \
-i "pandas.Series.to_string SA01" \
-i "pandas.Series.to_timestamp RT03,SA01" \
-i "pandas.Series.truediv PR07" \
-i "pandas.Series.tz_convert SA01" \
-i "pandas.Series.tz_localize SA01" \
-i "pandas.Series.unstack SA01" \
-i "pandas.Series.update PR07,SA01" \
-i "pandas.Series.value_counts RT03" \
-i "pandas.Series.var PR01,RT03,SA01" \
-i "pandas.Series.where RT03" \
-i "pandas.SparseDtype SA01" \
-i "pandas.Timedelta PR07,SA01" \
-i "pandas.Timedelta.as_unit SA01" \
Expand Down Expand Up @@ -681,60 +655,40 @@ if [[ -z "$CHECK" || "$CHECK" == "docstrings" ]]; then
-i "pandas.core.groupby.DataFrameGroupBy.__iter__ RT03,SA01" \
-i "pandas.core.groupby.DataFrameGroupBy.agg RT03" \
-i "pandas.core.groupby.DataFrameGroupBy.aggregate RT03" \
-i "pandas.core.groupby.DataFrameGroupBy.apply RT03" \
-i "pandas.core.groupby.DataFrameGroupBy.boxplot PR07,RT03,SA01" \
-i "pandas.core.groupby.DataFrameGroupBy.cummax RT03" \
-i "pandas.core.groupby.DataFrameGroupBy.cummin RT03" \
-i "pandas.core.groupby.DataFrameGroupBy.cumprod RT03" \
-i "pandas.core.groupby.DataFrameGroupBy.cumsum RT03" \
-i "pandas.core.groupby.DataFrameGroupBy.filter RT03,SA01" \
-i "pandas.core.groupby.DataFrameGroupBy.filter SA01" \
-i "pandas.core.groupby.DataFrameGroupBy.get_group RT03,SA01" \
-i "pandas.core.groupby.DataFrameGroupBy.groups SA01" \
-i "pandas.core.groupby.DataFrameGroupBy.hist RT03" \
-i "pandas.core.groupby.DataFrameGroupBy.indices SA01" \
-i "pandas.core.groupby.DataFrameGroupBy.max SA01" \
-i "pandas.core.groupby.DataFrameGroupBy.mean RT03" \
-i "pandas.core.groupby.DataFrameGroupBy.median SA01" \
-i "pandas.core.groupby.DataFrameGroupBy.min SA01" \
-i "pandas.core.groupby.DataFrameGroupBy.nth PR02" \
-i "pandas.core.groupby.DataFrameGroupBy.nunique RT03,SA01" \
-i "pandas.core.groupby.DataFrameGroupBy.nunique SA01" \
-i "pandas.core.groupby.DataFrameGroupBy.ohlc SA01" \
-i "pandas.core.groupby.DataFrameGroupBy.plot PR02,SA01" \
-i "pandas.core.groupby.DataFrameGroupBy.prod SA01" \
-i "pandas.core.groupby.DataFrameGroupBy.rank RT03" \
-i "pandas.core.groupby.DataFrameGroupBy.resample RT03" \
-i "pandas.core.groupby.DataFrameGroupBy.sem SA01" \
-i "pandas.core.groupby.DataFrameGroupBy.skew RT03" \
-i "pandas.core.groupby.DataFrameGroupBy.sum SA01" \
-i "pandas.core.groupby.DataFrameGroupBy.transform RT03" \
-i "pandas.core.groupby.SeriesGroupBy.__iter__ RT03,SA01" \
-i "pandas.core.groupby.SeriesGroupBy.agg RT03" \
-i "pandas.core.groupby.SeriesGroupBy.aggregate RT03" \
-i "pandas.core.groupby.SeriesGroupBy.apply RT03" \
-i "pandas.core.groupby.SeriesGroupBy.cummax RT03" \
-i "pandas.core.groupby.SeriesGroupBy.cummin RT03" \
-i "pandas.core.groupby.SeriesGroupBy.cumprod RT03" \
-i "pandas.core.groupby.SeriesGroupBy.cumsum RT03" \
-i "pandas.core.groupby.SeriesGroupBy.filter PR01,RT03,SA01" \
-i "pandas.core.groupby.SeriesGroupBy.filter PR01,SA01" \
-i "pandas.core.groupby.SeriesGroupBy.get_group RT03,SA01" \
-i "pandas.core.groupby.SeriesGroupBy.groups SA01" \
-i "pandas.core.groupby.SeriesGroupBy.indices SA01" \
-i "pandas.core.groupby.SeriesGroupBy.is_monotonic_decreasing SA01" \
-i "pandas.core.groupby.SeriesGroupBy.is_monotonic_increasing SA01" \
-i "pandas.core.groupby.SeriesGroupBy.max SA01" \
-i "pandas.core.groupby.SeriesGroupBy.mean RT03" \
-i "pandas.core.groupby.SeriesGroupBy.median SA01" \
-i "pandas.core.groupby.SeriesGroupBy.min SA01" \
-i "pandas.core.groupby.SeriesGroupBy.nth PR02" \
-i "pandas.core.groupby.SeriesGroupBy.ohlc SA01" \
-i "pandas.core.groupby.SeriesGroupBy.plot PR02,SA01" \
-i "pandas.core.groupby.SeriesGroupBy.prod SA01" \
-i "pandas.core.groupby.SeriesGroupBy.rank RT03" \
-i "pandas.core.groupby.SeriesGroupBy.resample RT03" \
-i "pandas.core.groupby.SeriesGroupBy.sem SA01" \
-i "pandas.core.groupby.SeriesGroupBy.skew RT03" \
-i "pandas.core.groupby.SeriesGroupBy.sum SA01" \
-i "pandas.core.groupby.SeriesGroupBy.transform RT03" \
-i "pandas.core.resample.Resampler.__iter__ RT03,SA01" \
-i "pandas.core.resample.Resampler.ffill RT03" \
-i "pandas.core.resample.Resampler.get_group RT03,SA01" \
Expand Down
1 change: 0 additions & 1 deletion doc/redirects.csv
Expand Up @@ -1422,7 +1422,6 @@ reference/api/pandas.Series.transpose,pandas.Series.T
reference/api/pandas.Index.transpose,pandas.Index.T
reference/api/pandas.Index.notnull,pandas.Index.notna
reference/api/pandas.Index.tolist,pandas.Index.to_list
reference/api/pandas.arrays.PandasArray,pandas.arrays.NumpyExtensionArray
reference/api/pandas.core.groupby.DataFrameGroupBy.backfill,pandas.core.groupby.DataFrameGroupBy.bfill
reference/api/pandas.core.groupby.GroupBy.backfill,pandas.core.groupby.DataFrameGroupBy.bfill
reference/api/pandas.core.resample.Resampler.backfill,pandas.core.resample.Resampler.bfill
Expand Down
2 changes: 2 additions & 0 deletions doc/source/getting_started/install.rst
Expand Up @@ -269,6 +269,8 @@ SciPy 1.10.0 computation Miscellaneous stati
xarray 2022.12.0 computation pandas-like API for N-dimensional data
========================= ================== =============== =============================================================

.. _install.excel_dependencies:

Excel files
^^^^^^^^^^^

Expand Down
6 changes: 6 additions & 0 deletions doc/source/getting_started/intro_tutorials/02_read_write.rst
Expand Up @@ -111,6 +111,12 @@ strings (``object``).

My colleague requested the Titanic data as a spreadsheet.

.. note::
If you want to use :func:`~pandas.to_excel` and :func:`~pandas.read_excel`,
you need to install an Excel reader as outlined in the
:ref:`Excel files <install.excel_dependencies>` section of the
installation documentation.

.. ipython:: python
titanic.to_excel("titanic.xlsx", sheet_name="passengers", index=False)
Expand Down
10 changes: 5 additions & 5 deletions doc/source/user_guide/basics.rst
Expand Up @@ -476,15 +476,15 @@ For example:
.. ipython:: python
df
df.mean(0)
df.mean(1)
df.mean(axis=0)
df.mean(axis=1)
All such methods have a ``skipna`` option signaling whether to exclude missing
data (``True`` by default):

.. ipython:: python
df.sum(0, skipna=False)
df.sum(axis=0, skipna=False)
df.sum(axis=1, skipna=True)
Combined with the broadcasting / arithmetic behavior, one can describe various
Expand All @@ -495,8 +495,8 @@ standard deviation of 1), very concisely:
ts_stand = (df - df.mean()) / df.std()
ts_stand.std()
xs_stand = df.sub(df.mean(1), axis=0).div(df.std(1), axis=0)
xs_stand.std(1)
xs_stand = df.sub(df.mean(axis=1), axis=0).div(df.std(axis=1), axis=0)
xs_stand.std(axis=1)
Note that methods like :meth:`~DataFrame.cumsum` and :meth:`~DataFrame.cumprod`
preserve the location of ``NaN`` values. This is somewhat different from
Expand Down

0 comments on commit be71a4d

Please sign in to comment.