Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Micro optimizations to improve indexing #9002

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

hmaarrfk
Copy link
Contributor

@hmaarrfk hmaarrfk commented May 5, 2024

These are some micro optimizaitons to improve indexing within xarray objects.

Generally speaking, the use of isinstance in python is pretty slow. Its pretty hard to avoid with the amount of "niceness" that xarray does.

However, I feel like these changes are "ok" and kick ups the performance of
#9001

from

h5netcdf  lazy      : 100%|███████████| 100000/100000 [00:11<00:00, 8951.20it/s]

to

h5netcdf  lazy      : 100%|███████████| 100000/100000 [00:09<00:00, 10079.44it/s]

With my "controverisal" fix for _apply_indexes_fast:

h5netcdf  lazy      : 100%|█████████████████| 100000/100000 [00:06<00:00, 14410.26it/s]

without it:

h5netcdf  lazy      : 100%|█████████████████| 100000/100000 [00:07<00:00, 12895.45it/s]

a nice 40% boost!

Benchmarked on

model name      : AMD Ryzen Threadripper 2950X 16-Core Processor

its not the best processor, but it doesn't throttle!

I'll mark this as not draft when it is ready for review. done!

Concepts used:

  • Fastpath added for BasicIndexer construction
  • Tuple extension used instead of list + tuple cast. I believe that CPython improved tuple performance somewhere around 3.9/3.10 era so this is now "faster" than the old 3.6 tricks.
  • LazilyIndexedArray now store the shape at creation time. Otherwise this was causing the shape to be recomputed many times during indexing operations that use operators like len or shape to broadcast the indexing tuple.
  • isel now avoids the _id_coord_names by avoidining xindexes.group_by_index() giving a 20% speedup.

Findings

  • _id_coord_names has terrible performance... I had to avoid it to get isel to speed up. It is likely that other code paths can be sped up the same way too. I just don't know how to use those code paths so i'm hesitant to go down that rabbit hole myself.

  • Closes #xxxx
  • Tests added
  • User visible changes (including notable bug fixes) are documented in whats-new.rst
  • New functions/methods are listed in api.rst

xref: #2799
xref: #7045

Copy link
Collaborator

@max-sixty max-sixty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

xarray/core/indexing.py Outdated Show resolved Hide resolved
xarray/core/indexing.py Outdated Show resolved Hide resolved
@@ -383,11 +383,14 @@ class BasicIndexer(ExplicitIndexer):

__slots__ = ()

def __init__(self, key: tuple[int | np.integer | slice, ...]):
def __init__(self, key: tuple[int | np.integer | slice, ...], *, _fastpath=False):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think we generally use fastpath rather than _fastpath, though no strong view. We could also have a different method from_key given there isn't much overlap

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, happy to change these variable names.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, i'm thinking through some of the implications here, lets review at the end of the day.

@hmaarrfk hmaarrfk force-pushed the fastestpath branch 4 times, most recently from d975fba to c2a065d Compare May 5, 2024 18:39
return self.__id_coord_names

def _create_id_coord_names(self) -> None:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its strange that the function _create_id_coord_names gets hit all the time by the profiler. I would have thought that once the dataset is created this function would only be called once.

@hmaarrfk hmaarrfk force-pushed the fastestpath branch 2 times, most recently from 8c49aba to 2b5e936 Compare May 5, 2024 19:27
@kmuehlbauer kmuehlbauer marked this pull request as ready for review May 5, 2024 19:49
@kmuehlbauer
Copy link
Contributor

@hmaarrfk Sorry Marc, moving to ready for review was by accident. Still have problems to properly use GitHub Android app.

@hmaarrfk hmaarrfk marked this pull request as draft May 5, 2024 19:52
@hmaarrfk
Copy link
Contributor Author

hmaarrfk commented May 5, 2024

GitHub Android app.

the github app is terrible..... i keep making similar mistakes on my phone.

@hmaarrfk hmaarrfk force-pushed the fastestpath branch 8 times, most recently from 5d4e9b2 to f958953 Compare May 5, 2024 22:48
@hmaarrfk
Copy link
Contributor Author

hmaarrfk commented May 5, 2024

Ok enough for today.

But generally speaking, I think the the main problem, large compatibility with Pandas is a known issue.

However, I think that some things are simply strange in the sense that certain isel operations just seem too slow for comfort. It may be that we can add enough fastpath variables to remove alot of the expensive validation

image

You can see some of the numbers that I use.

I use a 100_000 selection because it amplifies the problem and makes the profiler find the problematic hotspots.

The function _apply_indexes_fast might be controversial. I am happy to remove it and just document it hmaarrfk#29 for when i have time again.

It drops the performance on my Threadripper 2950x from 14kits/sec to 12kits/sec. still an improvement from the 9kits/sec where this pull request #9001 leaves off.

@hmaarrfk hmaarrfk force-pushed the fastestpath branch 4 times, most recently from 2c202ff to 1a659bc Compare May 5, 2024 23:01
@hmaarrfk hmaarrfk marked this pull request as ready for review May 5, 2024 23:01
@hmaarrfk
Copy link
Contributor Author

hmaarrfk commented May 5, 2024

feel free to push any cleanup.s

xarray/core/indexes.py Show resolved Hide resolved
@@ -1800,7 +1844,10 @@ def isel_indexes(
indexes: Indexes[Index],
indexers: Mapping[Any, Any],
) -> tuple[dict[Hashable, Index], dict[Hashable, Variable]]:
return _apply_indexes(indexes, indexers, "isel")
if any(isinstance(v, PandasMultiIndex) for v in indexes._indexes.values()):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI: this is the error that shows up without this if statement:

    def _assert_indexes_invariants_checks(
        indexes, possible_coord_variables, dims, check_default=True
    ):
        assert isinstance(indexes, dict), indexes
        assert all(isinstance(v, Index) for v in indexes.values()), {
            k: type(v) for k, v in indexes.items()
        }

        index_vars = {
            k for k, v in possible_coord_variables.items() if isinstance(v, IndexVariable)
        }
        assert indexes.keys() <= index_vars, (set(indexes), index_vars)

        # check pandas index wrappers vs. coordinate data adapters
        for k, index in indexes.items():
            if isinstance(index, PandasIndex):
                pd_index = index.index
                var = possible_coord_variables[k]
                assert (index.dim,) == var.dims, (pd_index, var)
                if k == index.dim:
                    # skip multi-index levels here (checked below)
                    assert index.coord_dtype == var.dtype, (index.coord_dtype, var.dtype)
                assert isinstance(var._data.array, pd.Index), var._data.array
                # TODO: check identity instead of equality?
                assert pd_index.equals(var._data.array), (pd_index, var)
            if isinstance(index, PandasMultiIndex):
                pd_index = index.index
                for name in index.index.names:
                    assert name in possible_coord_variables, (pd_index, index_vars)
                    var = possible_coord_variables[name]
                    assert (index.dim,) == var.dims, (pd_index, var)
                    assert index.level_coords_dtype[name] == var.dtype, (
                        index.level_coords_dtype[name],
                        var.dtype,
                    )
                    assert isinstance(var._data.array, pd.MultiIndex), var._data.array
                    assert pd_index.equals(var._data.array), (pd_index, var)
                    # check all all levels are in `indexes`
                    assert name in indexes, (name, set(indexes))
                    # index identity is used to find unique indexes in `indexes`
>                   assert index is indexes[name], (pd_index, indexes[name].index)
E                   AssertionError: (MultiIndex([('a', 1, -1),
E                               ('b', 2, -2)],
E                              name='x'), MultiIndex([('a', 1, -1),
E                               ('b', 2, -2)],
E                              name='x'))

/home/mark/git/xarray/xarray/testing/assertions.py:411: AssertionError
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> entering PDB >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> PDB post_mortem >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> /home/mark/git/xarray/xarray/testing/assertions.py(411)_assert_indexes_invariants_checks()
-> assert index is indexes[name], (pd_index, indexes[name].index)
(Pdb) c

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> PDB continue >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>


======================================== short test summary info =========================================
FAILED xarray/tests/test_dataarray.py::TestDataArray::test_selection_multiindex - AssertionError: (MultiIndex([('a', 1, -1),

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like a copy was made?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, i just don't know enough about the internal datastructures of xarray and ran out of time hunting for it.

Maybe there is a different way of iterating through things.

hmaarrfk added a commit to hmaarrfk/xarray that referenced this pull request May 6, 2024
This targets optimization for datasets with many "scalar" variables
(that is variables without any dimensions). This can happen in the
context where you have many pieces of small metadata that relate to
various facts about an experimental condition.

For example, we have about 80 of these in our datasets (and I want to
incrase this number)

Our datasets are quite large (On the order of 1TB uncompresed) so we
often have one dimension that is in the 10's of thousands.

However, it has become quite slow to index in the dataset.

We therefore often "carefully slice out the matadata we need" prior to
doing anything with our dataset, but that isn't quite possible with you
want to orchestrate things with a parent application.

These optimizations are likely "minor" but considering the results of
the benchmark, I think they are quite worthwhile:

* main (as of pydata#9001) - 2.5k its/s
* With pydata#9002 - 4.2k its/s
* With this Pull Request (on top of pydata#9002) -- 6.1k its/s

Thanks for considering.
hmaarrfk added a commit to hmaarrfk/xarray that referenced this pull request May 6, 2024
This targets optimization for datasets with many "scalar" variables
(that is variables without any dimensions). This can happen in the
context where you have many pieces of small metadata that relate to
various facts about an experimental condition.

For example, we have about 80 of these in our datasets (and I want to
incrase this number)

Our datasets are quite large (On the order of 1TB uncompresed) so we
often have one dimension that is in the 10's of thousands.

However, it has become quite slow to index in the dataset.

We therefore often "carefully slice out the matadata we need" prior to
doing anything with our dataset, but that isn't quite possible with you
want to orchestrate things with a parent application.

These optimizations are likely "minor" but considering the results of
the benchmark, I think they are quite worthwhile:

* main (as of pydata#9001) - 2.5k its/s
* With pydata#9002 - 4.2k its/s
* With this Pull Request (on top of pydata#9002) -- 6.1k its/s

Thanks for considering.
@hmaarrfk
Copy link
Contributor Author

hmaarrfk commented May 6, 2024

Finally on my machine, this code shows that there is a 30 us /it -> 25 us /it improvement on my little benchmark from 5 years ago -- #2799 (comment)

import xarray as xr
import numpy as np

n = np.zeros(shape=(1024, 1024))

x = xr.DataArray(n, dims=('y', 'x'))

the_slice = np.s_[256:512, 256:512]

%timeit n[the_slice]
%timeit x[the_slice]

xarray/core/indexing.py Outdated Show resolved Hide resolved
index = safe_cast_to_index(array).copy()
def __init__(
self, array: Any, dim: Hashable, coord_dtype: Any = None, *, fastpath=False
):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit wary of adding fastpath everywhere. It makes things a bit hard to reason about. Is this one that consequential? We do first check for isinstance(pd.Index) in safe_cast_to_index...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anecdotally, I remember thinking that indexes are created and copied a few too many times. I believe this was done for correctness in a big refactor, but reducing any useless copies would be a big win since we would call _replace less.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anecdotally, I remember thinking that indexes are created and copied a few too many times.

this in general is my feeling after my repeated deep diving into the codebase.

@@ -1772,6 +1785,37 @@ def check_variables():
return not not_equal


def _apply_indexes_fast(indexes: Indexes[Index], args: Mapping[Any, Any], func: str):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+10 for this kind of refactor ;)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! but what aspect? 🤔

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rewriting the core logic to be faster :) There's definitely a bunch of low hanging fruit

hmaarrfk added a commit to hmaarrfk/xarray that referenced this pull request May 6, 2024
This targets optimization for datasets with many "scalar" variables
(that is variables without any dimensions). This can happen in the
context where you have many pieces of small metadata that relate to
various facts about an experimental condition.

For example, we have about 80 of these in our datasets (and I want to
incrase this number)

Our datasets are quite large (On the order of 1TB uncompresed) so we
often have one dimension that is in the 10's of thousands.

However, it has become quite slow to index in the dataset.

We therefore often "carefully slice out the matadata we need" prior to
doing anything with our dataset, but that isn't quite possible with you
want to orchestrate things with a parent application.

These optimizations are likely "minor" but considering the results of
the benchmark, I think they are quite worthwhile:

* main (as of pydata#9001) - 2.5k its/s
* With pydata#9002 - 4.2k its/s
* With this Pull Request (on top of pydata#9002) -- 6.1k its/s

Thanks for considering.
xarray/core/indexes.py Outdated Show resolved Hide resolved
xarray/core/indexes.py Outdated Show resolved Hide resolved
Comment on lines 463 to 464
# Variables can have undefined data type according to MyPy due to the fastpath
self._coords = cast(dict, coords)
Copy link
Contributor

@Illviljan Illviljan May 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think mypy is correct to complain here. Set fastpath=True it becomes strange to not have the assert.

Remember that coords: Sequence[Sequence | pd.Index | DataArray] | Mapping | None = None at this point and self._coords must be a dict.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok i added back the asserts. they did not have a significant effect on the benchmark I was running.

xarray/core/indexes.py Outdated Show resolved Hide resolved
xarray/core/indexes.py Outdated Show resolved Hide resolved
xarray/core/indexing.py Outdated Show resolved Hide resolved
xarray/core/indexing.py Outdated Show resolved Hide resolved
Comment on lines 425 to 427
assert dims is None
assert attrs is None
assert indexes is not None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens when you set values other than None here now? Is there a test for this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i understood that fastpath indiciated:

  • I know what i'm doing, please avoid using expensive type checking.

Do we need more safety measures?

I feel like runtime checks should not need to be added to appease a linter. I under there is an environment variable that makes assert into a no-op, but I feel like we can avoid these lines of code if we can.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok i added back the asserts. they did not have a significant effect on the benchmark I was running.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ended up reverting the changes to this method. thanks for the review.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(sorry I made the changes on a local branch and forgot to push)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with you on the performance cost of isinstance. But if you want fast and correct code it's mypy that you should heavily rely on. Mypy won't be useful however if using overrides like cast and type:ignore.

While reading up on the array api standard I've come to prefer the approach that Array classes should have the exact required type for the input arguments, and use xp.asarray to handle all the other wider types. This way we separate the concerns in a familiar way and it significantly reduces the number of required isinstance checks internally because we can trust that the arguments always are correct.

DataArray unfortunately took the other route when these kinds of performance bottlenecks wasn't as big of a priority and now we have to deal with the backwards compatibility concerns.

@hmaarrfk hmaarrfk marked this pull request as draft May 18, 2024 14:24
@hmaarrfk
Copy link
Contributor Author

I'm converting to draft since I need to re-assess where things land with the changes in:
https://github.com/pydata/xarray/pull/9009/files

I think I incorporated many of the from xarray/core/indexing.py in that branch.

@hmaarrfk hmaarrfk force-pushed the fastestpath branch 2 times, most recently from 002c7e8 to 4081568 Compare May 19, 2024 00:58
@hmaarrfk
Copy link
Contributor Author

Timing:

$ python software_timestamp_benchmark.py
software_timestamp.nc
h5netcdf  lazy      : 100%|██████████████████████████| 100000/100000 [00:11<00:00, 9071.31it/s]
h5netcdf  computed  : 100%|█████████████████████████| 100000/100000 [00:06<00:00, 14918.83it/s]
netcdf4   lazy      : 100%|██████████████████████████| 100000/100000 [00:11<00:00, 8945.04it/s]
netcdf4   computed  : 100%|█████████████████████████| 100000/100000 [00:06<00:00, 14714.97it/s]
software_timestamp_float64.nc
h5netcdf  lazy      : 100%|██████████████████████████| 100000/100000 [00:10<00:00, 9435.68it/s]
h5netcdf  computed  : 100%|█████████████████████████| 100000/100000 [00:06<00:00, 15035.51it/s]
netcdf4   lazy      : 100%|██████████████████████████| 100000/100000 [00:10<00:00, 9386.32it/s]
netcdf4   computed  : 100%|█████████████████████████| 100000/100000 [00:06<00:00, 14706.41it/s]
$ python software_timestamp_benchmark.py
software_timestamp.nc
h5netcdf  lazy      : 100%|█████████████████████████| 100000/100000 [00:07<00:00, 12633.23it/s]
h5netcdf  computed  : 100%|█████████████████████████| 100000/100000 [00:06<00:00, 14948.95it/s]
netcdf4   lazy      : 100%|█████████████████████████| 100000/100000 [00:08<00:00, 12484.01it/s]
netcdf4   computed  : 100%|█████████████████████████| 100000/100000 [00:06<00:00, 14861.10it/s]
software_timestamp_float64.nc
h5netcdf  lazy      : 100%|█████████████████████████| 100000/100000 [00:08<00:00, 12478.23it/s]
h5netcdf  computed  : 100%|█████████████████████████| 100000/100000 [00:06<00:00, 15030.92it/s]
netcdf4   lazy      : 100%|█████████████████████████| 100000/100000 [00:08<00:00, 12408.24it/s]
netcdf4   computed  : 100%|█████████████████████████| 100000/100000 [00:06<00:00, 14785.03it/s]
$ python software_timestamp_benchmark.py
software_timestamp.nc
h5netcdf  lazy      : 100%|█████████████████████████| 100000/100000 [00:07<00:00, 13533.83it/s]
h5netcdf  computed  : 100%|█████████████████████████| 100000/100000 [00:06<00:00, 16316.19it/s]
netcdf4   lazy      : 100%|█████████████████████████| 100000/100000 [00:07<00:00, 13321.37it/s]
netcdf4   computed  : 100%|█████████████████████████| 100000/100000 [00:06<00:00, 16292.80it/s]
software_timestamp_float64.nc
h5netcdf  lazy      : 100%|█████████████████████████| 100000/100000 [00:07<00:00, 13335.41it/s]
h5netcdf  computed  : 100%|█████████████████████████| 100000/100000 [00:06<00:00, 16412.66it/s]
netcdf4   lazy      : 100%|█████████████████████████| 100000/100000 [00:07<00:00, 13131.81it/s]
netcdf4   computed  : 100%|█████████████████████████| 100000/100000 [00:06<00:00, 16102.49it/s]
Benchmarking code
import xarray
from tqdm import tqdm

for filename in [
    'software_timestamp.nc',
    'software_timestamp_float64.nc',
]:
    print(filename)
    for engine in ['h5netcdf', 'netcdf4']:
        for computed in [False, True]:
            dataset = xarray.open_dataset(filename, engine=engine)
            N_frames = dataset.sizes['frame_number']
            N_iters = 100_000
            software_timestamp = dataset['software_timestamp']
            if computed:
                software_timestamp = software_timestamp.compute()

            if computed:
                computed_str = "computed"
            else:
                computed_str = "lazy"
            desc = f"{engine:10s}{computed_str:10s}"
            for i in tqdm(range(N_iters), desc=desc):
                software_timestamp.isel(frame_number=i % N_frames)

            dataset.close()
h5ls software_timestamp.nc
frame_number             Dataset {48000}
software_timestamp       Dataset {48000}

@hmaarrfk hmaarrfk marked this pull request as ready for review May 19, 2024 01:05
@hmaarrfk
Copy link
Contributor Author

If you all think that: https://github.com/pydata/xarray/pull/9002/files#r1590443756 should hold up this PR then I'm happy to review, but I think that this is a step in the right direction

if isinstance(applied_indexer, slice) and applied_indexer == slice(None):
# shortcut for the usual case
return old_indexer
if isinstance(old_indexer, slice):
if isinstance(applied_indexer, slice):
indexer = slice_slice(old_indexer, applied_indexer, size)
elif isinstance(applied_indexer, integer_types):
indexer = range(*old_indexer.indices(size))[applied_indexer] # type: ignore[assignment]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These kinds of # type: ignore[assignment] errors usually means indexer has changed type, which usually leads to requiring another costly isinstance again somewhere downstream.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is due to the ambiguity between slice and int for these indexers. I can look into it if you want.

However. It starts to touch some tuple to lists conversation that was identified as something I shouldn't adjust too much due to potential conflicts with an active in development branch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The following patch would
diff --git a/xarray/core/indexing.py b/xarray/core/indexing.py
index e29d77da..44560dbd 100644
--- a/xarray/core/indexing.py
+++ b/xarray/core/indexing.py
@@ -9,7 +9,7 @@ from contextlib import suppress
 from dataclasses import dataclass, field
 from datetime import timedelta
 from html import escape
-from typing import TYPE_CHECKING, Any, Callable, overload
+from typing import TYPE_CHECKING, Any, Callable, overload, List

 import numpy as np
 import pandas as pd
@@ -295,20 +295,24 @@ def slice_slice(old_slice: slice, applied_slice: slice, size: int) -> slice:
     return slice(start, stop, step)


-def _index_indexer_1d(old_indexer, applied_indexer, size: int):
+def _index_indexer_1d(
+    old_indexer: slice | np.ndarray[Any, np.dtype[np.generic]],
+    applied_indexer: slice | int | np.integer | np.ndarray[Any, np.dtype[np.generic]],
+    size: int
+) -> slice | int | np.integer | np.ndarray[Any, np.dtype[np.generic]]:
+    """Helper routine to slice non integers into an entry for OuterIndexer"""
     if isinstance(applied_indexer, slice) and applied_indexer == slice(None):
         # shortcut for the usual case
         return old_indexer
     if isinstance(old_indexer, slice):
         if isinstance(applied_indexer, slice):
-            indexer = slice_slice(old_indexer, applied_indexer, size)
+            return slice_slice(old_indexer, applied_indexer, size)
         elif isinstance(applied_indexer, integer_types):
-            indexer = range(*old_indexer.indices(size))[applied_indexer]  # type: ignore[assignment]
+            return range(*old_indexer.indices(size))[applied_indexer]
         else:
-            indexer = _expand_slice(old_indexer, size)[applied_indexer]
+            return _expand_slice(old_indexer, size)[applied_indexer]
     else:
-        indexer = old_indexer[applied_indexer]
-    return indexer
+        return old_indexer[applied_indexer]


 class ExplicitIndexer:
@@ -630,12 +634,14 @@ class LazilyIndexedArray(ExplicitlyIndexedNDArrayMixin):

     def _updated_key(self, new_key: ExplicitIndexer) -> BasicIndexer | OuterIndexer:
         iter_new_key = iter(expanded_indexer(new_key.tuple, self.ndim))
-        full_key = []
+        full_key : List[slice | int | np.integer | np.ndarray[Any, np.dtype[np.generic]]] = []
+        outer_indexer = True
         for size, k in zip(self.array.shape, self.key.tuple):
             if isinstance(k, integer_types):
                 full_key.append(k)
             else:
                 full_key.append(_index_indexer_1d(k, next(iter_new_key), size))
+
         full_key_tuple = tuple(full_key)

         if all(isinstance(k, integer_types + (slice,)) for k in full_key_tuple):

would move the error to

xarray/core/indexing.py:648: error: Argument 1 to "BasicIndexer" has incompatible type "tuple[slice | int | integer[Any] | ndarray[Any, dtype[generic]], ...]"; expected "tuple[int | integer[Any] | slice, ...]"  [arg-type]

While I don't disagree that:

which usually leads to requiring another costly isinstance again somewhere downstream.

I would rather keep this for a future change once the branch backend-indexing has been merge in.

@hmaarrfk hmaarrfk marked this pull request as draft May 19, 2024 15:28
@hmaarrfk
Copy link
Contributor Author

Sigh, as much as I hate giving myself more work, I want to run the xarray benchmarks before and after and ensure that there is a measurable difference with the existing benchmarks, if there isn't I would like to contribute a benchmark for this.

While I know that benchmarks aren't always blockers for future changes, I did come here because I felt like indexing xarray had become too slow for our application so I would like to build off your monitoring tools.

@hmaarrfk
Copy link
Contributor Author

I needed this change on the main branch to get asv to run
diff --git a/asv_bench/asv.conf.json b/asv_bench/asv.conf.json
index 9dc86df7..4c717b7d 100644
--- a/asv_bench/asv.conf.json
+++ b/asv_bench/asv.conf.json
@@ -29,7 +29,7 @@
     // If missing or the empty string, the tool will be automatically
     // determined by looking for tools on the PATH environment
     // variable.
-    "environment_type": "mamba",
+    "environment_type": "conda",
     "conda_channels": ["conda-forge"],

     // timeout in seconds for installing any dependencies in environment
| Change   | Before [3eb91fc6] <main>   | After [40815684] <fastestpath>   |   Ratio | Benchmark (Parameter)                                         |
|----------|----------------------------|----------------------------------|---------|---------------------------------------------------------------|
| -        | 184±0.9μs                  | 159±0.4μs                        |    0.87 | indexing.Indexing.time_indexing_basic('1slice-1scalar')       |
| -        | 170±0.7μs                  | 146±0.4μs                        |    0.86 | indexing.Assignment.time_assignment_basic('1scalar')          |
| -        | 167±1μs                    | 142±0.6μs                        |    0.85 | indexing.Assignment.time_assignment_basic('2slicess-1scalar') |
| -        | 152±0.8μs                  | 125±1μs                          |    0.82 | indexing.Indexing.time_indexing_basic('1slice')               |
| -        | 253±10μs                   | 203±0.9μs                        |    0.8  | indexing.Indexing.time_indexing_basic('2slicess-1scalar')     |
| -        | 166±0.3μs                  | 130±0.8μs                        |    0.78 | indexing.Assignment.time_assignment_basic('1slice')           |
| -        | 87.1±0.2μs                 | 63.8±0.5μs                       |    0.73 | indexing.HugeAxisSmallSliceIndexing.time_indexing             |
| -        | 617±3μs                    | 333±3μs                          |    0.54 | indexing.AssignmentOptimized.time_assign_identical_indexes    |

It seems that the existing benchmarks are adequate enough to generate the signal we need.

@hmaarrfk hmaarrfk marked this pull request as ready for review May 19, 2024 17:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants