Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BE] Remove dependency on six and future #94709

Closed
wants to merge 8 commits into from

Conversation

XuehaiPan
Copy link
Collaborator

@XuehaiPan XuehaiPan commented Feb 12, 2023

Remove the Python 2 and 3 compatibility library six and future and torch._six. We only support Python 3.8+ now. It's time to retire them.

cc @mlazos @soumith @voznesenskym @yanboliang @penguinwu @anijain2305 @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx @desertfire

@pytorch-bot
Copy link

pytorch-bot bot commented Feb 12, 2023

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/94709

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 44ea65d:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added ciflow/mps Run MPS tests (subset of trunk) release notes: releng release notes category labels Feb 12, 2023
@Skylion007 Skylion007 added better-engineering Relatively self-contained tasks for better engineering contributors ciflow/trunk Trigger trunk jobs on your pull request labels Feb 12, 2023
.gitmodules Outdated Show resolved Hide resolved
@albanD
Copy link
Collaborator

albanD commented Mar 28, 2023

We should revisit all changes from string_classes to str to make sure to properly include bytes when relevant.
There are already 2 forward fixes for it: #97737 and #97789 but we should look through this diff and fix all the other ones pre-emptively!

@XuehaiPan would you have some time to look into that by any chance?

pytorchmergebot pushed a commit that referenced this pull request Mar 28, 2023
…es (#97737)

Slack thread: https://pytorch.slack.com/archives/GEEQ2K4MD/p1679962409906099

I was seeing some massive (~2x) slowdowns on a job after running it on PyTorch 2.0. From some profiling in `py-spy` it looked like the pin_memory thread was doing a lot more work than before. Looking at a trace in `nsys` I saw the thread doing the forward pass having a bunch of `pthread_cond_timedwait` with GIL reacquire calls in it’s call stack, and it seemed like the thread doing the forward pass was getting blocked (waiting for the GIL) by the pin memory thread (which was holding the GIL).

After some debugging I found out the issue. If a `bytes` was passed into `pin_memory`, previously in 1.13 (before #94709) it would short-circuit and return here
https://github.com/pytorch/pytorch/blob/d922c29a22e4bf0fba49526f7536395eb8cd66f4/torch/utils/data/_utils/pin_memory.py#L54-L55
since `bytes` was in `torch._six.string_classes`:
```
>>> from torch._six import string_classes
>>> string_classes
(<class 'str'>, <class 'bytes'>)
>>>
```

However after #94709, if a `bytes` was passed into `pin_memory` it would fall into here instead
https://github.com/pytorch/pytorch/blob/c263bd43e8e8502d4726643bc6fd046f0130ac0e/torch/utils/data/_utils/pin_memory.py#L68-L73
because the previous check is now doing `isinstance(data, str)` instead of `isinstance(data, (str, bytes))`!
https://github.com/pytorch/pytorch/blob/c263bd43e8e8502d4726643bc6fd046f0130ac0e/torch/utils/data/_utils/pin_memory.py#L56-L57

As a result, `pin_memory` gets called recursively for each element in the `bytes` leading to a ton of wasted recursion. This also explains the slowdown / GIL contention I was seeing.

This PR simply changes `isinstance(data, str)` to `isinstance(data, (str, bytes))` to match the behavior before #94709

Pull Request resolved: #97737
Approved by: https://github.com/albanD, https://github.com/NivekT
XuehaiPan added a commit to XuehaiPan/pytorch that referenced this pull request Mar 29, 2023
@XuehaiPan
Copy link
Collaborator Author

@albanD I opened a PR to re-add relevant isinstance checks for bytes.

pytorchmergebot pushed a commit that referenced this pull request Mar 30, 2023
Revisit `torch._six.string_classes` (which is `(str, bytes)`) removal: `isinstance(obj, string_classes) -> isinstance(obj, str)`.

Both `str` and `bytes` are `Sequence` classes.

```python
In [1]: from typing import Sequence

In [2]: issubclass(bytes, Sequence)
Out[2]: True

In [3]: issubclass(str, Sequence)
Out[3]: True
```

Re-add `bytes` to type guards like:

```python
def is_seq(obj):
    return isinstance(obj, Sequence) and not isinstance(obj, (str, bytes))
```

Ref:

- #94709 (comment)
- #97737
- #97789
Pull Request resolved: #97863
Approved by: https://github.com/Skylion007, https://github.com/albanD
XuehaiPan pushed a commit to XuehaiPan/pytorch that referenced this pull request Mar 31, 2023
…es (pytorch#97737)

Slack thread: https://pytorch.slack.com/archives/GEEQ2K4MD/p1679962409906099

I was seeing some massive (~2x) slowdowns on a job after running it on PyTorch 2.0. From some profiling in `py-spy` it looked like the pin_memory thread was doing a lot more work than before. Looking at a trace in `nsys` I saw the thread doing the forward pass having a bunch of `pthread_cond_timedwait` with GIL reacquire calls in it’s call stack, and it seemed like the thread doing the forward pass was getting blocked (waiting for the GIL) by the pin memory thread (which was holding the GIL).

After some debugging I found out the issue. If a `bytes` was passed into `pin_memory`, previously in 1.13 (before pytorch#94709) it would short-circuit and return here
https://github.com/pytorch/pytorch/blob/d922c29a22e4bf0fba49526f7536395eb8cd66f4/torch/utils/data/_utils/pin_memory.py#L54-L55
since `bytes` was in `torch._six.string_classes`:
```
>>> from torch._six import string_classes
>>> string_classes
(<class 'str'>, <class 'bytes'>)
>>>
```

However after pytorch#94709, if a `bytes` was passed into `pin_memory` it would fall into here instead
https://github.com/pytorch/pytorch/blob/c263bd43e8e8502d4726643bc6fd046f0130ac0e/torch/utils/data/_utils/pin_memory.py#L68-L73
because the previous check is now doing `isinstance(data, str)` instead of `isinstance(data, (str, bytes))`!
https://github.com/pytorch/pytorch/blob/c263bd43e8e8502d4726643bc6fd046f0130ac0e/torch/utils/data/_utils/pin_memory.py#L56-L57

As a result, `pin_memory` gets called recursively for each element in the `bytes` leading to a ton of wasted recursion. This also explains the slowdown / GIL contention I was seeing.

This PR simply changes `isinstance(data, str)` to `isinstance(data, (str, bytes))` to match the behavior before pytorch#94709

Pull Request resolved: pytorch#97737
Approved by: https://github.com/albanD, https://github.com/NivekT
XuehaiPan added a commit to XuehaiPan/pytorch that referenced this pull request Mar 31, 2023
…97863)

Revisit `torch._six.string_classes` (which is `(str, bytes)`) removal: `isinstance(obj, string_classes) -> isinstance(obj, str)`.

Both `str` and `bytes` are `Sequence` classes.

```python
In [1]: from typing import Sequence

In [2]: issubclass(bytes, Sequence)
Out[2]: True

In [3]: issubclass(str, Sequence)
Out[3]: True
```

Re-add `bytes` to type guards like:

```python
def is_seq(obj):
    return isinstance(obj, Sequence) and not isinstance(obj, (str, bytes))
```

Ref:

- pytorch#94709 (comment)
- pytorch#97737
- pytorch#97789
Pull Request resolved: pytorch#97863
Approved by: https://github.com/Skylion007, https://github.com/albanD
miamannionx added a commit to miamannionx/taming-transformers that referenced this pull request Apr 4, 2023
Torch does not support torch._six anymore, so removed from torch._six import string_classes

Replaced string_classes with str.

Reference: pytorch/pytorch#94709
NivekT pushed a commit that referenced this pull request Apr 4, 2023
…es (#97737)

Slack thread: https://pytorch.slack.com/archives/GEEQ2K4MD/p1679962409906099

I was seeing some massive (~2x) slowdowns on a job after running it on PyTorch 2.0. From some profiling in `py-spy` it looked like the pin_memory thread was doing a lot more work than before. Looking at a trace in `nsys` I saw the thread doing the forward pass having a bunch of `pthread_cond_timedwait` with GIL reacquire calls in it’s call stack, and it seemed like the thread doing the forward pass was getting blocked (waiting for the GIL) by the pin memory thread (which was holding the GIL).

After some debugging I found out the issue. If a `bytes` was passed into `pin_memory`, previously in 1.13 (before #94709) it would short-circuit and return here
https://github.com/pytorch/pytorch/blob/d922c29a22e4bf0fba49526f7536395eb8cd66f4/torch/utils/data/_utils/pin_memory.py#L54-L55
since `bytes` was in `torch._six.string_classes`:
```
>>> from torch._six import string_classes
>>> string_classes
(<class 'str'>, <class 'bytes'>)
>>>
```

However after #94709, if a `bytes` was passed into `pin_memory` it would fall into here instead
https://github.com/pytorch/pytorch/blob/c263bd43e8e8502d4726643bc6fd046f0130ac0e/torch/utils/data/_utils/pin_memory.py#L68-L73
because the previous check is now doing `isinstance(data, str)` instead of `isinstance(data, (str, bytes))`!
https://github.com/pytorch/pytorch/blob/c263bd43e8e8502d4726643bc6fd046f0130ac0e/torch/utils/data/_utils/pin_memory.py#L56-L57

As a result, `pin_memory` gets called recursively for each element in the `bytes` leading to a ton of wasted recursion. This also explains the slowdown / GIL contention I was seeing.

This PR simply changes `isinstance(data, str)` to `isinstance(data, (str, bytes))` to match the behavior before #94709

Pull Request resolved: #97737
Approved by: https://github.com/albanD, https://github.com/NivekT
atalman pushed a commit that referenced this pull request Apr 5, 2023
…97789, #97863) (#98055)

* [DataLoader] Short circuit pin_memory recursion when operating on bytes (#97737)

Slack thread: https://pytorch.slack.com/archives/GEEQ2K4MD/p1679962409906099

I was seeing some massive (~2x) slowdowns on a job after running it on PyTorch 2.0. From some profiling in `py-spy` it looked like the pin_memory thread was doing a lot more work than before. Looking at a trace in `nsys` I saw the thread doing the forward pass having a bunch of `pthread_cond_timedwait` with GIL reacquire calls in it’s call stack, and it seemed like the thread doing the forward pass was getting blocked (waiting for the GIL) by the pin memory thread (which was holding the GIL).

After some debugging I found out the issue. If a `bytes` was passed into `pin_memory`, previously in 1.13 (before #94709) it would short-circuit and return here
https://github.com/pytorch/pytorch/blob/d922c29a22e4bf0fba49526f7536395eb8cd66f4/torch/utils/data/_utils/pin_memory.py#L54-L55
since `bytes` was in `torch._six.string_classes`:
```
>>> from torch._six import string_classes
>>> string_classes
(<class 'str'>, <class 'bytes'>)
>>>
```

However after #94709, if a `bytes` was passed into `pin_memory` it would fall into here instead
https://github.com/pytorch/pytorch/blob/c263bd43e8e8502d4726643bc6fd046f0130ac0e/torch/utils/data/_utils/pin_memory.py#L68-L73
because the previous check is now doing `isinstance(data, str)` instead of `isinstance(data, (str, bytes))`!
https://github.com/pytorch/pytorch/blob/c263bd43e8e8502d4726643bc6fd046f0130ac0e/torch/utils/data/_utils/pin_memory.py#L56-L57

As a result, `pin_memory` gets called recursively for each element in the `bytes` leading to a ton of wasted recursion. This also explains the slowdown / GIL contention I was seeing.

This PR simply changes `isinstance(data, str)` to `isinstance(data, (str, bytes))` to match the behavior before #94709

Pull Request resolved: #97737
Approved by: https://github.com/albanD, https://github.com/NivekT

* [DataLoader] Fix  collation logic (#97789)

Similar to #97737, a previous auto-refactor changed how `bytes` are handled during collation, which can potentially lead to performance regression. This PR undoes that.
Pull Request resolved: #97789
Approved by: https://github.com/albanD

* Revisit `torch._six.string_classes` removal (#94709) (#97863)

Revisit `torch._six.string_classes` (which is `(str, bytes)`) removal: `isinstance(obj, string_classes) -> isinstance(obj, str)`.

Both `str` and `bytes` are `Sequence` classes.

```python
In [1]: from typing import Sequence

In [2]: issubclass(bytes, Sequence)
Out[2]: True

In [3]: issubclass(str, Sequence)
Out[3]: True
```

Re-add `bytes` to type guards like:

```python
def is_seq(obj):
    return isinstance(obj, Sequence) and not isinstance(obj, (str, bytes))
```

Ref:

- #94709 (comment)
- #97737
- #97789
Pull Request resolved: #97863
Approved by: https://github.com/Skylion007, https://github.com/albanD

---------

Co-authored-by: Eric Zhang <ezhang887@gmail.com>
Co-authored-by: Kevin Tse <ktse@fb.com>
juanmiguelnc added a commit to juanmiguelnc/ULIP that referenced this pull request Apr 29, 2023
This commit fixes a deprecated import statement in the codebase that uses the string_classes module from PyTorch. The deprecated import statement from torch import string_classes has been replaced with a simpler and more Pythonic alternative string_classes = str.

The string_classes module was previously used to define the type of a string, but it has been removed in more recent versions of PyTorch. This commit provides a cleaner and more compatible solution that works with newer versions of PyTorch.

See pytorch/pytorch#94709
fenglui added a commit to fenglui/FlagAI that referenced this pull request Sep 17, 2023
# Torch does not support torch._six anymore and it has been removed.
# Refer - pytorch/pytorch#94709
alex391 added a commit to alex391/bindsnet that referenced this pull request Feb 29, 2024
alex391 added a commit to alex391/bindsnet that referenced this pull request Feb 29, 2024
alex391 added a commit to alex391/bindsnet that referenced this pull request Feb 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
better-engineering Relatively self-contained tasks for better engineering contributors ciflow/inductor ciflow/mps Run MPS tests (subset of trunk) ciflow/trunk Trigger trunk jobs on your pull request Merged module: dynamo open source release notes: releng release notes category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants