Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle retries using urllib3 Retry #44

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

danmichaelo
Copy link

@danmichaelo danmichaelo commented Jul 21, 2020

Per #43, this is my take at replacing the custom retry code with the tried and tested urllib3 Retry class. Benefits include that it also handles Timeout and ConnectionError, and that the sleep time increases for each retry. Tests pass, and I'm also testing the branch in a real world scenario now, where I need to harvest from a server which is quite unstable, and happens to produce both timeouts, connection errors and 500 errors during updates. It seems to work fine so far!

  • A slight inconvenience of retrying upon ConnectionError, is that it will also retry if you misspell the hostname. Personally, I think the benefits of handling ConnectionError outweighs that incovenience, but it could be something to discuss. If the rest of the URL contains a typo, so that the server returns a 404, it will of course still raise an error immediately.

  • Perhaps the major issue here is that the default_retry_after argument can no longer be supported. It has been replaced by a new argument retry_backoff_factor, to control the exponential backoff factor, but the replacement will be a breaking change since the old argument cannot be converted into the new one.

  • Also note that I extended the default retry_status_codes with more codes. Since this is "OAI for humans", and humans do not necessarily know which errors to expect up front, I think it makes sense to include some more common error codes by default. But I can leave this extension out of this PR if it's controversial.

  • Logging: If it's desirable to have logging, one possibility is to enable debug logging for the retry module:

logging.getLogger("urllib3.util.retry").setLevel(logging.DEBUG)

to get log messages like these:

2020-07-21 16:01:45,408 DEBUG retry Incremented Retry for (url='/'): Retry(total=9, connect=5, read=2, redirect=None, status=9)
2020-07-21 16:01:45,410 DEBUG retry Incremented Retry for (url='/'): Retry(total=8, connect=5, read=2, redirect=None, status=8)
2020-07-21 16:01:48,020 DEBUG retry Incremented Retry for (url='/'): Retry(total=7, connect=5, read=2, redirect=None, status=7)
2020-07-21 16:01:53,227 DEBUG retry Incremented Retry for (url='/'): Retry(total=6, connect=5, read=2, redirect=None, status=6)
2020-07-21 16:02:03,635 DEBUG retry Incremented Retry for (url='/'): Retry(total=5, connect=5, read=2, redirect=None, status=5)
2020-07-21 16:02:24,445 DEBUG retry Incremented Retry for (url='/'): Retry(total=4, connect=5, read=2, redirect=None, status=4)
2020-07-21 16:03:06,052 DEBUG retry Incremented Retry for (url='/'): Retry(total=3, connect=5, read=2, redirect=None, status=3)

I'm not sure if this should be the responsibility of sickle or the user, though.

Breaking changes:

- `default_retry_after` has been replaced by `retry_backoff_factor`
- Extended the default `retry_status_codes` with more codes
It stops at BACKOFF_MAX = 120, so we can let it increase a bit faster.
total=max_retries,
backoff_factor=retry_backoff_factor,
status_forcelist=retry_status_codes or [429, 500, 502, 503, 504],
method_whitelist=frozenset(['GET', 'POST'])
Copy link

@bertsky bertsky Jan 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: method_whitelist has been deprecated in urllib3 v1.26 and removed in its v2. And since requests pins urllib3 to a range which includes the latter, IMHO the newer idiom must now be used here:

Suggested change
method_whitelist=frozenset(['GET', 'POST'])
allowed_methods=frozenset(['GET', 'POST'])

Otherwise, you'll get a

    retry_adapter = requests.adapters.HTTPAdapter(max_retries=Retry(
TypeError: __init__() got an unexpected keyword argument 'method_whitelist'

assert retries.total == 99
assert retries.backoff_factor == 1.1234
assert retries.status_forcelist == (418,)
assert retries.method_whitelist == frozenset(['POST', 'GET'])
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
assert retries.method_whitelist == frozenset(['POST', 'GET'])
assert retries.allowed_methods == frozenset(['POST', 'GET'])

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants