Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retry on timeouts and connection errors #43

Open
danmichaelo opened this issue Jul 21, 2020 · 1 comment
Open

Retry on timeouts and connection errors #43

danmichaelo opened this issue Jul 21, 2020 · 1 comment

Comments

@danmichaelo
Copy link

danmichaelo commented Jul 21, 2020

I'm harvesting from a server which frequently times out (requests.exceptions.Timeout). Then, the request is not retried even though I set max_retries, since the retry functionality only covers the case where you actually get a response from the server.

I would like to extend the retry functionality to also include timeouts, but rather than increasing the complexity of the _request method further, I think it's worth considering switching to the tested and tried Retry urllib3 class. For some background on the class, see https://kevin.burke.dev/kevin/urllib3-retries/

Retry also handles the Retry-After header, so it shouldn't be that different from the current behaviour. The main difference is that it uses a backoff factor instead of a fixed sleep time:

sleep_time = backoff_factor * (2**retry_number)

Since OAI-PMH servers can be quite slow, we could set the default backoff factor to something like 2, to make the sleep time increase quickly. It is capped to BACKOFF_MAX=120 seconds by default

>>> for x in range(2,10):
>>>     print('Retry %s : sleep time %.1f seconds' % (x, min(120, 2 * (2**x))))
Retry 2 : sleep time 8.0 seconds
Retry 3 : sleep time 16.0 seconds
Retry 4 : sleep time 32.0 seconds
Retry 5 : sleep time 64.0 seconds
Retry 6 : sleep time 120.0 seconds
Retry 7 : sleep time 120.0 seconds
Retry 8 : sleep time 120.0 seconds
Retry 9 : sleep time 120.0 seconds

Breaking change: This means that the default_retry_after argument would no longer be supported.

Let me know what you think, and whether there is a chance a PR for this would be accepted.

@danmichaelo
Copy link
Author

danmichaelo commented Jul 21, 2020

By the way, it looks like Retry also handles formatted dates, so I think it should take care of the issue described in #28, although I haven't tested it.

danmichaelo added a commit to danmichaelo/sickle that referenced this issue Jul 21, 2020
Breaking changes:

- `default_retry_after` has been replaced by `retry_backoff_factor`
danmichaelo added a commit to danmichaelo/sickle that referenced this issue Jul 21, 2020
Breaking changes:

- `default_retry_after` has been replaced by `retry_backoff_factor`
- Extended the default `retry_status_codes` with more codes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant