Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failures refreshing OAuth tokens #91

Open
paul121 opened this issue May 22, 2020 · 9 comments
Open

Failures refreshing OAuth tokens #91

paul121 opened this issue May 22, 2020 · 9 comments
Labels
2x farmOS 2x support bug Something isn't working

Comments

@paul121
Copy link
Member

paul121 commented May 22, 2020

Some attempts to refresh OAuth tokens on the farmOS server are failing, but only when there are two back-to-back attempts to refresh using the same refresh_token.

It's unclear how often this happens, but some of the time the second request fails (as expected) because the refresh_token has already been used. Other times both requests succeed, which shouldn't happen. This is likely due to how the farmOS server (and underlying oauth2_server module) handle the request to refresh tokens.

The larger issue, though, is that there are two attempts to refresh tokens. One thought is that when instantiating farmOS.py it makes multiple requests to the server (to get a CSRF token) and later process the request (and both of these might try to refresh a token before hand) but these are handled synchronously so I don't believe this is causing the problem.

I think it's likely that if multiple requests come to the aggregator requesting data from the same farmOS server, after an access token has expired (every hour under sustained use), then there could sometimes be multiple attempts to refresh the same token.

@paul121
Copy link
Member Author

paul121 commented May 22, 2020

I should note that even though there is a failure to refresh (and admins are notified of the failure), the farm does not end up unauthorized in the Aggregator. This is because in these cases there is at least one successful refresh that happens...which populates the aggregator DB with a valid token.

@paul121
Copy link
Member Author

paul121 commented May 27, 2020

I've implemented a lock with the Python threading.event object. This ensures that at a given time only one thread will be initializing the farmOS.py client for a given farm.id. Because the OAuth refresh flow takes place when initializing the client, this should prevent multiple refresh flows from being attempted.

Requests that are made while another thread is refreshing wait for the lock to be made available before loading the latest tokens and continuing on to creating their own client.

These logs describe what happens:

backend_1   | INFO:     Thread 5808:  creating farm client for farm ID: 12
backend_1   | INFO:     Thread 1216:  creating farm client for farm ID: 12
backend_1   | INFO:     Thread 1216: Another thread is making requests to this farm. Waiting.
backend_1   | INFO:     Thread 5808: Got new token!: c53ed132e239dc7c6ddf999123a11ebbb1f5667a
backend_1   | INFO:     Thread 5808: Notifying that I am done making requests to this farm.
backend_1   | INFO:     Thread 1216: The other thread is done.
backend_1   | INFO:     Thread 1216: Old token: 822d5beca135abe72c1cc48e476a78be6f5f343a
backend_1   | INFO:     Thread 1216: Reloading farm...
backend_1   | INFO:     Thread 1216:  New token: c53ed132e239dc7c6ddf999123a11ebbb1f5667a
backend_1   | INFO:     172.20.0.5:36608 - "GET /api/v1/farms/info/?use_cached=false&farm_id=12 HTTP/1.0" 200 OK
backend_1   | INFO:     Thread 1216: Notifying that I am done making requests to this farm.
backend_1   | INFO:     172.20.0.5:36616 - "GET /api/v1/farms/info/?use_cached=false&farm_id=12 HTTP/1.0" 200 OK

@mstenta
Copy link
Member

mstenta commented May 29, 2020

Ah great! That makes perfect sense @paul121 ! Great work!

I've deployed this change, so let's see if those errors clear up. I'll close this, but we can reopen if necessary (or if you think it needs to stay open for any reason).

@mstenta mstenta closed this as completed May 29, 2020
@mstenta
Copy link
Member

mstenta commented Jul 29, 2020

We've still been seeing these alert emails from time to time, so reopening this.

@mstenta mstenta reopened this Jul 29, 2020
@mstenta
Copy link
Member

mstenta commented Jul 30, 2020

@paul121 and I did some deep dives into the logs yesterday to try to find clues. Made some progress but still haven't been able to replicate the issue ourselves. Here's what we found:

In looking at the Aggregator's backend server logs, we can see that the issue is typically happening when two requests are made to the same farmOS site back-to-back (either in the same second, or within a second of each other).

For example:

Jul 29, 2020 @ 09:38:43.000INFO:root:send email result: <emails.backend.SMTPResponse status_code=250 status_text=b'Ok 010001739acbedc3-53c6d707-d599-4ef6-b226-ef895ad215c4-000000'>
Jul 29, 2020 @ 09:38:43.000ERROR:root:Cannot authenticate client with farmOS server id: 115 - InvalidGrantError('(invalid_grant) Invalid refresh token')(invalid_grant) Invalid refresh token
Jul 29, 2020 @ 09:38:43.000172.21.0.5:58172 - "GET /api/v1/farms/areas/?farm_url=YYYYY.farmos.net HTTP/1.0" 200
Jul 29, 2020 @ 09:38:42.000172.21.0.5:58174 - "GET /api/v1/farms/assets/?farm_url=YYYYY.farmos.net HTTP/1.0" 200
Jul 29, 2020 @ 09:38:42.000INFO:root:send email result: <emails.backend.SMTPResponse status_code=250 status_text=b'Ok 010001739acbebe8-990f374d-c05a-4934-9fa6-6e920df8ae8a-000000'>
Jul 29, 2020 @ 09:38:41.000172.21.0.5:58166 - "GET /api/v1/farms/areas/?farm_url=XXXXX.farmos.net HTTP/1.0" 200
Jul 29, 2020 @ 09:38:41.000172.21.0.5:58164 - "GET /api/v1/farms/assets/?farm_url=XXXXX.farmos.net HTTP/1.0" 200
Jul 29, 2020 @ 09:38:41.000INFO:root:Sending admin alert message: Cannot authenticate client with farmOS server id: 115 - InvalidGrantError('(invalid_grant) Invalid refresh token')(invalid_grant) Invalid refresh token

But, this does not happen consistently. Most of the time there is no issue. And we are still unable to replicate it consistently.

We tested sending batches of requests to the same endpoints via the browser console with the following code (courtesy of @paul121):

// Go to oursci.farmos.group/docs, login
// Copy the auth token that is returned and save in variable
const auth = 'Bearer head.params.sig';

// Multiple requests
Promise.all([
  fetch('/api/v1/farms/areas/?farm_url=test.farmos.net', {headers: { Authorization: auth}}),
  fetch('/api/v1/farms/assets/?farm_url=test.farmos.net', {headers: { Authorization: auth}}),
  fetch('/api/v1/farms/areas/?farm_url=test.farmos.net', {headers: { Authorization: auth}}),
  fetch('/api/v1/farms/assets/?farm_url=test.farmos.net', {headers: { Authorization: auth}}),
  fetch('/api/v1/farms/areas/?farm_url=test.farmos.net', {headers: { Authorization: auth}}),
  fetch('/api/v1/farms/assets/?farm_url=test.farmos.net', {headers: { Authorization: auth}}),
  fetch('/api/v1/farms/areas/?farm_url=test.farmos.net', {headers: { Authorization: auth}}),
  fetch('/api/v1/farms/assets/?farm_url=test.farmos.net', {headers: { Authorization: auth}}),
])

Notably... when we look at the farmOS server logs after that we always see 2 requests being made to /oauth2/token - when there should only be one (the first one). This is an important clue. It indicates that the locking mechanism in farmOS-aggregator might not be working as intended... as it should be preventing a second request from going through until the first one is done.

Here are the logs:

2020-07-30T10:35:33.256-04:00
	
test.farmos.net 172.104.26.133 - - [30/Jul/2020:14:35:33 +0000] "POST /oauth2/token HTTP/1.0" 200 180
	
2020-07-30T10:35:33.261-04:00
	
test.farmos.net 172.104.26.133 - - [30/Jul/2020:14:35:33 +0000] "POST /oauth2/token HTTP/1.0" 200 180
	
2020-07-30T10:35:33.404-04:00
	
test.farmos.net 172.104.26.133 - - [30/Jul/2020:14:35:33 +0000] "GET /restws/session/token HTTP/1.0" 200 43
	
2020-07-30T10:35:33.407-04:00
	
test.farmos.net 172.104.26.133 - - [30/Jul/2020:14:35:33 +0000] "GET /restws/session/token HTTP/1.0" 200 43
	
2020-07-30T10:35:33.579-04:00
	
test.farmos.net 172.104.26.133 - - [30/Jul/2020:14:35:33 +0000] "GET /farm.json HTTP/1.0" 200 25765
	
2020-07-30T10:35:33.582-04:00
	
test.farmos.net 172.104.26.133 - - [30/Jul/2020:14:35:33 +0000] "GET /farm.json HTTP/1.0" 200 25765
	
2020-07-30T10:35:33.852-04:00
	
test.farmos.net 172.104.26.133 - - [30/Jul/2020:14:35:33 +0000] "GET /restws/session/token HTTP/1.0" 200 43
	
2020-07-30T10:35:33.853-04:00
	
test.farmos.net 172.104.26.133 - - [30/Jul/2020:14:35:33 +0000] "GET /restws/session/token HTTP/1.0" 200 43
	
2020-07-30T10:35:33.860-04:00
	
test.farmos.net 172.104.26.133 - - [30/Jul/2020:14:35:33 +0000] "GET /restws/session/token HTTP/1.0" 200 43
	
2020-07-30T10:35:33.862-04:00
	
test.farmos.net 172.104.26.133 - - [30/Jul/2020:14:35:33 +0000] "GET /restws/session/token HTTP/1.0" 200 43
	
2020-07-30T10:35:34.117-04:00
	
test.farmos.net 172.104.26.133 - - [30/Jul/2020:14:35:33 +0000] "GET /farm.json HTTP/1.0" 200 25765
	
2020-07-30T10:35:34.119-04:00
	
test.farmos.net 172.104.26.133 - - [30/Jul/2020:14:35:33 +0000] "GET /farm.json HTTP/1.0" 200 25765
	
2020-07-30T10:35:34.121-04:00
	
test.farmos.net 172.104.26.133 - - [30/Jul/2020:14:35:33 +0000] "GET /farm.json HTTP/1.0" 200 25765
	
2020-07-30T10:35:34.123-04:00
	
test.farmos.net 172.104.26.133 - - [30/Jul/2020:14:35:33 +0000] "GET /farm.json HTTP/1.0" 200 25765
	
2020-07-30T10:35:34.368-04:00
	
test.farmos.net 172.104.26.133 - - [30/Jul/2020:14:35:33 +0000] "GET /taxonomy_term.json?bundle=farm_areas&page=0 HTTP/1.0" 200 43608
	
2020-07-30T10:35:34.749-04:00
	
test.farmos.net 172.104.26.133 - - [30/Jul/2020:14:35:33 +0000] "GET /farm_asset.json?page=0 HTTP/1.0" 200 39760
	
2020-07-30T10:35:34.794-04:00
	
test.farmos.net 172.104.26.133 - - [30/Jul/2020:14:35:34 +0000] "GET /restws/session/token HTTP/1.0" 200 43
	
2020-07-30T10:35:34.878-04:00
	
test.farmos.net 172.104.26.133 - - [30/Jul/2020:14:35:34 +0000] "GET /taxonomy_term.json?bundle=farm_areas&page=0 HTTP/1.0" 200 43608
	
2020-07-30T10:35:34.882-04:00
	
test.farmos.net 172.104.26.133 - - [30/Jul/2020:14:35:34 +0000] "GET /taxonomy_term.json?bundle=farm_areas&page=0 HTTP/1.0" 200 43608
	
2020-07-30T10:35:35.006-04:00
	
test.farmos.net 172.104.26.133 - - [30/Jul/2020:14:35:34 +0000] "GET /farm.json HTTP/1.0" 200 25765
	
2020-07-30T10:35:35.116-04:00
	
test.farmos.net 172.104.26.133 - - [30/Jul/2020:14:35:34 +0000] "GET /restws/session/token HTTP/1.0" 200 43
	
2020-07-30T10:35:35.175-04:00
	
test.farmos.net 172.104.26.133 - - [30/Jul/2020:14:35:34 +0000] "GET /farm_asset.json?page=0 HTTP/1.0" 200 39760
	
2020-07-30T10:35:35.188-04:00
	
test.farmos.net 172.104.26.133 - - [30/Jul/2020:14:35:34 +0000] "GET /farm_asset.json?page=0 HTTP/1.0" 200 39760
	
2020-07-30T10:35:35.290-04:00
	
test.farmos.net 172.104.26.133 - - [30/Jul/2020:14:35:35 +0000] "GET /farm.json HTTP/1.0" 200 25765
	
2020-07-30T10:35:35.558-04:00
	
test.farmos.net 172.104.26.133 - - [30/Jul/2020:14:35:35 +0000] "GET /taxonomy_term.json?bundle=farm_areas&page=0 HTTP/1.0" 200 43608
	
2020-07-30T10:35:36.029-04:00
	
test.farmos.net 172.104.26.133 - - [30/Jul/2020:14:35:35 +0000] "GET /farm_asset.json?page=0 HTTP/1.0" 200 39760

As an aside, the logs show that farmOS.py is also making requests to farm.json and /restws/session/token with every request... which we should stop doing. I'll make a separate issue for that...

@mstenta
Copy link
Member

mstenta commented Jul 30, 2020

We also wondered if maybe this issue (which we ran into earlier) could be related: https://www.drupal.org/project/oauth2_server/issues/2859214

I don't think it is... but worth linking to it here in case we need to reassess it. My understanding is that the only issue with that is sometimes the last_updated timestamp for access tokens doesn't get updated. But that wouldn't cause what we're seeing here, AFAICT.

@paul121
Copy link
Member Author

paul121 commented Aug 5, 2020

Lets try this again...

With the above test sending multiple requests instantaneously using Promise.all([ request, request]) I was able to see multiple POST to oauth2/token on a local farmOS server. Still couldn't recreate the "Invalid refresh token" error, but observed multiple refresh flows which seems to be the core of the issue.

  • I made some improvements to the farmOS.py client and released v0.2.0. This changed a various things, but noteworthy here is that this should speed things up (eliminate unnecessary requests to farm.json and restws/session/token). 9861046

  • With these changes in place, it gives us a bit more control over when the farmOS.py client triggers a refresh. The underlying requests-oauthlib will not refresh until an authenticated request is attempted, and the access_token has already expired. Before v0.2.0, farmOS.py would trigger a refresh (if needed) while instantiating the client because it would make the above requests to farm.json and restws/session/token. With this commit 6a08d31 I've added some logic in the get_farm_client function to manually trigger a refresh before the thread releases its lock. In local testing this seems to be working.

After these changes I no longer see multiple POST to oauth2/token on the local server. With these changes I think we should deploy new backend images and see if the error goes away.

@paul121
Copy link
Member Author

paul121 commented Nov 17, 2020

Unfortunately we're still experiencing this issue after the latest changes, although it does seem to happen less frequently than before. Since this isn't breaking functionality with the 3rd party, we haven't considered this a critical issue. The client code could handle the error and make the request again.

I'm curious if this bug will be present once we start integrating farmOS 2x servers. Since we have a new OAuth backend, there is a decent chance this issue could go away... and/or create new issues :-) More info in #98

@mstenta mstenta added the 2x farmOS 2x support label Jul 20, 2022
@mstenta
Copy link
Member

mstenta commented Jul 20, 2022

Update: We are still seeing this issue occasionally with the 2.x branch of the Aggregator making requests to 2.x instances of farmOS.

That indicates that it's an issue with the Aggregator and not with the farmOS server (since the OAuth server code from farmOS 1.x to 2.x is completely different).

So we can put this wonder to rest:

We also wondered if maybe this issue (which we ran into earlier) could be related: https://www.drupal.org/project/oauth2_server/issues/2859214

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2x farmOS 2x support bug Something isn't working
Development

No branches or pull requests

2 participants