-
Notifications
You must be signed in to change notification settings - Fork 121
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lock expired error when multiple clients fetch a particular resource from the Community Solid Server. #1843
Comments
Some preliminary results. I did some small tests with sending a lot of requests to a server on my own machine with a single document on a single worker thread. Sending all requests at the same time as the for loop in the issue above and awaiting all their results. 2000 requests were fine, but when I did a loop of 3000 they no longer got a response, even after waiting much longer than the lock expiration time. Except for the very first request, which immediately returned a 401 (instead of a 200). In this case, the server also only looks up the ACL for the first requests, for all other 2999 there is no log entry of trying to access the ACL. After stopping the client that is sending all the requests, and starting a new one with only 1 there is still no result, so it seems once the server gets stuck it stays stuck, or perhaps it takes longer to get rid of the original connections than I waited. Question then is where it gets stuck and why. I tried the same for loop but with the |
Thanks for the reply @joachimvh , indeed it performs better when awaiting the results. However, using an await isn't applicable in a real-world scenario when multiple clients are requesting at the same time without any communication among themselves.Client 57 doesn't know that it has to wait till a previous client 12's GET process's promise is resolved. |
I meant that I created 300 promises, each doing a fetch, and put them all in a |
do you mean 200 and 300 here or 2k and 3k? because I am getting lock error with 300 clients. |
2k and 3k. But the same machine was client and server in this case which helps with the results. The core point is that there is a number of requests at which the server seems to become unresponsive. |
Indeed, do you think of some ways with which we can improve the responsivity of server or something in the architecture that is currently an obstacle? |
To find the cause more investigation would be necessary. It could have something to do with how the file system is used, but no way to really tell with this what exactly is causing it. Adding caching of resources in memory could probably help if most requests are GETs, as could be seen from the better results of the memory backend. But then you would also need some way to invalidate cache of other worker threads when you're using more than one. |
GETs are only a part of the experiment when sensor data is read from the pod, but for new data to arrive and to be written to the solid pod. I would expect more PATCH requests to be done. |
I spent some time running evaluations with different server configurations trying to see if I could find the cause. Lots of text incoming before I reach my conclusion. I only tested a for loop with multiple connections. I did not look into all the ldes stuff. This was the test code: const total = 10000;
const mod = Math.floor(total / 100);
(async function() {
await fetch('http://localhost:3000/foo', {
method: 'PUT',
headers: {
'content-type': 'text/plain',
},
body: 'hello',
});
console.log('starting runs');
const promises = [];
for (let i = 0; i < total; ++i) {
promises.push(doCall(i));
}
console.log(await Promise.all(promises));
})();
async function doCall(i) {
const res = await fetch('http://localhost:3000/foo');
if (i % mod === 0) {
console.log(i, res.status);
}
return res.status;
} This is a table with the max requests I could do on my machine before running into issues. All of these started from the
The results seemed to indicate that the, or at least one, problem is quite probably related to the locking system. After some more digging, it seemed that the issue was mostly caused by the lock created on the resource that keeps track of the amount of open read requests on a resource as done here: CommunitySolidServer/src/util/locking/BaseReadWriteLocker.ts Lines 71 to 77 in 5347025
For both the class SimpleLocker implements ResourceLocker {
protected locked: Record<string, (() => void)[] | undefined> = {};
public async acquire(identifier: ResourceIdentifier): Promise<void> {
const promises = this.locked[identifier.path];
if (!promises) {
this.locked[identifier.path] = [];
return;
}
let resolve: () => void;
const prom = new Promise<void>((res): void => {
resolve = res;
});
promises.push(resolve!);
await prom;
}
public async release(identifier: ResourceIdentifier): Promise<void> {
// Unlock the next promise if there is one
const promises = this.locked[identifier.path];
if (!promises) {
throw new InternalServerError(`Trying to unlock resource that is not locked: ${identifier.path}`);
}
if (promises.length === 0) {
delete this.locked[identifier.path];
return;
}
promises.splice(0, 1)[0]();
}
} Using this locker with the default I'm not exactly sure how robust and correct that locking code is, so I also looked into a different locking library. I tried out the So probably going to look into replacing the Note that all of this is only relevant when using the file or memory locker. None of this is relevant when using the Redis locker. So if you also have problems when using that one it would not be solved by this. |
I also ran some tests using the redis locker just now. While it gave 401s for some requests due to the locker timing out, if you put the expiration high enough that these don't occur, the behaviour is similar to not having a locker at all. So while we should probably replace our memory locker, the redis locker can already support situations with more requests. So if you also have issues using that locker, and increasing the expiration, there is a different problem that can't be reproduced by just running a bunch of simultaneous requests. |
Environment
Description
I have a solid server located at http://n061-14a.wall2.ilabt.iminds.be:3000/ with 24 workers. (please use a IDLab-IGent VPN)
Upon fetching the same resource by multiple clients (i.e multiple GET requests), the server throws an error:
This behaviour can be reproduced by
As the server implements a multiple-read/single write lock, it is unexpected as the amount of GET requests is lower than the 789 (with 24 workers) as demonstrated in the graph 3 of the test here.
Moreover,
Using the LDES reader to read between a window as specified in the code below does 4 GET requests to the CSS.
Please install the versionawareldesinldp package before executing the code using,
However, if I simulate 25 clients (i.e 25*4 = 100 GET requests) with the following code
I get the following error on the server side,
and on the client side,
The server works unexpectedly when responding to the amount of GET requests.
The text was updated successfully, but these errors were encountered: