Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure thread-safety in cleaning up inactive BatchedCommand instances #7251

Open
christian-monch opened this issue Jan 12, 2023 · 2 comments · May be fixed by #7252
Open

Ensure thread-safety in cleaning up inactive BatchedCommand instances #7251

christian-monch opened this issue Jan 12, 2023 · 2 comments · May be fixed by #7252

Comments

@christian-monch
Copy link
Contributor

christian-monch commented Jan 12, 2023

This is a follow-up to issue #6179.

The current implementation of #6197 contains a race condition that could lead to deleting the runner from an active BatchedCommand-instance.

Also, from within BatchedCommand instance- and class-code, we can only clean up elements internal to BatchedCommand, i.e. we can remove runner that are not actively used, In 0.18.0, there might be a problem with identifying all instances because elements are removed from the instance dictionary.

I would like to question the necessity of this specific garbage collection. If we want to keep it, I think it has to be modified to ensure that no runner is removed from an active instance. I will open a PR linked to this issue that uses locking to prevent the race condition mentioned above.

@christian-monch christian-monch linked a pull request Jan 12, 2023 that will close this issue
1 task
@yarikoptic
Copy link
Member

yarikoptic commented Jan 24, 2023

The current implementation of #6197 contains a race condition that could lead to deleting the runner from an active BatchedCommand-instance.

race of what with what?

"active" how?

note : looking at the code -- the .runner (initialized within _initialize) belongs to that instance, and should not be provided from outside. So I think it would be ok for that class to stop/delete runner if it decides so in particular if it was not used for a while. Original code even has assumption that batched process can die and needs to be resurrected. The only race I might see is that something starts using runner again after awhile and cleanup triggers before timestamp is updated. May be making DATALAD_RUNTIME_MAX__INACTIVE__AGE some small value, thus encouraging often cleanups could somehow trigger such a use case where batched command would be queried at some times after some times close to that set age?

yet to analyze what PR suggest - I would expect some locking around operations which set/update _active_last and act on it since that is the only spots I think we could trigger some kind of a race condition that runner indeed somehow would be killed while made active again.

@christian-monch
Copy link
Contributor Author

yet to analyze what PR suggest - I would expect some locking around operations which set/update _active_last and act on it since that is the only spots I think we could trigger some kind of a race condition that runner indeed somehow would be killed while made active again.

That is actually done in the increased_active-context handler

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants