Ensure thread-safety in cleaning up inactive `BatchedCommand` instances #7251

christian-monch · 2023-01-12T12:24:31Z

This is a follow-up to issue #6179.

The current implementation of #6197 contains a race condition that could lead to deleting the runner from an active BatchedCommand-instance.

Also, from within BatchedCommand instance- and class-code, we can only clean up elements internal to BatchedCommand, i.e. we can remove runner that are not actively used, In 0.18.0, there might be a problem with identifying all instances because elements are removed from the instance dictionary.

I would like to question the necessity of this specific garbage collection. If we want to keep it, I think it has to be modified to ensure that no runner is removed from an active instance. I will open a PR linked to this issue that uses locking to prevent the race condition mentioned above.

The text was updated successfully, but these errors were encountered:

yarikoptic · 2023-01-24T15:53:04Z

The current implementation of #6197 contains a race condition that could lead to deleting the runner from an active BatchedCommand-instance.

race of what with what?

"active" how?

note : looking at the code -- the .runner (initialized within _initialize) belongs to that instance, and should not be provided from outside. So I think it would be ok for that class to stop/delete runner if it decides so in particular if it was not used for a while. Original code even has assumption that batched process can die and needs to be resurrected. The only race I might see is that something starts using runner again after awhile and cleanup triggers before timestamp is updated. May be making DATALAD_RUNTIME_MAX__INACTIVE__AGE some small value, thus encouraging often cleanups could somehow trigger such a use case where batched command would be queried at some times after some times close to that set age?

yet to analyze what PR suggest - I would expect some locking around operations which set/update _active_last and act on it since that is the only spots I think we could trigger some kind of a race condition that runner indeed somehow would be killed while made active again.

christian-monch · 2023-01-31T14:22:36Z

yet to analyze what PR suggest - I would expect some locking around operations which set/update _active_last and act on it since that is the only spots I think we could trigger some kind of a race condition that runner indeed somehow would be killed while made active again.

That is actually done in the increased_active-context handler

christian-monch linked a pull request Jan 12, 2023 that will close this issue

Rf batched command #7252

Draft

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ensure thread-safety in cleaning up inactive `BatchedCommand` instances #7251

Ensure thread-safety in cleaning up inactive `BatchedCommand` instances #7251

christian-monch commented Jan 12, 2023 •

edited

yarikoptic commented Jan 24, 2023 •

edited

christian-monch commented Jan 31, 2023

Ensure thread-safety in cleaning up inactive BatchedCommand instances #7251

Ensure thread-safety in cleaning up inactive BatchedCommand instances #7251

Comments

christian-monch commented Jan 12, 2023 • edited

yarikoptic commented Jan 24, 2023 • edited

christian-monch commented Jan 31, 2023

Ensure thread-safety in cleaning up inactive `BatchedCommand` instances #7251

Ensure thread-safety in cleaning up inactive `BatchedCommand` instances #7251

christian-monch commented Jan 12, 2023 •

edited

yarikoptic commented Jan 24, 2023 •

edited