Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

shell: input: stop writing stdin when reader is not ready #5970

Closed
grondo opened this issue May 15, 2024 · 2 comments
Closed

shell: input: stop writing stdin when reader is not ready #5970

grondo opened this issue May 15, 2024 · 2 comments

Comments

@grondo
Copy link
Contributor

grondo commented May 15, 2024

The stdin shell plugin reads input data from the KVS and tries to write it to the desitination task(s), even if the tasks aren't reading stdin. If the task is stopped (due to a debugging session) or otherwise isn't reading stdin, the buffers fill and the job is killed with:

2.320s: flux-shell[0]: FATAL: input: flux_subprocess_write: No space left on device

This is simple to reproduce given a large input file (here a 15MB file called 15M):

$ flux run --input=15M sleep inf
2.482s: job.exception type=exec severity=0 flux_subprocess_write: No space left on device
flux-job: task(s) exited with exit code 1
2.320s: flux-shell[0]: FATAL: input: flux_subprocess_write: No space left on device
@grondo grondo added this to the flux-core-0.63.0 milestone May 15, 2024
@grondo
Copy link
Contributor Author

grondo commented May 16, 2024

Related #2459 😞
There's some ideas for "solutions" in that issue.

I had to remind myself how this works. Even if stdin is set to a file, the file contents are read and streamed to a guest.input eventlog. Each task then separately watches the guest.input eventlog and sends the contentss of each data event to the task. If the task subprocess internal buffer fills, then ENOSPC is returned and a fatal job exception is raised.

It is going to be difficult to do flow control via an eventlog, though @chu11 presents some ideas in #2459. Maybe for a first cut, file input could read from the file per shell and write directly to each task, skipping the KVS (the rank 0 shell could put a redirect event in the eventlog). When the buffer fills, it is much easier to stop the fd watcher than an streaming rpc (I'm not sure there a way to stop these?)

There would still be a problem with flow control when getting input from an eventlog though 🤔 so perhaps it would be better to figure out how to solve that problem anyway.

@grondo
Copy link
Contributor Author

grondo commented May 23, 2024

I think we can close this one after #6005 since the file input method no longer goes through the KVS. We will keep #2459 open to track the lack of flow control in the "service" or interactive input implementation.

@grondo grondo closed this as completed May 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant