shell: input: stop writing stdin when reader is not ready #5970

grondo · 2024-05-15T15:55:44Z

The stdin shell plugin reads input data from the KVS and tries to write it to the desitination task(s), even if the tasks aren't reading stdin. If the task is stopped (due to a debugging session) or otherwise isn't reading stdin, the buffers fill and the job is killed with:

2.320s: flux-shell[0]: FATAL: input: flux_subprocess_write: No space left on device

This is simple to reproduce given a large input file (here a 15MB file called 15M):

$ flux run --input=15M sleep inf
2.482s: job.exception type=exec severity=0 flux_subprocess_write: No space left on device
flux-job: task(s) exited with exit code 1
2.320s: flux-shell[0]: FATAL: input: flux_subprocess_write: No space left on device

The text was updated successfully, but these errors were encountered:

grondo · 2024-05-16T02:55:19Z

Related #2459 😞
There's some ideas for "solutions" in that issue.

I had to remind myself how this works. Even if stdin is set to a file, the file contents are read and streamed to a guest.input eventlog. Each task then separately watches the guest.input eventlog and sends the contentss of each data event to the task. If the task subprocess internal buffer fills, then ENOSPC is returned and a fatal job exception is raised.

It is going to be difficult to do flow control via an eventlog, though @chu11 presents some ideas in #2459. Maybe for a first cut, file input could read from the file per shell and write directly to each task, skipping the KVS (the rank 0 shell could put a redirect event in the eventlog). When the buffer fills, it is much easier to stop the fd watcher than an streaming rpc (I'm not sure there a way to stop these?)

There would still be a problem with flow control when getting input from an eventlog though 🤔 so perhaps it would be better to figure out how to solve that problem anyway.

grondo · 2024-05-23T22:55:42Z

I think we can close this one after #6005 since the file input method no longer goes through the KVS. We will keep #2459 open to track the lack of flow control in the "service" or interactive input implementation.

grondo added this to the flux-core-0.63.0 milestone May 15, 2024

grondo closed this as completed May 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

shell: input: stop writing stdin when reader is not ready #5970

shell: input: stop writing stdin when reader is not ready #5970

grondo commented May 15, 2024

grondo commented May 16, 2024

grondo commented May 23, 2024

shell: input: stop writing stdin when reader is not ready #5970

shell: input: stop writing stdin when reader is not ready #5970

Comments

grondo commented May 15, 2024

grondo commented May 16, 2024

grondo commented May 23, 2024