Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fsreplace1 should fallocate() #20352

Open
allisonkarlitskaya opened this issue Apr 23, 2024 · 5 comments
Open

fsreplace1 should fallocate() #20352

allisonkarlitskaya opened this issue Apr 23, 2024 · 5 comments

Comments

@allisonkarlitskaya
Copy link
Member

allisonkarlitskaya commented Apr 23, 2024

Until now we've been using fsreplace1 for very small files, but we're going to be using it to upload potentially large files in the future.

It occurred to me that — particularly on extent-based filesystems — best practice is to fallocate() the full size of the file in advance. This has two major advantages:

  • we can detect "not enough disk space" early: once the upload starts, it's basically guaranteed to complete. If we don't do this, we could end up uploading 90% of a giant file only to be told at the end "sorry, didn't fit!" (and fsreplace will delete it at this point, of course)
  • fragmentation: if we simply stream the file, the kernel has no way to know in advance how much space it needs to find, and will start by writing it into a small hole, then another, then another. This is even worse for uploads over a slow connection, because the kernel can play wait-and-see if it gets the data quick, but it's only willing to wait so long before it starts to commit (at which point, it's too late). If we tell the kernel at the start how many bytes we're going to need, it can find an appropriately-sized spot, reducing fragmentation.

Python has os.posix_fallocate() since at least 3.6: https://docs.python.org/3.6/library/os.html#os.posix_fallocate

Just as we recently added to fsread1, we could create a size-hint for fsreplace1 which results in the temporary file being created immediately and posix_fallocate() being called on it.

One consideration: the "hint" in the name in this case is potentially problematic, though, since this would be more than a hint: posix_fallocate() doesn't merely allocate the space: it also changes the length of the file. Also: the creation of the tempfile as soon as the channel opens sort of excludes the possibility of then deleting the file. For this reason, it's more than a "hint" and maybe should just be called size — unless we go out of our way to truncate the data back to the received size again in case we get less than we were expecting (which could be desirable if we ever wanted to support resuming uploads or something like that).

Another consideration: we implemented upload() in cockpit-upload-helper.ts as an operation on a ReadableStream, but this doesn't let us determine the overall size of the file. Maybe it would be better to change this back to being a Blob, or to give some other way to provide the size.

cc: @jelly

@allisonkarlitskaya
Copy link
Member Author

One consideration: the "hint" in the name in this case is potentially problematic, though, since this would be more than a hint: posix_fallocate() doesn't merely allocate the space: it also changes the length of the file. Also: the creation of the tempfile as soon as the channel opens sort of excludes the possibility of then deleting the file. For this reason, it's more than a "hint" and maybe should just be called size — unless we go out of our way to truncate the data back to the received size again in case we get less than we were expecting (which could be desirable if we ever wanted to support resuming uploads or something like that).

A quick look into implementing this shows that this isn't really much of a consideration. Two reasons:

  • the easiest way to implement this is to keep the current logic but add the fallocate() just after we create the temporary file, but only in the case that we do create it. That means the current semantics (no data packet = delete file) would be preserved.
  • on incomplete upload we delete the file anyway, so this "resume" idea is already a non-starter. The only place this could be relevant is if (for some reason) we send done before sending the total length... like if the file length changes on the client while it's in the middle of being uploaded?

@martinpitt
Copy link
Member

f (for some reason) we send done before sending the total length...

That'd just be a protocol/program bug. But I suppose you mean that at this point we send a wrong (outdated) size? I suppose writing more data than set with fallocate is relatively harmless -- the file would hopefully just grow (with a potential ENOSPC of course). If it shrinks, the channel could maybe do a final ftruncate()?

@allisonkarlitskaya
Copy link
Member Author

Ya. I think fruncate() might be worthwhile here. The bookkeeping gets a bit annoying, though. There's no variant of fruncate that accepts "to the current offset", so we'd either need to do a seek call to find out, or keep a counter while we're writing. Neither of those would be the end of the world...

@allisonkarlitskaya
Copy link
Member Author

allisonkarlitskaya commented Apr 23, 2024

the easiest way to implement this is to keep the current logic but add the fallocate() just after we create the temporary file, but only in the case that we do create it. That means the current semantics (no data packet = delete file) would be preserved.

This approach has one drawback: if we fallocate()ed the file immediately on open, then we could close (with problem) before ready. That might have some implications for better UI on the sender side. In fact, the way that the AsyncChannel is currently structured, the data is acknowledged as soon as the channel receives it — which means that you'd even get your first "ack" message (and progress callback) before we've successfully allocated the space. That's weird.

@jelly
Copy link
Member

jelly commented Apr 23, 2024

Ya. I think fruncate() might be worthwhile here. The bookkeeping gets a bit annoying, though. There's no variant of fruncate that accepts "to the current offset", so we'd either need to do a seek call to find out, or keep a counter while we're writing. Neither of those would be the end of the world...

Seek call seems nice. And I do like knowing up front that you have ENOSPACE, that is a great improvement thanks for opening this issue!

This approach has one drawback: if we fallocate()ed the file immediately on open, then we could close (with problem) before ready. That might have some implications for better UI on the sender side. In fact, the way that the AsyncChannel is currently structured, the data is acknowledged as soon as the channel receives it — which means that you'd even get your first "ack" message (and progress callback) before we've successfully allocated the space. That's weird.

That is weird, and this should be ordered no?

open
ready
close

I thought the protocol guaranteed this order.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants