-
Notifications
You must be signed in to change notification settings - Fork 447
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pkg/blobserver/fsbacked: add blobserver using existing local files as storage [WIP] #1282
base: master
Are you sure you want to change the base?
Conversation
Re: the blob size limit, you could solve that by keeping track of offset & size info in the db as well, so that multiple blobs could be contained in the same file. Then the file gets split up when added as normal, but the store just points the different parts to different chunks of that same physical file. As far as the problem of including the data in perkeep without storing it twice -- what I do here is to just dump the data into perkeep and then access it via the fuse filesystem. Something like this would have been nice to have when I was doing a mass import of my backups months ago; not having to actually copy 2TiB of data might have sped things up a bit. I probably still would have had to write custom tooling though... |
Thanks for that suggestion! It's implemented in 4d9ca00. That commit also adds a new type, |
Just discovered there's an existing feature request for this: #1226. |
This PR is a preliminary sketch for a new blobserver type that uses files uploaded to it as their own storage.
When you add a file to an
fsbacked.Storage
that's within the directory tree it controls, an entry is added to a database that maps between files and blobrefs; but the file's contents are not copied anywhere. When fetching the file's content blob later, the database directs theStorage
to the right local file and the data is served from there.Adding files outside the directory tree, or adding any other kind of blob, fails over to another blobserver nested inside the
fsbacked.Storage
.This solves the problem of wanting to add a tree of large files (e.g., videos of my kids growing up) to a local Perkeep instance without storing all the data twice. This should be used only on directory trees whose files do not change, lest the blobrefs in the database become mismatched to their corresponding files.
A number of other changes throughout Perkeep would be needed to make this truly useful. The
io.Reader
presented to a blobserver'sReceiveBlob
method is usually (always?) some wrapper object (like checkHashReader) that conceals the underlying*os.File
, without whichfsbacked.Storage
cannot detect that a file within its tree is being uploaded. And in any case, Perkeep imposes rather a low limit on blob sizes for this purpose.Presented for further discussion.