pkg/blobserver/fsbacked: add blobserver using existing local files as storage [WIP] #1282

bobg · 2019-11-27T22:57:14Z

This PR is a preliminary sketch for a new blobserver type that uses files uploaded to it as their own storage.

When you add a file to an fsbacked.Storage that's within the directory tree it controls, an entry is added to a database that maps between files and blobrefs; but the file's contents are not copied anywhere. When fetching the file's content blob later, the database directs the Storage to the right local file and the data is served from there.

Adding files outside the directory tree, or adding any other kind of blob, fails over to another blobserver nested inside the fsbacked.Storage.

This solves the problem of wanting to add a tree of large files (e.g., videos of my kids growing up) to a local Perkeep instance without storing all the data twice. This should be used only on directory trees whose files do not change, lest the blobrefs in the database become mismatched to their corresponding files.

A number of other changes throughout Perkeep would be needed to make this truly useful. The io.Reader presented to a blobserver's ReceiveBlob method is usually (always?) some wrapper object (like checkHashReader) that conceals the underlying *os.File, without which fsbacked.Storage cannot detect that a file within its tree is being uploaded. And in any case, Perkeep imposes rather a low limit on blob sizes for this purpose.

Presented for further discussion.

zenhack · 2019-11-27T23:16:50Z

Re: the blob size limit, you could solve that by keeping track of offset & size info in the db as well, so that multiple blobs could be contained in the same file. Then the file gets split up when added as normal, but the store just points the different parts to different chunks of that same physical file.

As far as the problem of including the data in perkeep without storing it twice -- what I do here is to just dump the data into perkeep and then access it via the fuse filesystem.

Something like this would have been nice to have when I was doing a mass import of my backups months ago; not having to actually copy 2TiB of data might have sped things up a bit. I probably still would have had to write custom tooling though...

bobg · 2019-11-29T18:39:58Z

you could solve that by keeping track of offset & size info in the db as well, so that multiple blobs could be contained in the same file

Thanks for that suggestion! It's implemented in 4d9ca00. That commit also adds a new type, FileSectionReader, that, if used by schema.WriteFileChunks et al., would be most of what this needs to be fully useful. That's a project for a future commit on this branch.

This reverts commit 29bd6f0.

bobg · 2019-12-01T19:54:12Z

Just discovered there's an existing feature request for this: #1226.

bobg added 4 commits November 27, 2019 08:18

Checkpoint.

0241106

Checkpoint.

90bb390

It works!

65454a2

Add blobserver registration in init().

3497a7c

googlebot added the cla: yes Author has submitted the Google CLA. label Nov 27, 2019

bobg added 3 commits November 28, 2019 11:17

go mod vendor

29bd6f0

Fix TestStore: do not reuse the db between cases; remove indeterminacy.

bad6397

Handle blobs that are sections of existing files as well as whole files.

4d9ca00

bobg added 4 commits November 29, 2019 19:46

Revert "go mod vendor"

4012bef

This reverts commit 29bd6f0.

Allow Namer and Section methods to return "unknown" values.

b680aec

Move Namer and Section et al. to new package pkg/iohelp.

25d0ab4

Use absolute paths in TestFindRelPath.

86c34a5

bobg mentioned this pull request Dec 1, 2019

go.mod: update some dependencies #1283

Closed

Use filepath.Join instead of Unix-chauvinistic path strings like a/b.

de37887

tgulacsi approved these changes Aug 1, 2021

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pkg/blobserver/fsbacked: add blobserver using existing local files as storage [WIP] #1282

pkg/blobserver/fsbacked: add blobserver using existing local files as storage [WIP] #1282

bobg commented Nov 27, 2019

zenhack commented Nov 27, 2019

bobg commented Nov 29, 2019 •

edited

bobg commented Dec 1, 2019

pkg/blobserver/fsbacked: add blobserver using existing local files as storage [WIP] #1282

Are you sure you want to change the base?

pkg/blobserver/fsbacked: add blobserver using existing local files as storage [WIP] #1282

Conversation

bobg commented Nov 27, 2019

zenhack commented Nov 27, 2019

bobg commented Nov 29, 2019 • edited

bobg commented Dec 1, 2019

bobg commented Nov 29, 2019 •

edited