-
Notifications
You must be signed in to change notification settings - Fork 155
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[feature] consolidate files into volumes to reduce number of files #427
Comments
CryFS allows you to change the block size when creating s file system. Feel
free to experiment with 64MB blocks. It does have a significant downside
though - larger blocks slow down synchronization and file system latency.
CryFS does not currently merge multiple files into one block. The reason
for this is to minimize synchronization conflicts. If two different clients
modify different files that happen to map.to.the same block, we don't want
this to cause a synchronization conflict in your Dropbox.
SeaweedFS looks very interesting but has a very different use case. It is
meant for running distributed systems when you can run your own server
cluster. I don't think you can use it for the use case CryFS is meant for,
i.e. running on a local machine and possibly putting encrypted files in a
Dropbox or other third party cloud provider.
…On May 4, 2022 2:54:48 AM onlyjob ***@***.***> wrote:
CryFS have some good ideas and interesting design but terrible
implementation...
I did some testing on CryFS-0.10.2 and rsync'ed 290329 files from my home
folder into CryFS:
This is how it looked in htop just before I interrupted rsync:
PID USER PRI NI VIRT RES SHR S CPU% MEM%▽ TIME+ Command
153410 user 21 1 12.7G 11.1G 4708 S 0.7 17.4 23h40:33 cryfs .
/tmp/qqq.cryfs
Note massive memory use. But it gets worse.
CryFS created 4096 directories in its folder (000...FFF), with around 21500
files per directory: 4096*21500=88_064_000
So it created 88 million files(!) -- about 300 times greater number of
files that in source.
This is incredibly inefficient. No underlying file system can deal with
that many files without severe performance degradation. It took a week(!)
to remove 88 million files...
Instead of multiplying number of files by factor 300 (or so), CryFS should
reduce number of files by packing chunks (that are currently stored as
files) into 64 MiB volume files.
SeaweedFS implements this concept very well and can handle millions of
files very efficiently.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.
|
Thanks but the problem is with number of files that CryFS produces - not with their block size. Of course SeaweedFS is for a different use case. But it nicely implements the very concept I'm talking about: consolidation of small files into volumes. IMHO that idea is worthy of borrowing or at least considering for a somewhat similar implementation. |
@onlyjob the number of files produced depends on the block size. Increasing the block size will reduce the number of files produced, though you are right, there is always at least one block per file.
Can you clarify here? There are two major factors (ignoring IOPS for now) contributing to "speed": throughput and latency. My understanding of how these affect CryFS is
So when you say larger blocks slow down synchronization, I think you are talking about the third case. For files larger or equal to the block size, it would speed up synchronization and latency. Is this correct or am I completely wrong? Block size also affects storage:
Unfortunately, files come in all sizes. For my use case, I don't think there is a block size that will be fast enough for dealing with large files while also not inflating small file sizes so much that my backup becomes unaffordably expensive. I think @onlyjob's idea could be implemented fairly easily using a filesystem on top of CryFS. It doesn't seem like a perfect solution, though. Do you think that there is a way to have different buckets of block sizes
|
The current scheme means that I would either pay for a huge inflation of the FS by making the block 64MiB (same as the default storj block size), or pay for a huge inflation of the number of files for my 8TiB repo. Seems there is very little to choose from; perhaps the only solution is the one that
Sounds like a big PITA, though. |
CryFS have some good ideas and interesting design but terrible implementation...
I did some testing on CryFS-0.10.2 and rsync'ed 290329 files from my home folder into CryFS:
This is how it looked in
htop
just before I interruptedrsync
:Note massive memory use. But it gets worse.
CryFS created 4096 directories in its folder (000...FFF), with around 21500 files per directory: 4096*21500=88_064_000
So it created 88 million files(!) -- about 300 times greater number of files that in source.
This is incredibly inefficient. No underlying file system can deal with that many files without severe performance degradation. It took a week(!) to remove 88 million files...
Instead of multiplying number of files by factor 300 (or so), CryFS should reduce number of files by packing chunks (that are currently stored as files) into 64 MiB volume files.
SeaweedFS implements this concept very well and can handle millions of files very efficiently.
The text was updated successfully, but these errors were encountered: