Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alt deduplication mode to handle extremely large archives #180

Open
tasket opened this issue Feb 19, 2024 · 0 comments
Open

Alt deduplication mode to handle extremely large archives #180

tasket opened this issue Feb 19, 2024 · 0 comments
Labels
enhancement New feature or request optimization
Milestone

Comments

@tasket
Copy link
Owner

tasket commented Feb 19, 2024

Problem

Currently Wyng's deduplication code is RAM-bound (as are most deduplicators) which puts an effective limit on the size of an archive than can be deduplicated.

Possible solution

  1. detect the large archive condition and available RAM resources
  2. move the lions' share of dedup indexes out of RAM (and out of /tmp)

This would trade-off performance for the ability to perform the dedup.

Alternate solution (workaround)

For un-encrypted archives, users could have jdupes (or similar utility) do a hardlink or reflink dedup on the archive dir. Otherwise, a dedup-capable filesystem like Btrfs or ZFS could be utilized. (These options would not work on encrypted archives unless Wyng started offering a deterministic encryption mode.)

@tasket tasket added enhancement New feature or request optimization labels Feb 19, 2024
@tasket tasket added this to the v1.0 milestone Feb 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request optimization
Projects
None yet
Development

No branches or pull requests

1 participant