New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix "test" error: extra hashes #4982
base: master
Are you sure you want to change the base?
Conversation
This prevents duplicated blocks after a block was deleted and re-added (duplicati#4693). Also fix RemoveMissingBlocks in LocalListBrokenFilesDatabase, which did not clear the DeletedBlock table.
The DeletedBlock table was not filled after a database recreate. This results in incorrect compact size calculations and possible duplicate blocks. To detect deleted blocks, add all blocks not referenced in a blockset or used as a blocklist hash.
Check that blocks which are moved are recorded for the volume to be deleted. If duplicate blocks exist and one is in the DeletedBlock table, this can erase a block entry on an unrelated volume (duplicati#4693).
I meant to comment in this issue, where you wrote:
but you got the PR out first. I think that's all the flavors. A blockset might be data or metadata though (two aspects of a file). |
Does the current brute-force Deleting Blocks SQL in topic linked above reduce such needs?
Of course it has to run, but it likely will eventually run. Regardless, are the goals equivalent? |
If I understand it correctly, the suggested change runs the query when specific files or filesets are deleted, so blocks don't have to be looked up in the full table. In a database recreate, all of the blocks have to be examined to see if they are used, so I don't think the change can be applied here. The current queries are very similar, although I use a temporary table to avoid looking up the block IDs twice. |
PoC change would probably not fit here, which is why I suggest brute-force plan:
Better formatted (courtesy of poorsql.com) version of the cited query looks like it looks at all of the blocks.
Questions about DISTINCT and UNION ALL versus UNION got many query variations benchmarked here: |
These actually pair well together for one occasional use case: Migrate from Linux to Windows and there might be others, e.g. if database gets very broken. Idea is to reattach source file blocks rather than reuploading.
|
This pull request has been mentioned on Duplicati. There might be relevant details there: https://forum.duplicati.com/t/how-to-reuse-remote-data-when-changing-os/16717/4 |
This pull request has been mentioned on Duplicati. There might be relevant details there: https://forum.duplicati.com/t/database-recreation-not-really-starting/16948/87 |
This pull request has been mentioned on Duplicati. There might be relevant details there: https://forum.duplicati.com/t/how-to-fix-missing-volumes/17377/9 |
Closes #4693
This fixes multiple issues with deleted / duplicated blocks:
Steps to reproduce
Error with compact (comment)
Error with recreate (comment)
Duplicate blocks:
--no-auto-compact
and--no-encrypt
Expected result:
Backup 3 should not upload block for A.txt (hash
VZrq0IJk1XldOQlxjN0Fq9SVcuhP5VWQ7vMaiKCP3/0=
), as it is already contained in backup 1.Actual result:
The dblock file for backup 3 contains a block with hash
VZrq0IJk1XldOQlxjN0Fq9SVcuhP5VWQ7vMaiKCP3/0=
, as does the dblock file for backup 1.TODO:
Performance impact
Check the performance impact of reusing existing deleted blocks. If necessary, add an index over (Hash, Size) to the DeletedBlock table.
A test of the before/after performance on a large backup with deleted blocks would be appreciated.
The worst case performance could be tested as follows:
Validate database recreate change
The change in database recreate works in my tests, but I am not completely sure that the query catches all usages of blocks (I first missed the BlocklistHash table, for example). It could be possible that some blocks are moved to deleted incorrectly.