Fix to #3465. Prevent, resaving of duplicate images if overwrite not specified #3472
+25
−6
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is a fix to #3465 for LoadImage node. Behavior also generalizes to preventing duplication of existing files.
Adds function compare_image_hash to do a sha256 hash comparison between an uploaded image and existing images with matching file names.
This changes the behavior so that only images having the same filename that are actually different images are saved to input with increment, existing images are instead now opened instead.
Currently, exact duplicates with the same filename are resave saved with an incremented filename in the format:
filename (i).ext
with the code:
This commit changes this to include a sh256 hash comparison:
a check for if image_is_duplicate = False is done before saving the file.
Currently, if you load the same image of a cat named cat.jpg into the LoadImage node 3 times, you will get 3 new files in your input folder with incremented file names.
With this change, you will now only have the single copy of cat.jpg, that will be re-opened instead of re-saved with increment.
However if you load 3 different images of cats all named cat.jpg, you will get the expected behavior of having:
cat.jpg
cat (1).jpg
cat (2).jpg
This saves space and clutter. After checking my own input folder, I have 800+ images that are duplicates that were resaved with incremented file names amounting to more than 5GB of duplicated data.