Fix to #3465. Prevent, resaving of duplicate images if overwrite not specified #3472

shawnington · 2024-05-13T17:32:51Z

This is a fix to #3465 for LoadImage node. Behavior also generalizes to preventing duplication of existing files.

Adds function compare_image_hash to do a sha256 hash comparison between an uploaded image and existing images with matching file names.

This changes the behavior so that only images having the same filename that are actually different images are saved to input with increment, existing images are instead now opened instead.

Currently, exact duplicates with the same filename are resave saved with an incremented filename in the format:

filename (i).ext

with the code:

while os.path.exists(filepath): 
    filename = f"{split[0]} ({i}){split[1]}"
    filepath = os.path.join(full_output_folder, filename)
    i += 1

This commit changes this to include a sh256 hash comparison:

while os.path.exists(filepath): 
    if compare_image_hash(filepath, image):
        image_is_duplicate = True
        break
    filename = f"{split[0]} ({i}){split[1]}"
    filepath = os.path.join(full_output_folder, filename)
    i += 1

a check for if image_is_duplicate = False is done before saving the file.

Currently, if you load the same image of a cat named cat.jpg into the LoadImage node 3 times, you will get 3 new files in your input folder with incremented file names.

With this change, you will now only have the single copy of cat.jpg, that will be re-opened instead of re-saved with increment.

However if you load 3 different images of cats all named cat.jpg, you will get the expected behavior of having:
cat.jpg
cat (1).jpg
cat (2).jpg

This saves space and clutter. After checking my own input folder, I have 800+ images that are duplicates that were resaved with incremented file names amounting to more than 5GB of duplicated data.

… if overwrite not specified This is a fix to comfyanonymous#3465 Adds function compare_image_hash to do a sha256 hash comparison between an uploaded image and existing images with matching file names. This changes the behavior so that only images having the same filename that are actually different are saved to input, existing images are instead now opened instead of resaved with increment. Currently, exact duplicates with the same filename are resave saved with an incremented filename in the format: <filename> (n).ext with the code: ``` while os.path.exists(filepath): filename = f"{split[0]} ({i}){split[1]}" filepath = os.path.join(full_output_folder, filename) i += 1 ``` This commit changes this to: ``` while os.path.exists(filepath): if compare_image_hash(filepath, image): image_is_duplicate = True break filename = f"{split[0]} ({i}){split[1]}" filepath = os.path.join(full_output_folder, filename) i += 1 ``` a check for if image_is_duplicate = False is done before saving the file. Currently, if you load the same image of a cat named cat.jpg into the LoadImage node 3 times, you will get 3 new files in your input folder with incremented file names. With this change, you will now only have the single copy of cat.jpg, that will be re-opened instead of re-saved. However if you load 3 different images of cats named cat.jpg, you will get the expected behavior of having: cat.jpg cat (1).jpg cat (2).jpg This saves space and clutter. After checking my own input folder, I have 800+ images that are duplicates that were resaved with incremented file names amounting to more than 5GB of duplicated data.

shawnington requested a review from comfyanonymous as a code owner May 13, 2024 17:32

fixed typo in expression

16013c3

shawnington mentioned this pull request May 14, 2024

Is the goal of this to have privacy by deleting images after loading? liusida/ComfyUI-Login#5

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix to #3465. Prevent, resaving of duplicate images if overwrite not specified #3472

Fix to #3465. Prevent, resaving of duplicate images if overwrite not specified #3472

shawnington commented May 13, 2024 •

edited

Fix to #3465. Prevent, resaving of duplicate images if overwrite not specified #3472

Are you sure you want to change the base?

Fix to #3465. Prevent, resaving of duplicate images if overwrite not specified #3472

Conversation

shawnington commented May 13, 2024 • edited

shawnington commented May 13, 2024 •

edited