-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor: Modularize Inpaint Pipeline #6120
base: main
Are you sure you want to change the base?
Conversation
IMO an extra VAE encode and some additional support code for inpainting is an acceptable tradeoff for no possibility of artifacts. I'm imagining a scenario where you inpaint and it's almost perfect, but there are some artifacts. So you you inpaint the artifacts, but this then creates more artifacts, and so on. We were OK with a whole extra denoising step before - reducing that overhead to a single VAE encode seems very reasonable. |
I was hoping it could get to a perfect result, but I think it will have to just live as a "compatibility fallback" procedure if no encoded image is provided. The issue is how to avoid the extra encode when we definitely don't need it. It is only necessary when a user is inpainting on canvas with an Inpaint trained model. I would like to avoid what we had before, which was that every time you inpaint on canvas it would create the masked image and send it through an extra VAE encode... and then the overwhelming majority of the time it goes unused because normal models don't need it. We have a few options for that:
|
Summary
Keeping this as Draft until tested with more models and settings.
This is a rewrite of the inpaint mask code in the diffusers pipeline. It centralizes all the mask logic to a single object that gets called from multiple points. The calls are based on location in the process as a proof of concept for a more elaborate modular extension system of the denoise process. Baby steps first.
Our previous process had mask-based decisions and code in five locations, across multiple object levels, and the mask was inverted partway through. Dragging it all together was necessary for debugging.
Related Issues / Discussions
This (mostly) fixes the issues with inpaint models that were introduced with gradient mask changes. If an Inpaint model is used and a denoise mask is provided that does not include a masked_latents field (generated by providing the original image and VAE to the Create Denoise Mask node), then it will synthetically create one by filling the mask area with predetermined values based on the model architecture. Doing so avoids a second VAE-encode on all inpainting/canvas processes.
This is not one-to-one with the standard setup. The new method has been tuned to reduce artifacts as much as possible, but there can still be some unusual highlighting and occasionally edges of inpainted objects get chopped off.
The dog on the left was inpainted with a VAE-encoded masked_latents, and the dog on the right was from a synthetic fill. The mask edge of the synthetic dog has lost part of the hindquarters and added an errant sparkle between the bench slats.
Oddly, using pure synthetic fill made a significantly higher success rate for inpainting objects from the prompt, nearing 100% before adjusting for artifacts. With the fixes in place to avoid edge artifacts, the success rate for synthetic is much closer to VAE, but still slightly higher. VAE will ignore the prompt and just fill in the bench a bit more often than synthetic does at the current settings.
Code is included for synthetic fill of SD-2 models, but it does not work at this time. Something is different about the way it expects masked_latents to look.
QA Instructions
Since this affects all of the inpainting pipelines, we need to test a variety of settings on all model architectures to make sure there is nothing strange or unexpected happening.
Checklist