-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add PAG support #7944
base: main
Are you sure you want to change the base?
add PAG support #7944
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
@asomoza can you test it out? |
cc @HyoungwonCho for awareness |
@yiyixuxu @asomoza Hello, I was impressed by the various experiments you conducted using PAG! Since the guidance framework of PAG itself is simple, it seems quite possible to use it in conjunction with other modules like the IP-Adapter you mentioned. However, we have not yet implemented and experimented with it directly, so we have not confirmed whether there is a significant performance improvement when used together. If possible, we will conduct additional experiments in the future. Thank you for your interest in our research. |
Thank you for the great work! File ".../.env/lib/python3.11/site-packages/diffusers/models/controlnet.py", line 798, in forward
sample = sample + controlnet_cond
~~~~~~~^~~~~~~~~~~~~~~~~
RuntimeError: The size of tensor a (3) must match the size of tensor b (2) at non-singleton dimension 0 I solved it by adding a new parameter if do_classifier_free_guidance and do_perturbed_attention_guidance and not guess_mode:
image = torch.cat([image] * 3)
elif do_classifier_free_guidance and not guess_mode:
image = torch.cat([image] * 2)
elif do_perturbed_attention_guidance and not guess_mode:
image = torch.cat([image] * 2) |
@KKIEEK |
Just leaving a brief report of my findings with PAG and Diffusers (I already had it integrated in my pipelines before this PR):
|
@jorgemcgomes thanks! |
Hello. I'm an author of PAG. Thank you for your insightful opinions and cool implementation. Is there anything currently in progress? We are excited to see that PAG is gaining popularity within the community and being utilized in various workflows. Especially in ComfyUI, PAG nodes are used in diverse workflows. (Some workflows using PAG in ComfyUI: However, in Diffusers, it seems somewhat challenging to try creative combinations as the pipelines are separated. Therefore, the MixIn approach taken in this PR appears to be a very effective solution. However, it seems a bit awkward to call Additionally, since there are many users who want compatibility with IP-adapter, now I have time and would like to work on making it compatible with IPAdapter. I'm curious if there's any related progress about component design or IP-adapter compatibility. Thank you! |
@sunovivid thanks for the message! for IP-adapter, it will be super cool if we can make it work! I'm not aware of any related progress so would really appreciate if you are able to find time to work on this! maybe we can just pick one of the pipelines from this PR (with the mixin) and make it work with |
@sunovivid we will merge in and work on a new design for PAG once you upload the new change for ip-adapter :) for
|
Hi @yiyixuxu, Thank you for the feedback! I might have misunderstood something. Should I upload the new changes for the ip-adapter in this PR? How can I upload the changes? Should I attach files or use another approach? for
|
* fix compatability issue between PAG and IP-adapter * fix compatibility issue between PAG and IP-adapter plus
@@ -508,6 +508,9 @@ def encode_image(self, image, device, num_images_per_prompt, output_hidden_state | |||
def prepare_ip_adapter_image_embeds( | |||
self, ip_adapter_image, ip_adapter_image_embeds, device, num_images_per_prompt, do_classifier_free_guidance | |||
): | |||
image_embeds = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I refactored this method a little bit
this test run
import torch
from diffusers import AutoPipelineForText2Image
from diffusers.utils import load_image
pipeline = AutoPipelineForText2Image.from_pretrained(
"stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16, variant="fp16"
)
pipeline.load_ip_adapter(
"h94/IP-Adapter",
subfolder="sdxl_models",
weight_name=[
"ip-adapter_sdxl_vit-h.safetensors",
"ip-adapter-plus_sdxl_vit-h.safetensors",
"ip-adapter-plus-face_sdxl_vit-h.safetensors",
],
image_encoder_folder="models/image_encoder",
)
pipeline.set_ip_adapter_scale([0.1, 0.7, 0.3])
pipeline.to("cuda")
face_image = load_image("https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/women_input.png")
style_folder = "https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/style_ziggy"
style_images = [load_image(f"{style_folder}/img{i}.png") for i in range(10)]
prompt = "wonderwoman"
num_images_per_prompt = 1
guidance_scale = 7.5
do_classifier_free_guidance = guidance_scale > 1
generator = torch.Generator(device="cuda").manual_seed(0)
image = pipeline(
prompt=prompt,
ip_adapter_image=[face_image, style_images, face_image],
negative_prompt="",
guidance_scale=guidance_scale,
num_images_per_prompt=num_images_per_prompt,
generator = generator,
).images[0]
image.save("yiyi_test_12_out_imgs.png")
with torch.no_grad():
image_embeds = pipeline.prepare_ip_adapter_image_embeds(
[face_image, style_images, face_image],
None,
"cuda",
num_images_per_prompt,
do_classifier_free_guidance,
)
generator = torch.Generator(device="cuda").manual_seed(0)
image = pipeline(
prompt=prompt,
ip_adapter_image_embeds=image_embeds,
negative_prompt="",
guidance_scale=guidance_scale,
num_images_per_prompt=num_images_per_prompt,
generator=generator,
).images[0]
image.save("yiyi_test_12_out_img_embeds.png")
@HyoungwonCho @sunovivid |
cc @apolinario and @vladmandic we plan to support more popular features like PAG in diffusers, so design-wise, this PR sets the example for the future PRs. Would appreciate your inputs too:) |
thanks @yiyixuxu from a quick glance, new "magic" is mostly in PAG itself is still a separate pipeline and can be used as a separate pipeline, its just that autopipeline will do automatic switching if
i'm ok with that, one potential issue is propagation of future fixes - e.g. if there is a fix created for somewhere in StableDiffusionPipeline and autopipeline does behind-the-scene switch to StableDiffusionPAGPipeline, then we really need to ensure there are no regressions there since user is not even explicitly aware of that switch just not sure about the mappings using string replace - ok for PAG, but would this pattern apply universally?
|
Thanks for your hard work! In my opinion, it looks good. One minor concern, similar to @vladmandic's opinion, is that propagating future changes and updates might be tedious work. It might be better to work like IP-Adapter, which is fully merged into the original pipeline. However, I also totally agree with your opinion that we should keep the codebase as compact as possible since it is already very complex and supports many papers. Compared to IP-Adapter, which is a relatively simple add-on, supporting PAG requires a batch size of 3, which breaks the common presumption of using a batch size of 2 for CFG. So this is a tradeoff, and I support both opinions from the diffusers team. A minor suggestion: in line 185 of Thank you again for your hard work. |
) | ||
else: | ||
noise_pred_uncond, noise_pred_perturb = noise_pred.chunk(2) | ||
noise_pred = noise_pred_uncond + pag_scale * (noise_pred_uncond - noise_pred_perturb) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the variable noise_pred_uncond
should be renamed to noise_pred_text
because it is actually conditional. 😊 This was written this way in our original implementation, so sorry for the confusion.
@yiyixuxu Hello, The implementation of PAG seems flawless! Aside from the I also share a similar opinion regarding the integration of the PAG pipeline with the basic stable diffusion pipeline. Since PAG can be widely used for sampling under various conditions and can be easily toggled on/off, it seems it would be useful if merged into the basic pipeline. However, when pag and cfg are used together, the input batch size changes from the usual situation, which could make the implementation of additional papers and elements relatively more complex. As @sunovivid mentioned, it seems we need to balance the convenience of using pag by adding it to the basic pipeline with the simplicity of the code. I will endorse the decision of the diffusers administrators on this matter. |
Notes on implementation
separate pipeline class
created a separate pipeline group for PAG so that we are able to support it (and many more such features in the future) while keeping our SD and SDXL pipelines lightweight for the research community
PAGMixin
PAGMixin
extracts away all PAG-related logic so that we are able to keep the PAG pipeline structure consistent with the rest of the pipelines. It make it easier to read, and also easier to integrate and maintainAutoPipeline
APIenable_pag =True
to automatically create a pipeline with PAG enabled based on the task you specified and the checkpoint you provided. Under the hood, it creates the corresponding PAG pipeline. A few examplesfrom_pipe
API also works and works just intuitively (I hope). A few examples:pag_applied_layers
pag_applied_layers
when you create the pipeline, e.g.set_pag_applied_layers
to update these layers after the pipeline has been createdset_pag_applied_layers
is either a single string or a list of strings, you can"down"
,"mid"
,"up"
"down.block_0"
,"up.block_1"
"down.block_0.attentions_0"
other notes:
prepare_ip_adapter_image_embeds
a little bit so that we duplicate inputs for CFG only once in the end, that's why a lot of the files got changed. you only need to look at thepag
folder andauto_pipeline.py
file underpipelines
folder when reviewing this PRUsage Examples
SDXL + PAG
SDXL + PAG + IP-Adapter
works with ip-adapter now thanks to @sunovivid
SDXL Inpainting + PAG
SDXL + ControlNet + PAG