-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stable Cascade fix INT8 compression with NNCF #7987
Conversation
Thanks for this contribution and for bringing NNCF compression to our attention. I think it's a bit of an anti-pattern to our library's philosophy that we're changing the data type under the hood to something that wasn't requested by the user in the first place. Instead, can't we type-cast the modules explicitly after the pipeline is loaded and then doing the changes you mentioned in the PR description? In that case, we can just have a very nice doc about it and the users will be able to follow it. @yiyixuxu WDYT? |
Dig into the NNCF code a bit and found a way to get the original dtype: This returns the original dtype and seems to work fine with Full Prior and Lite Prior: self.prior.down_blocks[0][0].channelwise[0].pre_ops["0"].scale.dtype Note: I am using NNCF 2.7.0 because 2.8.0 and newer requires example inputs with PyTorch backend. |
yeah agree with @sayakpaul |
NNCF compression autocasts to original dtype when running the model, so any change we do here doesn't change them. |
Thanks for explaining but it really is quite antithetical to the library design. If there’s a way to patch the call in some manner I think that would be still acceptable to put in the docs but otherwise this really seems difficult. |
We can add something like Something like this could do: if getattr(self, "_autocast_dtype", None) is not None:
dtype = self._autocast_dtype
else:
dtype = next(self.prior.parameters()).dtype |
Actually i forgot to check the simplest stuff first. This also works. dtype = self.dtype Edit: But breaks again when model_cpu_offload is used. |
Adding |
Another workaround that doesn't require code change on diffusers side: backup_clip_txt_pooled_mapper = copy.deepcopy(sd_model.prior_pipe.prior.clip_txt_pooled_mapper)
pipe.prior_prior = pipe.prior_pipe.prior = nncf_compress_model(pipe.prior_pipe.prior)
pipe.prior_prior.clip_txt_pooled_mapper = pipe.prior_pipe.prior.clip_txt_pooled_mapper = backup_clip_txt_pooled_mapper |
Some notes about NNCF:
SDXL and SD 1.5 works out of the box. Stable Cascade needs this (#7987 (comment)) workaround. nncf==2.7.0 is required if you don't want to provide example inputs. |
This is how i implemented it in SDNext: I am treating it as if it was compiled because it breaks loading Loras, not because it is compiled. (It's not compiled.) This function applies NNCF or other functions to known module types: https://github.com/vladmandic/automatic/blob/master/modules/sd_models_compile.py#L30 |
What does this PR do?
Fixes NNCF compression with Stable Cascade.
NNCF compresses model weight to 8 bit and reduces the model footprint by half.
Full Stable Cascade model can run with 6-8 GB GPUs using NNCF compression and model cpu offload.
Fixed Issues:
NNCF compression uses uint8 and Stable Cascade Prior Pipeline tries to use the dtype from the model weights.
This behavior makes it unable to run since most stuff doesn't support Byte types.
This PR checks for uint8 and int8 uses BF16 or FP32 depending on the GPU support.
Notes
CUDA is using torch.cuda.is_bf16_supported().
IPEX (Intel ARC) is only checking the device since every XPU device does support BF16.
I don't know if there is a way to get the original dtype used before the NNCF compression step.
Current method i implemented ignores the user inputted torch dtype.
Example use of NNCF compression
BF16
UINT8
Before submitting
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.