You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I searched entire repo and code for "Text2Video-Zero" the video weights have still not been released and I don't see any code related to the text to video yet. The dev said it's just for comic generation for now in another comment. Not sure where you are seeing this?
Thank you for your attention. Both Consistent Self-Attention and Cross-Frame Attention make use of the key and value from self-attention, which was also introduced in Imagen. However, the subjects and purposes of their self-attention operations differ. Cross-frame attention is applied to video generation models, utilizing the first frame as a reference image, while Consistent Self-Attention is based on image generation models, leveraging sampled tokens from various character images to facilitate interaction among character features, thus ensuring character consistency. We will update our paper to make readers more aware of this distinction.
It seems they are somehow similar and could you please describe the difference between them? Thank you!
The text was updated successfully, but these errors were encountered: