Skip to content
This repository has been archived by the owner on Jul 7, 2023. It is now read-only.

Why there is no square root at area_temperature? #1900

Open
jiminbot20 opened this issue Nov 4, 2021 · 0 comments
Open

Why there is no square root at area_temperature? #1900

jiminbot20 opened this issue Nov 4, 2021 · 0 comments

Comments

@jiminbot20
Copy link

logits = logits / area_temperature

In typical dot product attention, logit which is the input matrix of softmax supposed to be divided by square rooted temperature like the equation below.
image

However, in this code, logit is just divided with temperature without a square root. Is it correct or wrong? If it is correct, could you explain why you didn't add square root?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant