Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some doubts about SublayerConnection #100

Open
watersounds opened this issue Sep 19, 2022 · 5 comments
Open

Some doubts about SublayerConnection #100

watersounds opened this issue Sep 19, 2022 · 5 comments

Comments

@watersounds
Copy link

watersounds commented Sep 19, 2022

According to what you wrote:
“That is, the output of each sub-layer is $\mathrm{LayerNorm}(x + \mathrm{Sublayer}(x))$, where $\mathrm{Sublayer}(x)$ is the function implemented by the sub-layer itself. We apply dropout (cite) to the output of each sub-layer, before it is added to the sub-layer input and normalized.”
I think the rentun value should be self.norm(x + self.dropout(sublayer(x))) rather than x + self.dropout(sublayer(self.norm(x))).

Look forward to your reply.

@StellaAthena
Copy link

Where do we write x + self.dropout(sublayer(self.norm(x)))? That's not what the passage you quote says.

@Bruising6802
Copy link

Bruising6802 commented Sep 24, 2022

In the_annotated_transformer.py on line 357. In the function documentation it even says that the norm was moved.

@lvXiangwei
Copy link

I have the same question as you. The explanation can be found in #92 .

@Bruising6802
Copy link

Maybe it's best to mention this issue in the notebook, because it causes confusion for many.

@watersounds
Copy link
Author

watersounds commented Sep 26, 2022

Where do we write x + self.dropout(sublayer(self.norm(x)))? That's not what the passage you quote says.

return x + self.dropout(sublayer(self.norm(x)))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants