-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: adding GeLU algorithm in layers #17
base: main
Are you sure you want to change the base?
Conversation
Amazing addition! I'll review and test it, then merge it straight away. Thanks for the contribution! |
Hey, the GeLU was resulting in some NaNs, so I went to the paper you cited and I have a question: wouldn't this be closer to what it describes? forward(z: Tensor): Tensor {
// Implementasi forward pass untuk GeLU nonlinearity di sini
const erf_coeff = 2/Math.PI;
const exp_term = z.mul(z).mul(-1).exp();
const erf_term = exp_term.mul(erf_coeff).sqrt().add(1);
const result = z.mul(0.5).mul(erf_term);
return result;
} In your code, you added z to erf_term inside of the last parenthesis. What is your opinion? |
However, the explanation regarding GeLU is similar to ReLU, with the distinction that GeLU permits the output to include small negative values when the input is less than zero. for erf_term,i think like want calculation erf_term first,after that i multipilcation with z and 0.5 |
Just a quick thought, but is it worth doing some unit tests and consider a simple integration tests ? (You allude to it in the fact you got some NaN's). Something to think about more generally on additions for the algorithms going forward. |
@medic-code tested it by adding GeLUs in the integration test. |
Sounds like some unit tests are needed atleast to interrogate it. Not necessarily to push to develop branch all of them. I'm more in favour of unit tests being kept to smaller numbers where possible (just a preference). |
@kayabaakihiko13 could you perhaps support making some commits that test your feature to your PR ? If you need support with this just reach out. Also is there any chance you can narrow down the any types in the typescript you've committed ? If you need support with this we've been working on the migrating to TS for the other parts of the code, which might be instructive. |
@eduardoleao052 just a quick question but why do we multiply the GeLu applied tensor by the input tensor ? Note i see we do this for ReLu too. I've just been getting my head around this feature and layers in general and isn't it such that we apply GeLu to a tensor and return that modified tensor ? |
@medic-code It could just create an output tensor by directly modifying the negative numbers in the tensor to zero, but that would mess with the tensor's gradients. By multiplying by a |
I suppose my question is why do we multiply by the input tensor ? Is it not that we apply ReLu and pass the modified tensor to the next layer in the forward pass ? I apologise i'm not an ML expert so just curious. https://www.cs.cmu.edu/~./15780/notes/pytorch.html - The simple two layer network's forward method seems to suggest what I'm saying.
|
That implementation is correct, however it requires a differentiable (with backward pass) In my implementation, I multiply by a |
GeLU
The Gaussian Error Linear Unit (GeLU) stands out as a robust and high-performing neural network activation function. It draws inspiration from an amalgamation of characteristics found in dropout, zoneout, and Rectified Linear Unit (ReLU) functions. GeLU introduces a smooth and continuous non-linearity, which effectively addresses the vanishing gradient problem often encountered in deep neural networks.
Refence
GAUSSIAN ERROR LINEAR UNITS (GELUS)