feat: adding GeLU algorithm in layers #17

kayabaakihiko13 · 2024-04-02T23:18:18Z

GeLU

The Gaussian Error Linear Unit (GeLU) stands out as a robust and high-performing neural network activation function. It draws inspiration from an amalgamation of characteristics found in dropout, zoneout, and Rectified Linear Unit (ReLU) functions. GeLU introduces a smooth and continuous non-linearity, which effectively addresses the vanishing gradient problem often encountered in deep neural networks.

Refence

GAUSSIAN ERROR LINEAR UNITS (GELUS)

eduardoleao052 · 2024-04-03T00:35:32Z

Amazing addition! I'll review and test it, then merge it straight away. Thanks for the contribution!

eduardoleao052 · 2024-04-03T00:56:16Z

Hey, the GeLU was resulting in some NaNs, so I went to the paper you cited and I have a question: wouldn't this be closer to what it describes?

forward(z: Tensor): Tensor {
    // Implementasi forward pass untuk GeLU nonlinearity di sini
    const erf_coeff = 2/Math.PI;
    const exp_term = z.mul(z).mul(-1).exp();
    const erf_term = exp_term.mul(erf_coeff).sqrt().add(1);
    const result = z.mul(0.5).mul(erf_term);
    return result;
  }

In your code, you added z to erf_term inside of the last parenthesis. What is your opinion?

kayabaakihiko13 · 2024-04-03T01:13:12Z

Hey, the GeLU was resulting in some NaNs, so I went to the paper you cited and I have a question: wouldn't this be closer to what it describes?

However, the explanation regarding GeLU is similar to ReLU, with the distinction that GeLU permits the output to include small negative values when the input is less than zero.

for erf_term,i think like want calculation erf_term first,after that i multipilcation with z and 0.5

…o dev

medic-code · 2024-04-03T02:40:49Z

Just a quick thought, but is it worth doing some unit tests and consider a simple integration tests ? (You allude to it in the fact you got some NaN's).

Something to think about more generally on additions for the algorithms going forward.

eduardoleao052 · 2024-04-05T13:47:51Z

@medic-code tested it by adding GeLUs in the integration test.
But it does not seem to be working

medic-code · 2024-04-05T15:34:40Z

Sounds like some unit tests are needed atleast to interrogate it. Not necessarily to push to develop branch all of them. I'm more in favour of unit tests being kept to smaller numbers where possible (just a preference).

medic-code · 2024-04-06T08:47:40Z

@kayabaakihiko13 could you perhaps support making some commits that test your feature to your PR ? If you need support with this just reach out.

Also is there any chance you can narrow down the any types in the typescript you've committed ? If you need support with this we've been working on the migrating to TS for the other parts of the code, which might be instructive.

medic-code · 2024-04-09T19:51:41Z

@eduardoleao052 just a quick question but why do we multiply the GeLu applied tensor by the input tensor ? Note i see we do this for ReLu too.

I've just been getting my head around this feature and layers in general and isn't it such that we apply GeLu to a tensor and return that modified tensor ?

eduardoleao052 · 2024-04-09T21:24:51Z

@medic-code
In the ReLU, what we're doing is creating a mask tensor, containing 0 where the input is negative, and 1 where the input is positive. Then we're multiplying it by the input tensor.

It could just create an output tensor by directly modifying the negative numbers in the tensor to zero, but that would mess with the tensor's gradients. By multiplying by a mask, it just multiplies by it in the backprop as well.

medic-code · 2024-04-10T20:28:44Z

I suppose my question is why do we multiply by the input tensor ? Is it not that we apply ReLu and pass the modified tensor to the next layer in the forward pass ? I apologise i'm not an ML expert so just curious.

https://www.cs.cmu.edu/~./15780/notes/pytorch.html - The simple two layer network's forward method seems to suggest what I'm saying.

class ReLU(Module):
    def forward(self, X):
        return torch.maximum(X, torch.tensor(0.))
        
class TwoLayerNN(Module):
    def __init__(self, in_dim, hidden_dim, out_dim):
        super().__init__()
        self.linear1 = Linear(in_dim, hidden_dim)
        self.linear2 = Linear(hidden_dim, out_dim, init_factor=1.0)
        self.relu = ReLU()

    def forward(self, X):
        return self.linear2(self.relu(self.linear1(X)))

eduardoleao052 · 2024-04-11T11:38:32Z

That implementation is correct, however it requires a differentiable (with backward pass) torch.maximum(a,b) operation, that compares the input tensor a and a zeros tensor b, returning the largest and thus applying ReLU.

In my implementation, I multiply by a b tensor as well, that simply has zeros where the input tensor must become zero. I multiply by the input tensor to make the operation differentiable. This way, when the tensors are coming from the output tensor, they reach the input tensor through a simple multiplication.

feat: adding GeLU algorithm in layers

3a6e312

kayabaakihiko13 and others added 3 commits April 3, 2024 08:44

feat: adding GeLU algorithm in layers

5a0586f

Merge branch 'dev' of https://github.com/kayabaakihiko13/js-torch int…

74c246a

…o dev

Update layers.ts

b3cbb40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: adding GeLU algorithm in layers #17

feat: adding GeLU algorithm in layers #17

kayabaakihiko13 commented Apr 2, 2024

eduardoleao052 commented Apr 3, 2024

eduardoleao052 commented Apr 3, 2024

kayabaakihiko13 commented Apr 3, 2024

medic-code commented Apr 3, 2024 •

edited

eduardoleao052 commented Apr 5, 2024

medic-code commented Apr 5, 2024 •

edited

medic-code commented Apr 6, 2024 •

edited

medic-code commented Apr 9, 2024 •

edited

eduardoleao052 commented Apr 9, 2024

medic-code commented Apr 10, 2024 •

edited

eduardoleao052 commented Apr 11, 2024

feat: adding GeLU algorithm in layers #17

Are you sure you want to change the base?

feat: adding GeLU algorithm in layers #17

Conversation

kayabaakihiko13 commented Apr 2, 2024

GeLU

Refence

eduardoleao052 commented Apr 3, 2024

eduardoleao052 commented Apr 3, 2024

kayabaakihiko13 commented Apr 3, 2024

medic-code commented Apr 3, 2024 • edited

eduardoleao052 commented Apr 5, 2024

medic-code commented Apr 5, 2024 • edited

medic-code commented Apr 6, 2024 • edited

medic-code commented Apr 9, 2024 • edited

eduardoleao052 commented Apr 9, 2024

medic-code commented Apr 10, 2024 • edited

eduardoleao052 commented Apr 11, 2024

medic-code commented Apr 3, 2024 •

edited

medic-code commented Apr 5, 2024 •

edited

medic-code commented Apr 6, 2024 •

edited

medic-code commented Apr 9, 2024 •

edited

medic-code commented Apr 10, 2024 •

edited