Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues w/ softmax procedure #554

Open
Niminem opened this issue Mar 7, 2022 · 11 comments
Open

Issues w/ softmax procedure #554

Niminem opened this issue Mar 7, 2022 · 11 comments

Comments

@Niminem
Copy link
Contributor

Niminem commented Mar 7, 2022

Firstly, I had to modify the arraymancer import to exclude 'softmax' procedure and then include it separately like so:
import arraymancer except softmax
import arraymancer/nn/activation/softmax

If I don't do this, the only version I'm getting is this one:
nnp_softmax.softmax: proc (input: Tensor[softmax.T]): Tensor[softmax.T]

Which would throw an error because it accepts Tensors where as I need the version that accepts Variable[TT], defined here: https://github.com/mratsim/Arraymancer/blob/master/src/arraymancer/nn/activation/softmax.nim
(because this is needed in my custom forward procedure)

When I did this, I'm STILL getting an error thrown although it's something different:

/Users/salient/Desktop/CatsVDogs/main.nim(75, 26) template/generic instantiation of forward from here
/Users/salient/Desktop/CatsVDogs/main.nim(63, 9) template/generic instantiation of softmax from here
/Users/salient/.nimble/pkgs/arraymancer-0.7.11/arraymancer/nn/activation/softmax.nim(58, 11) template/generic instantiation of softmax_cache from here
/Users/salient/.nimble/pkgs/arraymancer-0.7.11/arraymancer/nn/activation/softmax.nim(42, 24) template/generic instantiation of softmax_backward_ag from here
/Users/salient/.nimble/pkgs/arraymancer-0.7.11/arraymancer/nn/activation/softmax.nim(23, 35) Error: type mismatch: got but expected 'SoftmaxActivation[Tensor[system.float32]]'

Any explanation for what this is and/or how to fix it?

I do see an open issue that looks somewhat relevant but not too sure: #472

@Vindaar
Copy link
Collaborator

Vindaar commented Mar 7, 2022

Could you provide a small reproducible example?

@Niminem
Copy link
Contributor Author

Niminem commented Mar 7, 2022

import std/strformat
import arraymancer except softmax # by default, this softmax signature is proc (input: Tensor[softmax.T]): Tensor[softmax.T]
import arraymancer/nn/activation/softmax # this is the softmax we need softmax*[TT](a: Variable[TT]): Variable[TT]

let (N, D_in, H, D_out) = (64, 1000, 100, 10)
let ctx = newContext Tensor[float32]

let
  x = ctx.variable(randomTensor[float32](N, D_in, 1'f32))
  y = randomTensor[float32](N, D_out, 1'f32)

network ctx, TwoLayersNet:
  layers:
    fc1: Linear(D_in, H)
    fc2: Linear(H, D_out)
  forward x:
    x.fc1.relu.fc2.softmax

let
  model = ctx.init(TwoLayersNet)
  optim = model.optimizerSGD(learning_rate = 1e-4'f32)

for t in 0 ..< 500:
  let
    y_pred = model.forward(x)
    loss = y_pred.mse_loss(y)

  echo &"Epoch {t}: loss {loss.value[0]}"

  loss.backprop()
  optim.update()

@Niminem
Copy link
Contributor Author

Niminem commented Mar 7, 2022

@Vindaar the above is the "simple 2 layer" example modified to simply add softmax in the forward. It produces the same error as above

@Vindaar
Copy link
Collaborator

Vindaar commented Mar 7, 2022

Thanks! I felt so free as to update your comment and turn it into an actual code snippet.

Will check it out.

@Niminem
Copy link
Contributor Author

Niminem commented Mar 7, 2022

Thanks Vindaar for your fast response and comment edit lol I'm still getting used to Github markdown

@Niminem
Copy link
Contributor Author

Niminem commented Mar 7, 2022

I've modified the softmax_backward_ag[TT] procedure to pass in self rather than Gate: (see below)
reference: https://github.com/mratsim/Arraymancer/blob/master/src/arraymancer/nn/activation/softmax.nim

proc softmax_backward_ag[TT](self: Gate[TT], payload: Payload[TT]): SmallDiffs[TT] =
  let self = SoftmaxActivation[TT](self)#(Gate)
  let gradient = payload.variable.grad
  result = newDiffs[TT](1)
  result[0] = gradient.softmax_backward(self.cache)

This matches what I've found while looking at how relu is implemented. It took care of this error that was raised:
type mismatch: got <type Gate> but expected 'SoftmaxActivation[Tensor[system.float32]]'

However, now I get this error:
attempting to call undeclared routine: 'softmax_backward'

After a search through the docs I see we don't have the softmax_backward procedure as mentioned in this issue:
#472

@Niminem
Copy link
Contributor Author

Niminem commented Mar 25, 2022

@Vindaar please review when you can

@Niminem
Copy link
Contributor Author

Niminem commented Aug 18, 2022

@Vindaar my good sir can we please get this implemented lol

@Vindaar
Copy link
Collaborator

Vindaar commented Aug 18, 2022

Can you please ping me about this on matrix/discord on the weekend, if I haven't looked into this by then?

@Vindaar
Copy link
Collaborator

Vindaar commented Aug 21, 2022

Ok, I just had a look at it.

As you've mentioned yourself, the practical problem is that the backward pass for softmax is not implemented. After looking into it now, I realize that the (likely) reason for that is that the backward pass of a pure softmax is rather ugly, because the softmax itself is defined via a sum of all parameters. In essence then the gradient results in something that depends on the specific indices (you have a δij in the derivative):

∂sm(x_i) / ∂x_j = sm(x_j) · ( δij - sm(x_i) )

(sorry for somewhat sloppy notation)

See for example:
https://en.wikipedia.org/wiki/Softmax_function

That's why typically one combines the softmax on the last layer directly with a cross entropy loss, for which the gradient is easy to compute.

I don't have the time & mental space atm to figure out how to efficiently implement this (if even possible?). If someone is willing to do so, feel free. Otherwise I'd just recommend to do what one normally does, i.e. use softmax_cross_entropy)

@Niminem
Copy link
Contributor Author

Niminem commented Aug 21, 2022

Shit, thanks for looking into it Vindaar. I will take a look when I finally get the time and mental space as well lol

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants