New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[TODO]: Metric
Improvements
#817
Comments
FWIW the original goal was to make it such that a In summary:
I have a few ideas on this, but I'm curious to hear what others think. |
I'd prefer to decouple Metric from LossFunction as they serve different purposes. I can see that we would like minimal duplication of code between these two functions. Flipping the inheritance so that LossFunction inherits from Metric can avoid code duplication, but that implies all loss functions are metrics, although not all metrics are suitable as loss functions (e.g., MVELoss). We could use mixins to achieve both code reuse and avoid inheritance. |
A couple of points:
|
I agree that there is a strong correlation between evaluation metrics and loss functions. I originally wanted to say that not all loss functions are suitable as metrics (e.g., |
I don't think this statement is correct. For example, if you were training an MVE model with early stopping, you wouldn't want to monitor just RMSE because it doesn't capture whether your model is starting to overfit. That is, your model could optimize towards the trivial solution ( |
I've been thinking about my original comment some more, mainly this point:
And I disagree with it now. Differentiable functions are indeed a subset of Non-differentiable functions, but in code this would reverse the inheritance structure. That is, if we have some class of functions class LossFunction(nn.Module):
...
class MetricFunction(LossFunction):
... To be clear, a user can use a non-differentiable loss function, but the gradients won't propagate to the final loss value (i.e., they'll stop right at the non-differentiable step.) But we should decide on whether we even need a subclass |
Side note: also see #612 for a new loss function that could be added as we improve the metrics and loss functions |
Just my 2 cents: At this moment, Chemprop v2 |
Hi Simon, thanks for the feedback. This isn't a bug, so we'd like to have it more clear in our documentation why we implemented it that way. Maybe we should call it During training, a differentiable function (loss function/criterion) is used to calculate the deviation of the current model's predictions from the true values in the training batch. The gradient of this loss is used to update the model weights. Often a different, possibly non-differentiable, function will be used to evaluate the final model's performance. This test metric should also be used to calculate the performance of the model on the validation dataset so that the best model weights are saved during training. Chemprop does this automatically in the CLI. Refer to the |
I think we can decouple
Metric
fromLossFunction
. The forward method is already different because it does use task weights to reduce the loss. It also should have it's own init function because metrics don't take task weights. At that point there isn't anything thatMetric
needs to get fromLossFunction
.We currently couple them because some metrics inherit
_calc_unreduced_loss
from their loss function counterpart, likeBinaryMCCMetric
. At the same time though, I don't know if these metrics are actually calculated correctly. It looks like_calc_unreduced_loss
inBinaryMCCLoss
is already reduced, in which case this function should actually beforward()
.If that should be changed to
forward()
there is also the question of ifmask
needs to remain in signature of_calc_unreduced_loss
as the foward method of LossFunction handles masking as part of the reduction.All of this is to say, I think the more advanced metrics need some improvements, but that will take more time to discuss, so it isn't part of v2.0
The text was updated successfully, but these errors were encountered: