[TODO/BUG?]: add a warning to docs about all predictions being 0 with inconsistent specification of GPU devices #830

kevingreenman · 2024-04-23T16:17:44Z

Notes
Akshat found that one can end up with predictions of 0 for all inputs when loading a model from a best.pt file.

@hwpang found that this is somehow related to using --devices "1," vs --devices "1". If the former is used during training and the latter is used during predicting, the mean and scale matrices in the unscale transform got put onto two different GPUs.

model loaded onto GPU 1
tensor([[-92.4739]], device='cuda:1')
tensor([[77.8905]], device='cuda:1')

GPU 0 is used by predicting
tensor([[0.]], device='cuda:0')
tensor([[0.]], device='cuda:0')

From @JacksonBurns:
This has to do with how lightning will load models from checkpoint and the default behavior of the map_location. This is technically intended behavior on their side, we just aren't providing the map location.

If there's no fool proof thing we can do to prevent this from happening for the users, we should add a warning to our documentation for users to make sure they specify their device numbers consistently between training and prediction.

The text was updated successfully, but these errors were encountered:

davidegraff · 2024-04-23T17:47:43Z

my vote is for warning. The two inputs have different meanings, and it's not up to us to try and guess what a user actually meant to type. This can lead to inconsistencies down the road, e.g., should we also assume the same behavior with chemprop train? IMO the answer is clearly no, and we want the argument to be treated the same across scripts.

kevingreenman added bug Something isn't working todo add an item to the to-do list labels Apr 23, 2024

kevingreenman added this to the v2.0.1 milestone Apr 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TODO/BUG?]: add a warning to docs about all predictions being 0 with inconsistent specification of GPU devices #830

[TODO/BUG?]: add a warning to docs about all predictions being 0 with inconsistent specification of GPU devices #830

kevingreenman commented Apr 23, 2024

davidegraff commented Apr 23, 2024

[TODO/BUG?]: add a warning to docs about all predictions being 0 with inconsistent specification of GPU devices #830

[TODO/BUG?]: add a warning to docs about all predictions being 0 with inconsistent specification of GPU devices #830

Comments

kevingreenman commented Apr 23, 2024

davidegraff commented Apr 23, 2024