Checkpoint support for `torch.package` #2570

dasturge · 2022-05-13T22:49:27Z

🚀 Feature

Currently, checkpointing is very centered around objects with state_dict and load_state_dict properties, but the new torch.package serialization option breaks with this pattern. It doesn't seem that I can simply insert a custom save_handler to handle package import/export

torch offers a new, interesting method for serializing models along with code and dependencies (and is not limited to pytorch/base python types), it would be cool to be able to leverage this for checkpointing so models produced by the checkpointer are packaged and ready-to-go, helping to bridge the gap with deployment workflows.

Something along the lines of:

TorchPackageCheckpoint(package=my_package, internal_module="models", ) which allows one to pass an object which works along with:

with PackageExporter(f'{filepath}.pt') as pe:
    pe.save_pickle(self.internal_module, self.internal_file, self.package)

Obviously this would require some refactoring of private methods if it's to use the base Checkpoint class, needing to offload the responsibility to use state_dict to any DiskSaver/save_handlers. I didn't see a clean way to simply extend the Checkpoint class.

The text was updated successfully, but these errors were encountered:

sdesrozis · 2022-05-15T20:15:42Z

@dasturge Thank you to highlight this very interesting point.

IMO a package is very different to a checkpoint. I'm not expert of what was recently done, but it doesn't sound like a new way to checkpoint and replace the actual load/save from state dicts. I would say that it should be very useful at the training process end helping the deployment.

A specific handler could be an idea but at the moment I don't see how reuse automatically the training code in order to have a inference script. Maybe it's more relative to a guideline for writing applications.

sadra-barikbin · 2022-05-17T02:14:43Z

The raw way to do the job:

@trainer.on(Events.COMPLETED)
def package_model()
  with PackageExporter('package.pt') as pe:
    # Some action pattern settings, depending on what you're packaging
    pe.intern('models.**') # example
    pe.extern('numpy.**') # example

    pe.save_pickle('my_package', 'model.pkl', model)

As @dasturge said, we can have an api like:

TorchPackageCheckpoint(path:str, package_name: str, interns: List[str]=[], externs: List[str]=[], mocked: List[str]=[], to_save:Dict[str,Any] )

to do the job, but since user does not know interns and externs beforehand, he/she should do items below again and again so as to all dependencies have an assigned action.

Call some pe.intern, pe.extern or pe.mocks
Face packaging error
Refine patterns and go to step 1

When we have to write those statements, why do not fall back to the raw way I said at first?

sdesrozis · 2022-05-17T06:11:21Z

Having a new and specific handler for packaging would be interesting if we manage something helpful. Maybe we could have checkpoints during the training and packaging at the end. However, packaging is more than checkpoint, it embeds what is needed for inference and is related to deployment.

Let's think about it. It would be nice having a package importer, exporter for training and why not an inference engine based on that.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Checkpoint support for `torch.package` #2570

Checkpoint support for `torch.package` #2570

dasturge commented May 13, 2022

sdesrozis commented May 15, 2022

sadra-barikbin commented May 17, 2022

sdesrozis commented May 17, 2022

Checkpoint support for torch.package #2570

Checkpoint support for torch.package #2570

Comments

dasturge commented May 13, 2022

🚀 Feature

sdesrozis commented May 15, 2022

sadra-barikbin commented May 17, 2022

sdesrozis commented May 17, 2022

Checkpoint support for `torch.package` #2570

Checkpoint support for `torch.package` #2570