Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] exported ONNX model does not result in same output as the original pytorch model #394

Open
5 tasks done
VineetTambe opened this issue Jul 31, 2023 · 5 comments
Labels
more information needed Please fill the issue template completely question Further information is requested

Comments

@VineetTambe
Copy link

VineetTambe commented Jul 31, 2023

❓ Question

I am trying to export the trained pytorch model to onnx so that I can deploy it.
But I am facing some issues where the output of the exported model is not the same as the pytorch model when I run a episode.
I have made sure that I set the model to eval mode before exporting.
I heavily modified the enjoy.py script to export and run the models.

Exporting to ONNX:

    torch_model = ALGOS[algo].load(
        model_path, custom_objects=custom_objects, device=args.device, **kwargs
    )
    torch_model.policy.eval()

    obs = env.reset()

    obs_tensor = torch_model.policy.obs_to_tensor(obs)[0]
      # Export the model
      torch.onnx.export(
          torch_model.policy,  # model being run
          obs_tensor,  # model input (or a tuple for multiple inputs)
          output_model_name,  # where to save the model (can be a file or file-like object)
          export_params=True,  # store the trained parameter weights inside the model file
          opset_version=10,  # the ONNX version to export the model to
          do_constant_folding=True,  # whether to execute constant folding for optimization
          input_names=["input"],  # the model's input names
          output_names=["output"],  # the model's output names
          dynamic_axes={
              "input": {0: "batch_size"},  # variable length axes
              "output": {0: "batch_size"},
          },
      )

Running inference using the onnx model:

    ort_session = onnxruntime.InferenceSession(onnx_model_path)
    ort_inputs = {ort_session.get_inputs()[0].name: obs}
    action = ort_session.run(None, ort_inputs)[0]
    obs, reward, done, infos = env.step(action)

The above are the only modifications done to enjoy.py in order to export and run the model. However, the results of the trained agent is not same.
Am I missing something obvious here? Any help would be greatly appriciated!

Checklist

@VineetTambe VineetTambe added the question Further information is requested label Jul 31, 2023
@araffin araffin added the more information needed Please fill the issue template completely label Jul 31, 2023
@araffin
Copy link
Member

araffin commented Jul 31, 2023

Hello,
could you be more specific on which algo/env you are using ?

@VineetTambe
Copy link
Author

Hey,

I am using the qrdqn algo and a custom environment based on top of the minigrid env

@araffin
Copy link
Member

araffin commented Aug 1, 2023

I am using the qrdqn algo and a custom environment based on top of the minigrid env

Could you share the observation and action spaces?

You are probably missing pre-processing, see DLR-RM/stable-baselines3#1349 (comment)
(we welcome a PR that updates our doc).

@VineetTambe
Copy link
Author

VineetTambe commented Aug 1, 2023

Could you share the observation and action spaces?

Observation space: Box(0, 255, (50,), uint8)
Action Space:          Discrete(4)

You are probably missing pre-processing
I tried doing what is done in the comment linked - which is create a new pytorch model class which has the the policy preprocessing step in the forward pass (please correct me if I am wrong here)

What exactly does the pre-processing entail? Is there anything more to it?
Because even after doing the above step I get the same incorrect results.
Is there any postprocessing step that I might be missing?

@araffin
Copy link
Member

araffin commented Aug 3, 2023

You are probably either missing image pre-processing (dividing by 255 before feeding to the network) or are not comparing to the greedy policy.

The following works and was tested comparing the quantiles returned:

import numpy as np
import torch as th
from sb3_contrib import QRDQN


model = QRDQN("MlpPolicy", "LunarLander-v2")
model.policy.to("cpu")
# Note: by default model.policy.quantile_net.forward() returns quantiles
onnxable_model = model.policy
observation_size = model.observation_space.shape[0]

dummy_input = th.randn(1, observation_size)
onnx_path = "qrdqn_model.onnx"
th.onnx.export(
    onnxable_model,
    dummy_input,
    onnx_path,
    opset_version=17,
    input_names=["input"],
)

##### Load and test with onnx

import numpy as np
import onnx
import onnxruntime as ort

onnx_model = onnx.load(onnx_path)
onnx.checker.check_model(onnx_model)

# observation = np.zeros((1, observation_size)).astype(np.float32)
observation = dummy_input.cpu().numpy()
ort_sess = ort.InferenceSession(onnx_path)
action = ort_sess.run(None, {"input": observation})[0]

print(action)
print(model.predict(observation, deterministic=True)[0])

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
more information needed Please fill the issue template completely question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants