Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ONNX converting issues #10

Open
NikitaKononov opened this issue Apr 10, 2023 · 12 comments · Fixed by #11
Open

ONNX converting issues #10

NikitaKononov opened this issue Apr 10, 2023 · 12 comments · Fixed by #11
Labels
good first issue Good for newcomers

Comments

@NikitaKononov
Copy link

Hello, I faced these errors while converting to onnx

TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
assert (discriminant >= 0).all()
Warning: Constant folding - Only steps=1 can be constant folded for opset >= 10 onnx::Slice op. Constant folding not applied.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.

What may be wrong? Thanks

@lss233
Copy link
Contributor

lss233 commented Apr 11, 2023

They're just some normal warnings, it's OK to ignore them.

@sudoskys sudoskys added the good first issue Good for newcomers label Apr 11, 2023
@NikitaKononov
Copy link
Author

They're just some normal warnings

Thank you for your answer. So it doesn't affect inference quality?

@lss233
Copy link
Contributor

lss233 commented Apr 11, 2023

Yes, none of these warnings have effects on quality.

The code reported by TracerWarnings is used to check whether variables meet the requirements, rather than the inference part. Therefore, it can be ignored.

Constant folding is a method used in ONNX to optimize the Slice operation. However, this optimization is not applicable at the location where the warning occurred, and not performing constant folding meets our expectations.

As for the last warning, I think it is some kind of bug in PyTorch that causes ONNX to be unable to recognize the type. However, it won't have any effect if your model infers correctly.

@sudoskys sudoskys pinned this issue Apr 16, 2023
@NikitaKononov
Copy link
Author

won't have any effect if your model infers correctly

Hello!

Model converted into onnx with your scripts has very poor performance in NVIDIA Triton Inference Server
Inference time is x2-3 times slower, than pytorch inference
I've tried all available options in Triton configuration, but I can't achieve good inference speed

Have you faced such problem? Thanks.

@sudoskys sudoskys reopened this Apr 17, 2023
@sudoskys
Copy link
Member

sudoskys commented Apr 17, 2023

Very sorry! When writing the Onnx runtime, I specified the CPU inference, please wait

model = RunONNX(model=_vits_base, providers=['CPUExecutionProvider'])

@sudoskys
Copy link
Member

954ceba

@NikitaKononov
Copy link
Author

NikitaKononov commented Apr 17, 2023

I specified the CPU inference

Thanks, I'll give it a try
But RunONNX doesn't affect converted model saving, as I can see in the code?

I use the converted model in NVIDIA Triton Inference server
It utilizes GPU, but have poor performance for some reason

I'll test pure pytorch inference, pure onnx inference, and triton inference with pytorch model and onnx model
and provide test results

@sudoskys
Copy link
Member

sudoskys commented Apr 17, 2023

ok

@sudoskys
Copy link
Member

pls wait a while for svc branch

@NikitaKononov
Copy link
Author

ok

Have done 50 test inferences for each model with same input text
pytorch avg ~2.5s
onnx avg ~ 2.7s
triton onnx avg ~ 4.1s

for some reason onnxruntime in triton makes execution slower, trying to find bottleneck

@sudoskys sudoskys linked a pull request Apr 17, 2023 that will close this issue
3 tasks
@sudoskys sudoskys reopened this Apr 19, 2023
@sudoskys
Copy link
Member

ok

Have done 50 test inferences for each model with same input text
pytorch avg ~2.5s
onnx avg ~ 2.7s
triton onnx avg ~ 4.1s

for some reason onnxruntime in triton makes execution slower, trying to find bottleneck

The server will convert pth to onnx before loading. Instead of using onnx.
It may be that the model structure or other configuration errors caused this problem

@sudoskys
Copy link
Member

The ONNX model will perform some initialization operations during the first reasoning after the Session is loaded, and this factor should also be considered

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants