Recorded FPS vs actual FPS #56

sambo55 · 2020-06-23T19:27:08Z

I'm struggling to achieve the FPS reported in the command line.

For example when I run inference on a 10 min 30fps video the reported inference fps is 300+.

I would expect that the time taken to run inference on the entire video would be 30x60x10 = 18000 / 300 fps = 60secs = 1min

Yet the code takes at least 3 mins to run. Is there something wrong with my calculation? Why would the reported fps not be the actual?

ceccocats · 2020-06-23T19:33:46Z

Hi sambo,
"Inference time" is only inference time, it doesn't count preprocessing, postprocessing and visualisation.
Still the demo is only an example on how to use tkdnn is not the most optimized solution in term of preprocessing, postprocessing and visualisation.

sambo55 · 2020-06-24T15:01:44Z

Thanks. Any pointers on how to optimise those aspects?

ceccocats · 2020-06-24T17:32:52Z

Opencv Is comfortable but slow expecially for visualization. Find an opengl viewer that fit your needs.
For preprocessing and postprocessing ensure that your are compiling opencv with CUDA and cudacodec

mive93 · 2020-06-29T07:41:53Z

Hi @sambo55,

If you have opencv (4.x) with contrib compiled for CUDA, then you can uncomment line 17 here. The preprocessing will be optimized on GPU.
Another thing you could do is decouple inference and visualization, using different threads.

rod-hendricks · 2020-07-03T03:51:12Z

I tried running Batchsize=1 vs 4 and I notice that the inference speed is not as fast as I thought it would be on a RTX2070. For size=1 I get ~6.6ms inf time while when i do size=4 I get ~17.7ms (2.6+ slower). Are these numbers correct? I am only using ~1.6Gb of GPU mem and about ~40% processing power even when running size 4.

Would you guys know of a way to optimize this by utilizing more of the GPU power?

On a side note, my pre and post processing times are awful when I do size=4 which are running a total of ~9ms. I will try to check opencv again and ensure compiled on CUDA to see if it significantly improves.

rod-hendricks · 2020-07-13T09:11:20Z

I built opencv-4.2.0 with cuda and cudnn enabled and found that there was no substantial improvement gained on the pre and post processing portion of Yolo3Detection.cpp. In fact, it was slower in my end if i enable (uncomment

tkDNN/include/tkDNN/DetectionNN.h

Line 17 in 7c2155d

    
           // #define OPENCV_CUDACONTRIB //if OPENCV has been compiled with CUDA and contrib.

) OPENCV_CUDACONTRIB as compared to running it disabled. Am I doing this right?

Also I guess the results I have on inference speed cannot be optimized anymore on my hardware?

mive93 · 2020-07-15T17:15:51Z

Hi @rod-hendricks
I checked the problem both on the Xavier and the RTX2080Ti and it is actually true, that enabling that define for Yolo detectors is worse (while is better for others like Mobilenet and Centernet models). The problem with Yolo is that there are too many useless passages between host and device.

If you really want to improve pre and post processing, you could try to implement CUDA kernels for those phases, and having everything on the device. So you could copy the frame at the beginning and copy back the bounding boxes at the end. We have not tried this solution yet, but I have it's on my list to improve those parts.

rod-hendricks · 2020-07-16T02:45:34Z

Thanks for the response and advice @mive93 ! I am not sure yet as to the effort vs gain on doing this so I'll see if I can consider working on this later.

I would like to ask further though about what you meant by Yolo having too many useless passages between host and device. Do you mean that within the yolo inference task, there is still the data passages going on between host and device before the network output is produced?

mive93 · 2020-07-16T07:00:05Z

@rod-hendricks if I have updates on any improvements, I will let you know.

No, I meant only on our preprocessing. The code should be cleaned up and fix to remove a useless passage between host and device. When I'll have time I'll fix that :)

rod-hendricks · 2020-07-16T07:18:30Z

Thanks @mive93 ! Much appreciated. Good work on this repo. Its amazing!

MohammadKassemZein · 2021-08-13T08:44:03Z

Hello @mive93 , any updates on removing the useless passage between host and device?
Thanks.

mive93 · 2021-08-19T15:16:23Z

HI @MohammadKassemZein,
no updates yet, I'm sorry.

mkzein · 2021-11-09T13:36:26Z

@mive93 Sorry to bother :) any updates on this?

mive93 · 2022-01-19T22:23:49Z

Not yet, but maybe soon.
We have already the code in an internal project, we just need to merge it here.

mkzein · 2022-01-20T07:32:11Z

Sounds great!
Thank you @mive93

Signed-off-by: Micaela Verucchi <micaelaverucchi@gmail.com>

mive93 closed this as completed Sep 11, 2020

mive93 pinned this issue Jan 19, 2022

mive93 added a commit that referenced this issue Jan 27, 2022

Improve Yolo, mobilenet and shelfnet preprocessing using GPU #56

a0f54cd

Signed-off-by: Micaela Verucchi <micaelaverucchi@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Recorded FPS vs actual FPS #56

Recorded FPS vs actual FPS #56

sambo55 commented Jun 23, 2020

ceccocats commented Jun 23, 2020 •

edited

sambo55 commented Jun 24, 2020

ceccocats commented Jun 24, 2020

mive93 commented Jun 29, 2020

rod-hendricks commented Jul 3, 2020

rod-hendricks commented Jul 13, 2020

mive93 commented Jul 15, 2020

rod-hendricks commented Jul 16, 2020

mive93 commented Jul 16, 2020

rod-hendricks commented Jul 16, 2020

MohammadKassemZein commented Aug 13, 2021

mive93 commented Aug 19, 2021

mkzein commented Nov 9, 2021

mive93 commented Jan 19, 2022

mkzein commented Jan 20, 2022

Recorded FPS vs actual FPS #56

Recorded FPS vs actual FPS #56

Comments

sambo55 commented Jun 23, 2020

ceccocats commented Jun 23, 2020 • edited

sambo55 commented Jun 24, 2020

ceccocats commented Jun 24, 2020

mive93 commented Jun 29, 2020

rod-hendricks commented Jul 3, 2020

rod-hendricks commented Jul 13, 2020

mive93 commented Jul 15, 2020

rod-hendricks commented Jul 16, 2020

mive93 commented Jul 16, 2020

rod-hendricks commented Jul 16, 2020

MohammadKassemZein commented Aug 13, 2021

mive93 commented Aug 19, 2021

mkzein commented Nov 9, 2021

mive93 commented Jan 19, 2022

mkzein commented Jan 20, 2022

ceccocats commented Jun 23, 2020 •

edited