[Performance]: Latency is much longer than RealTime when using benchmark_app for profiling #24517

cold-blue · 2024-05-15T03:24:05Z

OpenVINO Version

No response

Operating System

Windows System

Device used for inference

None

OpenVINO installation

PyPi

Programming Language

Python

Hardware Architecture

x86 (64 bits)

Model used

N/A

Model quantization

Yes

Target Platform

No response

Performance issue description

When I am using benchmark_app to profile a model. I find that the latency in the report is much longer than the total RealTime computed in detailed_counter_report.csv.
In report.csv:

In detailed_counter_report.csv:

I wonder whether this is a normal phenomenon and what is the reason?

Step-by-step reproduction

No response

Issue submission checklist

I'm reporting a performance issue. It's not a question.
I checked the problem with the documentation, FAQ, open issues, Stack Overflow, etc., and have not found a solution.
There is reproducer code and related data files such as images, videos, models, etc.

dmitry-gorokhov · 2024-05-17T05:37:27Z

Hi @cold-blue! What device you used for inference? CPU, GPU?

Jessielovecodings · 2024-06-03T04:02:41Z

.take

github-actions · 2024-06-03T04:02:54Z

Thank you for looking into this issue! Please let us know if you have any questions or require any help.

Jessielovecodings · 2024-06-03T05:04:43Z

hi @cold-blue
When you use benchmark_app to profile model performance and find that the latency reported in the report is much longer than the total RealTime computed in detailed_counter_report.csv, this could be due to the following reasons:

Initial loading and initialization time:
benchmark_app requires loading the model and performing some initialization operations before starting the evaluation. This process includes loading the model file, initializing the plugin, allocating memory, etc. These times are usually included in the total latency reported but are not included in the specific inference time statistics.
Data preprocessing and postprocessing time:
The total latency reported may include data preprocessing and postprocessing time, whereas detailed_counter_report.csv primarily records the actual execution time of the inference engine.
Differences in measurement methods:
Different timing methods may result in time differences. For example, the time in the report may be the total time from the start to the end of the application, while the time in detailed_counter_report.csv may only record the inference process time.
Batch size and synchronous/asynchronous execution:
If batch processing or asynchronous execution mode is used, the actual inference time statistics and the total latency time may differ because the asynchronous mode timing may include more synchronization operation time.

How to solve:

Break down the time overhead:
By adding logs or further analysis, break down the time overhead of each step, including model loading, initialization, data preprocessing, inference process, data postprocessing, etc. This will help identify which steps are more time-consuming.
Compare execution time in different modes:
Try running benchmark_app in both synchronous and asynchronous modes and compare the execution times to analyze the reasons for the time differences.
Check and optimize the initialization process:
If the initialization process is time-consuming, consider optimizing the model loading and initialization steps, such as reducing the model size, optimizing the model file, etc.
Verify data processing steps:
Ensure the efficiency of data preprocessing and postprocessing steps to avoid unnecessary time overhead. If the data processing steps are time-consuming, consider optimizing these steps.

Here is an example code to break down the time overhead of each step:

import time
from openvino.inference_engine import IECore

#Record start time
start_time = time.time()

#Create IE instance
ie = IECore()
ie_creation_time = time.time()

#Read model
net = ie.read_network(model="model.xml", weights="model.bin")
read_network_time = time.time()

#Load model to device
exec_net = ie.load_network(network=net, device_name="CPU")
load_network_time = time.time()

#Prepare input data
input_data = prepare_input_data()  # This is a placeholder, replace with actual data preparation steps
prepare_input_data_time = time.time()

#Perform inference
results = exec_net.infer(inputs={input_blob_name: input_data})
infer_time = time.time()

#Record total time
end_time = time.time()

#Output time statistics
print(f"Total time: {end_time - start_time:.2f} seconds")
print(f"IE creation time: {ie_creation_time - start_time:.2f} seconds")
print(f"Read network time: {read_network_time - ie_creation_time:.2f} seconds")
print(f"Load network time: {load_network_time - read_network_time:.2f} seconds")
print(f"Prepare input data time: {prepare_input_data_time - load_network_time:.2f} seconds")
print(f"Infer time: {infer_time - prepare_input_data_time:.2f} seconds")

By using this approach, you can get detailed insights into the time taken for each step and perform targeted optimizations. This will help you better understand and resolve the issue of long latency times.

Jessielovecodings · 2024-06-03T05:08:47Z

@cold-blue
Summary

Break down the latency into specific steps to identify where the most time is being spent.
Compare synchronous and asynchronous execution modes to understand their impact on total latency.
Optimize the initialization, data preprocessing, and postprocessing steps to reduce overhead.
Ensure measurement consistency by understanding the differences between the total time and detailed per-inference time.

cold-blue added performance Performance related topics support_request labels May 15, 2024

github-actions bot assigned Jessielovecodings Jun 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Performance]: Latency is much longer than RealTime when using benchmark_app for profiling #24517

[Performance]: Latency is much longer than RealTime when using benchmark_app for profiling #24517

cold-blue commented May 15, 2024

dmitry-gorokhov commented May 17, 2024 •

edited

Jessielovecodings commented Jun 3, 2024

github-actions bot commented Jun 3, 2024

Jessielovecodings commented Jun 3, 2024 •

edited

Jessielovecodings commented Jun 3, 2024 •

edited

[Performance]: Latency is much longer than RealTime when using benchmark_app for profiling #24517

[Performance]: Latency is much longer than RealTime when using benchmark_app for profiling #24517

Comments

cold-blue commented May 15, 2024

OpenVINO Version

Operating System

Device used for inference

OpenVINO installation

Programming Language

Hardware Architecture

Model used

Model quantization

Target Platform

Performance issue description

Step-by-step reproduction

Issue submission checklist

dmitry-gorokhov commented May 17, 2024 • edited

Jessielovecodings commented Jun 3, 2024

github-actions bot commented Jun 3, 2024

Jessielovecodings commented Jun 3, 2024 • edited

Jessielovecodings commented Jun 3, 2024 • edited

dmitry-gorokhov commented May 17, 2024 •

edited

Jessielovecodings commented Jun 3, 2024 •

edited

Jessielovecodings commented Jun 3, 2024 •

edited