Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Performance]: Latency is much longer than RealTime when using benchmark_app for profiling #24517

Open
3 tasks done
cold-blue opened this issue May 15, 2024 · 5 comments
Open
3 tasks done
Assignees
Labels
performance Performance related topics support_request

Comments

@cold-blue
Copy link

OpenVINO Version

No response

Operating System

Windows System

Device used for inference

None

OpenVINO installation

PyPi

Programming Language

Python

Hardware Architecture

x86 (64 bits)

Model used

N/A

Model quantization

Yes

Target Platform

No response

Performance issue description

When I am using benchmark_app to profile a model. I find that the latency in the report is much longer than the total RealTime computed in detailed_counter_report.csv.
In report.csv:
image
In detailed_counter_report.csv:
image

I wonder whether this is a normal phenomenon and what is the reason?

Step-by-step reproduction

No response

Issue submission checklist

  • I'm reporting a performance issue. It's not a question.
  • I checked the problem with the documentation, FAQ, open issues, Stack Overflow, etc., and have not found a solution.
  • There is reproducer code and related data files such as images, videos, models, etc.
@cold-blue cold-blue added performance Performance related topics support_request labels May 15, 2024
@dmitry-gorokhov
Copy link
Contributor

dmitry-gorokhov commented May 17, 2024

Hi @cold-blue! What device you used for inference? CPU, GPU?

@Jessielovecodings
Copy link

.take

Copy link
Contributor

github-actions bot commented Jun 3, 2024

Thank you for looking into this issue! Please let us know if you have any questions or require any help.

@Jessielovecodings
Copy link

Jessielovecodings commented Jun 3, 2024

hi @cold-blue
When you use benchmark_app to profile model performance and find that the latency reported in the report is much longer than the total RealTime computed in detailed_counter_report.csv, this could be due to the following reasons:

  1. Initial loading and initialization time:
    benchmark_app requires loading the model and performing some initialization operations before starting the evaluation. This process includes loading the model file, initializing the plugin, allocating memory, etc. These times are usually included in the total latency reported but are not included in the specific inference time statistics.

  2. Data preprocessing and postprocessing time:
    The total latency reported may include data preprocessing and postprocessing time, whereas detailed_counter_report.csv primarily records the actual execution time of the inference engine.

  3. Differences in measurement methods:
    Different timing methods may result in time differences. For example, the time in the report may be the total time from the start to the end of the application, while the time in detailed_counter_report.csv may only record the inference process time.

  4. Batch size and synchronous/asynchronous execution:
    If batch processing or asynchronous execution mode is used, the actual inference time statistics and the total latency time may differ because the asynchronous mode timing may include more synchronization operation time.

How to solve:

  1. Break down the time overhead:
    By adding logs or further analysis, break down the time overhead of each step, including model loading, initialization, data preprocessing, inference process, data postprocessing, etc. This will help identify which steps are more time-consuming.

  2. Compare execution time in different modes:
    Try running benchmark_app in both synchronous and asynchronous modes and compare the execution times to analyze the reasons for the time differences.

  3. Check and optimize the initialization process:
    If the initialization process is time-consuming, consider optimizing the model loading and initialization steps, such as reducing the model size, optimizing the model file, etc.

  4. Verify data processing steps:
    Ensure the efficiency of data preprocessing and postprocessing steps to avoid unnecessary time overhead. If the data processing steps are time-consuming, consider optimizing these steps.

Here is an example code to break down the time overhead of each step:

import time
from openvino.inference_engine import IECore

#Record start time
start_time = time.time()

#Create IE instance
ie = IECore()
ie_creation_time = time.time()

#Read model
net = ie.read_network(model="model.xml", weights="model.bin")
read_network_time = time.time()

#Load model to device
exec_net = ie.load_network(network=net, device_name="CPU")
load_network_time = time.time()

#Prepare input data
input_data = prepare_input_data()  # This is a placeholder, replace with actual data preparation steps
prepare_input_data_time = time.time()

#Perform inference
results = exec_net.infer(inputs={input_blob_name: input_data})
infer_time = time.time()

#Record total time
end_time = time.time()

#Output time statistics
print(f"Total time: {end_time - start_time:.2f} seconds")
print(f"IE creation time: {ie_creation_time - start_time:.2f} seconds")
print(f"Read network time: {read_network_time - ie_creation_time:.2f} seconds")
print(f"Load network time: {load_network_time - read_network_time:.2f} seconds")
print(f"Prepare input data time: {prepare_input_data_time - load_network_time:.2f} seconds")
print(f"Infer time: {infer_time - prepare_input_data_time:.2f} seconds")

By using this approach, you can get detailed insights into the time taken for each step and perform targeted optimizations. This will help you better understand and resolve the issue of long latency times.

@Jessielovecodings
Copy link

Jessielovecodings commented Jun 3, 2024

@cold-blue
Summary

  1. Break down the latency into specific steps to identify where the most time is being spent.
  2. Compare synchronous and asynchronous execution modes to understand their impact on total latency.
  3. Optimize the initialization, data preprocessing, and postprocessing steps to reduce overhead.
  4. Ensure measurement consistency by understanding the differences between the total time and detailed per-inference time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Performance related topics support_request
Projects
None yet
Development

No branches or pull requests

3 participants