-
Notifications
You must be signed in to change notification settings - Fork 10.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[OMPT] codeptr_ra
is pointing to runtime for certain scenarios
#92423
Comments
@llvm/issue-subscribers-openmp Author: Jan André Reuter (Thyre)
## Issue description
The OpenMP Tools Interface includes several callbacks for the host, which include a value called > The codeptr_ra argument relates the implementation of an OpenMP region to its source code. If a runtime routine implements the region associated with a callback that has type signature ompt_callback_parallel_begin_t then codeptr_ra contains the return address of the call to that runtime routine. If the implementation of the region is inlined then codeptr_ra contains the return address of the callback invocation. If attribution to source code is impossible or inappropriate, codeptr_ra may be NULL. In a lot of cases, this is what LLVM is reporting to the tool. However, I have discovered a few select cases where this fails every single time. Those can be broken down into the following categories:
I will present one example for each of these down below. The full reproducer can be found at the end of this issue. Taskloop constructTaskloop constructs cause the One can reproduce it with this example: #include <unistd.h>
int main( void )
{
#pragma omp taskloop
for( int i = 0; i < 5; ++i )
{
usleep(10);
}
} Result: $ ./a.out
[ompt_start_tool] tid = -1 | omp_version 201611 | runtime_version = 'LLVM OMP version: 5.0.20140926'
[my_initialize_tool] tid = -1 | initial_device_num 0
[...]
[thread_begin_cb] tid = 1 | type = initial
[implicit_task_cb] tid = 1 | parallel_data = 0 | task_data = 6660001 | endpoint = begin | actual_parallelism = 1 | index = 1 | flags = initial
[sync_region_cb] tid = 1 | parallel_data = 0 | task_data = 6660001 | endpoint = begin | kind = taskgroup | codeptr_ra = ./a.out:(null)
[work_cb] tid = 1 | parallel_data = 0 | task_data = 6660001 | endpoint = begin | work_type = taskloop | count = 5 | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomp.so:__kmpc_taskloop
[task_create_cb] tid = 1 | encountering_task_data = 6660001 | new_task_data = 6660002 | flags = explicit_undeferred | has_dependences = 0 | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomp.so:__kmpc_taskloop
[task_schedule_cb] tid = 1 | prior_task_data = 6660001 | prior_status = switch | next_task_data = 6660002
[task_schedule_cb] tid = 1 | prior_task_data = 6660002 | prior_status = complete | next_task_data = 6660001
[task_create_cb] tid = 1 | encountering_task_data = 6660001 | new_task_data = 6660003 | flags = explicit_undeferred | has_dependences = 0 | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomp.so:__kmpc_taskloop
[task_schedule_cb] tid = 1 | prior_task_data = 6660001 | prior_status = switch | next_task_data = 6660003
[task_schedule_cb] tid = 1 | prior_task_data = 6660003 | prior_status = complete | next_task_data = 6660001
[task_create_cb] tid = 1 | encountering_task_data = 6660001 | new_task_data = 6660004 | flags = explicit_undeferred | has_dependences = 0 | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomp.so:__kmpc_taskloop
[task_schedule_cb] tid = 1 | prior_task_data = 6660001 | prior_status = switch | next_task_data = 6660004
[task_schedule_cb] tid = 1 | prior_task_data = 6660004 | prior_status = complete | next_task_data = 6660001
[task_create_cb] tid = 1 | encountering_task_data = 6660001 | new_task_data = 6660005 | flags = explicit_undeferred | has_dependences = 0 | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomp.so:__kmpc_taskloop
[task_schedule_cb] tid = 1 | prior_task_data = 6660001 | prior_status = switch | next_task_data = 6660005
[task_schedule_cb] tid = 1 | prior_task_data = 6660005 | prior_status = complete | next_task_data = 6660001
[task_create_cb] tid = 1 | encountering_task_data = 6660001 | new_task_data = 6660006 | flags = explicit_undeferred | has_dependences = 0 | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomp.so:__kmpc_taskloop
[task_schedule_cb] tid = 1 | prior_task_data = 6660001 | prior_status = switch | next_task_data = 6660006
[task_schedule_cb] tid = 1 | prior_task_data = 6660006 | prior_status = complete | next_task_data = 6660001
[work_cb] tid = 1 | parallel_data = 0 | task_data = 6660001 | endpoint = end | work_type = taskloop | count = 5 | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomp.so:__kmpc_taskloop
[sync_region_wait_cb] tid = 1 | parallel_data = 0 | task_data = 6660001 | endpoint = begin | kind = taskgroup | codeptr_ra = ./a.out:(null)
[sync_region_wait_cb] tid = 1 | parallel_data = 0 | task_data = 6660001 | endpoint = end | kind = taskgroup | codeptr_ra = ./a.out:(null)
[sync_region_cb] tid = 1 | parallel_data = 0 | task_data = 6660001 | endpoint = end | kind = taskgroup | codeptr_ra = ./a.out:(null)
[implicit_task_cb] tid = 1 | parallel_data = 0 | task_data = 6660001 | endpoint = end | actual_parallelism = 0 | index = 1 | flags = initial
[thread_end_cb] tid = 1
[my_finalize_tool] tid = 1 Looking at Cancelling parallel regionsWhen a parallel region is cancelled, the pointer for the implicit barrier will point to an internal method and not the user code: #include <omp.h>
#include <stdio.h>
#include <stdlib.h>
#define NUM_ITERATIONS 1000
void run_openmp_cancel_example(void) {
#pragma omp parallel default(none)
{
float random_number;
for (int i = 0; i < NUM_ITERATIONS; ++i) {
#pragma omp cancellation point parallel
random_number = rand() / RAND_MAX;
if (random_number < 0.20) {
#pragma omp cancel parallel
}
}
}
if (omp_get_cancellation()) {
printf("Parallel got cancelled!\n");
}
}
int main(void) {
run_openmp_cancel_example();
return 0;
} $ OMP_NUM_THREADS=2 OMP_CANCELLATION=true ./a.out
[thread_begin_cb] tid = 1 | type = initial
[implicit_task_cb] tid = 1 | parallel_data = 0 | task_data = 6660001 | endpoint = begin | actual_parallelism = 1 | index = 1 | flags = initial
[parallel_begin_cb] tid = 1 | parallel_data = 7770001 | encountering_task_data = 6660001 | flags = invoker_runtime_team | requested_parallelism = 2 | codeptr_ra = ./a.out:(null)
[implicit_task_cb] tid = 1 | parallel_data = 7770001 | task_data = 6660002 | endpoint = begin | actual_parallelism = 2 | index = 0 | flags = implicit
[cancel_cb] tid = 1 | task_data = 6660002 | flags = activated | codeptr_ra = ./a.out:(null)
[sync_region_cb] tid = 1 | parallel_data = 7770001 | task_data = 6660002 | endpoint = begin | kind = barrier_implicit (deprecated) | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomp.so:__kmpc_cancel_barrier
[sync_region_wait_cb] tid = 1 | parallel_data = 7770001 | task_data = 6660002 | endpoint = begin | kind = barrier_implicit (deprecated) | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomp.so:__kmpc_cancel_barrier
[thread_begin_cb] tid = 2 | type = worker
[implicit_task_cb] tid = 2 | parallel_data = 7770001 | task_data = 6660003 | endpoint = begin | actual_parallelism = 2 | index = 1 | flags = implicit
[cancel_cb] tid = 2 | task_data = 6660003 | flags = detected | codeptr_ra = ./a.out:(null)
[sync_region_cb] tid = 2 | parallel_data = 7770001 | task_data = 6660003 | endpoint = begin | kind = barrier_implicit (deprecated) | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomp.so:__kmpc_cancel_barrier
[sync_region_wait_cb] tid = 2 | parallel_data = 7770001 | task_data = 6660003 | endpoint = begin | kind = barrier_implicit (deprecated) | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomp.so:__kmpc_cancel_barrier
[sync_region_wait_cb] tid = 1 | parallel_data = 7770001 | task_data = 6660002 | endpoint = end | kind = barrier_implicit (deprecated) | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomp.so:__kmpc_cancel_barrier
[sync_region_cb] tid = 1 | parallel_data = 7770001 | task_data = 6660002 | endpoint = end | kind = barrier_implicit (deprecated) | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomp.so:__kmpc_cancel_barrier
[sync_region_wait_cb] tid = 2 | parallel_data = 7770001 | task_data = 6660003 | endpoint = end | kind = barrier_implicit (deprecated) | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomp.so:__kmpc_cancel_barrier
[sync_region_cb] tid = 2 | parallel_data = 7770001 | task_data = 6660003 | endpoint = end | kind = barrier_implicit (deprecated) | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomp.so:__kmpc_cancel_barrier
[sync_region_cb] tid = 1 | parallel_data = 7770001 | task_data = 6660002 | endpoint = begin | kind = barrier_implicit (deprecated) | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomp.so:__kmpc_cancel_barrier
[sync_region_wait_cb] tid = 1 | parallel_data = 7770001 | task_data = 6660002 | endpoint = begin | kind = barrier_implicit (deprecated) | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomp.so:__kmpc_cancel_barrier
[sync_region_cb] tid = 2 | parallel_data = 7770001 | task_data = 6660003 | endpoint = begin | kind = barrier_implicit (deprecated) | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomp.so:__kmpc_cancel_barrier
[sync_region_wait_cb] tid = 2 | parallel_data = 7770001 | task_data = 6660003 | endpoint = begin | kind = barrier_implicit (deprecated) | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomp.so:__kmpc_cancel_barrier
[sync_region_wait_cb] tid = 1 | parallel_data = 7770001 | task_data = 6660002 | endpoint = end | kind = barrier_implicit (deprecated) | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomp.so:__kmpc_cancel_barrier
[sync_region_wait_cb] tid = 2 | parallel_data = 7770001 | task_data = 6660003 | endpoint = end | kind = barrier_implicit (deprecated) | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomp.so:__kmpc_cancel_barrier
[sync_region_cb] tid = 2 | parallel_data = 7770001 | task_data = 6660003 | endpoint = end | kind = barrier_implicit (deprecated) | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomp.so:__kmpc_cancel_barrier
[sync_region_cb] tid = 2 | parallel_data = 7770001 | task_data = 6660003 | endpoint = begin | kind = barrier_implicit (deprecated) | codeptr_ra = Address (nil) not found!
[sync_region_cb] tid = 1 | parallel_data = 7770001 | task_data = 6660002 | endpoint = end | kind = barrier_implicit (deprecated) | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomp.so:__kmpc_cancel_barrier
[sync_region_wait_cb] tid = 2 | parallel_data = 7770001 | task_data = 6660003 | endpoint = begin | kind = barrier_implicit (deprecated) | codeptr_ra = Address (nil) not found!
[sync_region_cb] tid = 1 | parallel_data = 7770001 | task_data = 6660002 | endpoint = begin | kind = barrier_implicit (deprecated) | codeptr_ra = ./a.out:(null)
[sync_region_wait_cb] tid = 1 | parallel_data = 7770001 | task_data = 6660002 | endpoint = begin | kind = barrier_implicit (deprecated) | codeptr_ra = ./a.out:(null)
[sync_region_wait_cb] tid = 1 | parallel_data = 7777777 | task_data = 6660002 | endpoint = end | kind = barrier_implicit (deprecated) | codeptr_ra = ./a.out:(null)
[sync_region_cb] tid = 1 | parallel_data = 7777777 | task_data = 6660002 | endpoint = end | kind = barrier_implicit (deprecated) | codeptr_ra = ./a.out:(null)
[implicit_task_cb] tid = 1 | parallel_data = 7777777 | task_data = 6660002 | endpoint = end | actual_parallelism = 2 | index = 0 | flags = implicit
[parallel_end_cb] tid = 1 | parallel_data = 7770001 | encountering_task_data = 6660001 | flags = invoker_runtime_team | codeptr_ra = ./a.out:(null)
Parallel got cancelled!
[implicit_task_cb] tid = 1 | parallel_data = 0 | task_data = 6660001 | endpoint = end | actual_parallelism = 0 | index = 1 | flags = initial
[thread_end_cb] tid = 1
[sync_region_wait_cb] tid = 2 | parallel_data = 7777777 | task_data = 6660003 | endpoint = end | kind = barrier_implicit (deprecated) | codeptr_ra = Address (nil) not found!
[sync_region_cb] tid = 2 | parallel_data = 7777777 | task_data = 6660003 | endpoint = end | kind = barrier_implicit (deprecated) | codeptr_ra = Address (nil) not found!
[implicit_task_cb] tid = 2 | parallel_data = 7777777 | task_data = 6660003 | endpoint = end | actual_parallelism = 0 | index = 1 | flags = implicit
[thread_end_cb] tid = 2
[my_finalize_tool] tid = 1 Here, the Target regionsBoth target regions and data transfers ( int main( void )
{
int a[100];
#pragma omp target data map(to: a[:100])
{
#pragma omp target
{
a[0] = 1;
}
}
return 0;
} $ clang -fopenmp --offload-arch=native ./a.out
[thread_begin_cb] tid = 1 | type = initial
[implicit_task_cb] tid = 1 | parallel_data = 0 | task_data = 6660001 | endpoint = begin | actual_parallelism = 1 | index = 1 | flags = initial
[device_initialize_cb] tid = 1
[device_initialize_cb] tid = 1 | tracing interface entry point "ompt_get_device_num_procs" not provided for sm_75(0)
[device_initialize_cb] tid = 1 | tracing interface entry point "ompt_get_device_time" not provided for sm_75(0)
[device_initialize_cb] tid = 1 | tracing interface entry point "ompt_translate_time" not provided for sm_75(0)
[device_initialize_cb] tid = 1 | tracing interface entry point "ompt_set_trace_ompt" not provided for sm_75(0)
[device_initialize_cb] tid = 1 | tracing interface entry point "ompt_set_trace_native" not provided for sm_75(0)
[device_initialize_cb] tid = 1 | tracing interface entry point "ompt_start_trace" not provided for sm_75(0)
[device_initialize_cb] tid = 1 | tracing interface entry point "ompt_pause_trace" not provided for sm_75(0)
[device_initialize_cb] tid = 1 | tracing interface entry point "ompt_flush_trace" not provided for sm_75(0)
[device_initialize_cb] tid = 1 | tracing interface entry point "ompt_stop_trace" not provided for sm_75(0)
[device_initialize_cb] tid = 1 | tracing interface entry point "ompt_advance_buffer_cursor" not provided for sm_75(0)
[device_initialize_cb] tid = 1 | tracing interface entry point "ompt_get_record_type" not provided for sm_75(0)
[device_initialize_cb] tid = 1 | tracing interface entry point "ompt_get_record_ompt" not provided for sm_75(0)
[device_initialize_cb] tid = 1 | tracing interface entry point "ompt_get_record_native" not provided for sm_75(0)
[device_initialize_cb] tid = 1 | tracing interface entry point "ompt_get_record_abstract" not provided for sm_75(0)
[device_load_cb] tid = 1
[target_emi_cb] tid = 1 | endpoint = begin | kind = target_enter_data | device_num = 0 | task_data = 6660001 | target_task_data = 0 | target_data = 0 | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomptarget.so.18.1:__tgt_target_data_begin_mapper
[target_data_op_emi_cb] tid = 1 | endpoint = begin | target_task_data = 0 | target_data = 0 | host_op_id = 0 | src_addr = 0x7ffe659e6e70 | dest_addr = (nil) | src_device_num = 1 | dest_device_num = 0 | optype = alloc | bytes = 400 | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomptarget.so.18.1:(null)
[target_data_op_emi_cb] tid = 1 | endpoint = end | target_task_data = 0 | target_data = 0 | host_op_id = 0 | src_addr = 0x7ffe659e6e70 | dest_addr = 0x7084e4600000 | src_device_num = 1 | dest_device_num = 0 | optype = alloc | bytes = 400 | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomptarget.so.18.1:(null)
[target_data_op_emi_cb] tid = 1 | endpoint = begin | target_task_data = 0 | target_data = 0 | host_op_id = 0 | src_addr = 0x7ffe659e6e70 | dest_addr = 0x7084e4600000 | src_device_num = 1 | dest_device_num = 0 | optype = transfer_to_device | bytes = 400 | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomptarget.so.18.1:(null)
[target_data_op_emi_cb] tid = 1 | endpoint = end | target_task_data = 0 | target_data = 0 | host_op_id = 0 | src_addr = 0x7ffe659e6e70 | dest_addr = 0x7084e4600000 | src_device_num = 1 | dest_device_num = 0 | optype = transfer_to_device | bytes = 400 | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomptarget.so.18.1:(null)
[target_emi_cb] tid = 1 | endpoint = end | kind = target_enter_data | device_num = 0 | task_data = 6660001 | target_task_data = 0 | target_data = 0 | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomptarget.so.18.1:__tgt_target_data_begin_mapper
[target_emi_cb] tid = 1 | endpoint = begin | kind = target | device_num = 0 | task_data = 6660001 | target_task_data = 0 | target_data = 0 | codeptr_ra = ./a.out:(null)
[target_submit_emi_cb] tid = 1 | endpoint = begin
[target_submit_emi_cb] tid = 1 | endpoint = end
[target_emi_cb] tid = 1 | endpoint = end | kind = target | device_num = 0 | task_data = 6660001 | target_task_data = 0 | target_data = 0 | codeptr_ra = ./a.out:(null)
[target_emi_cb] tid = 1 | endpoint = begin | kind = target_exit_data | device_num = 0 | task_data = 6660001 | target_task_data = 0 | target_data = 0 | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomptarget.so.18.1:__tgt_target_data_end_mapper
[target_data_op_emi_cb] tid = 1 | endpoint = begin | target_task_data = 0 | target_data = 0 | host_op_id = 0 | src_addr = 0x7084e4600000 | dest_addr = (nil) | src_device_num = 0 | dest_device_num = -1 | optype = delete | bytes = 0 | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomptarget.so.18.1:(null)
[target_data_op_emi_cb] tid = 1 | endpoint = end | target_task_data = 0 | target_data = 0 | host_op_id = 0 | src_addr = 0x7084e4600000 | dest_addr = (nil) | src_device_num = 0 | dest_device_num = -1 | optype = delete | bytes = 0 | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomptarget.so.18.1:(null)
[target_emi_cb] tid = 1 | endpoint = end | kind = target_exit_data | device_num = 0 | task_data = 6660001 | target_task_data = 0 | target_data = 0 | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomptarget.so.18.1:__tgt_target_data_end_mapper
[implicit_task_cb] tid = 1 | parallel_data = 0 | task_data = 6660001 | endpoint = end | actual_parallelism = 0 | index = 1 | flags = initial
[thread_end_cb] tid = 1
[my_finalize_tool] tid = 1 Helper threadsWhen helper threads are active, their ReproducerTo reproduce the issue, I've changed a simple "ompt-printf" tool to include address resolution via To reproduce the issue:
$ clang -fPIC -shared -fopenmp ompt-tool.c -o libompt-tool.so
$ export OMP_TOOL_LIBRARIES=$(pwd)/libompt-tool.so
I've tested the issue with LLVM 18.1.2 on Ubuntu 22.04 LTS with CUDA 12.4 and an NVIDIA MX550. |
Issue description
The OpenMP Tools Interface includes several callbacks for the host, which include a value called
codeptr_ra
. In the specifications, it is described like this (for example forompt_callback_parallel_begin
):In a lot of cases, this is what LLVM is reporting to the tool. However, I have discovered a few select cases where this fails every single time.
Those can be broken down into the following categories:
taskloop
constructtarget
constructI will present one example for each of these down below. The full reproducer can be found at the end of this issue.
Taskloop construct
Taskloop constructs cause the
work
andtask_create
callbacks to returnlibomp.so:__kmpc_taskloop
One can reproduce it with this example:
Result:
Looking at
__kmpc_taskloop
, it seems like the call to the tool returns the method called directly before viaOMPT_GET_RETURN_ADRESS
and not the user one.Cancelling parallel regions
When a parallel region is cancelled, the pointer for the implicit barrier will point to an internal method and not the user code:
Here, the
codeptr_ra
points to__kmpc_cancel_barrier
which calls the correct barrier.Target regions
Both target regions and data transfers (
ompt_target_emi
/ompt_target_data_emi
) seem to incorrectly returnlibomptarget.so
for theircodeptr_ra
Helper threads
When helper threads are active, their
codeptr_ra
partially seem to point to incorrect positions in the LLVM runtime. I consider this okay in this certain scenario, since the parallel and masked region is generated by the runtime and not by the user. Here, one can question if these callbacks should even by dispatched to the tool.Here's a code to reproduce the issue:
The barriers, parallel and masked callbacks point to the runtime.
Reproducer
To reproduce the issue, I've changed a simple "ompt-printf" tool to include address resolution via
dladdr
. While this approach is not able to resolve the exact function name each time, it is sufficient to detect the shared library thecodeptr_ra
is from. I originally encountered the issue on several different systems where thecodeptr_ra
were resolved using libbfd.To reproduce the issue:
I've tested the issue with LLVM 18.1.2 on Ubuntu 22.04 LTS with CUDA 12.4 and an NVIDIA MX550.
The text was updated successfully, but these errors were encountered: