Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[OMPT] codeptr_ra is pointing to runtime for certain scenarios #92423

Open
Thyre opened this issue May 16, 2024 · 1 comment
Open

[OMPT] codeptr_ra is pointing to runtime for certain scenarios #92423

Thyre opened this issue May 16, 2024 · 1 comment
Labels

Comments

@Thyre
Copy link

Thyre commented May 16, 2024

Issue description

The OpenMP Tools Interface includes several callbacks for the host, which include a value called codeptr_ra. In the specifications, it is described like this (for example for ompt_callback_parallel_begin):

The codeptr_ra argument relates the implementation of an OpenMP region to its source code. If a runtime routine implements the region associated with a callback that has type signature ompt_callback_parallel_begin_t then codeptr_ra contains the return address of the call to that runtime routine. If the implementation of the region is inlined then codeptr_ra contains the return address of the callback invocation. If attribution to source code is impossible or inappropriate, codeptr_ra may be NULL.

In a lot of cases, this is what LLVM is reporting to the tool. However, I have discovered a few select cases where this fails every single time.

Those can be broken down into the following categories:

  • The taskloop construct
  • Cancelling parallel regions
  • In some cases, the target construct
  • Helper threads in the LLVM runtime

I will present one example for each of these down below. The full reproducer can be found at the end of this issue.

Taskloop construct

Taskloop constructs cause the work andtask_create callbacks to return libomp.so:__kmpc_taskloop

One can reproduce it with this example:

#include <unistd.h>

int main( void )
{
    #pragma omp taskloop
    for( int i = 0; i < 5; ++i )
    {
	usleep(10);
    }
}

Result:

$ ./a.out
[ompt_start_tool] tid = -1 | omp_version 201611 | runtime_version = 'LLVM OMP version: 5.0.20140926'
[my_initialize_tool] tid = -1 | initial_device_num 0
[...]
[thread_begin_cb] tid = 1 | type = initial
[implicit_task_cb] tid = 1 | parallel_data = 0 | task_data = 6660001 | endpoint = begin | actual_parallelism = 1 | index = 1 | flags = initial
[sync_region_cb] tid = 1 | parallel_data = 0 | task_data = 6660001 | endpoint = begin | kind = taskgroup | codeptr_ra = ./a.out:(null)
[work_cb] tid = 1 | parallel_data = 0 | task_data = 6660001 | endpoint = begin | work_type = taskloop | count = 5 | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomp.so:__kmpc_taskloop
[task_create_cb] tid = 1 | encountering_task_data = 6660001 | new_task_data = 6660002 | flags = explicit_undeferred | has_dependences = 0 | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomp.so:__kmpc_taskloop
[task_schedule_cb] tid = 1 | prior_task_data = 6660001 | prior_status = switch | next_task_data = 6660002
[task_schedule_cb] tid = 1 | prior_task_data = 6660002 | prior_status = complete | next_task_data = 6660001
[task_create_cb] tid = 1 | encountering_task_data = 6660001 | new_task_data = 6660003 | flags = explicit_undeferred | has_dependences = 0 | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomp.so:__kmpc_taskloop
[task_schedule_cb] tid = 1 | prior_task_data = 6660001 | prior_status = switch | next_task_data = 6660003
[task_schedule_cb] tid = 1 | prior_task_data = 6660003 | prior_status = complete | next_task_data = 6660001
[task_create_cb] tid = 1 | encountering_task_data = 6660001 | new_task_data = 6660004 | flags = explicit_undeferred | has_dependences = 0 | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomp.so:__kmpc_taskloop
[task_schedule_cb] tid = 1 | prior_task_data = 6660001 | prior_status = switch | next_task_data = 6660004
[task_schedule_cb] tid = 1 | prior_task_data = 6660004 | prior_status = complete | next_task_data = 6660001
[task_create_cb] tid = 1 | encountering_task_data = 6660001 | new_task_data = 6660005 | flags = explicit_undeferred | has_dependences = 0 | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomp.so:__kmpc_taskloop
[task_schedule_cb] tid = 1 | prior_task_data = 6660001 | prior_status = switch | next_task_data = 6660005
[task_schedule_cb] tid = 1 | prior_task_data = 6660005 | prior_status = complete | next_task_data = 6660001
[task_create_cb] tid = 1 | encountering_task_data = 6660001 | new_task_data = 6660006 | flags = explicit_undeferred | has_dependences = 0 | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomp.so:__kmpc_taskloop
[task_schedule_cb] tid = 1 | prior_task_data = 6660001 | prior_status = switch | next_task_data = 6660006
[task_schedule_cb] tid = 1 | prior_task_data = 6660006 | prior_status = complete | next_task_data = 6660001
[work_cb] tid = 1 | parallel_data = 0 | task_data = 6660001 | endpoint = end | work_type = taskloop | count = 5 | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomp.so:__kmpc_taskloop
[sync_region_wait_cb] tid = 1 | parallel_data = 0 | task_data = 6660001 | endpoint = begin | kind = taskgroup | codeptr_ra = ./a.out:(null)
[sync_region_wait_cb] tid = 1 | parallel_data = 0 | task_data = 6660001 | endpoint = end | kind = taskgroup | codeptr_ra = ./a.out:(null)
[sync_region_cb] tid = 1 | parallel_data = 0 | task_data = 6660001 | endpoint = end | kind = taskgroup | codeptr_ra = ./a.out:(null)
[implicit_task_cb] tid = 1 | parallel_data = 0 | task_data = 6660001 | endpoint = end | actual_parallelism = 0 | index = 1 | flags = initial
[thread_end_cb] tid = 1
[my_finalize_tool] tid = 1

Looking at __kmpc_taskloop, it seems like the call to the tool returns the method called directly before via OMPT_GET_RETURN_ADRESS and not the user one.

Cancelling parallel regions

When a parallel region is cancelled, the pointer for the implicit barrier will point to an internal method and not the user code:

#include <omp.h>
#include <stdio.h>
#include <stdlib.h>

#define NUM_ITERATIONS 1000

void run_openmp_cancel_example(void) {

#pragma omp parallel default(none)
    {
        float random_number;
        for (int i = 0; i < NUM_ITERATIONS; ++i) {
#pragma omp cancellation point parallel
            random_number = rand() / RAND_MAX;
            if (random_number < 0.20) {
#pragma omp cancel parallel
            }
        }
    }

    if (omp_get_cancellation()) {
        printf("Parallel got cancelled!\n");
    }
}

int main(void) {
    run_openmp_cancel_example();
    return 0;
}
$ OMP_NUM_THREADS=2 OMP_CANCELLATION=true ./a.out
[thread_begin_cb] tid = 1 | type = initial
[implicit_task_cb] tid = 1 | parallel_data = 0 | task_data = 6660001 | endpoint = begin | actual_parallelism = 1 | index = 1 | flags = initial
[parallel_begin_cb] tid = 1 | parallel_data = 7770001 | encountering_task_data = 6660001 | flags = invoker_runtime_team | requested_parallelism = 2 | codeptr_ra = ./a.out:(null)
[implicit_task_cb] tid = 1 | parallel_data = 7770001 | task_data = 6660002 | endpoint = begin | actual_parallelism = 2 | index = 0 | flags = implicit
[cancel_cb] tid = 1 | task_data = 6660002 | flags = activated | codeptr_ra = ./a.out:(null)
[sync_region_cb] tid = 1 | parallel_data = 7770001 | task_data = 6660002 | endpoint = begin | kind = barrier_implicit (deprecated) | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomp.so:__kmpc_cancel_barrier
[sync_region_wait_cb] tid = 1 | parallel_data = 7770001 | task_data = 6660002 | endpoint = begin | kind = barrier_implicit (deprecated) | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomp.so:__kmpc_cancel_barrier
[thread_begin_cb] tid = 2 | type = worker
[implicit_task_cb] tid = 2 | parallel_data = 7770001 | task_data = 6660003 | endpoint = begin | actual_parallelism = 2 | index = 1 | flags = implicit
[cancel_cb] tid = 2 | task_data = 6660003 | flags = detected | codeptr_ra = ./a.out:(null)
[sync_region_cb] tid = 2 | parallel_data = 7770001 | task_data = 6660003 | endpoint = begin | kind = barrier_implicit (deprecated) | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomp.so:__kmpc_cancel_barrier
[sync_region_wait_cb] tid = 2 | parallel_data = 7770001 | task_data = 6660003 | endpoint = begin | kind = barrier_implicit (deprecated) | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomp.so:__kmpc_cancel_barrier
[sync_region_wait_cb] tid = 1 | parallel_data = 7770001 | task_data = 6660002 | endpoint = end | kind = barrier_implicit (deprecated) | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomp.so:__kmpc_cancel_barrier
[sync_region_cb] tid = 1 | parallel_data = 7770001 | task_data = 6660002 | endpoint = end | kind = barrier_implicit (deprecated) | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomp.so:__kmpc_cancel_barrier
[sync_region_wait_cb] tid = 2 | parallel_data = 7770001 | task_data = 6660003 | endpoint = end | kind = barrier_implicit (deprecated) | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomp.so:__kmpc_cancel_barrier
[sync_region_cb] tid = 2 | parallel_data = 7770001 | task_data = 6660003 | endpoint = end | kind = barrier_implicit (deprecated) | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomp.so:__kmpc_cancel_barrier
[sync_region_cb] tid = 1 | parallel_data = 7770001 | task_data = 6660002 | endpoint = begin | kind = barrier_implicit (deprecated) | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomp.so:__kmpc_cancel_barrier
[sync_region_wait_cb] tid = 1 | parallel_data = 7770001 | task_data = 6660002 | endpoint = begin | kind = barrier_implicit (deprecated) | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomp.so:__kmpc_cancel_barrier
[sync_region_cb] tid = 2 | parallel_data = 7770001 | task_data = 6660003 | endpoint = begin | kind = barrier_implicit (deprecated) | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomp.so:__kmpc_cancel_barrier
[sync_region_wait_cb] tid = 2 | parallel_data = 7770001 | task_data = 6660003 | endpoint = begin | kind = barrier_implicit (deprecated) | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomp.so:__kmpc_cancel_barrier
[sync_region_wait_cb] tid = 1 | parallel_data = 7770001 | task_data = 6660002 | endpoint = end | kind = barrier_implicit (deprecated) | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomp.so:__kmpc_cancel_barrier
[sync_region_wait_cb] tid = 2 | parallel_data = 7770001 | task_data = 6660003 | endpoint = end | kind = barrier_implicit (deprecated) | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomp.so:__kmpc_cancel_barrier
[sync_region_cb] tid = 2 | parallel_data = 7770001 | task_data = 6660003 | endpoint = end | kind = barrier_implicit (deprecated) | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomp.so:__kmpc_cancel_barrier
[sync_region_cb] tid = 2 | parallel_data = 7770001 | task_data = 6660003 | endpoint = begin | kind = barrier_implicit (deprecated) | codeptr_ra = Address (nil) not found!
[sync_region_cb] tid = 1 | parallel_data = 7770001 | task_data = 6660002 | endpoint = end | kind = barrier_implicit (deprecated) | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomp.so:__kmpc_cancel_barrier
[sync_region_wait_cb] tid = 2 | parallel_data = 7770001 | task_data = 6660003 | endpoint = begin | kind = barrier_implicit (deprecated) | codeptr_ra = Address (nil) not found!
[sync_region_cb] tid = 1 | parallel_data = 7770001 | task_data = 6660002 | endpoint = begin | kind = barrier_implicit (deprecated) | codeptr_ra = ./a.out:(null)
[sync_region_wait_cb] tid = 1 | parallel_data = 7770001 | task_data = 6660002 | endpoint = begin | kind = barrier_implicit (deprecated) | codeptr_ra = ./a.out:(null)
[sync_region_wait_cb] tid = 1 | parallel_data = 7777777 | task_data = 6660002 | endpoint = end | kind = barrier_implicit (deprecated) | codeptr_ra = ./a.out:(null)
[sync_region_cb] tid = 1 | parallel_data = 7777777 | task_data = 6660002 | endpoint = end | kind = barrier_implicit (deprecated) | codeptr_ra = ./a.out:(null)
[implicit_task_cb] tid = 1 | parallel_data = 7777777 | task_data = 6660002 | endpoint = end | actual_parallelism = 2 | index = 0 | flags = implicit
[parallel_end_cb] tid = 1 | parallel_data = 7770001 | encountering_task_data = 6660001 | flags = invoker_runtime_team | codeptr_ra = ./a.out:(null)
Parallel got cancelled!
[implicit_task_cb] tid = 1 | parallel_data = 0 | task_data = 6660001 | endpoint = end | actual_parallelism = 0 | index = 1 | flags = initial
[thread_end_cb] tid = 1
[sync_region_wait_cb] tid = 2 | parallel_data = 7777777 | task_data = 6660003 | endpoint = end | kind = barrier_implicit (deprecated) | codeptr_ra = Address (nil) not found!
[sync_region_cb] tid = 2 | parallel_data = 7777777 | task_data = 6660003 | endpoint = end | kind = barrier_implicit (deprecated) | codeptr_ra = Address (nil) not found!
[implicit_task_cb] tid = 2 | parallel_data = 7777777 | task_data = 6660003 | endpoint = end | actual_parallelism = 0 | index = 1 | flags = implicit
[thread_end_cb] tid = 2
[my_finalize_tool] tid = 1

Here, the codeptr_ra points to __kmpc_cancel_barrier which calls the correct barrier.

Target regions

Both target regions and data transfers (ompt_target_emi / ompt_target_data_emi) seem to incorrectly return libomptarget.so for their codeptr_ra

int main( void )
{
    int a[100];
    #pragma omp target data map(to: a[:100])
    {
	#pragma omp target
	{
	    a[0] = 1;
	}
    }

    return 0;
}
$ clang -fopenmp --offload-arch=native ./a.out
[thread_begin_cb] tid = 1 | type = initial
[implicit_task_cb] tid = 1 | parallel_data = 0 | task_data = 6660001 | endpoint = begin | actual_parallelism = 1 | index = 1 | flags = initial
[device_initialize_cb] tid = 1
[device_initialize_cb] tid = 1 | tracing interface entry point "ompt_get_device_num_procs" not provided for sm_75(0)
[device_initialize_cb] tid = 1 | tracing interface entry point "ompt_get_device_time" not provided for sm_75(0)
[device_initialize_cb] tid = 1 | tracing interface entry point "ompt_translate_time" not provided for sm_75(0)
[device_initialize_cb] tid = 1 | tracing interface entry point "ompt_set_trace_ompt" not provided for sm_75(0)
[device_initialize_cb] tid = 1 | tracing interface entry point "ompt_set_trace_native" not provided for sm_75(0)
[device_initialize_cb] tid = 1 | tracing interface entry point "ompt_start_trace" not provided for sm_75(0)
[device_initialize_cb] tid = 1 | tracing interface entry point "ompt_pause_trace" not provided for sm_75(0)
[device_initialize_cb] tid = 1 | tracing interface entry point "ompt_flush_trace" not provided for sm_75(0)
[device_initialize_cb] tid = 1 | tracing interface entry point "ompt_stop_trace" not provided for sm_75(0)
[device_initialize_cb] tid = 1 | tracing interface entry point "ompt_advance_buffer_cursor" not provided for sm_75(0)
[device_initialize_cb] tid = 1 | tracing interface entry point "ompt_get_record_type" not provided for sm_75(0)
[device_initialize_cb] tid = 1 | tracing interface entry point "ompt_get_record_ompt" not provided for sm_75(0)
[device_initialize_cb] tid = 1 | tracing interface entry point "ompt_get_record_native" not provided for sm_75(0)
[device_initialize_cb] tid = 1 | tracing interface entry point "ompt_get_record_abstract" not provided for sm_75(0)
[device_load_cb] tid = 1
[target_emi_cb] tid = 1 | endpoint = begin | kind = target_enter_data | device_num = 0 | task_data = 6660001 | target_task_data = 0 | target_data = 0 | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomptarget.so.18.1:__tgt_target_data_begin_mapper
[target_data_op_emi_cb] tid = 1 | endpoint = begin | target_task_data = 0 | target_data = 0 | host_op_id = 0 | src_addr = 0x7ffe659e6e70 | dest_addr = (nil) | src_device_num = 1 | dest_device_num = 0 | optype = alloc | bytes = 400 | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomptarget.so.18.1:(null)
[target_data_op_emi_cb] tid = 1 | endpoint = end | target_task_data = 0 | target_data = 0 | host_op_id = 0 | src_addr = 0x7ffe659e6e70 | dest_addr = 0x7084e4600000 | src_device_num = 1 | dest_device_num = 0 | optype = alloc | bytes = 400 | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomptarget.so.18.1:(null)
[target_data_op_emi_cb] tid = 1 | endpoint = begin | target_task_data = 0 | target_data = 0 | host_op_id = 0 | src_addr = 0x7ffe659e6e70 | dest_addr = 0x7084e4600000 | src_device_num = 1 | dest_device_num = 0 | optype = transfer_to_device | bytes = 400 | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomptarget.so.18.1:(null)
[target_data_op_emi_cb] tid = 1 | endpoint = end | target_task_data = 0 | target_data = 0 | host_op_id = 0 | src_addr = 0x7ffe659e6e70 | dest_addr = 0x7084e4600000 | src_device_num = 1 | dest_device_num = 0 | optype = transfer_to_device | bytes = 400 | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomptarget.so.18.1:(null)
[target_emi_cb] tid = 1 | endpoint = end | kind = target_enter_data | device_num = 0 | task_data = 6660001 | target_task_data = 0 | target_data = 0 | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomptarget.so.18.1:__tgt_target_data_begin_mapper
[target_emi_cb] tid = 1 | endpoint = begin | kind = target | device_num = 0 | task_data = 6660001 | target_task_data = 0 | target_data = 0 | codeptr_ra = ./a.out:(null)
[target_submit_emi_cb] tid = 1 | endpoint = begin
[target_submit_emi_cb] tid = 1 | endpoint = end
[target_emi_cb] tid = 1 | endpoint = end | kind = target | device_num = 0 | task_data = 6660001 | target_task_data = 0 | target_data = 0 | codeptr_ra = ./a.out:(null)
[target_emi_cb] tid = 1 | endpoint = begin | kind = target_exit_data | device_num = 0 | task_data = 6660001 | target_task_data = 0 | target_data = 0 | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomptarget.so.18.1:__tgt_target_data_end_mapper
[target_data_op_emi_cb] tid = 1 | endpoint = begin | target_task_data = 0 | target_data = 0 | host_op_id = 0 | src_addr = 0x7084e4600000 | dest_addr = (nil) | src_device_num = 0 | dest_device_num = -1 | optype = delete | bytes = 0 | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomptarget.so.18.1:(null)
[target_data_op_emi_cb] tid = 1 | endpoint = end | target_task_data = 0 | target_data = 0 | host_op_id = 0 | src_addr = 0x7084e4600000 | dest_addr = (nil) | src_device_num = 0 | dest_device_num = -1 | optype = delete | bytes = 0 | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomptarget.so.18.1:(null)
[target_emi_cb] tid = 1 | endpoint = end | kind = target_exit_data | device_num = 0 | task_data = 6660001 | target_task_data = 0 | target_data = 0 | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomptarget.so.18.1:__tgt_target_data_end_mapper
[implicit_task_cb] tid = 1 | parallel_data = 0 | task_data = 6660001 | endpoint = end | actual_parallelism = 0 | index = 1 | flags = initial
[thread_end_cb] tid = 1
[my_finalize_tool] tid = 1

Helper threads

When helper threads are active, their codeptr_ra partially seem to point to incorrect positions in the LLVM runtime. I consider this okay in this certain scenario, since the parallel and masked region is generated by the runtime and not by the user. Here, one can question if these callbacks should even by dispatched to the tool.

Here's a code to reproduce the issue:

int main( void )
{
    int a;

    #pragma omp target nowait map(tofrom: a)
    {
        a = 0;
    }

    return a;
}
$ clang -fopenmp --offload-arch=native ./test.c
$ ./a.out
[thread_begin_cb] tid = 1 | type = initial
[implicit_task_cb] tid = 1 | parallel_data = 0 | task_data = 6660001 | endpoint = begin | actual_parallelism = 1 | index = 1 | flags = initial
[device_initialize_cb] tid = 1
[device_initialize_cb] tid = 1 | tracing interface entry point "ompt_get_device_num_procs" not provided for sm_75(0)
[device_initialize_cb] tid = 1 | tracing interface entry point "ompt_get_device_time" not provided for sm_75(0)
[device_initialize_cb] tid = 1 | tracing interface entry point "ompt_translate_time" not provided for sm_75(0)
[device_initialize_cb] tid = 1 | tracing interface entry point "ompt_set_trace_ompt" not provided for sm_75(0)
[device_initialize_cb] tid = 1 | tracing interface entry point "ompt_set_trace_native" not provided for sm_75(0)
[device_initialize_cb] tid = 1 | tracing interface entry point "ompt_start_trace" not provided for sm_75(0)
[device_initialize_cb] tid = 1 | tracing interface entry point "ompt_pause_trace" not provided for sm_75(0)
[device_initialize_cb] tid = 1 | tracing interface entry point "ompt_flush_trace" not provided for sm_75(0)
[device_initialize_cb] tid = 1 | tracing interface entry point "ompt_stop_trace" not provided for sm_75(0)
[device_initialize_cb] tid = 1 | tracing interface entry point "ompt_advance_buffer_cursor" not provided for sm_75(0)
[device_initialize_cb] tid = 1 | tracing interface entry point "ompt_get_record_type" not provided for sm_75(0)
[device_initialize_cb] tid = 1 | tracing interface entry point "ompt_get_record_ompt" not provided for sm_75(0)
[device_initialize_cb] tid = 1 | tracing interface entry point "ompt_get_record_native" not provided for sm_75(0)
[device_initialize_cb] tid = 1 | tracing interface entry point "ompt_get_record_abstract" not provided for sm_75(0)
[parallel_begin_cb] tid = -1 | parallel_data = 7770001 | WARNING encountering_task_data = 0 | flags = invoker_runtime_team | requested_parallelism = 8 | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomp.so:(null)
[thread_begin_cb] tid = 2 | type = worker
[thread_begin_cb] tid = 3 | type = worker
[thread_begin_cb] tid = 4 | type = worker
[thread_begin_cb] tid = 5 | type = worker
[thread_begin_cb] tid = 6 | type = worker
[thread_begin_cb] tid = 7 | type = worker
[implicit_task_cb] tid = -1 | parallel_data = 7770001 | task_data = 6660002 | endpoint = begin | actual_parallelism = 8 | index = 0 | flags = implicit
[implicit_task_cb] tid = 2 | parallel_data = 7770001 | task_data = 6660003 | endpoint = begin | actual_parallelism = 8 | index = 3 | flags = implicit
[implicit_task_cb] tid = 5 | parallel_data = 7770001 | task_data = 6660004 | endpoint = begin | actual_parallelism = 8 | index = 2 | flags = implicit
[implicit_task_cb] tid = 6 | parallel_data = 7770001 | task_data = 6660005 | endpoint = begin | actual_parallelism = 8 | index = 4 | flags = implicit
[implicit_task_cb] tid = 4 | parallel_data = 7770001 | task_data = 6660006 | endpoint = begin | actual_parallelism = 8 | index = 6 | flags = implicit
[implicit_task_cb] tid = 3 | parallel_data = 7770001 | task_data = 6660007 | endpoint = begin | actual_parallelism = 8 | index = 1 | flags = implicit
[thread_begin_cb] tid = 8 | type = worker
[implicit_task_cb] tid = 8 | parallel_data = 7770001 | task_data = 6660008 | endpoint = begin | actual_parallelism = 8 | index = 5 | flags = implicit
[implicit_task_cb] tid = 7 | parallel_data = 7770001 | task_data = 6660009 | endpoint = begin | actual_parallelism = 8 | index = 7 | flags = implicit
[masked_cb] tid = -1 | parallel_data = 7770001 | task_data = 6660002 | endpoint = begin | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomp.so:(null)
[sync_region_cb] tid = 2 | parallel_data = 7770001 | task_data = 6660003 | endpoint = begin | kind = barrier_implicit (deprecated) | codeptr_ra = Address (nil) not found!
[sync_region_wait_cb] tid = 2 | parallel_data = 7770001 | task_data = 6660003 | endpoint = begin | kind = barrier_implicit (deprecated) | codeptr_ra = Address (nil) not found!
[task_create_cb] tid = 1 | encountering_task_data = 6660001 | new_task_data = 6660010 | flags = explicit_untied | has_dependences = 0 | codeptr_ra = ./a.out:(null)
[task_schedule_cb] tid = 2 | prior_task_data = 6660003 | prior_status = switch | next_task_data = 6660010
[sync_region_cb] tid = 7 | parallel_data = 7770001 | task_data = 6660009 | endpoint = begin | kind = barrier_implicit (deprecated) | codeptr_ra = Address (nil) not found!
[sync_region_wait_cb] tid = 7 | parallel_data = 7770001 | task_data = 6660009 | endpoint = begin | kind = barrier_implicit (deprecated) | codeptr_ra = Address (nil) not found!
[sync_region_cb] tid = 3 | parallel_data = 7770001 | task_data = 6660007 | endpoint = begin | kind = barrier_implicit (deprecated) | codeptr_ra = Address (nil) not found!
[sync_region_wait_cb] tid = 3 | parallel_data = 7770001 | task_data = 6660007 | endpoint = begin | kind = barrier_implicit (deprecated) | codeptr_ra = Address (nil) not found!
[sync_region_cb] tid = 6 | parallel_data = 7770001 | task_data = 6660005 | endpoint = begin | kind = barrier_implicit (deprecated) | codeptr_ra = Address (nil) not found!
[sync_region_wait_cb] tid = 6 | parallel_data = 7770001 | task_data = 6660005 | endpoint = begin | kind = barrier_implicit (deprecated) | codeptr_ra = Address (nil) not found!
[sync_region_cb] tid = 5 | parallel_data = 7770001 | task_data = 6660004 | endpoint = begin | kind = barrier_implicit (deprecated) | codeptr_ra = Address (nil) not found!
[sync_region_wait_cb] tid = 5 | parallel_data = 7770001 | task_data = 6660004 | endpoint = begin | kind = barrier_implicit (deprecated) | codeptr_ra = Address (nil) not found!
[sync_region_cb] tid = 8 | parallel_data = 7770001 | task_data = 6660008 | endpoint = begin | kind = barrier_implicit (deprecated) | codeptr_ra = Address (nil) not found!
[sync_region_wait_cb] tid = 8 | parallel_data = 7770001 | task_data = 6660008 | endpoint = begin | kind = barrier_implicit (deprecated) | codeptr_ra = Address (nil) not found!
[sync_region_cb] tid = 4 | parallel_data = 7770001 | task_data = 6660006 | endpoint = begin | kind = barrier_implicit (deprecated) | codeptr_ra = Address (nil) not found!
[sync_region_wait_cb] tid = 4 | parallel_data = 7770001 | task_data = 6660006 | endpoint = begin | kind = barrier_implicit (deprecated) | codeptr_ra = Address (nil) not found!
[device_load_cb] tid = 2
[target_emi_cb] tid = 2 | endpoint = begin | kind = target | device_num = 0 | task_data = 6660010 | target_task_data = 0 | target_data = 0 | codeptr_ra = ./a.out:(null)
[target_data_op_emi_cb] tid = 2 | endpoint = begin | target_task_data = 0 | target_data = 0 | host_op_id = 0 | src_addr = 0x7ffcc3c79518 | dest_addr = (nil) | src_device_num = 1 | dest_device_num = 0 | optype = alloc | bytes = 4 | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomptarget.so.18.1:(null)
[sync_region_cb] tid = -1 | parallel_data = 7770001 | task_data = 6660002 | endpoint = begin | kind = barrier_implicit (deprecated) | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomp.so:(null)
[target_data_op_emi_cb] tid = 2 | endpoint = end | target_task_data = 0 | target_data = 0 | host_op_id = 0 | src_addr = 0x7ffcc3c79518 | dest_addr = 0x7ee402600000 | src_device_num = 1 | dest_device_num = 0 | optype = alloc | bytes = 4 | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomptarget.so.18.1:(null)
[sync_region_wait_cb] tid = -1 | parallel_data = 7770001 | task_data = 6660002 | endpoint = begin | kind = barrier_implicit (deprecated) | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomp.so:(null)

The barriers, parallel and masked callbacks point to the runtime.

Reproducer

To reproduce the issue, I've changed a simple "ompt-printf" tool to include address resolution via dladdr. While this approach is not able to resolve the exact function name each time, it is sufficient to detect the shared library the codeptr_ra is from. I originally encountered the issue on several different systems where the codeptr_ra were resolved using libbfd.

To reproduce the issue:

  1. Build the OpenMP Tools Interface:
$ clang -fPIC -shared -fopenmp ompt-tool.c -o libompt-tool.so
$ export OMP_TOOL_LIBRARIES=$(pwd)/libompt-tool.so
  1. Compile each of the applications and run them

I've tested the issue with LLVM 18.1.2 on Ubuntu 22.04 LTS with CUDA 12.4 and an NVIDIA MX550.

@llvmbot
Copy link
Collaborator

llvmbot commented May 16, 2024

@llvm/issue-subscribers-openmp

Author: Jan André Reuter (Thyre)

## Issue description

The OpenMP Tools Interface includes several callbacks for the host, which include a value called codeptr_ra. In the specifications, it is described like this (for example for ompt_callback_parallel_begin):

> The codeptr_ra argument relates the implementation of an OpenMP region to its source code. If a runtime routine implements the region associated with a callback that has type signature ompt_callback_parallel_begin_t then codeptr_ra contains the return address of the call to that runtime routine. If the implementation of the region is inlined then codeptr_ra contains the return address of the callback invocation. If attribution to source code is impossible or inappropriate, codeptr_ra may be NULL.

In a lot of cases, this is what LLVM is reporting to the tool. However, I have discovered a few select cases where this fails every single time.

Those can be broken down into the following categories:

  • The taskloop construct
  • Cancelling parallel regions
  • In some cases, the target construct
  • Helper threads in the LLVM runtime

I will present one example for each of these down below. The full reproducer can be found at the end of this issue.

Taskloop construct

Taskloop constructs cause the work andtask_create callbacks to return libomp.so:__kmpc_taskloop

One can reproduce it with this example:

#include &lt;unistd.h&gt;

int main( void )
{
    #pragma omp taskloop
    for( int i = 0; i &lt; 5; ++i )
    {
	usleep(10);
    }
}

Result:

$ ./a.out
[ompt_start_tool] tid = -1 | omp_version 201611 | runtime_version = 'LLVM OMP version: 5.0.20140926'
[my_initialize_tool] tid = -1 | initial_device_num 0
[...]
[thread_begin_cb] tid = 1 | type = initial
[implicit_task_cb] tid = 1 | parallel_data = 0 | task_data = 6660001 | endpoint = begin | actual_parallelism = 1 | index = 1 | flags = initial
[sync_region_cb] tid = 1 | parallel_data = 0 | task_data = 6660001 | endpoint = begin | kind = taskgroup | codeptr_ra = ./a.out:(null)
[work_cb] tid = 1 | parallel_data = 0 | task_data = 6660001 | endpoint = begin | work_type = taskloop | count = 5 | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomp.so:__kmpc_taskloop
[task_create_cb] tid = 1 | encountering_task_data = 6660001 | new_task_data = 6660002 | flags = explicit_undeferred | has_dependences = 0 | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomp.so:__kmpc_taskloop
[task_schedule_cb] tid = 1 | prior_task_data = 6660001 | prior_status = switch | next_task_data = 6660002
[task_schedule_cb] tid = 1 | prior_task_data = 6660002 | prior_status = complete | next_task_data = 6660001
[task_create_cb] tid = 1 | encountering_task_data = 6660001 | new_task_data = 6660003 | flags = explicit_undeferred | has_dependences = 0 | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomp.so:__kmpc_taskloop
[task_schedule_cb] tid = 1 | prior_task_data = 6660001 | prior_status = switch | next_task_data = 6660003
[task_schedule_cb] tid = 1 | prior_task_data = 6660003 | prior_status = complete | next_task_data = 6660001
[task_create_cb] tid = 1 | encountering_task_data = 6660001 | new_task_data = 6660004 | flags = explicit_undeferred | has_dependences = 0 | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomp.so:__kmpc_taskloop
[task_schedule_cb] tid = 1 | prior_task_data = 6660001 | prior_status = switch | next_task_data = 6660004
[task_schedule_cb] tid = 1 | prior_task_data = 6660004 | prior_status = complete | next_task_data = 6660001
[task_create_cb] tid = 1 | encountering_task_data = 6660001 | new_task_data = 6660005 | flags = explicit_undeferred | has_dependences = 0 | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomp.so:__kmpc_taskloop
[task_schedule_cb] tid = 1 | prior_task_data = 6660001 | prior_status = switch | next_task_data = 6660005
[task_schedule_cb] tid = 1 | prior_task_data = 6660005 | prior_status = complete | next_task_data = 6660001
[task_create_cb] tid = 1 | encountering_task_data = 6660001 | new_task_data = 6660006 | flags = explicit_undeferred | has_dependences = 0 | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomp.so:__kmpc_taskloop
[task_schedule_cb] tid = 1 | prior_task_data = 6660001 | prior_status = switch | next_task_data = 6660006
[task_schedule_cb] tid = 1 | prior_task_data = 6660006 | prior_status = complete | next_task_data = 6660001
[work_cb] tid = 1 | parallel_data = 0 | task_data = 6660001 | endpoint = end | work_type = taskloop | count = 5 | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomp.so:__kmpc_taskloop
[sync_region_wait_cb] tid = 1 | parallel_data = 0 | task_data = 6660001 | endpoint = begin | kind = taskgroup | codeptr_ra = ./a.out:(null)
[sync_region_wait_cb] tid = 1 | parallel_data = 0 | task_data = 6660001 | endpoint = end | kind = taskgroup | codeptr_ra = ./a.out:(null)
[sync_region_cb] tid = 1 | parallel_data = 0 | task_data = 6660001 | endpoint = end | kind = taskgroup | codeptr_ra = ./a.out:(null)
[implicit_task_cb] tid = 1 | parallel_data = 0 | task_data = 6660001 | endpoint = end | actual_parallelism = 0 | index = 1 | flags = initial
[thread_end_cb] tid = 1
[my_finalize_tool] tid = 1

Looking at __kmpc_taskloop, it seems like the call to the tool returns the method called directly before via OMPT_GET_RETURN_ADRESS and not the user one.

Cancelling parallel regions

When a parallel region is cancelled, the pointer for the implicit barrier will point to an internal method and not the user code:

#include &lt;omp.h&gt;
#include &lt;stdio.h&gt;
#include &lt;stdlib.h&gt;

#define NUM_ITERATIONS 1000

void run_openmp_cancel_example(void) {

#pragma omp parallel default(none)
    {
        float random_number;
        for (int i = 0; i &lt; NUM_ITERATIONS; ++i) {
#pragma omp cancellation point parallel
            random_number = rand() / RAND_MAX;
            if (random_number &lt; 0.20) {
#pragma omp cancel parallel
            }
        }
    }

    if (omp_get_cancellation()) {
        printf("Parallel got cancelled!\n");
    }
}

int main(void) {
    run_openmp_cancel_example();
    return 0;
}
$ OMP_NUM_THREADS=2 OMP_CANCELLATION=true ./a.out
[thread_begin_cb] tid = 1 | type = initial
[implicit_task_cb] tid = 1 | parallel_data = 0 | task_data = 6660001 | endpoint = begin | actual_parallelism = 1 | index = 1 | flags = initial
[parallel_begin_cb] tid = 1 | parallel_data = 7770001 | encountering_task_data = 6660001 | flags = invoker_runtime_team | requested_parallelism = 2 | codeptr_ra = ./a.out:(null)
[implicit_task_cb] tid = 1 | parallel_data = 7770001 | task_data = 6660002 | endpoint = begin | actual_parallelism = 2 | index = 0 | flags = implicit
[cancel_cb] tid = 1 | task_data = 6660002 | flags = activated | codeptr_ra = ./a.out:(null)
[sync_region_cb] tid = 1 | parallel_data = 7770001 | task_data = 6660002 | endpoint = begin | kind = barrier_implicit (deprecated) | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomp.so:__kmpc_cancel_barrier
[sync_region_wait_cb] tid = 1 | parallel_data = 7770001 | task_data = 6660002 | endpoint = begin | kind = barrier_implicit (deprecated) | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomp.so:__kmpc_cancel_barrier
[thread_begin_cb] tid = 2 | type = worker
[implicit_task_cb] tid = 2 | parallel_data = 7770001 | task_data = 6660003 | endpoint = begin | actual_parallelism = 2 | index = 1 | flags = implicit
[cancel_cb] tid = 2 | task_data = 6660003 | flags = detected | codeptr_ra = ./a.out:(null)
[sync_region_cb] tid = 2 | parallel_data = 7770001 | task_data = 6660003 | endpoint = begin | kind = barrier_implicit (deprecated) | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomp.so:__kmpc_cancel_barrier
[sync_region_wait_cb] tid = 2 | parallel_data = 7770001 | task_data = 6660003 | endpoint = begin | kind = barrier_implicit (deprecated) | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomp.so:__kmpc_cancel_barrier
[sync_region_wait_cb] tid = 1 | parallel_data = 7770001 | task_data = 6660002 | endpoint = end | kind = barrier_implicit (deprecated) | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomp.so:__kmpc_cancel_barrier
[sync_region_cb] tid = 1 | parallel_data = 7770001 | task_data = 6660002 | endpoint = end | kind = barrier_implicit (deprecated) | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomp.so:__kmpc_cancel_barrier
[sync_region_wait_cb] tid = 2 | parallel_data = 7770001 | task_data = 6660003 | endpoint = end | kind = barrier_implicit (deprecated) | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomp.so:__kmpc_cancel_barrier
[sync_region_cb] tid = 2 | parallel_data = 7770001 | task_data = 6660003 | endpoint = end | kind = barrier_implicit (deprecated) | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomp.so:__kmpc_cancel_barrier
[sync_region_cb] tid = 1 | parallel_data = 7770001 | task_data = 6660002 | endpoint = begin | kind = barrier_implicit (deprecated) | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomp.so:__kmpc_cancel_barrier
[sync_region_wait_cb] tid = 1 | parallel_data = 7770001 | task_data = 6660002 | endpoint = begin | kind = barrier_implicit (deprecated) | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomp.so:__kmpc_cancel_barrier
[sync_region_cb] tid = 2 | parallel_data = 7770001 | task_data = 6660003 | endpoint = begin | kind = barrier_implicit (deprecated) | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomp.so:__kmpc_cancel_barrier
[sync_region_wait_cb] tid = 2 | parallel_data = 7770001 | task_data = 6660003 | endpoint = begin | kind = barrier_implicit (deprecated) | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomp.so:__kmpc_cancel_barrier
[sync_region_wait_cb] tid = 1 | parallel_data = 7770001 | task_data = 6660002 | endpoint = end | kind = barrier_implicit (deprecated) | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomp.so:__kmpc_cancel_barrier
[sync_region_wait_cb] tid = 2 | parallel_data = 7770001 | task_data = 6660003 | endpoint = end | kind = barrier_implicit (deprecated) | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomp.so:__kmpc_cancel_barrier
[sync_region_cb] tid = 2 | parallel_data = 7770001 | task_data = 6660003 | endpoint = end | kind = barrier_implicit (deprecated) | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomp.so:__kmpc_cancel_barrier
[sync_region_cb] tid = 2 | parallel_data = 7770001 | task_data = 6660003 | endpoint = begin | kind = barrier_implicit (deprecated) | codeptr_ra = Address (nil) not found!
[sync_region_cb] tid = 1 | parallel_data = 7770001 | task_data = 6660002 | endpoint = end | kind = barrier_implicit (deprecated) | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomp.so:__kmpc_cancel_barrier
[sync_region_wait_cb] tid = 2 | parallel_data = 7770001 | task_data = 6660003 | endpoint = begin | kind = barrier_implicit (deprecated) | codeptr_ra = Address (nil) not found!
[sync_region_cb] tid = 1 | parallel_data = 7770001 | task_data = 6660002 | endpoint = begin | kind = barrier_implicit (deprecated) | codeptr_ra = ./a.out:(null)
[sync_region_wait_cb] tid = 1 | parallel_data = 7770001 | task_data = 6660002 | endpoint = begin | kind = barrier_implicit (deprecated) | codeptr_ra = ./a.out:(null)
[sync_region_wait_cb] tid = 1 | parallel_data = 7777777 | task_data = 6660002 | endpoint = end | kind = barrier_implicit (deprecated) | codeptr_ra = ./a.out:(null)
[sync_region_cb] tid = 1 | parallel_data = 7777777 | task_data = 6660002 | endpoint = end | kind = barrier_implicit (deprecated) | codeptr_ra = ./a.out:(null)
[implicit_task_cb] tid = 1 | parallel_data = 7777777 | task_data = 6660002 | endpoint = end | actual_parallelism = 2 | index = 0 | flags = implicit
[parallel_end_cb] tid = 1 | parallel_data = 7770001 | encountering_task_data = 6660001 | flags = invoker_runtime_team | codeptr_ra = ./a.out:(null)
Parallel got cancelled!
[implicit_task_cb] tid = 1 | parallel_data = 0 | task_data = 6660001 | endpoint = end | actual_parallelism = 0 | index = 1 | flags = initial
[thread_end_cb] tid = 1
[sync_region_wait_cb] tid = 2 | parallel_data = 7777777 | task_data = 6660003 | endpoint = end | kind = barrier_implicit (deprecated) | codeptr_ra = Address (nil) not found!
[sync_region_cb] tid = 2 | parallel_data = 7777777 | task_data = 6660003 | endpoint = end | kind = barrier_implicit (deprecated) | codeptr_ra = Address (nil) not found!
[implicit_task_cb] tid = 2 | parallel_data = 7777777 | task_data = 6660003 | endpoint = end | actual_parallelism = 0 | index = 1 | flags = implicit
[thread_end_cb] tid = 2
[my_finalize_tool] tid = 1

Here, the codeptr_ra points to __kmpc_cancel_barrier which calls the correct barrier.

Target regions

Both target regions and data transfers (ompt_target_emi / ompt_target_data_emi) seem to incorrectly return libomptarget.so for their codeptr_ra

int main( void )
{
    int a[100];
    #pragma omp target data map(to: a[:100])
    {
	#pragma omp target
	{
	    a[0] = 1;
	}
    }

    return 0;
}
$ clang -fopenmp --offload-arch=native ./a.out
[thread_begin_cb] tid = 1 | type = initial
[implicit_task_cb] tid = 1 | parallel_data = 0 | task_data = 6660001 | endpoint = begin | actual_parallelism = 1 | index = 1 | flags = initial
[device_initialize_cb] tid = 1
[device_initialize_cb] tid = 1 | tracing interface entry point "ompt_get_device_num_procs" not provided for sm_75(0)
[device_initialize_cb] tid = 1 | tracing interface entry point "ompt_get_device_time" not provided for sm_75(0)
[device_initialize_cb] tid = 1 | tracing interface entry point "ompt_translate_time" not provided for sm_75(0)
[device_initialize_cb] tid = 1 | tracing interface entry point "ompt_set_trace_ompt" not provided for sm_75(0)
[device_initialize_cb] tid = 1 | tracing interface entry point "ompt_set_trace_native" not provided for sm_75(0)
[device_initialize_cb] tid = 1 | tracing interface entry point "ompt_start_trace" not provided for sm_75(0)
[device_initialize_cb] tid = 1 | tracing interface entry point "ompt_pause_trace" not provided for sm_75(0)
[device_initialize_cb] tid = 1 | tracing interface entry point "ompt_flush_trace" not provided for sm_75(0)
[device_initialize_cb] tid = 1 | tracing interface entry point "ompt_stop_trace" not provided for sm_75(0)
[device_initialize_cb] tid = 1 | tracing interface entry point "ompt_advance_buffer_cursor" not provided for sm_75(0)
[device_initialize_cb] tid = 1 | tracing interface entry point "ompt_get_record_type" not provided for sm_75(0)
[device_initialize_cb] tid = 1 | tracing interface entry point "ompt_get_record_ompt" not provided for sm_75(0)
[device_initialize_cb] tid = 1 | tracing interface entry point "ompt_get_record_native" not provided for sm_75(0)
[device_initialize_cb] tid = 1 | tracing interface entry point "ompt_get_record_abstract" not provided for sm_75(0)
[device_load_cb] tid = 1
[target_emi_cb] tid = 1 | endpoint = begin | kind = target_enter_data | device_num = 0 | task_data = 6660001 | target_task_data = 0 | target_data = 0 | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomptarget.so.18.1:__tgt_target_data_begin_mapper
[target_data_op_emi_cb] tid = 1 | endpoint = begin | target_task_data = 0 | target_data = 0 | host_op_id = 0 | src_addr = 0x7ffe659e6e70 | dest_addr = (nil) | src_device_num = 1 | dest_device_num = 0 | optype = alloc | bytes = 400 | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomptarget.so.18.1:(null)
[target_data_op_emi_cb] tid = 1 | endpoint = end | target_task_data = 0 | target_data = 0 | host_op_id = 0 | src_addr = 0x7ffe659e6e70 | dest_addr = 0x7084e4600000 | src_device_num = 1 | dest_device_num = 0 | optype = alloc | bytes = 400 | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomptarget.so.18.1:(null)
[target_data_op_emi_cb] tid = 1 | endpoint = begin | target_task_data = 0 | target_data = 0 | host_op_id = 0 | src_addr = 0x7ffe659e6e70 | dest_addr = 0x7084e4600000 | src_device_num = 1 | dest_device_num = 0 | optype = transfer_to_device | bytes = 400 | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomptarget.so.18.1:(null)
[target_data_op_emi_cb] tid = 1 | endpoint = end | target_task_data = 0 | target_data = 0 | host_op_id = 0 | src_addr = 0x7ffe659e6e70 | dest_addr = 0x7084e4600000 | src_device_num = 1 | dest_device_num = 0 | optype = transfer_to_device | bytes = 400 | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomptarget.so.18.1:(null)
[target_emi_cb] tid = 1 | endpoint = end | kind = target_enter_data | device_num = 0 | task_data = 6660001 | target_task_data = 0 | target_data = 0 | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomptarget.so.18.1:__tgt_target_data_begin_mapper
[target_emi_cb] tid = 1 | endpoint = begin | kind = target | device_num = 0 | task_data = 6660001 | target_task_data = 0 | target_data = 0 | codeptr_ra = ./a.out:(null)
[target_submit_emi_cb] tid = 1 | endpoint = begin
[target_submit_emi_cb] tid = 1 | endpoint = end
[target_emi_cb] tid = 1 | endpoint = end | kind = target | device_num = 0 | task_data = 6660001 | target_task_data = 0 | target_data = 0 | codeptr_ra = ./a.out:(null)
[target_emi_cb] tid = 1 | endpoint = begin | kind = target_exit_data | device_num = 0 | task_data = 6660001 | target_task_data = 0 | target_data = 0 | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomptarget.so.18.1:__tgt_target_data_end_mapper
[target_data_op_emi_cb] tid = 1 | endpoint = begin | target_task_data = 0 | target_data = 0 | host_op_id = 0 | src_addr = 0x7084e4600000 | dest_addr = (nil) | src_device_num = 0 | dest_device_num = -1 | optype = delete | bytes = 0 | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomptarget.so.18.1:(null)
[target_data_op_emi_cb] tid = 1 | endpoint = end | target_task_data = 0 | target_data = 0 | host_op_id = 0 | src_addr = 0x7084e4600000 | dest_addr = (nil) | src_device_num = 0 | dest_device_num = -1 | optype = delete | bytes = 0 | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomptarget.so.18.1:(null)
[target_emi_cb] tid = 1 | endpoint = end | kind = target_exit_data | device_num = 0 | task_data = 6660001 | target_task_data = 0 | target_data = 0 | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomptarget.so.18.1:__tgt_target_data_end_mapper
[implicit_task_cb] tid = 1 | parallel_data = 0 | task_data = 6660001 | endpoint = end | actual_parallelism = 0 | index = 1 | flags = initial
[thread_end_cb] tid = 1
[my_finalize_tool] tid = 1

Helper threads

When helper threads are active, their codeptr_ra partially seem to point to incorrect positions in the LLVM runtime. I unfortunately don't have a reproducer ready for this one, but will update the issue.

Reproducer

To reproduce the issue, I've changed a simple "ompt-printf" tool to include address resolution via dladdr. While this approach is not able to resolve the exact function name each time, it is sufficient to detect the shared library the codeptr_ra is from. I originally encountered the issue on several different systems where the codeptr_ra were resolved using libbfd.

To reproduce the issue:

  1. Build the OpenMP Tools Interface:
$ clang -fPIC -shared -fopenmp ompt-tool.c -o libompt-tool.so
$ export OMP_TOOL_LIBRARIES=$(pwd)/libompt-tool.so
  1. Compile each of the applications and run them

I've tested the issue with LLVM 18.1.2 on Ubuntu 22.04 LTS with CUDA 12.4 and an NVIDIA MX550.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants