Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tprof can crash with {bad_generator, undefined} #8483

Open
mkuratczyk opened this issue May 15, 2024 · 7 comments · May be fixed by #8547
Open

tprof can crash with {bad_generator, undefined} #8483

mkuratczyk opened this issue May 15, 2024 · 7 comments · May be fixed by #8547
Assignees
Labels
bug Issue is reported as a bug team:VM Assigned to OTP team VM

Comments

@mkuratczyk
Copy link
Contributor

mkuratczyk commented May 15, 2024

Describe the bug
tprof can crash with the following error:

=ERROR REPORT==== 15-May-2024::18:32:00.203068 ===                                                                                                                                                                                                               
** Generic server tprof terminating                                                                                                                                                                                                                              
** Last message in was collect                                                                                                                                                                                                                                   
** When Server state == {tprof_state,{#Ref<0.1502541203.1629093889.118795>,                                                                                                                                                                                      
                                      {tprof,15}},                                                                                                                                                                                                               
                                     call_memory,all,false,undefined}                                                                                                                                                                                            
** Reason for termination ==                                                                                                                                                                                                                                     
** {{bad_generator,undefined},                                                                                                                                                                                                                                   
    [{tprof,'-combine_trace/1-lc$^1/1-1-',1,[]},                                                                                                                                                                                                                 
     {tprof,combine_trace,1,[{file,"tprof.erl"},{line,1050}]},                                                                                                                                                                                                   
     {tprof,'-collect_trace/5-fun-0-',5,[{file,"tprof.erl"},{line,1017}]},                                                                                                                                                                                       
     {lists,foldl,3,[{file,"lists.erl"},{line,2146}]},                                                                                                                                                                                                           
     {tprof,collect_trace,5,[{file,"tprof.erl"},{line,1015}]},                                                                                                                                                                                                   
     {maps,fold_1,4,[{file,"maps.erl"},{line,860}]},                                                                                                                                                                                                             
     {tprof,collect,3,[{file,"tprof.erl"},{line,1360}]},                                                                                                                                                                                                         
     {tprof,handle_call,3,[{file,"tprof.erl"},{line,976}]}]}                                                                                                                                                                                                     
** Client <0.251.0> stacktrace                                                                                                                                                                                                                                   
** [{gen,do_call,4,[{file,"gen.erl"},{line,241}]},                                                                                                                                                                                                               
    {gen_server,call,3,[{file,"gen_server.erl"},{line,1122}]},                                                                                                                                                                                                   
    {erl_eval,do_apply,7,[{file,"erl_eval.erl"},{line,904}]},                                                                                                                                                                                                    
    {erl_eval,expr,6,[{file,"erl_eval.erl"},{line,648}]},                                                                                                                                                                                                        
    {erl_eval,exprs,6,[{file,"erl_eval.erl"},{line,271}]},                                                                                                                                                                                                       
    {erl_eval,eval_lc1,7,[{file,"erl_eval.erl"},{line,929}]},                                                                                                                                                                                                    
    {erl_eval,eval_generate,8,[{file,"erl_eval.erl"},{line,1003}]},                                                                                                                                                                                              
    {erl_eval,eval_lc,7,[{file,"erl_eval.erl"},{line,917}]}]                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                 
=CRASH REPORT==== 15-May-2024::18:32:00.203723 ===                                                                                                                                                                                                               
  crasher:                                                                                                                                                                                                                                                       
    initial call: tprof:init/1                                                                                                                                                                                                                                   
    pid: <0.287.0>                                                                                                                                                                                                                                               
    registered_name: tprof                                                                                                                                                                                                                                       
    exception error: bad generator undefined                                                                                                                                                                                                                     
      in function  tprof:'-combine_trace/1-lc$^1/1-1-'/1                                                                                                                                                                                                         
      in call from tprof:combine_trace/1 (tprof.erl, line 1050)                                                                                                                                                                                                  
      in call from tprof:'-collect_trace/5-fun-0-'/5 (tprof.erl, line 1017)                                                                                                                                                                                      
      in call from lists:foldl/3 (lists.erl, line 2146)                                                                                                                                                                                                          
      in call from tprof:collect_trace/5 (tprof.erl, line 1015)                                                                                                                                                                                                  
      in call from maps:fold_1/4 (maps.erl, line 860)                                                                                                                                                                                                            
      in call from tprof:collect/3 (tprof.erl, line 1360)                                                                                                                                                                                                        
      in call from tprof:handle_call/3 (tprof.erl, line 976)     

To Reproduce
I don't have a deterministic way to reproduce. However, using the following loops I can trigger it within seconds.

I'm using https://github.com/max-au/erlperf just to easily generate some activity

  1. rebar3 shell
[erlperf:run({lists, seq, [1, 12]}, #{warmup => 1}, #{max => 50}) || _ <- lists:seq(1, 100)].
  1. In another shell:
[begin tprof:start(#{type => call_memory}), tprof:set_pattern('_', '_' , '_'), tprof:enable_trace({all_children, kernel_sup}), timer:sleep(floor(random:uniform() * 1000)), F = tprof:collect(), tprof:format(tprof:inspect(F)), tprof:stop(),
 f(F) end || _ <- lists:seq(1, 100)].

While this example is somewhat artificial, I suspect the problem is timing related and therefore short lived processes and short tprof captures make it easy to trigger it. Originally I hit this issue under more usual circumstances.

Affected versions
Current master (I'm running d15e18e, from May 9th exactly)

@mkuratczyk mkuratczyk added the bug Issue is reported as a bug label May 15, 2024
@IngelaAndin IngelaAndin added the team:VM Assigned to OTP team VM label May 20, 2024
@garazdawi
Copy link
Contributor

I cannot reproduce it with your example using 27.0, can you check if you still can? What I did was:

$ git clone erlperf
$ cd erlperf
$ rebar3 shell
Erlang/OTP 27 [erts-15.0] [source-ef44167350] [64-bit] [smp:8:8] [ds:8:8:10] [async-threads:1] [jit:ns]
1> [erlperf:run({lists, seq, [1, 12]}, #{warmup => 1}, #{max => 50}) || _ <- lists:seq(1, 100)].
Ctrl+G
User switch command (type h for help)
 --> s

 --> c

Eshell V15.0 (press Ctrl+G to abort, type help(). for help)
1> [begin tprof:start(#{type => call_memory}), tprof:set_pattern('_', '_' , '_'), tprof:enable_trace({all_children, kernel_sup}), timer:sleep(floor(random:uniform() * 1000)), F = tprof:collect(), tprof:format(tprof:inspect(F)), tprof:stop(), f(F) end || _ <- lists:seq(1, 100)].

@mkuratczyk
Copy link
Contributor Author

Interesting, I can very easily reproduce it on macOS using 27.0 but I can't on Linux.

on macOS it just took a second or so to get this:

=CRASH REPORT==== 3-Jun-2024::14:10:41.913107 ===
  crasher:
    initial call: tprof:init/1
    pid: <0.386.0>
    registered_name: tprof
    exception error: bad generator undefined
      in function  tprof:'-collect_trace/5-fun-0-'/5 (tprof.erl, line 1016)
      in call from tprof:combine_trace/1 (tprof.erl, line 1050)
      in call from tprof:'-collect_trace/5-fun-0-'/5 (tprof.erl, line 1017)
      in call from lists:foldl/3 (lists.erl, line 2146)
      in call from tprof:collect_trace/5 (tprof.erl, line 1015)
      in call from maps:fold_1/4 (maps.erl, line 860)
      in call from tprof:collect/3 (tprof.erl, line 1360)
      in call from tprof:handle_call/3 (tprof.erl, line 976)
    ancestors: [<0.379.0>,<0.378.0>,<0.376.0>,<0.65.0>,<0.69.0>,<0.64.0>,
                  kernel_sup,<0.47.0>]
    message_queue_len: 0
    messages: []
    links: []
    dictionary: []
    trap_exit: true
    status: running
    heap_size: 46422
    stack_size: 29
    reductions: 253820
  neighbours:

** exception exit: {{{bad_generator,undefined},
                     [{tprof,'-collect_trace/5-fun-0-',5,
                             [{file,"tprof.erl"},{line,1016}]},
                      {tprof,combine_trace,1,[{file,"tprof.erl"},{line,1050}]},
                      {tprof,'-collect_trace/5-fun-0-',5,
                             [{file,"tprof.erl"},{line,1017}]},
                      {lists,foldl,3,[{file,"lists.erl"},{line,2146}]},
                      {tprof,collect_trace,5,[{file,"tprof.erl"},{line,1015}]},
                      {maps,fold_1,4,[{file,"maps.erl"},{line,860}]},
                      {tprof,collect,3,[{file,"tprof.erl"},{line,1360}]},
                      {tprof,handle_call,3,[{file,"tprof.erl"},{line,976}]}]},
                    {gen_server,call,[tprof,collect,infinity]}}
     in function  gen_server:call/3 (gen_server.erl, line 1126)

@garazdawi
Copy link
Contributor

How odd, guess I'll have to switch to a mac to debug further.

@garazdawi
Copy link
Contributor

Can't reproduce on my Intel mac.... I assume you are running an arm machine?

@mkuratczyk
Copy link
Contributor Author

Yes, M1Pro

@max-au
Copy link
Contributor

max-au commented Jun 4, 2024

This happens when trace:info returns undefined here: https://github.com/erlang/otp/blob/master/lib/tools/src/tprof.erl#L1017

It might be erlperf specific interaction. The way it works, erlperf generates source code for a new module. You can view the generated code of that module if you rin erlperf in the verbose mode - erlperf -v. Then it compiles the module, loads it, and enables call_count tracing on the function of that module: https://github.com/max-au/erlperf/blob/master/src/erlperf_job.erl#L383
Then, to take samples, it calls trace_info: https://github.com/max-au/erlperf/blob/master/src/erlperf_job.erl#L297

(note: since erlperf works on older OTP, starting with 22, it is not yet using multiple trace session feature of OTP 27, but I open a new GitHub issue to track and implement that)

Update: I managed to reproduce this on Linux.
Indeed, trace:info returns undefined for this call:

trace:info({#Ref<0.3620025655.329908228.236674>, {tprof,28}}, {job_149002_5,job_149002_5,3}, 3)

This can, for example, happen if code:all_loaded had job_149002_5 in the output, but module was unloaded before making module_info call. That said, I think I can make a PR fixing that:

iff --git a/lib/tools/src/tprof.erl b/lib/tools/src/tprof.erl
index 24cadf9699..a7aabc91a4 100644
--- a/lib/tools/src/tprof.erl
+++ b/lib/tools/src/tprof.erl
@@ -1037,6 +1037,7 @@ collect_trace(Session, Mod, FunList, Acc, Type) ->
 %% statistics won't be correct anyway. Hence the warning in the user
 %% guide, guarding against hot code reload while tracing.
 combine_trace({_, false}) -> fail;
+combine_trace({_, undefined}) -> skip;
 combine_trace({call_count, 0}) -> skip;
 combine_trace({call_count, Num}) -> [{all, Num, Num}];
 combine_trace({call_time, Times}) ->

I can make a PR, but let me think on a good test case causing that specific race (module unloaded in between of code:all_loaded and trace:info calls).

@garazdawi
Copy link
Contributor

aha, that would explain it. A PR would be most welcome even if you cannot come up with a good testcase.

@max-au max-au linked a pull request Jun 5, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Issue is reported as a bug team:VM Assigned to OTP team VM
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants