Store traces in a tree form rather than a list so span can be calcula… #658

spall · 2019-03-19T00:32:25Z

I added the TTree, TForest, PtTree, PtForest data types to maintain the ordering and relationship of the traced commands to accurately calculate span. The original list is also maintained, although maybe calculating it when necessary is better.

Additionally to calculating span I think this could be useful for future build system analysis.

I look forward to your comment on if you think this is worthwhile or if there is a better way to implement it.

Note: this did pass the tests on my Ubuntu machine

…ted.

ndmitchell · 2019-03-19T09:52:02Z

Thanks for the PR @spall. What's the desire behind this PR? What is the advantage of storing traces in a tree? What can we do now?

spall · 2019-03-19T15:20:32Z

Hi Neil, I'm working on a project to understand and improve build system parallelism. While adding a calculation of the span (critical path) of a Shake build, I realized that Shake doesn't maintain enough information to calculate this accurately, in particular when commands in a rule are specified to run in parallel. By storing the commands in a tree rather than a list, we would have enough information to calculate this accurately. I have just seen your blog post from yesterday and it appears you are thinking about the same issues. I see that your profiler page in the blog post is stating a "precise critical path", how have you calculated this with the existing structure? Or was this done with code that isn't yet on the master branch? Sarah

…

On Tue, Mar 19, 2019 at 5:52 AM Neil Mitchell ***@***.***> wrote: Thanks for the PR @spall <https://github.com/spall>. What's the desire behind this PR? What is the advantage of storing traces in a tree? What can we do now? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#658 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABUkEua3O5DzWLvMnUY7tjxMwGCtCQ25ks5vYLNEgaJpZM4b7Cbr> .

ndmitchell · 2019-03-20T12:11:06Z

Cool project. Yes, I've been investigating things, writeup at https://neilmitchell.blogspot.com/2019/03/ghc-rebuild-times-shake-profiling.html, and everything that was used to do that is entirely public. The actual code that does it is in html/ts - in particular https://github.com/ndmitchell/shake/blob/master/html/ts/reports/summary.tsx#L44-L136.

My guess is that most build systems don't call parallel, e.g. it's entirely absent in Hadrian, so it won't have a significant effect. As a result, I'm not sure that storing the detail at that level of granularity helps. I also based my calculations on the depends field and the execution time - rather assuming (as a slight simplification) that the build proceeds a a series of need followed by the bulk of the execution. To get my precise calculation I did need to extend depends from being a set of dependencies to a list of sets of dependencies as it really is in the core of Shake. Previously profiling blurred that information away.

Is this project meant to be a significant part of your PhD? Or more just a side project? Is the Shake aspect an important one? If it's going to be a big part, it may be worth us having a video-chat to figure out what you're trying to do and seeing how it all fits together. I've just finished a big bit of work on profiling, but I'm not particularly planning to do much work on it for a while now.

spall · 2019-03-20T15:27:00Z

My guess is that most build systems don't call parallel, e.g. it's entirely absent in Hadrian, so it won't have a significant effect. As a result, I'm not sure that storing the detail at that level of granularity helps

Why don't build systems use parallel?

To get my precise calculation I did need to extend depends from being a set of dependencies to a list of sets of dependencies as it really is in the core of Shake. Previously profiling blurred that information away.

I noticed this; I had actually made this change myself but when I fetched your latest code to rebase mine on top of I saw that you had also changed this. Is this project meant to be a significant part of your PhD? Or more just a

side project? Is the Shake aspect an important one? If it's going to be a big part, it may be worth us having a video-chat to figure out what you're trying to do and seeing how it all fits together. I've just finished a big bit of work on profiling, but I'm not particularly planning to do much work on it for a while now.

I am going to send you an email about this. Sarah

…

On Wed, Mar 20, 2019 at 8:11 AM Neil Mitchell ***@***.***> wrote: Cool project. Yes, I've been investigating things, writeup at https://neilmitchell.blogspot.com/2019/03/ghc-rebuild-times-shake-profiling.html, and everything that was used to do that is entirely public. The actual code that does it is in html/ts - in particular https://github.com/ndmitchell/shake/blob/master/html/ts/reports/summary.tsx#L44-L136 . My guess is that most build systems don't call parallel, e.g. it's entirely absent in Hadrian, so it won't have a significant effect. As a result, I'm not sure that storing the detail at that level of granularity helps. I also based my calculations on the depends field and the execution time - rather assuming (as a slight simplification) that the build proceeds a a series of need followed by the bulk of the execution. To get my precise calculation I did need to extend depends from being a set of dependencies to a list of sets of dependencies as it really is in the core of Shake. Previously profiling blurred that information away. Is this project meant to be a significant part of your PhD? Or more just a side project? Is the Shake aspect an important one? If it's going to be a big part, it may be worth us having a video-chat to figure out what you're trying to do and seeing how it all fits together. I've just finished a big bit of work on profiling, but I'm not particularly planning to do much work on it for a while now. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#658 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABUkEpdjqX1iqYjmJfbU_a3sEo_zoAyDks5vYiVagaJpZM4b7Cbr> .

ndmitchell · 2019-03-27T07:29:53Z

Shake has need which demands a number of dependencies in parallel. It encodes that in the database with the dependencies being an ordered list of sets. As a result, very few people use explicit parallel.

Store traces in a tree form rather than a list so span can be calcula…

12d781c

…ted.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Store traces in a tree form rather than a list so span can be calcula… #658

Store traces in a tree form rather than a list so span can be calcula… #658

spall commented Mar 19, 2019 •

edited

ndmitchell commented Mar 19, 2019

spall commented Mar 19, 2019 via email

ndmitchell commented Mar 20, 2019

spall commented Mar 20, 2019 via email

ndmitchell commented Mar 27, 2019

Store traces in a tree form rather than a list so span can be calcula… #658

Are you sure you want to change the base?

Store traces in a tree form rather than a list so span can be calcula… #658

Conversation

spall commented Mar 19, 2019 • edited

ndmitchell commented Mar 19, 2019

spall commented Mar 19, 2019 via email

ndmitchell commented Mar 20, 2019

spall commented Mar 20, 2019 via email

ndmitchell commented Mar 27, 2019

spall commented Mar 19, 2019 •

edited