Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Progress tracking interface could be better? #771

Open
saurabhnanda opened this issue Aug 16, 2020 · 13 comments · May be fixed by #777
Open

Progress tracking interface could be better? #771

saurabhnanda opened this issue Aug 16, 2020 · 13 comments · May be fixed by #777

Comments

@saurabhnanda
Copy link

saurabhnanda commented Aug 16, 2020

I'm writing a small terminal UI on top of shake, and want to display a progress bar, as demonstrated in this screenshot given below [1] However, depending on the frequency at which the IO Progress action is polled, and exactly when the progress-tracking thread is killed, I get one of two results:

  • skipped: 178, built: 1 (see footnote 2)

    Screen Shot 2020-08-16 at 6 15 21 PM

  • unknown: 178, built: 1

    Screen Shot 2020-08-16 at 6 14 22 PM

How does one get predictably notified of the of end of a build, along with the final stats? Why leave the polling to the library's user? Why doesn't shake take a polling frequency and callback function itself? And ensure the callback function is called one final time right after the build ends?

This could be related to #361, perhaps.

[1] ignore the graphical bar, I'm trying to figure out the numerical stats right now.

[2] I don't completely understand why there are two rules being built in certain runs, and only one rule in certain runs. I'm not touching anything on the filesystem, just changing the polling frequency.

@saurabhnanda
Copy link
Author

Similarly, I am unable to get anything in isFailure field of progress because presumably the progress-tracking thread is killed before the build failure can be communicated.

@ndmitchell
Copy link
Owner

Thanks for the question, and sorry it's taken so long to reply - personal stuff cropped up.

The polling and progress stuff was tacked on after the end of Shake. I wouldn't necessarily describe it as fit for purpose or an ideal API, but there's nothing stopping us from creating an alternative API that we provide to people in addition. In practice most people do pass a function to Shake using progressDisplay, which seems like the API you are proposing.

It might be possible to hack the kind of API you are asking on top of the API we have now? You can catch a threadKilled, and at that point request a final progress value, do whatever you want, and abort. Note that's how some of the builtin Shake stuff works:

progressDisplay :: Double -> (String -> IO ()) -> IO Progress -> IO ()

@saurabhnanda
Copy link
Author

Thanks for the question, and sorry it's taken so long to reply - personal stuff cropped up.

No problem at all. Thank you for so patiently replying to my question (hopefully, they make sense!), here, as well as on StackOverflow.

I noticed the progressDisplay function earlier, but its usage is fairly restricted because it simply gives you a String -> IO() hook, whereas I'd like Progress -> IO () hook.

I even looked at its internals and found it weird that it depended upon ThreadKilled to display the last message. One related question here: inside the ThreadKilled handler, would it be appropriate to poll the IO Progress action one last time? What is the guarantee that IO Progress is still in good shape, and won't itself throw an exception when called from within a ThreadKilled exception handler?

If you feel that the following is an acceptable extension to the external API (using the same internals as progressDisplay) - I can probably try putting together a PR:

-- possible an additional record for calculated/predicted timing that 
-- is not there in the current `Progress` record. if we can extend the
-- `Progress` record itself, then `Timing` is not required.
data Timing = Timing {timingPredicted :: Double, ...} 

-- what's a good name: progressProgress, progressHook, progressCallback?
progressCallbak :: Double -> ((Timing, Progress) -> IO ()) -> IO Progress -> IO ()

Btw, would Running for X sec be almost equal to countBuilt? Or will it diverge if there is a lot of administrative work being done compared to actual build work?

I'd like to reiterate, that using ThreadKilled as a signal for completion doesn't seem right to me, but I do not know the internals of Shake to take a strong stance on this.

@ndmitchell
Copy link
Owner

The API of using ThreadKilled to mean something is awful - I hate it. But it's what we've got for now. The options are to create a new API alongside it (a bit painful), replace it entirely (requires an API break) or build API's closer to what we want on top of it (cheap). My usual approach when we've made a mistake is to try and wrap it, and hopefully the wrappings will give us a good sense what the right API should have been, and eventually we fold that knowledge in to a better underlying API.

Polling IO Progress one last time during ThreadKilled probably works. I wouldn't guarantee it, but if it works today, I think we could arrange to keep it working in future - at least until we have a new API that makes doing things in ThreadKilled unnecessary.

The Progress record is meant to be things that are unambiguously true - taken from the database. The timing information is things that some heuristic calculates based on that information. I kept hoping someone else would write a timing record, in a better way, and I could use that. As such timing isn't suitable to put in progress, but defining a separate record for it isn't unreasonable.

How do you mean "Running for X sec" equal to "countBuilt"? The countBuilt is the number of nodes that have successfully finished building, its just different.

The kind of API you describe seems a reasonable wrapping, and I'm not adverse to that.

@saurabhnanda
Copy link
Author

How do you mean "Running for X sec" equal to "countBuilt"? The countBuilt is the number of nodes that have successfully finished building, its just different.

Sorry, I meant timeBuilt

@saurabhnanda
Copy link
Author

The Progress record is meant to be things that are unambiguously true - taken from the database. The timing information is things that some heuristic calculates based on that information. I kept hoping someone else would write a timing record, in a better way, and I could use that. As such timing isn't suitable to put in progress, but defining a separate record for it isn't unreasonable.

Apart from what progressDisplay is doing internally (wrt Mealy machine), is there a better way to do deal with a Timing record? Actual vs predicted timing for each rule, perhaps?

The kind of API you describe seems a reasonable wrapping, and I'm not adverse to that.

Should I attempt a PR?

@ndmitchell
Copy link
Owner

timeBuilt is a bit different. If you have completed 5 rules, and each individually took 10s, then you get timeBuilt as 50s. They might have run in parallel, and thus be higher than the current time. We might have spent 100s of a 120s rule, in which case timeBuilt doesn't yet show that. Happy to take a PR or suggestion on how to change the docs to make that clear.

I have no idea what better ways to deal with progressDisplay. Nowadays throwing it through an ML model with historical values might be interesting.

Please do make a PR. For the incidental decisions (like naming) just do whatever you think best, and we can discuss the details in the context of specific code. Sometimes the name is only apparent after you have written the Haddock docs. Sometimes you need the full signature. Sometimes something generic like progressCallback is just fine.

@saurabhnanda
Copy link
Author

I'm working on this today and I'm wondering whether the the API need to poll at all? For a long-running rule, Shake doesn't know the percentage completion of the rule. It simply knows whether the rule is pending, skipped, or completed. Can't the progress callback be called each time the state of a rule changes?

Basically tracking progress is a discrete/step function, not a continuous time-varying function, right?

@saurabhnanda saurabhnanda linked a pull request Sep 29, 2020 that will close this issue
@saurabhnanda
Copy link
Author

I've got a basic progressTracker :: Double -> (Progress -> IO ()) -> IO Progress -> IO () working in #777 and the ThreadKilled hack is working fine. However, even after the build finishes, I'm getting the following numbers, which never amount to 100% completion: (fromIntegral $ countSkipped + countBuilt) / (fromIntegral $ countTodo + countUnknown + countSkipped + countBuilt)

Screen Shot 2020-09-29 at 3 46 30 PM

@saurabhnanda
Copy link
Author

@ndmitchell bump.

@ndmitchell
Copy link
Owner

There's no guarantee that after things complete you have built everything that was known about. For example, you might have built release/bin last time, and this time you build asking for debug/bin, which I think will leave countUnknown higher. If it's looking close though I'd write a hack and make it skip to 100% at the end.

@ndmitchell
Copy link
Owner

Basically tracking progress is a discrete/step function, not a continuous time-varying function, right?

Yes. But I was worried about the performance implication. It takes O(n) to scan the database, since it just walks the list of keys. If we did that on every change, we'd probably scale O(n^2). However, if we had built this in from the beginning we'd probably tie changing the progress structure to the updates, so each progress change would take O(1) and then streaming it immediately wouldn't be unreasonable. So I guess its mostly because progress was bolted on after, rather than designed in from the beginning.

@saurabhnanda
Copy link
Author

There's no guarantee that after things complete you have built everything that was known about. For example, you might have built release/bin last time, and this time you build asking for debug/bin, which I think will leave countUnknown higher. If it's looking close though I'd write a hack and make it skip to 100% at the end.

That sounds like skipped. Basically, shake knows what rules it has to, and doesn't have to, build in each run, right? What's unknown about it? I'm probably getting confused by the naming. Does this number change between the begining of the run and end of the run, or does it stay constant throughout?

Or are you saying that al rules starts
in the unknown bucket, and as the build progresses, each rule is re-bucketed into todo, skipped, and built? And due to the way it's written, a bunch of rules end-up hanging around in the unknown bucket even after the build is over?

So I guess its mostly because progress was bolted on after, rather than designed in from the beginning.

I haven't looked at the internals, but this shouldn't be that hard, right? There would be a few common functions that get called each time a rule is evaluated/started/finished. We would need to fire a registered callback at those places. Doesn't seem very different from shakeTrace. Should I attempt this? Are you open to such a change?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants