Staged builtin rules #799

pepeiborra · 2021-03-13T16:26:05Z

This is an alternative solution to #751 implementing a no fork solution. Shake cannot possibly know whether a built-in rule is trivial or not, but the rule itself does!

This change exposes a richer type for BuiltinRun so that rule creators can decide whether to execute in-place or fork the work. The ability to run in place is only available when the runChanged is ChangedNothing, but this is a bit arbitrary and can be changed.

For example, in https://github.com/pepeiborra/ide/tree/staged-shake-rules we run Shake rules in place when the dependencies have not changed and we have a previous result at hand.

To make this work, I had to change the Lock implementation to allow for reentrancy.

This is mostly a RFC in order to get your thoughts. Is this safe? Can Action computations deadlock when executed in-place?

I will follow up with some numbers and tests soon

pepeiborra · 2021-03-13T21:54:57Z

I have further restricted the outer stage to be limited to IO instead of Action, to limit what can be done. The ghcide test suite passes all tests, but the benchmarks hang and I haven't investigated why. The Shake test suite fails in the "docs" example with a missing package error.

The improvement in sigma-ide is around 20% for the edit benchmark

ndmitchell · 2021-03-13T23:03:26Z

I suggest just commenting out the docs test while playing around. It's not very interesting for correctness.

It's an interesting idea, and I'll take a proper look tomorrow.

pepeiborra · 2021-03-14T08:09:43Z

I found the issue that was causing the ghcide benchmarks to hang: asyncrhonous exceptions. Raised basvandijk/concurrent-extra#20 and worked around it by defining RLock locally. Now ghcide no longer hangs

pepeiborra · 2021-03-14T09:09:53Z

The ghcide branch at https://github.com/pepeiborra/ide/tree/staged-shake-rules runs Shake rules in place when the dependencies have not changed and we have a previous result at hand.
The Cabal-3.0 example shows a 20%-26% decrease in allocations for some experiments (ignore the timings since experiments were done in a laptop with a throttling CPU):

version    name                             success   samples   startup              setup         userTime              delayedTime             totalTime             maxResidency   allocatedBytes
upstream   edit                             True      50        12.758429603000002   0.0           13.809277662000003    1.7339468e-2            13.838109315          214MB          36083MB
HEAD       edit                             True      50        11.655350757         0.0           11.87623591           1.7398344e-2            11.901152736          215MB          27979MB
upstream   hover                            True      50        11.047497201         0.0           0.11313416800000001   2.0759009e-2            0.14015725            194MB          5819MB
HEAD       hover                            True      50        14.597898737000001   0.0           0.122381316           2.3133136999999998e-2   0.152482855           194MB          5793MB
upstream   hover after edit                 True      50        13.608562172000001   0.0           0.6306581809999997    11.666261022            12.305062839000001    216MB          35509MB
HEAD       hover after edit                 True      50        13.418342930000001   0.0           0.608596368           11.811360499999997      12.427427936          216MB          27481MB
upstream   getDefinition                    True      50        12.765654765         0.0           0.15850191800000002   1.1992210000000003e-2   0.176914279           193MB          5732MB
HEAD       getDefinition                    True      50        12.907939635         0.0           0.15592997000000006   1.1432047999999998e-2   0.17337439300000002   192MB          5707MB
upstream   getDefinition after edit         True      50        12.620630689         0.0           0.5661279820000001    9.801615612             10.375834127000001    216MB          32860MB
HEAD       getDefinition after edit         True      50        13.764577946000001   0.0           0.829668213           11.537515337000004      12.374948022000002    216MB          24589MB
upstream   completions                      True      50        11.248367641000002   0.0           1.4015383150000005    1.7550087000000002e-2   1.4266624170000002    201MB          10240MB
HEAD       completions                      True      50        13.531454276000002   0.0           1.5824760530000002    1.9800839e-2            1.6127718210000002    202MB          10210MB
upstream   completions after edit           True      50        10.904532649         0.0           2.7622176930000006    8.120129725999998       10.893974960000001    231MB          38019MB
HEAD       completions after edit           True      50        12.935896892         0.0           2.424618349           9.411634159000004       11.844739245000001    231MB          29828MB
upstream   code actions                     True      50        10.940668545000001   0.249339101   0.3249442810000001    1.5338034999999996e-2   0.346122486           218MB          6602MB
HEAD       code actions                     True      50        13.258537115000001   0.242772479   0.4280841890000001    1.7638289e-2            0.45270955500000004   215MB          6422MB
upstream   code actions after edit          True      50        13.380240415000001   0.0           10.067724718000001    1.7597535000000008e-2   10.092605811          242MB          28646MB
HEAD       code actions after edit          True      50        13.470754748000001   0.0           10.576734214000002    1.6266468000000003e-2   10.599568782          241MB          21546MB

The RLock implementation in concurrent-extra is broken basvandijk/concurrent-extra#20

It is possible to return Now for rules that do not have a second stage

pepeiborra · 2021-03-16T10:39:42Z

One problem with this change is that we run the stage 1 with Locked and this hurts parallelism a bit.

pepeiborra · 2021-05-22T09:11:45Z

I have pushed a more efficient reentrant lock - the previous version had significant contention overheads.

pepeiborra added 11 commits March 14, 2021 10:53

staged builtin rules

0153b00

fix missing dependency

e538c9d

backwards compat. and tests

c605980

restrict the outer stage to IO

3bc6776

remove commented out code

15d164b

fix typo

0d7424e

undo redundant import

264b4fa

Define a reentrant lock

b5cc0d9

The RLock implementation in concurrent-extra is broken basvandijk/concurrent-extra#20

Fix not used warning

7dadd2b

Fix redundant imports

8fdbcfd

refactor runKey to return a Wait instead of a plain Capture

d50fa80

It is possible to return Now for rules that do not have a second stage

pepeiborra force-pushed the staged-builtin-rules branch from 66bd390 to 8fdbcfd Compare March 14, 2021 18:19

use pure

8e446f1

Much more efficient reentrant lock

a47a99d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Staged builtin rules #799

Staged builtin rules #799

pepeiborra commented Mar 13, 2021 •

edited

pepeiborra commented Mar 13, 2021 •

edited

ndmitchell commented Mar 13, 2021

pepeiborra commented Mar 14, 2021

pepeiborra commented Mar 14, 2021

pepeiborra commented Mar 16, 2021

pepeiborra commented May 22, 2021

Staged builtin rules #799

Are you sure you want to change the base?

Staged builtin rules #799

Conversation

pepeiborra commented Mar 13, 2021 • edited

pepeiborra commented Mar 13, 2021 • edited

ndmitchell commented Mar 13, 2021

pepeiborra commented Mar 14, 2021

pepeiborra commented Mar 14, 2021

pepeiborra commented Mar 16, 2021

pepeiborra commented May 22, 2021

pepeiborra commented Mar 13, 2021 •

edited

pepeiborra commented Mar 13, 2021 •

edited