Implementation of irregular flattening #1740

athas · 2022-10-16T19:24:48Z

This highly WIP PR contains an implementation of full flattening as a transformation that goes from the SOACS representation to GPU. It is the result of a few hours of hacking and can handle the following program:

def main = map (\n -> #[unsafe] i64.sum (iota n))

Flattened versions of all core Futhark constructs must be defined, and so far I have only done Iota and Reduce. The main design challenge was the representation of irregular arrays in the compiler, as well as the overall structure of the algorithm. It is currently based on (irregular) distribution, much like the moderate flattening algorithm. I think that is the best way to do it.

The ultimate goals are:

Replace the current implementation of flattening with one that is more principled (the old one is really dirty in places because I had no idea what I was doing).
Maintain incremental flattening, just in a more principled framework. The "fully flattened" code version will only be used as a last resort.
Experiment with whether (possibly virtualised) full flattening may perhaps have acceptable performance in the intra-group case.
Support compiling any Futhark program to GPU code; not just the subset that uses only regular nested parallelism.
Bring back recursive functions.

An explicit non-goal is adding support for irregular arrays in the source language.

Munksgaard · 2022-10-17T11:23:53Z

Wow, this is exciting stuff! What are the prospects for success?

athas · 2022-10-17T11:56:53Z

There is no real risk of this not succeeding. Flattening a first-order monomorphic language is not particularly difficult, although implementing all the cases will be tedious. The main challenge is writing the code in a clean and maintainable way. Time is the main constraint.

That's only for naive flattening, though, which is notoriously inefficient in practice. It's a more open question how efficient we can make it subsequently. I'm fairly confident we can do a good job, though.

The only somewhat bothersome wrinkle is that we allow arbitrary (potentially irregular) parallelism in reduce/scan/histogram operators. Flattening doesn't have a solution for that. However, nontrivial parallelism in those operators is exceedingly rare, and in all cases these constructs can be turned into maps without any asymptotic overhead, so we can always do that for the really nasty cases. (Although so can the original programmer who writes the source code.)

melsman · 2022-10-17T22:04:53Z

I like it!! Also the return of recursion... man. 17. okt. 2022 kl. 13.57 skrev Troels Henriksen < ***@***.***>:

…

There is no real risk of this not succeeding. Flattening a first-order monomorphic language is not particularly difficult, although implementing all the cases will be tedious. The main challenge is writing the code in a clean and maintainable way. Time is the main constraint. That's only for *naive* flattening, though, which is notoriously inefficient in practice. It's a more open question how efficient we can make it subsequently. I'm fairly confident we can do a good job, though. The only somewhat bothersome wrinkle is that we allow arbitrary (potentially irregular) parallelism in reduce/scan/histogram operators. Flattening doesn't have a solution for that. However, nontrivial parallelism in those operators is exceedingly rare, and in all cases these constructs can be turned into maps without any asymptotic overhead, so we can always do that for the really nasty cases. (Although so can the original programmer who writes the source code.) — Reply to this email directly, view it on GitHub <#1740 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAI5DO5II43RN2VHMXJCTLDWDU5JDANCNFSM6AAAAAARGPEMCI> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

FluxusMagna · 2022-10-28T14:50:29Z

What does the type of a irregular nested array look like? [n][?]a or similar? I guess existentials come in handy here somehow?

Will there be any limitations on recursion? For example, I'd assume we still wouldn't have recursive data types.

athas · 2022-10-28T15:05:16Z

This is not a source language extension, so there will be no (source) irregular arrays. in the core language, they are represented as flag arrays with flag vectors.

There will be no restrictions on recursive functions, but still no recursive data types (these would require a more substantial change to the internal value representation).

Compiles but does not work.

Fixing parts I think was wrong before.

@Erk-

* Start function flattening * `cmp-bench-json.py` rewritten in Haskell (Issue #748) (#1860) * Note in CHANGELOG. * Use new tool. * Remove cmp-bench-json.py. * Fix #1863. (#1864) * This is 0.23.1. * Onwards! * Fix typo. * Remove copyCopyToCopy rule. (#1866) This is a very old (5+ years) rule that is much too naive in its handling of memory. We have better optimisations now, that aren't buggy. * Remove SrcLoc from ImportName. Syntactic information does not belong in semantic objects. * Use ImportName consistently. (#1869) Previously some parts of the compiler would use FilePaths directly, and it is ambiguous whether those refer to canonical import names. Now it should be clearer. * futhark-benchmarks: bump * Workaround for tiny /tmp on these servers. * futhark-benchmarks: bump * futhark-benchmarks: bump * futhark-benchmarks: bump * Workaround for temporary ghcup breakage. * Switch to GHC 9.4 in Cabal CI. (#1871) If this does not fix Windows, then I will remove it (again). * Plain values should never be Unique. * No need for this. * Also no setUniqueness here. * futhark-benchmarks: bump * Fix #1874. * Avoid spurious space. * Make consumption an effect on functions, rather than types. (#1873) This is a breaking change, because until now we allowed functions like def f (a: *[]i32, b: []i32) = ... where we could then pass in a tuple where in an application `f (x,y)` the value `x` would be consumed, but not `y`. However, this became increasingly difficult to support as the language grew (and frankly, it was always buggy). With this commit, the syntax above is still permitted, but it is interpreted as def f ((a,b): *([]i32, []i32)) = ... i.e. the single tuple argument is consumed *as a whole*. Long term we can also consider amending the syntax or warning about cases where it is misleading, but that is less urgent. I've wanted to make this simplification for a long time, but I always hit various snags. Today I managed to make it work, and the next step will be cleaning up the notion of "uniqueness" in return types as well (it should be the more general notion of "aliases"). * Forgot a test for #1874. * Avoid warnings about "potentially uninitialized" variables. C compilers are (understandably) not smart enough to see that these are never actually used uninitialised. * Make source language Apply AST node multi-argument. (#1875) This is a deviation from the concrete syntax, but humans tend to think of function calls having multiple arguments. Also, the AST had to keep a lot of useless metadata around to express the results of the intermediate applications. And again, it is related to making #1872 more feasible. * Better constant folding for CmpOp PrimExps. This mostly has the effect of making generated code a little neater. * futhark-benchmarks: bump * Add some comments. * More explicit. * Fix #1878. * Forbid access to interpreter. * Ensure no apply-of-apply. The symptom of this being wrong is that defunctionalisation would create duplicate functions. No more! * Handle array results. * Flattening of Copy. * Use Hendrix for CI. (#1862) * First experiment at using Hendrix for CI. * Maybe like this. * Import everything locally. * Try this. * More systems. * Also OpenCL. * Also depend on these. * More readable when split. * Import new CI actions. * Testing with slurm. * Forgot to specify hendrix and the partition flag might also be needed. * The wrong composite actions was included * Trying cuda and opencl on hendrix * Trying to use the composite test action for benchmarks. * Wrong amount of indentation * Forgot to add a |. * Some small changes that will most likely not change things. * trying to use sbatch * switching to titanrtx and used the p flag wrong. * Trailing whitespace purge. * Skip these on TITAN X. * Any GPU will work for these. * Trying to run benchmarks without slurmbench.py * Syntax errors * Accidentally used old keyword test. * found another syntax error i think * I think the equality sign broke it * maybe this will work * Used gres wrongly. * Do not use old futhark-benchmarks. * Trying to use srun and cleaned up composite actions. * Add some comments. * More explicit. * Fix #1878. * Forbid access to interpreter. * Ensure no apply-of-apply. The symptom of this being wrong is that defunctionalisation would create duplicate functions. No more! * Revert "Trying to use srun and cleaned up composite actions." This reverts commit 6c4111f. * using srun and fixing commit history hopefully? * Adding an 8 hour time limit. * Missing -. * Newer version og futhark-benchmarks * Trying to use `${{ always() }}`. * Revert "Newer version og futhark-benchmarks" because of `${{ always() }}` This reverts commit 965e788. * Hopefully this is the correct version of the futhark-benchmarks * Remove always() --------- Co-authored-by: due <williamhenrichdue@gmail.com> * Do not use hendrix except where needed. * Cleanup whitespace. * Matplotlib is handy. * Add job names. * Avoid unnecessary deallocation. * These seem broken. * Style fixes. * Bump GHC. * Not needed anymore. * Seems to fix the nontermination. * Support rev AD of scanomaps and scatters with non-identity lambdas. (#1880) * Fix #1883. * Loop over all dimensions here. * Precompute more chunk counts. This is mostly to track the change in the parallelisation of Replicate in the preceding commit. * Allow arbitrary expressions in size expressions. We still only permit elaboration of expressions that correspond to variables or integer constants. This is a step on the path to realising #1659. * Always forget about the unit tests. * Avoid extra braces when printing. * Oops; fix copy/paste error. * These brackets are necessary. * Fix typo. * A few other wording fixes. * A few more text improvements. * Fix error in manifest schema discovered by @Erk-. * Newer action. * Fix invalid link Thanks to @lkuty for noticing. * Use explicit entry. * Fix #1885. * Better style. * Plotting tool. (#1877) Closes #1861. * Make executable. * Remove trailing whitespace. * Final status message. * Use GitHub machines for Python tests. * Generate tuning param definitions in GenericC. (#1890) This is a step towards #1884. Now that GenericC is responsible for all the work (and has all the information), it can generate new API functions. * Record which tuning params are relevant to which entry points. (#1891) This involves extending the manifest and server protocol, and modifying 'futhark autotune' to use this new information. The main advantage (apart from general cleanup) is that we can now tune threshold parameters used in non-inlined functions. * This is 0.24.1. * Onwards! * Fix #1895. * Do not use interpreter. * Incomplete work on nested maps. * More work on nested maps. * Fix #1896. * This goes in tests. * Use Hendrix for A100 jobs. (#1898) * Fail early. * All these SegOps should be virtualised. * Start function flattening * Incomplete work on function lifting * Very rudimentary lifted function results Currently only handles lifting of functions whose return types are scalar typed variables i.e. no constants or arrays. * Work on lifted function results * Further work on lifted function results * Change way return types are lifted * Correctly return constants from lifted functions * Existential size return for lifted functions Merge building of body statements and results for lifted functions. Will probably need to filter out existential size quantifiers before lifting results. * Filter existential sizes from lifted functions Remove existential quantifiers from the return type and result of a function before lifting as I believe their lifted version aren't needed. * Revert "Filter existential sizes from lifted functions" This reverts commit d04ecc5. It might be useful later but for now it complicates things. * Application of lifted functions * Do not lift entry points. * Work in progress match-expression flattening * Fix bug in lifting function parameters Lifting irregular parameters was (wrongly) in the order `[offsets, flags, segments, elements]`. When calling, the arguments were (rightly) given in the order `[segments, flags, offsets, elements]`. * Fix bug in lifting of if-then-else Wrote too many elements in the final scatters. * Make lifted if-then-else a little nicer * Handle irregular inputs to if-expressions * Handle irregular results of if-expressions * Handle general irregular match-expressions * Irregular match-expr: handle empty arrays * Better error messages * Handle free variables in `liftArg` `inputReps` now also gives type information, which is used by `liftArg` to determine if free variables are regular or irregular. * Flatten builtins scans over multi-dim arrays Let scan functions (genScanomap, genScan, genExScan, ...) in the flatten builtins module operate on multi-dimensional arrays. Of note is that `exScanAndSum`, when given a single-dimensional array, will return the # of segments and sum of segment sizes as scalar values and when given a multi-dimensional array will return them as arrays. Also move `segMap` from Flatten.hs to Flatten.Builtins.hs * Make sure flag and elems array have same size When passing flag and elems array to a function, or returning them from a function, resize them to please the type checker. * Replicate free vars in result of lifted functions * Handle free variables in match-expressions Move the common "if a subexp is a constant or free variable, replicate it, and otherwise do a lookup in dist inputs and dist env" code to a function `liftSubExp`. This is used in `liftArg`, `liftResult` and lifting match-expressions. * Add tests for lifting functions * Add tests for flattening match-expressions --------- Co-authored-by: Troels Henriksen <athas@sigkill.dk>

This reverts commit 647e4fe.

wip

2576b2e

athas marked this pull request as draft October 16, 2022 20:02

Merge branch 'master' into flattening

ec42c3a

athas and others added 17 commits November 3, 2022 18:52

More half-baked work.

f871257

Merge branch 'master' into flattening

23269ab

Fix offset calculation.

8c87f3e

Fix some things.

eb10d91

Merge branch 'master' into flattening

7fe28ad

Use full flattening in GPU pipelines.

d58ba99

Irregular slice now works.

138ceca

This always works.

8e946a2

Reshape and further Index fixes.

fb21c05

Handle the most general case of Iota.

35669f9

Remove some warnings.

f2a79e4

More foldable.

1682d88

Hacky initial support for flattening nested maps.

ef35b71

Flatten redomaps.

d2d7c7f

Merge branch 'master' into flattening

e9f778e

Merge branch 'master' into flattening

2496205

starting out

4945cd1

athas mentioned this pull request Jan 10, 2023

Internal compiler error: Type error after pass 'expand allocations' #1837

Closed

cornelius-sevald and others added 4 commits January 11, 2023 15:03

[WIP] Update flattening case

1e801d9

Compiles but does not work.

[WIP] Further work on Update flattening

3fc3e0e

Fixing parts I think was wrong before.

test case

ff6ceb3

update1 test update

c05cc91

athas and others added 30 commits May 17, 2023 11:59

Merge branch 'master' into flattening

7f72edd

Merge branch 'master' into flattening

70b904d

Handle free irregular arrays in nested map.

024e6c7

Another test.

29dacc5

Handle free irregular arrays in nested maps.

2b5809b

Merge branch 'master' into flattening

e0dc387

Handle distribution of free and identity results.

9c8351b

Merge branch 'master' into flattening

c5d7c03

Merge branch 'master' into flattening

2e37824

Merge branch 'master' into flattening

03cb09e

Merge branch 'master' into flattening

cb277e0

Merge branch 'master' into flattening

76b90a8

Merge branch 'master' into flattening

d8ef766

Flattening rearranges.

acb7caf

Add failing test.

d14cf71

Merge branch 'master' into flattening

daaf815

Fix type annotation.

4bdf0e3

Better test data.

aae4bd1

Fix typo.

068a16e

This seems wrong.

647e4fe

New formatting.

8aff0d5

Document nomenclature.

9738c47

Use nomenclature.

db81070

Style fixes.

eb443bb

More nomenclature.

8c224e4

Revert "This seems wrong."

9871516

This reverts commit 647e4fe.

Maybe like this.

e7ea05c

Merge branch 'master' into flattening

533ada3

Merge branch 'master' into flattening

c8ba2a7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implementation of irregular flattening #1740

Implementation of irregular flattening #1740

athas commented Oct 16, 2022

Munksgaard commented Oct 17, 2022

athas commented Oct 17, 2022

melsman commented Oct 17, 2022 via email

FluxusMagna commented Oct 28, 2022 •

edited

athas commented Oct 28, 2022

Implementation of irregular flattening #1740

Are you sure you want to change the base?

Implementation of irregular flattening #1740

Conversation

athas commented Oct 16, 2022

Munksgaard commented Oct 17, 2022

athas commented Oct 17, 2022

melsman commented Oct 17, 2022 via email

FluxusMagna commented Oct 28, 2022 • edited

athas commented Oct 28, 2022

FluxusMagna commented Oct 28, 2022 •

edited