Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementation of irregular flattening #1740

Draft
wants to merge 96 commits into
base: master
Choose a base branch
from
Draft

Implementation of irregular flattening #1740

wants to merge 96 commits into from

Conversation

athas
Copy link
Member

@athas athas commented Oct 16, 2022

This highly WIP PR contains an implementation of full flattening as a transformation that goes from the SOACS representation to GPU. It is the result of a few hours of hacking and can handle the following program:

def main = map (\n -> #[unsafe] i64.sum (iota n))

Flattened versions of all core Futhark constructs must be defined, and so far I have only done Iota and Reduce. The main design challenge was the representation of irregular arrays in the compiler, as well as the overall structure of the algorithm. It is currently based on (irregular) distribution, much like the moderate flattening algorithm. I think that is the best way to do it.

The ultimate goals are:

  1. Replace the current implementation of flattening with one that is more principled (the old one is really dirty in places because I had no idea what I was doing).
  2. Maintain incremental flattening, just in a more principled framework. The "fully flattened" code version will only be used as a last resort.
  3. Experiment with whether (possibly virtualised) full flattening may perhaps have acceptable performance in the intra-group case.
  4. Support compiling any Futhark program to GPU code; not just the subset that uses only regular nested parallelism.
  5. Bring back recursive functions.

An explicit non-goal is adding support for irregular arrays in the source language.

@athas athas marked this pull request as draft October 16, 2022 20:02
@Munksgaard
Copy link
Collaborator

Wow, this is exciting stuff! What are the prospects for success?

@athas
Copy link
Member Author

athas commented Oct 17, 2022

There is no real risk of this not succeeding. Flattening a first-order monomorphic language is not particularly difficult, although implementing all the cases will be tedious. The main challenge is writing the code in a clean and maintainable way. Time is the main constraint.

That's only for naive flattening, though, which is notoriously inefficient in practice. It's a more open question how efficient we can make it subsequently. I'm fairly confident we can do a good job, though.

The only somewhat bothersome wrinkle is that we allow arbitrary (potentially irregular) parallelism in reduce/scan/histogram operators. Flattening doesn't have a solution for that. However, nontrivial parallelism in those operators is exceedingly rare, and in all cases these constructs can be turned into maps without any asymptotic overhead, so we can always do that for the really nasty cases. (Although so can the original programmer who writes the source code.)

@melsman
Copy link
Contributor

melsman commented Oct 17, 2022 via email

@FluxusMagna
Copy link

FluxusMagna commented Oct 28, 2022

What does the type of a irregular nested array look like? [n][?]a or similar? I guess existentials come in handy here somehow?

Will there be any limitations on recursion? For example, I'd assume we still wouldn't have recursive data types.

@athas
Copy link
Member Author

athas commented Oct 28, 2022

This is not a source language extension, so there will be no (source) irregular arrays. in the core language, they are represented as flag arrays with flag vectors.

There will be no restrictions on recursive functions, but still no recursive data types (these would require a more substantial change to the internal value representation).

athas and others added 30 commits May 17, 2023 11:59
* Start function flattening

* `cmp-bench-json.py` rewritten in Haskell (Issue #748) (#1860)

* Note in CHANGELOG.

* Use new tool.

* Remove cmp-bench-json.py.

* Fix #1863. (#1864)

* This is 0.23.1.

* Onwards!

* Fix typo.

* Remove copyCopyToCopy rule. (#1866)

This is a very old (5+ years) rule that is much too naive in its
handling of memory.  We have better optimisations now, that aren't
buggy.

* Remove SrcLoc from ImportName.

Syntactic information does not belong in semantic objects.

* Use ImportName consistently. (#1869)

Previously some parts of the compiler would use FilePaths directly,
and it is ambiguous whether those refer to canonical import names.
Now it should be clearer.

* futhark-benchmarks: bump

* Workaround for tiny /tmp on these servers.

* futhark-benchmarks: bump

* futhark-benchmarks: bump

* futhark-benchmarks: bump

* Workaround for temporary ghcup breakage.

* Switch to GHC 9.4 in Cabal CI. (#1871)

If this does not fix Windows, then I will remove it (again).

* Plain values should never be Unique.

* No need for this.

* Also no setUniqueness here.

* futhark-benchmarks: bump

* Fix #1874.

* Avoid spurious space.

* Make consumption an effect on functions, rather than types. (#1873)

This is a breaking change, because until now we allowed functions like

    def f (a: *[]i32, b: []i32) = ...

where we could then pass in a tuple where in an application `f (x,y)`
the value `x` would be consumed, but not `y`.  However, this became
increasingly difficult to support as the language grew (and frankly,
it was always buggy).  With this commit, the syntax above is still
permitted, but it is interpreted as

    def f ((a,b): *([]i32, []i32)) = ...

i.e. the single tuple argument is consumed *as a whole*.  Long term we
can also consider amending the syntax or warning about cases where it
is misleading, but that is less urgent.

I've wanted to make this simplification for a long time, but I always
hit various snags.  Today I managed to make it work, and the next step
will be cleaning up the notion of "uniqueness" in return types as well
(it should be the more general notion of "aliases").

* Forgot a test for #1874.

* Avoid warnings about "potentially uninitialized" variables.

C compilers are (understandably) not smart enough to see that these
are never actually used uninitialised.

* Make source language Apply AST node multi-argument. (#1875)

This is a deviation from the concrete syntax, but humans tend to think
of function calls having multiple arguments.  Also, the AST had to
keep a lot of useless metadata around to express the results of the
intermediate applications.

And again, it is related to making #1872 more feasible.

* Better constant folding for CmpOp PrimExps.

This mostly has the effect of making generated code a little neater.

* futhark-benchmarks: bump

* Add some comments.

* More explicit.

* Fix #1878.

* Forbid access to interpreter.

* Ensure no apply-of-apply.

The symptom of this being wrong is that defunctionalisation would
create duplicate functions.  No more!

* Handle array results.

* Flattening of Copy.

* Use Hendrix for CI. (#1862)

* First experiment at using Hendrix for CI.

* Maybe like this.

* Import everything locally.

* Try this.

* More systems.

* Also OpenCL.

* Also depend on these.

* More readable when split.

* Import new CI actions.

* Testing with slurm.

* Forgot to specify hendrix and the partition flag might also be needed.

* The wrong composite actions was included

* Trying cuda and opencl on hendrix

* Trying to use the composite test action for benchmarks.

* Wrong amount of indentation

* Forgot to add a |.

* Some small changes that will most likely not change things.

* trying to use sbatch

* switching to titanrtx and used the p flag wrong.

* Trailing whitespace purge.

* Skip these on TITAN X.

* Any GPU will work for these.

* Trying to run benchmarks without slurmbench.py

* Syntax errors

* Accidentally used old keyword test.

* found another syntax error i think

* I think the equality sign broke it

* maybe this will work

* Used gres wrongly.

* Do not use old futhark-benchmarks.

* Trying to use srun and cleaned up composite actions.

* Add some comments.

* More explicit.

* Fix #1878.

* Forbid access to interpreter.

* Ensure no apply-of-apply.

The symptom of this being wrong is that defunctionalisation would
create duplicate functions.  No more!

* Revert "Trying to use srun and cleaned up composite actions."

This reverts commit 6c4111f.

* using srun and fixing commit history hopefully?

* Adding an 8 hour time limit.

* Missing -.

* Newer version og futhark-benchmarks

* Trying to use `${{ always() }}`.

* Revert "Newer version og futhark-benchmarks" because of `${{ always() }}`

This reverts commit 965e788.

* Hopefully this is the correct version of the futhark-benchmarks

* Remove always()

---------

Co-authored-by: due <williamhenrichdue@gmail.com>

* Do not use hendrix except where needed.

* Cleanup whitespace.

* Matplotlib is handy.

* Add job names.

* Avoid unnecessary deallocation.

* These seem broken.

* Style fixes.

* Bump GHC.

* Not needed anymore.

* Seems to fix the nontermination.

* Support rev AD of scanomaps and scatters with non-identity lambdas. (#1880)

* Fix #1883.

* Loop over all dimensions here.

* Precompute more chunk counts.

This is mostly to track the change in the parallelisation of Replicate
in the preceding commit.

* Allow arbitrary expressions in size expressions.

We still only permit elaboration of expressions that correspond to
variables or integer constants.  This is a step on the path to
realising #1659.

* Always forget about the unit tests.

* Avoid extra braces when printing.

* Oops; fix copy/paste error.

* These brackets are necessary.

* Fix typo.

* A few other wording fixes.

* A few more text improvements.

* Fix error in manifest schema discovered by @Erk-.

* Newer action.

* Fix invalid link

Thanks to @lkuty for noticing.

* Use explicit entry.

* Fix #1885.

* Better style.

* Plotting tool. (#1877)

Closes #1861.

* Make executable.

* Remove trailing whitespace.

* Final status message.

* Use GitHub machines for Python tests.

* Generate tuning param definitions in GenericC. (#1890)

This is a step towards #1884.  Now that GenericC is responsible for
all the work (and has all the information), it can generate new API
functions.

* Record which tuning params are relevant to which entry points. (#1891)

This involves extending the manifest and server protocol, and
modifying 'futhark autotune' to use this new information.

The main advantage (apart from general cleanup) is that we can now
tune threshold parameters used in non-inlined functions.

* This is 0.24.1.

* Onwards!

* Fix #1895.

* Do not use interpreter.

* Incomplete work on nested maps.

* More work on nested maps.

* Fix #1896.

* This goes in tests.

* Use Hendrix for A100 jobs. (#1898)

* Fail early.

* All these SegOps should be virtualised.

* Start function flattening

* Incomplete work on function lifting

* Very rudimentary lifted function results

Currently only handles lifting of functions whose return types are
scalar typed variables i.e. no constants or arrays.

* Work on lifted function results

* Further work on lifted function results

* Change way return types are lifted

* Correctly return constants from lifted functions

* Existential size return for lifted functions

Merge building of body statements and results for lifted functions.
Will probably need to filter out existential size quantifiers before
lifting results.

* Filter existential sizes from lifted functions

Remove existential quantifiers from the return type and result of a
function before lifting as I believe their lifted version aren't needed.

* Revert "Filter existential sizes from lifted functions"

This reverts commit d04ecc5.

It might be useful later but for now it complicates things.

* Application of lifted functions

* Do not lift entry points.

* Work in progress match-expression flattening

* Fix bug in lifting function parameters

Lifting irregular parameters was (wrongly) in the order
`[offsets, flags, segments, elements]`.
When calling, the arguments were (rightly) given in the order
`[segments, flags, offsets, elements]`.

* Fix bug in lifting of if-then-else

Wrote too many elements in the final scatters.

* Make lifted if-then-else a little nicer

* Handle irregular inputs to if-expressions

* Handle irregular results of if-expressions

* Handle general irregular match-expressions

* Irregular match-expr: handle empty arrays

* Better error messages

* Handle free variables in `liftArg`

`inputReps` now also gives type information, which is used by `liftArg`
to determine if free variables are regular or irregular.

* Flatten builtins scans over multi-dim arrays

Let scan functions (genScanomap, genScan, genExScan, ...) in the flatten
builtins module operate on multi-dimensional arrays.

Of note is that `exScanAndSum`, when given a single-dimensional array,
will return the # of segments and sum of segment sizes as scalar values
and when given a multi-dimensional array will return them as arrays.

Also move `segMap` from Flatten.hs to Flatten.Builtins.hs

* Make sure flag and elems array have same size

When passing flag and elems array to a function, or returning them from
a function, resize them to please the type checker.

* Replicate free vars in result of lifted functions

* Handle free variables in match-expressions

Move the common "if a subexp is a constant or free variable, replicate
it, and otherwise do a lookup in dist inputs and dist env" code to a
function `liftSubExp`. This is used in `liftArg`, `liftResult` and
lifting match-expressions.

* Add tests for lifting functions

* Add tests for flattening match-expressions

---------

Co-authored-by: Troels Henriksen <athas@sigkill.dk>
This reverts commit 647e4fe.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants