Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New command: purs codegen #4092

Open
wants to merge 19 commits into
base: master
Choose a base branch
from
Open

Conversation

colinwahl
Copy link
Contributor

Description of the change

Implements a new command: purs codegen

purs codegen takes globs to filepaths containing the JSON representation of a CoreFn Module (this can be generated by purs compile). It parses the core functional representation out of these files, and passes them in to the standard codegen function.

This command allows for CoreFn transformations to be written outside of the compiler (even in PureScript!) without having to worry about using PureScript as a library.

Example usage of this would be:

$ purs compile glob/to/files.purs -g corefn,js
$ <execute pass over generated corefn.json files>
$ purs codegen glob/to/all/corefn.json

This intends to close #3339


Checklist:

  • Added the change to the changelog's "Unreleased" section with a reference to this PR (e.g. "- Made a change (#0000)")
  • Added myself to CONTRIBUTORS.md (if this is my first contribution)
  • Linked any existing issues or proposals that this pull request should close
  • Updated or added relevant documentation
  • Added a test for the contribution (if applicable)

<> Opts.showDefault
<> Opts.help "The output directory"

globWarningOnMisses :: (String -> IO ()) -> [FilePath] -> IO [FilePath]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Command.Graph, Command.Compile, and Command.Codegen all contain the same definition here, but I wasn't sure the most appropriate place to pull it out to for sharing.

If there is an appropriate place to move this then I will update all 3 of those modules.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like a good idea. Create a Command.Common module?

@@ -97,7 +97,7 @@ data MakeActions m = MakeActions
, readExterns :: ModuleName -> m (FilePath, Maybe ExternsFile)
-- ^ Read the externs file for a module as a string and also return the actual
-- path for the file.
, codegen :: CF.Module CF.Ann -> Docs.Module -> ExternsFile -> SupplyT m ()
, codegen :: CF.Module CF.Ann -> Docs.Module -> Maybe ExternsFile -> SupplyT m ()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When performing codegen via purs codegen - we can create a stub ExternsFile via the CoreFn.Module - but it isn't actually the ExternsFile we want to write. I modified this so that we don't write a bogus ExternsFile during purs codegen

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it would be better to factor codegen the function into a function for each output (including the externs file), and only use the JS-outputting function for purs codegen.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea - I can go ahead and do that now

CHANGELOG.md Outdated Show resolved Hide resolved
@@ -97,7 +97,7 @@ data MakeActions m = MakeActions
, readExterns :: ModuleName -> m (FilePath, Maybe ExternsFile)
-- ^ Read the externs file for a module as a string and also return the actual
-- path for the file.
, codegen :: CF.Module CF.Ann -> Docs.Module -> ExternsFile -> SupplyT m ()
, codegen :: CF.Module CF.Ann -> Docs.Module -> Maybe ExternsFile -> SupplyT m ()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it would be better to factor codegen the function into a function for each output (including the externs file), and only use the JS-outputting function for purs codegen.

<> Opts.showDefault
<> Opts.help "The output directory"

globWarningOnMisses :: (String -> IO ()) -> [FilePath] -> IO [FilePath]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like a good idea. Create a Command.Common module?


foreigns <- P.inferForeignModules filePathMap
(makeResult, makeWarnings) <-
liftIO
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is liftIO needed here?

return paths

concatMapM :: (a -> IO [b]) -> [a] -> IO [b]
concatMapM f = fmap concat . mapM f
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While you're messing around with this, you can import concatMapM from Protolude. Don't know why we aren't already doing that.

concatMapM f = fmap concat . mapM f

-- | Arguments: use JSON, warnings, errors
printWarningsAndErrors :: Bool -> P.MultipleErrors -> Either P.MultipleErrors a -> IO ()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just Command.Compile.printWarningsAndErrors True, right? Looks like another candidate for Command.Common.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, it is, great call

@TheMatten TheMatten mentioned this pull request Jun 1, 2021
5 tasks
@colinwahl
Copy link
Contributor Author

Sorry, things got busy around here, I'm going to pick this back up soon to address the feedback.

@colinwahl
Copy link
Contributor Author

@rhendric I've addressed your initial feedback.

While doing the codegen refactoring, I noticed that purs codegen doesn't allow opting-in to generating source maps - I should probably add an option to the command to allow the user to opt-in to that.

Copy link
Member

@rhendric rhendric left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good! I have one outstanding question about the runSupplyT 0 here, and I suggest rebasing on master to clean up some conflicts and HLint nits, but I think this is in great shape already.

foreigns <- P.inferForeignModules filePathMap
(makeResult, makeWarnings) <-
P.runMake purescriptOptions
$ runSupplyT 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the normal compilation path, the codegen supply monad is initialized with the next unused number from previous parts of the compilation. Using 0 here raises the question of whether this reuse is necessary. If so, using 0 here might cause problems. If not (I suspect not), we should probably be consistent so that the produced code isn't different when generated just by purs compile versus purs compile; purs codegen.

So assuming it's safe to do so, I think we should remove the SupplyT from the signatures in MakeActions and push that detail into their implementations. But now would be a really good time for someone else to share why that wouldn't be safe!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great point. I tried reading through the usages of the supply monad in the codegen code and it looks like it's just for generating fresh variable names - it doesn't seem to me that it'd require we start off from where we left off in say typechecking - but I don't have enough experience to say for sure.

At work we've been using zephyr for quite a while, which also starts from 0 for codegen, so I'd be really surprised if it causes errors.

If it is the case that it doesn't matter, then I'll remove the SupplyT requirement and start it from zero within codegenJS.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I spent some time going through the codegenJS implementation and I don't think that it is dangerous to always initial that supply with 0. The fact that zephyr was doing that for so long also makes me pretty confident based on my personal experience.

I went ahead and made the change you suggested, and all tests are passing.

If anyone knows more than I do and thinks that we should undo the change, I can do that too!

M.fromList $ map ((\m -> (CoreFn.moduleName m, Right $ CoreFn.modulePath m)) . snd) $ rights mods

unless (null (lefts mods)) $ do
_ <- traverse (hPutStr stderr . formatParseError) $ lefts mods
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hlint will yell at you for this when you rebase on master. Use traverse_.

Comment on lines 73 to 74
runCodegen foreigns filePathMap m =
P.codegenJS (makeActions foreigns filePathMap) False m
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hlint will yell at you for this too, but actually I think you should probably just inline this whole definition.

app/Command/Codegen.hs Outdated Show resolved Hide resolved
@colinwahl
Copy link
Contributor Author

I'll spend some time thinking about how we could add a meaningful test for this soon.

Other than that, there is the outstanding question of initializing the codegen supply with 0, which I've gone ahead and done. Then this should be ready for a final review!

M.fromList $ map ((\m -> (CoreFn.moduleName m, Right $ CoreFn.modulePath m)) . snd) $ rights mods

unless (null (lefts mods)) $ do
traverse_ (hPutStr stderr . formatParseError) $ lefts mods
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since left mods is used twice, perhaps this should be turned into a let above?

let errList = lefts mods

Also, since filePathMap isn't used until after the unless block, perhaps it should go below this block but above the foreigns <- P.inferForeignMoudles filePathMap line?

(makeResult, makeWarnings) <-
P.runMake purescriptOptions
$ traverse (P.codegenJS (makeActions foreigns filePathMap) codegenSourceMaps . snd)
$ rights mods
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's a second rights mods. Perhaps that should also be moved to a let binding so more things can reuse it?

@MaybeJustJames
Copy link

@colinwahl do you have bandwidth to finish this off? Can I help?

@MaybeJustJames
Copy link

I've rebased on current master here.

@colinwahl
Copy link
Contributor Author

@MaybeJustJames would you like to take this over from me? My bandwidth for compiler work is pretty low these days.

The big open question is that I'm still not sure if #4092 (comment) could lead to any problems.

@rhendric
Copy link
Member

As the question is over a year old, I'm inclined to say let's ship it and find out.

@MaybeJustJames
Copy link

Happy to take over. How would you like to do it?

@colinwahl
Copy link
Contributor Author

Happy to take over. How would you like to do it?

However you'd like to do it is fine with me - you could continue off this PR, or make a new branch and cherry-pick my changes, or just close it and start again from scratch. Let me know what you decide and if I should close this PR!

@MaybeJustJames
Copy link

However you'd like to do it is fine with me - you could continue off this PR

IMHO it would be a shame to lose the context here. Could I get write permission to your branch so I can pick up from here?

@f-f
Copy link
Member

f-f commented Sep 28, 2022

I was wondering - is this work necessary at all now that we have the backend optimizer?

@colinwahl
Copy link
Contributor Author

I was wondering - is this work necessary at all now that we have the backend optimizer?

Supporting an optimizer was certainly my main goal for this - now that we've got purescript-backend-optimizer, I don't think I'd use the command (at least, I don't have anything in mind ATM). However, maybe someone's got other ideas :)

@rhendric
Copy link
Member

It does strike me as a loss if the best general-purpose JavaScript backend for PureScript remains in a third-party project in the long term. I'm not sure how exactly this happened—I suspect the friction to contributing to PureScript is just too high for this level of innovation—but with enough time I would hope it can be mostly unforked. At that point, exposing the backend used by purs becomes a feature of interest again, unless the unforking includes some other mechanism for making the CoreFn-handling pipeline extensible.

@f-f
Copy link
Member

f-f commented Sep 28, 2022

@rhendric I agree with you - my point here is that the new backend has shuffled the landscape quite a bit: it shows not only that it's possible to aggressively optimise the CoreFn, but also that it's possible to emit more performant JS outside of the compiler, and all of this while the implementation is in PureScript.
Given this new perspective, I am suggesting that we should reconsider the premises for this command to exist at all - with the baseline being that everything that is exposed by the compiler is a public API that we can't deprecate easily, e.g. see how long it took to remove bundle - and if the new project offers a better way to achieve the goal of this work (that is: generate better JS).
I am sure that this work will be unforked in the long term, hopefully while lowering the barrier for contribution, for example by showing that we can implement chunks of the compiler (or even all of it) in PureScript itself.

@rhendric
Copy link
Member

Okay yeah, I agree with looking at bundle as an example. We got rid of bundle when the ecosystem around ES modules matured enough and we did enough work on our codegen that we could recommend another no-regrets tool to replace it; waiting for those things to happen was what made deprecating bundle take so long, as far as I know.

Is purs-backend-es already that no-regrets tool for codegen? It's very impressive but also very young and possibly more aggressive than some of our users want. If it becomes that tool in the future, I don't see a significant barrier to ripping codegen back out, along with all the codegen internals. Just like with bundle, we'll paper over the switch in spago and basic users won't need to be aware of it.

In the meantime, as long as there's some value in having a built-in JS backend (regardless of the language in which the backend is written or the repo in which it lives), I think there's still a case for exposing it, so users can benefit from custom optimizations and rewrites without needing to also use a third-party backend.

@natefaubion
Copy link
Contributor

natefaubion commented Sep 29, 2022

Is purs-backend-es already that no-regrets tool for codegen?

purs-backend-es does not subsume compiler functionality.

  • It is not incremental, so right now it's largely targeted at production builds. If we want to separate out the backend from the core compiler, then the compiler must be able to inform backends of incremental status, otherwise all backends have to duplicate the work the compiler has already done to sort out what needs to be built, which is a complete waste of non-trivial work and resources. I would like to make it incremental in the interim by just depending on cache-db.json, whether or not the compiler considers it a stable target because it's the only realistic way to get that information.
  • It does not emit source maps. I personally have no intention of ever implementing this without near unanimous support from the community that it's something that people use since I consider the power-to-weight-ratio to be extremely poor.

So, I do not see any near term future where the current JS backend is rendered obsolete, though I would like a near term future where something like purs-backend-es can be used in a first-class way. That being said, if there are currently no pending users of this feature, I'm not sure what the point is. I think the fresh name issue seems like it can clearly cause a problem, however unlikely, and I'm not sure how you'd fix it. I don't know how I feel about merging a feature with a uncertain prospects and potentially buggy behavior.

I'm happy to talk about purs-backend-es background/motivation in general, but I don't think this is the place. If you have any thoughts or questions, I'd love to hear from you on discourse!

@MaybeJustJames
Copy link

I think there's still a case for exposing it, so users can benefit from custom optimizations and rewrites without needing to also use a third-party backend.

I agree. I wanted to get this through for zephyr specifically. There is potential for other optimization tools to make use of this interface even if an improved backend is eventually merged.

@JordanMartinez
Copy link
Contributor

I think this PR can be closed, right?

@MaybeJustJames
Copy link

I would still vote to merge for the zephyr | ${other_optimizer} use case.

@MaybeJustJames
Copy link

Is the vision for purescript to have multiple codegen backends? If non-javascript backends are always going to be separate projects then maybe to makes sense for JavaScript codegen to be separate too? In which case this PR should be closed. If the vision for the compiler is to include multiple backends then I think a codegen command will remain useful

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Consider purs codegen command
7 participants