Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow most leading and trailing delimiters #4345

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

rhendric
Copy link
Member

Description of the change

Inspired by this Discourse post, I made a small patch to the CST to support optional leading and trailing delimiters in almost every location in the grammar where I didn't think it would cause problems.

The one place I was less than maximally liberal was in multi-headed case expressions. When I tried to make extra delimiters optional in patterns, something prevented it from working in my tests—probably related to the fancy ad-hoc monadic parsing that is used in that part of the grammar.

Example of what I'm talking about:

case
  , expr1
  , expr2
of
  , BigPattern1
  , BigPattern2
    -> ...
  , BigPattern3 -- this is where the issue was
  , BigPattern4
    -> ...

Rather than try to fix it, I chose not to enable extra delimiters in that part of the grammar; and then, for consistency, also removed support for extra delimiters from the head of the case as well (even though those worked fine). While this shortcoming is regrettable, I think it'd be quite low down on the list of places in the grammar where users would want to use extra delimiters.

Also note that using leading and trailing delimiters together, a la [,1,], is not supported anywhere. It would have been trivial to do so but I have standards, sir, standards!

Despite submitting this PR and the voiced support for at least some of the syntax changes included here, I have two misgivings. The first is that I don't know if any of these changes break any assumptions made during the layout phase of lexing, which does seem to be sensitive to where commas appear. I haven't noticed any issues but I don't have a thorough understanding of that algorithm. The second is that, by cravenly refraining from picking a side between the delimiters-first versus delimiters-last styles of code formatting, I fear I may stoke the flames of utterly unproductive format wars. There is definitely an argument for having one clear way to do simple things in a language and I don't know how strongly that argument should apply here.

Regardless of the above reservations, this PR is ready for review and discussion, though I don't consider it all that urgent.


Checklist:

  • Added a file to CHANGELOG.d for this PR (see CHANGELOG.d/README.md)
  • Added myself to CONTRIBUTORS.md (if this is my first contribution)
  • Linked any existing issues or proposals that this pull request should close
  • Updated or added relevant documentation
  • Added a test for the contribution (if applicable)

@toastal
Copy link

toastal commented May 30, 2022

I fear I may stoke the flames of utterly unproductive format wars

This is why I initially brought up specifically ADTs to narrow the scope 😅

@JordanMartinez
Copy link
Contributor

This PR can be broken down into smaller changes and their corresponding discussions. I'll provide an example of the change and name it:

-- Optional leading pipe character for first data constructor
data Foo =
 | Bar
 | Baz

I think I'm good with making this change because

  1. it allows one to swap the order of data constructors (sometimes useful for a compiler-derived Ord instance) using the ALT+↓ or ALT+↑ keyboard shortcut.
  2. it allows one to comment/uncomment data constructors while prototyping.
  3. developers hit annoyances with current syntax limitations more frequently than other syntax (e.g. module import, export, etc.)
  4. it's a small change to the language, so it doesn't feel that different and limits the fallout of a potential formatting war.
-- Optional trailing pipe character for last data constructor
data Foo =
  Bar |
  Baz |

This change feels a bit weird to me because the pipe character is no longer in front of the data constructor. So, this change feels like a "bigger" one than the one above. Granted, I know that Foo = Bar | Baz is valid and so adding another | to the end (e.g. Foo = Bar | Baz |) isn't that different. However, putting these on multiple lines is where things started to feel weird.
I think I'd prefer the first change and not this change if any change is made at all because I don't see what value this change would provide that isn't already supported by the first change.

The only argument I can think of against this is that one could copy-paste the above constructors into a file and then replace | with -> to immediately get all data constructors for a case _ of expression. Doing the same with the first version requires deleting | and then appending ->, which is a bit more work. However, I don't think this "problem" should be solved by supporting this syntax in the language. Rather, the IDE should support such functionality (if it doesn't already). To my knowledge, it doesn't currently support this.

As for module exports, below are a few examples of how this could be formatted:

-- Current syntax. Uses 4 lines
module Module1
  ( (<>)
  , b
  ) where

-- Optional leading comma character for first export
-- uses 5 lines
module Module2
  (
  , (<>)
  , b
  ) where
  
-- Same as Module2 but formatted differently to use 4 lines
module Module3 (
  , (<>)
  , b
  ) where

-- Optional trailing comma character for last export
-- uses 4 lines
module Module4 (
  (<>),
  b,
  ) where

Let me briefly comment on these examples. The order of exports determines their order in the resulting documentation. Module1 can't re-order the (<>) export using the ALT+↓ keyboard shortcut. Similarly, Module1 can't comment/uncomment a line to add/remove the export. Module2 can for both but, unlike the data constructor example above, it adds an unnecessary line. Module3 fixes Module2's issues, but I don't have a good argument for why Module4 shouldn't be accepted since it also enables both.

That being said, I don't know if I care about the module export syntax. The only time someone touches export lists manually is when they want to prevent some things from being exported. Since this happens infrequently, I think that's an argument against making this change: this small and infrequent annoyance due to a syntax limitation is less of an issue than formatting wars.
If this change was made, I'd be ok with Module3, but I think I'd prefer Module4 because it just feels cleaner than Module3. That being said, I wouldn't want Module4 unless purs-tidy also works with it.

(Side note... While @rhendric added a test for Maybe(Just, Nothing), I don't think this isn't needed. A change was made sometime ago in the language that requires either all or none of a type's data constructors to be imported/exported, but I don't think the parser was updated to account for that.)

As for module imports, I don't know if this issue matters that much either. Neither the order of the modules, nor their import lists matter, so long as they are imported. Moreover, the IDE can add imports via autocompletion, remove unused imports by applying the suggestions inferred after analyzing import-related compiler warnings, and sort them if one is already using a formatter like purs-tidy. So the ALT+↓ concern is less of an issue. Similarly, if one is already using purs-tidy, most imported identifiers appear on the same line, so the comment/uncomment workflow isn't as relevant. Lastly, since most of this is handled by the IDE, developers rarely encounter issues with this that require manual intervention.
All that to say I don't think there's a strong argument for changing the import syntax at all.

As for type class constraints in type class definitions and instances, things are a bit different than module import syntax. Constraint order does not matter, so no argument here due to the ALT+↓ concern. However, constraints must be added manually (i.e. no IDE autocompletion). Another difference is that sometimes the constraints appear on one line and other times appear on multiple lines. So, the commenting/uncommenting workflow can but does not always apply here:

-- single line
class (Foo a) => Bar a

-- multi-line
instance
  ( Foo a
  , Foo b
  ) => Foo (Tuple a b)

So, I think the comment/uncomment workflow argument I made for module export syntax can apply here, but the ordering one and the infrequency arguments don't apply. Given this tradeoff, I think avoiding a formatter war is more important than removing this syntactic limitation.

As for the leading/trailing comma for array and record syntax, I am sympathetic to the idea of supporting the trailing comma because one could copy-paste JS arrays/records into PureScript and more cases would compile despite the trailing comma. (This "problem" could also be resolved by making purs-tidy smart enough to still parse the code, drop the trailing comma, and format it. But then purs-tidy supports parsing PureScript with extensions, and that seems like a bad precedent to start.) In my experience, copy-pasting JS array/record values into PS code happens infrequently and generally requires updating parts of the JS value anyway, so I don't think this is a strong argument.

However, the current array/record syntax do not support the ALT+↓ keyboard shortcut or the comment/uncomment feature. And unlike the module syntax example, the lines of code between the leading and trailing versions aren't different and this syntax limitation would be something a developer experiences frequently:

-- leading
[
, a
, b
]

-- vs

-- trailing
[
  a,
  b,
]

If we did support a change, I would prefer supporting a leading delimiter for array/record syntax and not a trailing delimiter because it's a "smaller" change. However, I think such a change should only be made if it works in all contexts, including single or multi case _ of statements. Otherwise, we'd have to tell beginners, "You can use this array/record syntax in context A, but not B", making the language a bit harder to learn.

Phew! All that being said, I think this is where I stand:

  • I support leading | in data constructors.
  • I support leading , in array/record syntax if the below conditions are satisfied:
    • single/multi-head case _ of syntax works
    • purs-tidy works on it and Nate is in agreement with this change (out of respect for Nate since he wrote both the current parser and purs-tidy, both of which took a lot of time and effort)
  • I don't support changing the module import, module export, or type class constraint (definition or instance) syntax.

@rhendric
Copy link
Member Author

rhendric commented May 30, 2022

First, from a purely what-does-this-enable-for-users perspective, let me note that there is absolutely no difference between leading and trailing delimiters. In particular, all of your numbered arguments for | Bar apply without change to Bar |. (I disagree with your assessment that trailing delimiters is a ‘bigger’ change in this case than leading delimiters—if it seems that way, surely it's just because the formatting style you're most used to approximates leading delimiters in this case?) Both are equivalent implementations of what I'll call the ‘Uniform Line Syntax’ principle: the idea that if a sequence of syntax elements can be laid out one per line, any additions, deletions, or permutations of the sequence should be possible to achieve with corresponding additions, deletions, or permutations of lines, one for one, with no special consideration needed for first or last lines.

So the way I see this proposal is this:

  1. Is the ULS principle worth implementing anywhere in our grammar?
  2. If so, where?
  3. In those positions, should ULS be implemented with leading delimiters, trailing delimiters, or either? And to what degree should this question be informed by:
    • Trends in the formatting of current PureScript, or other programming languages that don't support ULS
    • The design of other programming languages that do support ULS
    • The answer to this question in other positions

I think 1 is a clear ‘yes’. I think the answer to 2 is most positive for sum type definitions and array/object literals and patterns, followed closely by object update expressions and row types; I also think a decent case can be made for import/export lists. I'm neutral on constructor lists within import lists, fundeps, compound constraints, and multi-head case expressions—I don't expect those to excite anyone, until the one user three years from now who'll say, ‘Hey, I'm designing a class with seventeen possible functional dependencies that I want to toggle and I don't know why I can't use (leading/trailing) commas there just like I use them with big arrays.’

3 contains the details harboring the devil. Here's what I think I know:

  • For sum types, there is a clear desire for leading |, supported by the syntax of ML and by the dominant formatting style in Haskell and PureScript to date.
  • For array/object literals, I expect support for leading versus trailing , to be split. The leading side is supported by a general ‘symbols first’ trend in the formatting of PureScript and Haskell. The trailing side is supported by the fact that it's the style of ULS much more commonly found in the equivalent of array/object literals in other programming languages: JavaScript, Python, Ruby, Perl, PHP, Lua, C, C#, Java, Kotlin, Scala, Rust... and those are just the ones I know off the top of my head. I can think of several programming languages besides PureScript and Haskell that don't support leading or trailing commas in such expressions (ML, F#, Groovy, SQL, Prolog), but I personally can't think of a single language, popular or esoteric, that supports leading commas and not trailing commas. Especially given the syntactic synergy that our literals have with JavaScript—we support almost all of the same literals with the same syntax and the same semantics—I think it would be foolish to implement ULS anywhere without specifically allowing trailing commas in array and record literals.

As for the rest... I could see a mostly conservative proposal of only leading | for sum types, only trailing , for array-ish and object-ish stuff (encompassing literals, patterns, updates, and row types), and no ULS anywhere else. I could see a consistent-by-character proposal of leading | and trailing , wherever applicable. I could see a leading-delimiters-allowed-generally with a specific carve-out for also allowing trailing , for array-ish and object-ish stuff. And I can see the mostly liberal proposal of allowing both types of ULS generally. I think I'd have a tough time supporting more complicated ways of slicing this design space; I'd rather not make it difficult to remember where leading and trailing delimiters are allowed by making those choices in too ad-hoc a way.

To some of your specific points:

While @rhendric added a test for Maybe(Just, Nothing), I don't think this isn't needed. A change was made sometime ago in the language that requires either all or none of a type's data constructors to be imported/exported, but I don't think the parser was updated to account for that.

Exported, but not imported. It's still legal, and sometimes useful, to import some but not all of a type's constructors; and so it's still possible to want to individually add and remove from that list, which means that ULS can still add (an admittedly small amount of) value here.

  • I support leading , in array/record syntax if the below conditions are satisfied:
    • single/multi-head case _ of syntax works

Just to clarify, the multi-head case issue has nothing to do with array/record patterns. These commas:

case
  , expr1
  , expr2
of
  , BigPattern1
  , BigPattern2
    -> ...
  , BigPattern3 -- this is where the issue was
  , BigPattern4
    -> ...

are not the same as these commas:

case someArrayValuedExpression of
  [
  , BigPattern1
  , BigPattern2
  ] -> ...

@toastal
Copy link

toastal commented May 31, 2022

I just saw some Haskell code where the imports had trailing commas. It felt a little weird and I agree this is mostly a bias on how ML-code should look like rather than something objective. This could be enforced with a formatter ({ trailingCommas: Always }) but it does add complexity and fork in styles. I believe it would be less like a "war" than a bias to the familiar (do you come from JavaScript or Haskell?). I think both sides are correct and there is subjectivity at play for liberal usage for maximal expressiveness + principally odd to choose only leading instead of both vs. conserving and keeping harmony in the feel/aesthetic throughout the language.

(but +1 leading pipe on ADTs specifically 😉)

@JordanMartinez
Copy link
Contributor

if it seems that way, surely it's just because the formatting style you're most used to approximates leading delimiters in this case?

That is definitely the case.

Exported, but not imported. It's still legal, and sometimes useful, to import some but not all of a type's constructors; and so it's still possible to want to individually add and remove from that list, which means that ULS can still add (an admittedly small amount of) value here.

I tried import Data.Maybe (Maybe(Just)) on Try PureScript and you're right. Thanks, I didn't know that.

@purefunctor
Copy link
Member

I would say that one thing to consider is to determine whether or not these changes would end up becoming the "recommended" formatting to use for writing PureScript. A few years down the line, would we see new documentation making use of and potentially recommending these optional formatting styles? I've no strong arguments about allowing trailing | and , from an aesthetic and functional perspective though. OTOH, I think allowing ULS for import and export lists should be a separate PR unless it warrants support from users.

@natefaubion
Copy link
Contributor

natefaubion commented Jun 1, 2022

I am moderately 👎 on this, mainly because maximal syntactic expressiveness is not something I'm really interested in. Regarding what @toastal wrote:

I think both sides are correct and there is subjectivity at play for liberal usage for maximal expressiveness + principally odd to choose only leading instead of both vs. conserving and keeping harmony the feel/aesthetic throughout the language.

I'm much more inclined to land an overall harmony. I'm least 👎 and perhaps somewhat 👍 on leading pipes, but I just don't really agree regarding commas. Trailing commas just don't scale well overall with our syntax.

example = [
  case foo of
    ...
    Foo ->
      ... some big expression, -- <- so easy to miss
]

The way to make that stand out, and not be so hard to miss would be to put a newline before the trailing comma, which I think is an even more bizarre style. I personally really don't want to see trailing commas be a thing. It would be a hard pass for me to support it in tidy, since I would be supporting a style that has zero usage in the wild. I think users by far would prefer no commas at all, so for me this is solving the wrong problem.

@ursi
Copy link

ursi commented Jun 1, 2022

I think less syntax is better. Less variation to parse mentally, less things to think about when deciding how to format your own code, less things for tooling to have to deal with, so I'm not a fan of this change. That being said, I do think good multi-line syntax is important, and I think maybe the one case above where something that doesn't currently have a good multi-line syntax now, would, is case statements with leading commas. That being said. I'm not sure how often I use multi-line case statements in my code (potentially never), so I'm not sure I feel it's worth the extra syntax.

@wclr
Copy link
Contributor

wclr commented Jun 10, 2022

I don't think this is a good idea, this adds even more inconsistency of code styles and makes things look drifting to JS style excessive flexibility.

@davezuch
Copy link

It seems like leading pipes are at least much less controversial than the rest of this PR. Maybe those can be extracted out into their own PR and discussed? Personally, I would really like to have leading pipes, but I don't have strong feelings about the rest.

@toastal
Copy link

toastal commented Jun 16, 2022

@natefaubion you bring up a good point with leading commas. It seems there's a stylistic reason to try to keep a consistent left-hand character which would match leading pipe.

type Result e a =
  | Success s  …
  | Failure e 

I guess by that argument I could see an case for stylistic leading characters too like:

burgerToppings  Array String
burgerToppings =
  [
  , "bacon"
  , "peanut butter"
  , "fried banana"
  ]

or

burgerToppings  Array String
burgerToppings = [
  , "bacon"
  , "peanut butter"
  , "fried banana"
]

This makes it easier to move/modify the first element, though it looks a little weird.

@i-am-the-slime
Copy link
Contributor

An interesting tidbit, I find, is something I've liked in zig which supports trailing commas.
The (language enforced) formatter will format things without trailing commas in them to be on a single line and when there's a trailing comma the code will be formatted to be on multiple lines.

@toastal
Copy link

toastal commented Dec 5, 2022

I may have changed my stance about the comma situation. I'm definitely still pro-leading pipe above anything, but someone pointed out to me that leading commas doesn't play the best with tabs (as opposed to spaces). There's an open issue about tab support for accessibility, and I'm inclined to support it (because my eyes have had trouble reading 2-space (and 1-space) indentation for the last 2 years). The last thing you want to do is start mixing the two though.

For

[ a
, { b:
    { c: unit
    , d: unit
    }
  }
}

you could put tabs after the commas/opening symbols like

[	a
,	{	b:
			{	c: unit
			,	d: unit
			}
	}
}

or maybe you embrace the trailing commas syntax (easiest to read IMO)

[
	a,
	{
		b:	{
			c: unit,	
			d: unit,
		},
	},
]

The examples I last posted are I believe harder to read in this case.

[
	, a
	, {
		, b: {
			, c: unit
			, d: unit
		}
	}
}

(if this looks bad and too spaced out in your browser, it's because because browsers don't make it easy to configure your preferred tab width, but a userStyle can cover it; I did raise as issue with Mozilla)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants