Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure analysis server tolerates non-whitespace edits from the formatter when minimizing edits #53561

Closed
DanTup opened this issue Sep 19, 2023 · 5 comments
Labels
analyzer-lsp Issues with analysis server's support of Language Server Protocol area-analyzer Use area-analyzer for Dart analyzer issues, including the analysis server and code completion. P2 A bug or feature request we're likely to work on

Comments

@DanTup
Copy link
Collaborator

DanTup commented Sep 19, 2023

Currently the formatter API takes Dart code as string input and outputs formatted code also as a string. In editors, just replacing the entire string can lead to markers in the code (such as breakpoints) losing their positions. Instead, only the modified parts of the document should be changed.

The server handles this (source) by parsing the before/after code and comparing the token streams to find which whitespace changed. This is much simpler than a real diff but only currently handles whitespace changes.

There are future plans for the formatter to handle adding/removing commas in some situations so this code will need updating to avoid falling back to a full-document-edit.

Some possible fixes:

  1. Update the code to tolerate the insertion/removal of comma tokens
  2. Change the code to do a real diff so it supports any future changes from the formatter
  3. Have the formatter able to provide the ranges/replacements it makes in a structured way that we can map directly

The first option is likely the simplest, but also only addresses commas. The second option requires finding or writing a good diff (and will likely be more expensive to run on large files). The third option would perhaps by the ideal solution, but depending on how the formatter works might not be trivial.

@munificent I guess some questions for you:

  1. Are the changes isolated only to inserting/deleting commas (in addition to whitespace), or might there be more?
  2. Does the formatter work in a way that the individual ranges/replacement texts could be exposed? Would it make sense to try and do this?
@DanTup DanTup added area-analyzer Use area-analyzer for Dart analyzer issues, including the analysis server and code completion. analyzer-lsp Issues with analysis server's support of Language Server Protocol labels Sep 19, 2023
@munificent
Copy link
Member

  1. Are the changes isolated only to inserting/deleting commas (in addition to whitespace), or might there be more?

The current plan is yes. It may also move commas relative to comments as well, as in:

// Before:
function(argument // Comment.
,);

// After:
function(
  argument, // Comment.
);
  1. Does the formatter work in a way that the individual ranges/replacement texts could be exposed?

No, it doesn't. It basically tokenizes and parses the input then builds its own separate output string from scratch based on that. It doesn't think in terms of modifications, it just builds the output from the intermediate representation.

Would it make sense to try and do this?

Probably not. Having the locations of all of the tokens (or contiguous unsplit series of them) would be pretty hard and probably slow things down a lot.

However, the formatter can preserve a selection. So it understands that you can give it two offsets into the input code and it will you the offsets that they end up at in the resulting code. It would be fairly easy to generalize that to an open-ended number of offsets.

Given that, we could just track the offsets of, like, all the tokens and figure out the moves based on that. But a more efficient thing to do would probably be to pass in the offsets of the breakpoints, and then the formatter will tell you where the breakpoints ended up. That way, the only offsets it needs to track are the ones that actually matter.

@DanTup
Copy link
Collaborator Author

DanTup commented Sep 20, 2023

It may also move commas relative to comments as well

Ah, that's good to know, thanks!

No, it doesn't. It basically tokenizes and parses the input then builds its own separate output string

I had a feeling that was the case, thanks for confirming.

But a more efficient thing to do would probably be to pass in the offsets of the breakpoints, and then the formatter will tell you where the breakpoints ended up. That way, the only offsets it needs to track are the ones that actually matter.

Unfortunately it's not possible to do it like this - we don't have access to this kind of context and we don't get to run any code here - we just have to implement an API that can provide edits that format the document. It's also not restricted to breakpoints, even things like the syntax highlighting are preserved (until the next request completes) by applying small edits instead of replacing the doc.

For now, I think I'll try adding comma (+attached comment) support to the existing code. If it turns out to add significant complexity we can re-evaluate.

(cc @bwilkerson in case he has other suggestions/opinions)

@munificent
Copy link
Member

munificent commented Sep 20, 2023

Unfortunately it's not possible to do it like this - we don't have access to this kind of context and we don't get to run any code here - we just have to implement an API that can provide edits that format the document. It's also not restricted to breakpoints, even things like the syntax highlighting are preserved (until the next request completes) by applying small edits instead of replacing the doc.

Ah, that makes sense.

It might be worth considering just running a diff algorithm on the before and after text like you suggested. That won't be particularly fast, but if it only applies to the current file at the point that the user is doing an already slow-ish operation like saving to the filesystem, it could be acceptable.

I'll think some more about how the formatter could track this while formatting. But my hunch is that doing so in general would add a lot of overhead and make all formatting slower, which I wouldn't want to do in the general case since it would punish batch formatter for no benefit since the tracked moves wouldn't be used.

But maybe there's a way to make the tracking optional.

For now, I think I'll try adding comma (+attached comment) support to the existing code.

If it makes things easier, I could also not move commas relative to comments. I'm working on that right now because I think it leads to a generally better user experience, but it might also be reasonable to expect users to put comments in the right place. But, if it's doable, I do think it would be nice if the formatter could swap the order of commas and comments in places where that makes sense. Ibe

@DanTup
Copy link
Collaborator Author

DanTup commented Sep 20, 2023

It might be worth considering just running a diff algorithm on the before and after text like you suggested. That won't be particularly fast, but if it only applies to the current file at the point that the user is doing an already slow-ish operation like saving to the filesystem, it could be acceptable.

There are options for things like "format on type" where we auto-format when you type some specific characters (like ; or }). I have this turned on and quite like it - but I don't know if running a real diff across a large document would be reasonable in that case.

If it makes things easier, I could also not move commas relative to comments.
...
But, if it's doable, I do think it would be nice if the formatter could swap the order of commas and comments in places where that makes sense.

Agreed, it'd be nice to not ditch this - I'll have a go at it and if I don't think it can be made to work with the current mechanism we can review the other options (including full diff, etc.). Thanks!

@munificent
Copy link
Member

but I don't know if running a real diff across a large document would be reasonable in that case.

It's been a while since I dug into diff algorithms, but I believe most will be O(n) in the length of the document and fast in practice if the actual differences are small and confined to a small region. If most of the file has already been formatted and just a couple of lines in the middle have changed, I would hope (but don't hold me to this) that running a diff would be pretty quick.

@pq pq added the P2 A bug or feature request we're likely to work on label Sep 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
analyzer-lsp Issues with analysis server's support of Language Server Protocol area-analyzer Use area-analyzer for Dart analyzer issues, including the analysis server and code completion. P2 A bug or feature request we're likely to work on
Projects
None yet
Development

No branches or pull requests

3 participants