-
Notifications
You must be signed in to change notification settings - Fork 205
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add comments to format output #4397
base: main
Are you sure you want to change the base?
Conversation
One basic problem here is that we're trying to use PL. But that doesn't actually fit with our approach for using comments. For example, if a binary expression has a comment within it: from foo
derive bar = x + # comment
\ y ...it can't fit into our current approach, because the code for writing the binary expr doesn't allow writing any comments in the middle of that binary expression: prql/prqlc/prqlc/src/codegen/ast.rs Lines 109 to 121 in c7f1235
So we may need to rethink the current approach — for example something that iterates through the lexer output, and looks up their meaning in the PL. |
I don't see a problem here. The comment would be emitted by the |
But how would the example above be written? Currently |
I was thinking something like this:
|
My question is — what if the comment is in the middle of these? For example: r += opt.consume(" ")?; // writes " "
r += opt.consume(&op.to_string())?; // writes "+"
// What if there's a comment here??
r += opt.consume(" ")?; // writes " " |
Oh I see. A way back when we talked about emitting comments, I imagined to have a wrapper around expr codegen that would:
Your case would then look like:
// writing out BinOp(x, +, y)
let mut r = String::new();
let left = write_within(left.as_ref(), self, opt.clone())?; // this wrapper would see the next comment, but determine that is not ready to be written
r += opt.consume(&left)?; // writes "x"
r += opt.consume(" ")?; // writes " "
r += opt.consume(&op.to_string())?; // writes "+"
r += opt.consume(" ")?; // writes " "
r += &write_within(right.as_ref(), self, opt)?; // this wrapper would see the next comment and see that it appears before `y`, so it should be written before `y`
Some(r) |
from foo
derive bar = x + # comment
\ y
from foo
# comment
derive bar = x + y What do you think the best path forward is? This seems like a tractable change which could let us land Whereas rewriting this to iterate through tokens — which I think is the closest approach which honors exact comment / line-wrap placement — might be a big rewrite... |
I don't think you understood me: approach I've described would not move comments "trough" expressions. In your example it would keep the comment before It would however sometimes move it "trough" tokens that are not full expressions ( |
My general approach would be to make something, even if we later decide that it needs to be rewritten for whatever reason. But this quite a hard thing to do in general, so if it not something you would enjoy doing, there are things that would be more beneficial to the project. At least that's my perception of importance. |
Sorry, you're right, my example wasn't good. A better one is from prql/prqlc/prqlc/src/codegen/ast.rs Line 372 in c7f1235
...but this is very esoteric. OK great, thanks, let's go ahead with this! I will work on it |
f7157c5
to
73c0504
Compare
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
I think this mostly "works", but is extremely bad code, some of the worst I've written :). I simplified the before / after comment functions, but this means that I need to do a quick change to allow comments that appear before any tokens to work. One major issue is that in: from employees # comment ... I'm not sure the best way to do that — I currently have some really hacky flag in the It's also really inefficient, which isn't super important at this stage, but as an FYI. I'm still thinking that iterating through the tokens and deciding where to put them based on looking them up in the PL would be a better model. But as above — that probably requires a big rewrite, and I'd be quite keen to land something. |
// FIXME: why isn't this working? | ||
// .position(|t| t.1.start == span.start && t.1.end == span.end) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because if expression 1 + x
has span=0..5
, the tokens will have spans 1: 0..1 +: 2..3 x: 4..5
.unwrap_or_else(|| panic!("{:?}, {:?}", &tokens, &span)); | ||
|
||
let mut out = vec![]; | ||
for token in tokens.0.iter().skip(index + 1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You probably want this here:
tokens.0.iter().skip_while(|t| t.span.start < span.end)
... skipping all tokens that were before or within the current expression.
match token.kind { | ||
TokenKind::NewLine | TokenKind::Comment(_) => out.push(token.clone()), | ||
_ => break, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, and this is:
.take_while(|token| matches!(token.kind, TokenKind::NewLine | TokenKind::Comment(_))
assert_snapshot!(format_prql( r#" | ||
from employees | ||
# test comment | ||
select {name} | ||
"# | ||
).unwrap(), @r###" | ||
from employees | ||
# test comment | ||
|
||
select {name} | ||
|
||
"###); | ||
|
||
assert_snapshot!(format_prql( r#" | ||
# test comment | ||
from employees # inline comment | ||
# another test comment | ||
select {name}"# | ||
).unwrap(), @r###" | ||
from employees # inline comment | ||
# another test comment | ||
|
||
select {name} | ||
"###); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, this is got me excited about this feature!
/// The lexer tokens that were used to produce this source; used for | ||
/// comments. | ||
pub tokens: TokenVec, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a question if we could prematurely optimize this:
Do we need all of the tokens? Wouldn't just comments (and new lines maybe) be enough to get the same functionality?
A heads up: the It was supposed to describe in what conditions something should be written in: A few examples:
An important notion here is that when you enter an context of, for example, parenthesis, the new indentation is present only within the parenthesis. When that call ends, the indentation needs to be reset. That's why it is convenient that these I assume you wanted to have global place for the |
It wasn't for the perf! It was because I think need a way to hold state, to pass down whether we should be grabbing comments from before the expr. We only want to do that for the first expr of the statement, and need to tell the other exprs that they shouldn't look before themselves, only after1.
Footnotes
|
8263423
to
4e47f09
Compare
Yup, iterating trough the tokens sounds like a good idea. It would benefit the perf and would also take care of the problem you describe where child exprs find comments already consumed by parent expr. |
OK so I did some light research and one approach seems close to our current one, and will avoid the current issues in the PR: Iterate over the tokens, and add each token to a node in the PL AST. That way, we don't have the ambiguity of whether the comment in It does mean that we either need:
The HashMap seems like the easiest to do; though also the least inspectable and the least easy to move to the full "iterate over both the PL and the tokens". I'll probably do that.... |
This needs some work, as it's really hacky. But wanted to get back into making progress. Will refine before merging.
Any feedback welcome; comments are inline describing a couple of the issues...