Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch to New Parser #55

Open
wants to merge 43 commits into
base: main
Choose a base branch
from
Open

Switch to New Parser #55

wants to merge 43 commits into from

Conversation

iwillspeak
Copy link
Owner

Bringing the new parser up to speed. Replacing all uses of the old parser with the new one, and
binning the old parser.

TODO:

  • Binder
  • Macros
  • Compiler
  • tests

@@ -1234,7 +1234,7 @@ module Compilation =

let ast, diagnostics =
let nodes, diagnostics =
List.map Parse.parseFile sources
List.map LegacyParse.parseFile sources
|> List.fold (fun (nodes, diags) (n, d) -> (List.append nodes [ n ], List.append d diags)) ([], [])

{ Location = TextLocation.Missing
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably want to come up with a better way of modelling this. The previous parser was run on each input, and then we pasted them together into a single big SEQ. Instead I guess we should have a wrapper Compilation type, which contains a set of BoundCompilationUnits. We can bind them in sequence, and propagate some state (lib definitions etc.). The key is we don't want to have public items from one unit just become 'magically' available in another. This will probably want to be different for script compilations however. In that case we'd want multiple passes at compilation to share definitions from the previous.

We may want to handle this by just having the last compilation unit in the compilation be treated specially and allow definitions from that to leak into the root binder scope. That way inherited compilations from scripting would have access to the previous definitions.

src/Feersum.CompilerServices/Syntax/Parse.fs Outdated Show resolved Hide resolved
|> Seq.choose (NodeOrToken.asToken)
|> Seq.tryFind (tokenOfKind AstKind.OPEN_PAREN)

member public _.Body = red.Children()
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to do something about form bodies


member _.Text =
red
|> NodeOrToken.consolidate (fun n -> n.Range) (fun t -> t.Range)
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs to actually get the text, not range.


static member TryCast(red: SyntaxNode) =
if red.Kind = (AstKind.CONSTANT |> astToGreen) then
new Constant(red) |> Some
else
None

type Expression =
| Form of Form
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inheritance might be better than this, along with active patterns.

ReadLine.Read("[]> ")
|> Parse.readProgram "repl.scm"

let private print (result: ParseResult<Program>) =
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some kind of ast explorer which allows you to cd and ls around the tree might be nice.

@@ -16,16 +22,9 @@ let private getValue token =

[<Fact>]
let ``Empty input text always returns end of file`` () =
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test name needs updating to match expectations

@iwillspeak
Copy link
Owner Author

@@ -6,6 +6,14 @@ open Feersum.CompilerServices.Diagnostics
open Feersum.CompilerServices.Syntax
open Feersum.CompilerServices.Utils

module private BinderDiagnostics =

// TODO: remove this and replace with better binder errors
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: Distinct binder diagnostics.

ctx.Diagnostics.Emit
BinderDiagnostics.bindError
(List.last formals).Location
"Saw dot but no ID in formals"
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dots should be handled by the parse, not the bind

match node.Kind with
| AstNodeKind.Ident i -> BoundDatum.Ident i
| AstNodeKind.Constant c -> BoundDatum.SelfEval(BoundLiteral.FromConstant c)
| AstNodeKind.Dot -> BoundDatum.Ident "." // FIXME: This is definitely not right
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix me. Datum dots.

inherit Expression(red)

member public _.OpeningParen =
red.ChildrenWithTokens()
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should / could lazies work here?

if s.Line = e.Line then
sprintf
"%s(%d,%d-%d): %s: %s"
(s.Source |> normaliseName)
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This breaks click through and problem matching in VS Code. Maybe Code needs a PR?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This new REPL mode allows debugging how the parser produces syntax
trees.
Recognise simple forms. No support yet for dotted forms.
Start moving the new parser infra into primary place.
Introduce a collection of types to model out the node kinds in our tree.

Fixup our parser to return a typed result instead of plain `SyntaxNode`
from Firethorn.
Properly handle warning diagnostics in parse results.
Introduce base type for syntax items.
Update diagnostics to store a kind. This can then contain information
shared amongst all instances of a given diagnotci. For now we use it to
store our error code.
Ensure that each stage of the compiler is grouped in the `.fsproj`. This
should also help ensure that each stage of the compilation pipeline
only dpends on those before it.
Simplify the API Layout of the Lex module. No need for a nested module
to store the token sets any longer.
Move the parser from an object to a collection of functions. Each parser
function takes a `state` as the final parameter, and returns a new state
to allow for modifications.

This is the same approach taken with `Firethorn` recently in the
Predciated` project.
Update the parser to allow bumpt to _optionally_ return a token. If
no token is retuend then we will just emit an empty token as the error
rather than throwing. This fixes the test case which fails to provide a
closing `)`, causing `expect` to be called when no tokens remain.
We don't need this any more.
We don't actually need a stopwatch for this, instead just use
`Stopwatch.GetTimestamp`.
It seems that with the current debugger, and our current output format,
we no longer need to be quite so agressive with our debug settings. Roll
back to the defaults in some places.
Update the VS Code configuration's indentation a little.
Add better names to most option values. This improves the help text a
little. It would be _super nice_ to be able to use proper sub-commands
to properly parse the disjoint operations supported by the compiler.
It doens't seem like that is quite possible right now with the default
behaviour as we can't use `MainCommand`.
The old bound expressions used to emit AST nodes in quotes. Fix taht
by introducing a new type for bound datums.
Tree becomes a namespace that directly contains the types used in the
syntax tree. There are still types missing from this.
This is less F#, but more uniform in the end.
Update the error output to emit a more compact representation for the
position if both the start and end lie on the same line. Add some docs
as a placeholder for a full error index.
This layer should allow switching parts of the compiler over to the new
syntax tree in a picewise manner. The idea is we can start accepting the
new tree types, and filter them through the LegacySyntax shim when the
old syntax is needed.
Rplace a timeout in the tests with a bail count. This should help
prevent the tests being suceptible to execution speed, as well as
simplifying the test overall.
Remove the generic binder erorr and replace with a set of specific
diagnostics for each case.
When the location of a point is missing we can use a hidden sequence
point rahter than omitting the sequence point entirely. This ensures
that the debugger knows the sequence point exists but that the source
location is hidden from it.

Given that missing locations usually come from fabricated syntax as part
of macro expansion this seems like the best solution.
Fixup some of the syntax tests by implemeting cooking for idnetifier
string values. This involves walking through the characters in the
identifier's token and replacing any escape sequences with the
appropriate values.
Re-enable the remaining new parser tests and properly handle the differnt
escape sequences.
Update the parser specs to use the new parser. Change the
serialisation for the tests to make things _a little_ more compact.
Add support for parsing dotted forms. The dottet tail is a node in the
tree under the form. Add support for this in our syntax patterns.
TCE means that we can run programs like this that recurse forever
without having to worry about stack consumption. This program runs using
all of a single core indefinitely without any memory use change.
Rather than fabricating a full location for each token instead we can
just store the offset into the document. The `TextDocument` API allows
this to be converted into a line-col as and when required.
Some work to improve the handling of text locations by the
`TextDocument`. This makes sure that locations are allways at a
consistent line colum offset. Previously the first line was 0-based and
the remaining lines 1-based.

Still not totally sure if we want the position objects to be 0-based
like Firethorn or 1-based ready for output to the screen.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant