Switch to New Parser #55

iwillspeak · 2022-06-12T15:31:21Z

Bringing the new parser up to speed. Replacing all uses of the old parser with the new one, and
binning the old parser.

TODO:

Binder
Macros
Compiler
tests

iwillspeak · 2022-06-12T15:40:16Z

src/Feersum.CompilerServices/Compile/Compiler.fs

@@ -1234,7 +1234,7 @@ module Compilation =

        let ast, diagnostics =
            let nodes, diagnostics =
-                List.map Parse.parseFile sources
+                List.map LegacyParse.parseFile sources
                |> List.fold (fun (nodes, diags) (n, d) -> (List.append nodes [ n ], List.append d diags)) ([], [])

            { Location = TextLocation.Missing


We probably want to come up with a better way of modelling this. The previous parser was run on each input, and then we pasted them together into a single big SEQ. Instead I guess we should have a wrapper Compilation type, which contains a set of BoundCompilationUnits. We can bind them in sequence, and propagate some state (lib definitions etc.). The key is we don't want to have public items from one unit just become 'magically' available in another. This will probably want to be different for script compilations however. In that case we'd want multiple passes at compilation to share definitions from the previous.

We may want to handle this by just having the last compilation unit in the compilation be treated specially and allow definitions from that to leak into the root binder scope. That way inherited compilations from scripting would have access to the previous definitions.

src/Feersum.CompilerServices/Syntax/Parse.fs

iwillspeak · 2022-06-19T16:58:58Z

src/Feersum.CompilerServices/Syntax/Tree.fs

+            |> Seq.choose (NodeOrToken.asToken)
+            |> Seq.tryFind (tokenOfKind AstKind.OPEN_PAREN)
+
+        member public _.Body = red.Children()


Need to do something about form bodies

iwillspeak · 2022-06-19T16:59:31Z

src/Feersum.CompilerServices/Syntax/Tree.fs

+
+        member _.Text =
+            red
+            |> NodeOrToken.consolidate (fun n -> n.Range) (fun t -> t.Range)


Needs to actually get the text, not range.

iwillspeak · 2022-06-19T17:00:28Z

src/Feersum.CompilerServices/Syntax/Tree.fs


        static member TryCast(red: SyntaxNode) =
            if red.Kind = (AstKind.CONSTANT |> astToGreen) then
                new Constant(red) |> Some
            else
                None
+
+    type Expression =
+        | Form of Form


Inheritance might be better than this, along with active patterns.

iwillspeak · 2022-06-19T17:02:08Z

src/Feersum/ParseRepl.fs

+    ReadLine.Read("[]> ")
+    |> Parse.readProgram "repl.scm"
+
+let private print (result: ParseResult<Program>) =


Some kind of ast explorer which allows you to cd and ls around the tree might be nice.

iwillspeak · 2022-06-19T17:03:04Z

test/Feersum.Tests/LexTests.fs

@@ -16,16 +22,9 @@ let private getValue token =

 [<Fact>]
 let ``Empty input text always returns end of file`` () =


Test name needs updating to match expectations

test/Feersum.Tests/SpecTests.fs

iwillspeak · 2022-06-23T09:28:25Z

https://gist.github.com/iwillspeak/6d87958184ea0062b0409f31f36d7131

iwillspeak · 2022-07-03T11:34:31Z

src/Feersum.CompilerServices/Binding/Binder.fs

@@ -6,6 +6,14 @@ open Feersum.CompilerServices.Diagnostics
 open Feersum.CompilerServices.Syntax
 open Feersum.CompilerServices.Utils

+module private BinderDiagnostics =
+
+    // TODO: remove this and replace with better binder errors


TODO: Distinct binder diagnostics.

iwillspeak · 2022-07-03T16:24:36Z

src/Feersum.CompilerServices/Binding/Binder.fs

+                ctx.Diagnostics.Emit
+                    BinderDiagnostics.bindError
+                    (List.last formals).Location
+                    "Saw dot but no ID in formals"


Dots should be handled by the parse, not the bind

iwillspeak · 2022-07-03T16:25:02Z

src/Feersum.CompilerServices/Binding/Binder.fs

+        match node.Kind with
+        | AstNodeKind.Ident i -> BoundDatum.Ident i
+        | AstNodeKind.Constant c -> BoundDatum.SelfEval(BoundLiteral.FromConstant c)
+        | AstNodeKind.Dot -> BoundDatum.Ident "." // FIXME: This is definitely not right


Fix me. Datum dots.

iwillspeak · 2022-07-03T16:39:06Z

src/Feersum.CompilerServices/Syntax/Tree.fs

+    inherit Expression(red)
+
+    member public _.OpeningParen =
+        red.ChildrenWithTokens()


Should / could lazies work here?

iwillspeak · 2022-07-03T22:00:20Z

src/Feersum.CompilerServices/Diagnostics.fs

+            if s.Line = e.Line then
+                sprintf
+                    "%s(%d,%d-%d): %s: %s"
+                    (s.Source |> normaliseName)


This breaks click through and problem matching in VS Code. Maybe Code needs a PR?

microsoft/vscode#154054

This new REPL mode allows debugging how the parser produces syntax trees.

Recognise simple forms. No support yet for dotted forms.

Start moving the new parser infra into primary place.

Introduce a collection of types to model out the node kinds in our tree. Fixup our parser to return a typed result instead of plain `SyntaxNode` from Firethorn.

Properly handle warning diagnostics in parse results.

Introduce base type for syntax items.

Update diagnostics to store a kind. This can then contain information shared amongst all instances of a given diagnotci. For now we use it to store our error code.

Ensure that each stage of the compiler is grouped in the `.fsproj`. This should also help ensure that each stage of the compilation pipeline only dpends on those before it.

Simplify the API Layout of the Lex module. No need for a nested module to store the token sets any longer.

Move the parser from an object to a collection of functions. Each parser function takes a `state` as the final parameter, and returns a new state to allow for modifications. This is the same approach taken with `Firethorn` recently in the Predciated` project.

Update the parser to allow bumpt to _optionally_ return a token. If no token is retuend then we will just emit an empty token as the error rather than throwing. This fixes the test case which fails to provide a closing `)`, causing `expect` to be called when no tokens remain.

We don't need this any more.

We don't actually need a stopwatch for this, instead just use `Stopwatch.GetTimestamp`.

It seems that with the current debugger, and our current output format, we no longer need to be quite so agressive with our debug settings. Roll back to the defaults in some places.

Update the VS Code configuration's indentation a little.

Add better names to most option values. This improves the help text a little. It would be _super nice_ to be able to use proper sub-commands to properly parse the disjoint operations supported by the compiler. It doens't seem like that is quite possible right now with the default behaviour as we can't use `MainCommand`.

The old bound expressions used to emit AST nodes in quotes. Fix taht by introducing a new type for bound datums.

Tree becomes a namespace that directly contains the types used in the syntax tree. There are still types missing from this.

This is less F#, but more uniform in the end.

Update the error output to emit a more compact representation for the position if both the start and end lie on the same line. Add some docs as a placeholder for a full error index.

This layer should allow switching parts of the compiler over to the new syntax tree in a picewise manner. The idea is we can start accepting the new tree types, and filter them through the LegacySyntax shim when the old syntax is needed.

Rplace a timeout in the tests with a bail count. This should help prevent the tests being suceptible to execution speed, as well as simplifying the test overall.

Remove the generic binder erorr and replace with a set of specific diagnostics for each case.

When the location of a point is missing we can use a hidden sequence point rahter than omitting the sequence point entirely. This ensures that the debugger knows the sequence point exists but that the source location is hidden from it. Given that missing locations usually come from fabricated syntax as part of macro expansion this seems like the best solution.

Fixup some of the syntax tests by implemeting cooking for idnetifier string values. This involves walking through the characters in the identifier's token and replacing any escape sequences with the appropriate values.

Re-enable the remaining new parser tests and properly handle the differnt escape sequences.

Update the parser specs to use the new parser. Change the serialisation for the tests to make things _a little_ more compact.

Add support for parsing dotted forms. The dottet tail is a node in the tree under the form. Add support for this in our syntax patterns.

TCE means that we can run programs like this that recurse forever without having to worry about stack consumption. This program runs using all of a single core indefinitely without any memory use change.

Rather than fabricating a full location for each token instead we can just store the offset into the document. The `TextDocument` API allows this to be converted into a line-col as and when required.

Some work to improve the handling of text locations by the `TextDocument`. This makes sure that locations are allways at a consistent line colum offset. Previously the first line was 0-based and the remaining lines 1-based. Still not totally sure if we want the position objects to be 0-based like Firethorn or 1-based ready for output to the screen.

iwillspeak commented Jun 12, 2022

View reviewed changes

iwillspeak commented Jun 19, 2022

View reviewed changes

test/Feersum.Tests/SpecTests.fs Outdated Show resolved Hide resolved

iwillspeak commented Jul 3, 2022

View reviewed changes

iwillspeak force-pushed the feature/newparse-full branch from 2ccc457 to 309f86c Compare September 2, 2022 07:20

iwillspeak force-pushed the feature/newparse-full branch 3 times, most recently from f49cb8a to de9feb9 Compare June 17, 2023 17:56

iwillspeak force-pushed the feature/newparse-full branch from de9feb9 to 7c274b4 Compare June 18, 2023 09:44

iwillspeak added 12 commits June 20, 2023 07:42

Initial Parser REPL

854d30a

This new REPL mode allows debugging how the parser produces syntax trees.

Initial Parsing of Forms

a2d1ea3

Recognise simple forms. No support yet for dotted forms.

Quote Support in New Parser

3a77b9c

Rename Things

e165d90

Start moving the new parser infra into primary place.

Typed Parsing

9d87a13

Introduce a collection of types to model out the node kinds in our tree. Fixup our parser to return a typed result instead of plain `SyntaxNode` from Firethorn.

Formatting

4a2c177

Check Diagnostics for Errors Correctly

80737bc

Properly handle warning diagnostics in parse results.

Initial Inheritance in AST

90ce858

Introduce base type for syntax items.

Parser Program / Script Mode

2c0081c

Diagnostic Kinds

1419db1

Update diagnostics to store a kind. This can then contain information shared amongst all instances of a given diagnotci. For now we use it to store our error code.

Fixup Test Snapshots

baec296

Reoganise into Stages

4cd9c97

Ensure that each stage of the compiler is grouped in the `.fsproj`. This should also help ensure that each stage of the compilation pipeline only dpends on those before it.

iwillspeak added 27 commits June 20, 2023 07:42

Reformat Things

4eeafc3

Simplify the API Layout of the Lex module. No need for a nested module to store the token sets any longer.

Functional Parser

0695fe2

Move the parser from an object to a collection of functions. Each parser function takes a `state` as the final parameter, and returns a new state to allow for modifications. This is the same approach taken with `Firethorn` recently in the Predciated` project.

Stop Passing References in SpecTests

cdf2d7b

We don't need this any more.

Avoid Static Stopwatch for Jiffies

2a7ad50

We don't actually need a stopwatch for this, instead just use `Stopwatch.GetTimestamp`.

Update Test Debug Launch Settings

5906ec6

It seems that with the current debugger, and our current output format, we no longer need to be quite so agressive with our debug settings. Roll back to the defaults in some places.

Reformat JSON Config

da8078d

Update the VS Code configuration's indentation a little.

Vector and Bytevector Datums

0668578

Separate Bound Expressions from Old AST

4628129

The old bound expressions used to emit AST nodes in quotes. Fix taht by introducing a new type for bound datums.

Reorganise Tree

72597fc

Tree becomes a namespace that directly contains the types used in the syntax tree. There are still types missing from this.

Use Inheritance not Unions for Alternations

1cad4d2

This is less F#, but more uniform in the end.

Add Overview of Compiler Error Fomrat

9f96c6a

Update the error output to emit a more compact representation for the position if both the start and end lie on the same line. Add some docs as a placeholder for a full error index.

Initial LegacySyntax Shim

ad80d54

This layer should allow switching parts of the compiler over to the new syntax tree in a picewise manner. The idea is we can start accepting the new tree types, and filter them through the LegacySyntax shim when the old syntax is needed.

Use Deterministic Bail in Parse Spec

27736aa

Rplace a timeout in the tests with a bail count. This should help prevent the tests being suceptible to execution speed, as well as simplifying the test overall.

Add Named Diagnostics for Binder Errors

bc66fec

Remove the generic binder erorr and replace with a set of specific diagnostics for each case.

Cooking of Identifier Strings

f745c02

Fixup some of the syntax tests by implemeting cooking for idnetifier string values. This involves walking through the characters in the identifier's token and replacing any escape sequences with the appropriate values.

Bump Fantomas and Reformat

d574883

Properly Cook Strings and Characters

1dc600a

Re-enable the remaining new parser tests and properly handle the differnt escape sequences.

Switch to New Parser for Spec Test

a72fa99

Update the parser specs to use the new parser. Change the serialisation for the tests to make things _a little_ more compact.

Bump Dependencies

02236dc

Rename Framework Options

ee42ec0

Parsing of Dotted Forms

d4704fa

Add support for parsing dotted forms. The dottet tail is a node in the tree under the form. Add support for this in our syntax patterns.

Add Spec for Infinite Tail Recursion

93f6f6e

TCE means that we can run programs like this that recurse forever without having to worry about stack consumption. This program runs using all of a single core indefinitely without any memory use change.

Lazy Token Locations

9c38d7e

Rather than fabricating a full location for each token instead we can just store the offset into the document. The `TextDocument` API allows this to be converted into a line-col as and when required.

iwillspeak force-pushed the feature/newparse-full branch from 7c274b4 to 9d1b6f8 Compare July 29, 2023 13:17

Switch to New Parser in Compile

9df82a1

iwillspeak force-pushed the feature/newparse-full branch from 9d1b6f8 to 9df82a1 Compare July 29, 2023 13:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Switch to New Parser #55

Switch to New Parser #55

iwillspeak commented Jun 12, 2022

iwillspeak Jun 12, 2022

iwillspeak Jun 19, 2022

iwillspeak Jun 19, 2022

iwillspeak Jun 19, 2022

iwillspeak Jun 19, 2022

iwillspeak Jun 19, 2022

iwillspeak commented Jun 23, 2022

iwillspeak Jul 3, 2022

iwillspeak Jul 3, 2022

iwillspeak Jul 3, 2022

iwillspeak Jul 3, 2022

iwillspeak Jul 3, 2022

iwillspeak Jul 4, 2022

		@@ -16,16 +22,9 @@ let private getValue token =

		[<Fact>]
		let ``Empty input text always returns end of file`` () =

Switch to New Parser #55

Are you sure you want to change the base?

Switch to New Parser #55

Conversation

iwillspeak commented Jun 12, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

iwillspeak commented Jun 23, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment