Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Continue parsing after finding error #106

Open
NoahTheDuke opened this issue Apr 30, 2024 · 2 comments
Open

Continue parsing after finding error #106

NoahTheDuke opened this issue Apr 30, 2024 · 2 comments

Comments

@NoahTheDuke
Copy link
Contributor

Is your feature request related to a problem? Please describe.
It's possible to recover from encountering mismatched brackets using :edamame/expected-delimiter but not for other kinds of errors. I would like to be able to recover from other kinds of errors to continue parsing. I encountered this with non-octal numbers padded with zeros (08 and 09), but it would be helpful in all other contexts as well (keywords or symbols with multiple /, maps with uneven number of entries, etc).

Describe the solution you'd like
Some mechanism to provide a "fix" and continue to parse. This could be a flag set in the parser when it's first called, or it could be an alternate code flow, or it could even be side-effecting top-level function to alter the state of all parsers (like (set! *warn-on-reflection* true)), or the problems could be accrued in some sort of "broken state" map and returned alongside the correct code. Here are a couple ideas for how to solve this, after thinking about it for 5 seconds:

Idea: Errors could be thrown with the correctly parsed parts and the broken parts attached in some fashion. For example, uneven maps could be, :edamame/uneven-map {:map {:a 1 :b 2} :leftovers :c} and duplicates could be, {:edamame/duplicate-map-entry {:map {:a 1 :b 2} :duplicates [{:a 3}]}}. This would allow for granularity in how each is tackled.

Idea: Errors in code could be replaced with gensym-like keywords so they can be replaced as desired. For example, (parse-string-all "(list 1 2 08 {:a 1 :b 2 :c})" {:gather-errors true}) would return [[(list 1 2 :edamame/error-1 :edameme/error-2)] {:edamame/errors-1 {:type :edamame/incorrect-number :string "08"} :edamame/errors-2 {:type :edamame/uneven-map :string "{:a 1 :b 2 :c}"}}].

Idea: Error fixing functions can be included in the parser options so throw if function doesn't return a non-nil value: (parse-string-all "(list 1 2 08 {:a 1 :b 2 :c})" {:incorrect-number (fn [s] (when (str/starts-with s "0") (subs s 1)) :uneven-map (fn [entries] (conj entries :splint/missing-value))} would return [(list 1 2 8 {:a 1 :b 2 :c :splint/missing-value})].

Describe alternatives you've considered

  1. Do nothing. Can't know what was intended so must exist immediately.
  2. Don't fix the parsing state, just delete the offending token and move on.

Additional context
The goal is to be able to analyze a whole file and provide feedback even when it's not exactly correct, because it's still worthwhile to check the rest of the file. It's annoying to only see one broken piece of code at a time, instead of being able to review/fix them all at once.

@borkdude
Copy link
Owner

borkdude commented May 6, 2024

Lots of ideas and possibilities here. It would help (and save time) if you could make a table of anything that could go wrong during parsing (e.g. unbalanced parens, uneven amount of key/vals in map, duplicate set elements) and how this would be solved on a case by case basis (and/or by configuration).

@NoahTheDuke
Copy link
Contributor Author

Excluding all of the feature throws ("Syntax quote not allowed." etc) and unmatched delimiters (those are already handled):

fn msg kind
read-num Invalid number :invalid-char
parse-string EOF while reading, expected X to match :eof
parse-to-delimiter EOF while reading, expected X to match :eof
read-regex-pattern Error while parsing regex :eof
parse-set X literal contains duplicate key :duplicate
parse-first-matching-condition Feature should be a keyword :invalid-type
parse-first-matching-condition EOF while reading, expected X :eof
read-symbol Invalid symbol :invalid-char
parse-namespaced-map namespaced map must specify a namespace :syntax
parse-sharp Unexpected EOF :eof
parse-sharp EOF while reading :eof
parse-map Map literal contains odd forms. :uneven-pairs
parse-keyword Invalid token :invalid-char
dispatch EOF while reading :eof
  • :eof is hard to know how to handle. Maybe punt for now for simplicity (shouldn't happen during most usages).
  • :invalid-char is my catch-all for "used a wrong character for one of the literals": disallowed letters in a number, disallowed characters in a keyword or symbol, etc. Maybe the string so far and the type ({:type :symbol :string "cool-symbol:"}) could be passed to a provided function and the valid type must be returned.
  • :duplicate is pretty easy: if the fixer fn is provided, pass the vector of forms to the function. let it create a valid object of the required type.
  • :syntax felt less specific than :namespaced-map-error, but it's the only one lol. I don't know how to fix this except to pass the string plus following map to the function and let it fix it. Maybe punt? Most people don't use namespaced maps.
  • :uneven-pairs can be fixed in the same way as :duplicate: pass the vector of forms to the provided function and then assert it returns a valid map.

What do you think?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants