Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Load edn as a Python dict with string keys #76

Open
piotr-yuxuan opened this issue Jul 8, 2020 · 8 comments
Open

Load edn as a Python dict with string keys #76

piotr-yuxuan opened this issue Jul 8, 2020 · 8 comments

Comments

@piotr-yuxuan
Copy link

Would you consider a PR to load edn form just as standard Python dict with string keys?

@bfontaine
Copy link
Collaborator

bfontaine commented Jul 8, 2020

I’ll let @swaroopch weight in but personnally I’d be interested in some option to edn_format.loads such as string_keys=True or string_keyword_keys=True that would parse all keyword keys as strings.

@piotr-yuxuan
Copy link
Author

Indeed yes that sounds reasonable.

@swaroopch
Copy link
Owner

Agree with @bfontaine ! Thanks @piotr-yuxuan :-)

@piotr-yuxuan
Copy link
Author

piotr-yuxuan commented Jul 28, 2020

I see two issues here:

  • Currently Keyword('a/b').name => 'a/b'. I'm sure it's kind of a Chersterton's fence and I'm not yet aware of why it has been built that way, but in Clojure (name :a/b) => "b". This precludes us to use a 'stringify' strategy which would return only simple names.
  • The way namespaced maps are done in the parser, and not in the lexer, further complicates any attempt to stringify keys. It's entirely possible that I'm wrong, but perhaps another way is possible by maintaining the map namespace in the lexer state (cf. here). This would seem more general to me, and less hackish. However, as I don't have yet the bigger picture, it's entirely possible that I miss a more important point.

These two points are very opinionated, and perhaps you will disagree with me. I would prefer to know the opinion of the maintainers about them before refactoring everything and making breaking changes ;-) If we were to move forward, I would see four possible different ways to parse an edn keyword, and two possible ways to parse a symbol:

  • Keyword
    • Default: Keyword object
    • String, :a/b -> a/b (clj namespace + clj name)
    • String, :a/b -> b (only the name, with possible shadowing)
    • String, :a/b -> :a/b (no-op)
  • Symbol
    • Default: Symbol object
    • String, '(fn [a] a) -> "(fn [a] a)"

Methinks a naive implementation for Keyword which store the parsing strategy as a lexer state would be:

@ply.lex.TOKEN(KEYWORD)
def t_KEYWORD(t):
    if t.lexer.keyword_parsing == KeywordParsing.keyword_object:
        t.value = Keyword(t.value[1:])
    elif t.lexer.keyword_parsing == KeywordParsing.name_string:
        t.value = Keyword(t.value[1:]).name
    elif t.lexer.keyword_parsing == KeywordParsing.ns_slash_name_string:
        t.value = t.value[1:]
    elif t.lexer.keyword_parsing == KeywordParsing.leading_colon_string:
        pass
    return t

…which leads to the remarks above.

@piotr-yuxuan
Copy link
Author

piotr-yuxuan commented Jul 28, 2020

For the impatient, here how I would iterate through what is returned by edn_format:

keyword_name = lambda keyword: \
    keyword.name.split("/", 1)[1] if "/" in keyword.name else keyword.name
strinfigy_keyword = lambda x: \
    keyword_name(x) if isinstance(x, edn_format.Keyword) else x


def process_edn(x):
    if isinstance(x, edn_format.ImmutableDict):
        return {strinfigy_keyword(k): process_edn(v) for k, v in x.items()}
    elif isinstance(x, edn_format.ImmutableList):
        return [process_edn(v) for v in x]
    else:
        return strinfigy_keyword(x)

Obviously keyword_name can be changed for any other callable.

@bfontaine
Copy link
Collaborator

Hello, sorry for the late response.

That’s interesting; I was thinking of post-processing the result of the parsing like you did in your second comment; not of doing that during the lexing/parsing.

Currently Keyword('a/b').name => 'a/b'

Yes, it has bit me before. I don’t know the reason behind this. Changing this would be a breaking change, though.

The way namespaced maps are done in the parser, and not in the lexer, further complicates any attempt to stringify keys.

I don’t understand how you would do that in the lexer; its job is to tokenize the input, not to make sense of those tokens. The lexer doesn’t know what a map looks like: in #:foo {:a 1}, #foo, {, :a, 1, and } are different tokens.

@mohkale
Copy link

mohkale commented Aug 24, 2020

Just thought it was worth mentioning that you can create a tag with @piotr-yuxuan solution. Which you can add at the top of your edn file if you don't want to actively call process_edn after parsing. Something like:

edn.add_tag('python', process_edn)

and then you can:

#python
{:tags
 [{:name "foo"},
  {:name "bar"} ]}

@tzzh
Copy link

tzzh commented Dec 22, 2020

Starting from @piotr-yuxuan 's suggestion I used

def edn_to_map(x):
    if isinstance(x, edn_format.ImmutableDict):
        return {edn_to_map(k): edn_to_map(v) for k, v in x.items()}
    elif isinstance(x, edn_format.ImmutableList):
        return [edn_to_map(v) for v in x]
    elif isinstance(x, edn_format.Keyword):
        return x.name
    else:
        return x

it's incomplete but it seems ok so far in my case (parsing simple edns to python dicts without keywords), so posting it here in case that's helpful for someone else.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants