Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected behavior for multiline-strings #1112

Open
4 tasks done
MatteoVoges opened this issue Aug 8, 2023 · 16 comments · May be fixed by #1113
Open
4 tasks done

Unexpected behavior for multiline-strings #1112

MatteoVoges opened this issue Aug 8, 2023 · 16 comments · May be fixed by #1113
Labels
bug Something isn't working

Comments

@MatteoVoges
Copy link
Contributor

Describe the bug
Hey, I'm trying to have a config, where some values are in an escaped interpolation, but in a multiline-string. I think this is a really specific use case, but I think that this applies to all escaped characters.

To Reproduce

from omegaconf import OmegaConf as oc, Node

correct = {
    "config": {
        "subconfig": "MULTILINE \n    ${tag:ENV}",
    },
    "values": "${config}",
}

wrong = {
    "values": "${config}",
    "config": {
        "subconfig": "MULTILINE \n    ${tag:ENV}",
    },
}


def escape_tag(input: str, _node_: Node):
    """resolver function, that returns an escaped tag with the input"""
    if "\n" in str(_node_):  # multiline-strings
        escape = "\\\\\\"  # \\\
    else:
        escape = "\\"  # \

    return escape + "${" + input + "}"


oc.register_new_resolver("tag", escape_tag)


cfg_correct = oc.create(correct)
cfg_wrong = oc.create(wrong)

oc.resolve(cfg_correct)
oc.resolve(cfg_wrong)

print(cfg_correct["values"])
# >>> {'values': {'subconfig': 'MULTILINE \n    \\${ENV}'}}
print(cfg_wrong["values"])
# >>> {'values': {'subconfig': 'MULTILINE \n    \\\\\\${ENV}'}

### Another bug ?
print(cfg_correct["values"]["subconfig"])
# >>> [...] ${ENV}
print(cfg_wrong["values"]["subconfig"])
# >>> [...] \${ENV}

Explanation
Here I have two configs correct and wrong and the only difference between them is the key order in the python file.
Then I have a custom resolver that produces me an escaped interpolation. In multiline-strings \ has to be escaped and because of the double escapes (#695 (comment)) we need 3 \. Now the diff is that in the wrong config, each \ is escaped again, and it produces 6 \.

Another bug (?) is, that when I print the actual strings, that the string gets dumped as multiline and not as quoted, but this could be intended.

Expected behavior
I expect, that the order in the config doesn't matter and both configs produce \${ENV}.

If there is any kind of workaround like sorting the keys in the config, I would be happy already.

Additional context

  • OmegaConf version: latest / master / 2.4.0.dev0
  • Python version: 3.10.12
  • Operating system: Linux Ubuntu
  • Please provide a minimal repro
@MatteoVoges MatteoVoges added the bug Something isn't working label Aug 8, 2023
@odelalleau
Copy link
Collaborator

odelalleau commented Aug 8, 2023

Interesting -- I believe this is the same root issue as in #1081 (comment)

I'll have a quick look to check if option (2) mentioned in that post would be easy to implement.

@MatteoVoges
Copy link
Contributor Author

There are definitely some similarities. Is there a general problem with producing (escaped) interpolations, because I use them very often.
But this time I couldn't believe that the key order matters...

odelalleau added a commit to odelalleau/omegaconf that referenced this issue Aug 8, 2023
@odelalleau
Copy link
Collaborator

Is there a general problem with producing (escaped) interpolations

Yes (IMO).

I have a tentative fix in #1113. I can't guarantee if/when a new version will be released with this fix, so you might need to use a custom omegaconf version for now.

Note that this may break some of your existing code if you are relying on the current buggy behavior (for instance I think you'll need to change your tag resolver to achieve what you want with this PR).

@MatteoVoges
Copy link
Contributor Author

Thanks for the quick fix. I will try it in a minute. This is just a fix for the escaping, right? Because I still wonder what causes the different behavior regarding the order of the keys in my example. Will it fix that too?

odelalleau added a commit to odelalleau/omegaconf that referenced this issue Aug 8, 2023
@odelalleau
Copy link
Collaborator

I still wonder what causes the different behavior regarding the order of the keys in my example. Will it fix that too?

Yes I expect so.
(just be careful that my first PR was bugged, I just pushed a fixed version)

@MatteoVoges
Copy link
Contributor Author

MatteoVoges commented Aug 9, 2023

I don't know if I'm doing something wrong, but if I run your code, my example above is still incorrect. Even if I need to change the tag resolver, then the output for both configs should be equal.
I will try again later today, if I have more time

Works now, I edited the wrong file in my env

@MatteoVoges
Copy link
Contributor Author

Hey, I was able to work with the fix a bit and I noticed some breaking changes and some use cases that don't work anymore.
Maybe we could achieve, that this is not the case and we're just fixing the bug in another way.

The bug / breaking change happens when creating an interpolation with a resolver:

from omegaconf import OmegaConf as oc

def produce_interpolation(content: str):
    return "${" + content + "}"

test = {
    "value": "abc",
    "interpolation": "${produce_interpolation:value}",
}

oc.register_new_resolver("produce_interpolation", produce_interpolation)

config = oc.create(test)
print(config)
# >>> {'value': 'abc', 'interpolation': '${produce_interpolation:value}'}

oc.resolve(config)
print(config)
# >>> {'value': 'abc', 'interpolation': '\\${value}'}

oc.resolve(config)
print(config)
# >>> {'value': 'abc', 'interpolation': '\\${value}'}

Before the fix, the resolver just produces ${value} (unescaped) and in the second resolve step it resolves to abc. This would also be my favorite behavior for this.

@odelalleau
Copy link
Collaborator

Hey, I was able to work with the fix a bit and I noticed some breaking changes and some use cases that don't work anymore.

Yes, this is what I meant in #1113 when I wrote "Note that this is a breaking change".

The current behavior isn't actually intended (there are no tests covering this use case), and can cause issues as we've seen (it's also potentially confusing that a config doesn't behave the same way after it's been resolved).

Can you explain your use case? There may be another way to achieve what you want, for instance maybe you can directly resolve the interpolation you are creating (this is an example that requires direct access to the internal API so it's not great, but it might be possible to expose some of it if needed):

from omegaconf.omegaconf import _node_wrap
from omegaconf._impl import _resolve


def produce_interpolation(content: str, _parent_):
    interpolation = "${" + content + "}"
    return resolve_interpolation(interpolation, parent=_parent_)


def resolve_interpolation(interpolation, parent):
    node = _node_wrap(
        value=interpolation,
        is_optional=False,
        key="__tmp__",
        parent=parent,
    )
    _resolve(node)
    return node._value()

We may also consider adding a flag to resolve() to indicate whether we want the new or old behavior (but fixing the issue where the resolution order matters). This could actually be useful for to_yaml() and to_container(), which also require looking at.

@MatteoVoges
Copy link
Contributor Author

I will test your resolver in a minute, thanks for this!

My usecase is large templating of several configs with default values. This is a simplified example of it:
We have a default "class" with a default structure:

default:
  valueA: aaa
  valueB: bbb
  api:
    deep:
      nested:
        valueA: ${relative_path:default.valueA}
        valueB: ${relative_path:default.valueB}

Then we have special objects that "inherit" from this base class

special: ${merge:${default},${special-config}}

special-config:
  valueA: zzz

Here we have two resolvers relative_path, that converts an absolute interpolation path to a relative one and important: delays one resolving stage, and merge, that takes several configs and merges them using OmegaConf.merge()

If we resolve this in a two step-system the interpolations get resolved after merging and we get the final config:

default: [object is resolved, but unimportant]
special-config: [unimportant]

# important
special:
  valueA: zzz
  valueB: bbb
  api:
    deep:
      nested:
        valueA: zzz # overridden value
        valueB: bbb # default value

So TLDR: We are able to produce reusable objects templated with default values and can have many copies of them where some special values are set. This hierarchy of inheritence can of course go further like a specialA and specialB object, that get merged with the special object.

We at kapitan need this, because many of the values are redundantly defined and we need a simple way to template these efficiently. Then we can search for the key api and can copy these config values to helm for example.

I know that this may seem way too complicated and there are certainly 100 methods to do it a lot easier...

@MatteoVoges
Copy link
Contributor Author

MatteoVoges commented Aug 15, 2023

Wouldn't be a solution, that the return value of a resolver is always directly the value and there is no escaping at all?

Some solution that goes in the same direction is the modification of the _unescape function in the grammar_visitor. This would allow us, to have the behavior with resolvers and interpolations as it was before and we treat escaped interpolations differently. My idea would be to keep escaped interpolations escaped, until we want to have the real value of the config, so accessing, to_object() and to_yaml(). WDYT?

UPDATE: I tried with modifying and came up with removing the +1 in grammar_visitor.py line 359:

- text = s.text[-(len(s.text) // 2 + 1) :]
+ text = s.text[-(len(s.text) // 2 ) :]

and this seems working. I will try to run the tests with it soon...

@MatteoVoges
Copy link
Contributor Author

Your resolver works fine for some cases like the simplified repro from #1112 (comment) but for my most recent use case it isn't suitable because I need the actual resolving happen after the merge.

We may also consider adding a flag to resolve() to indicate whether we want the new or old behavior (but fixing the issue where the resolution order matters). This could actually be useful for to_yaml() and to_container(), which also require looking at.

I think this would be the best solution, because I might not be the only one, that uses this in their resolvers.

@odelalleau
Copy link
Collaborator

I'll need to spend a bit more time understanding your use case and be able to tell whether there's a better way to achieve what you want, but in the meantime I updated #1113 so that resolve() now takes a flag to control whether or not to escape interpolation strings.
I reverted the default behavior to the old one for now, and hopefully the problem you originally mentioned in this issue should be fixed.

@odelalleau
Copy link
Collaborator

Wouldn't be a solution, that the return value of a resolver is always directly the value and there is no escaping at all?

That was the previous behavior. The main issue with it is that cfg does not behave the same before / after calling resolve(cfg), which I'd argue is bad in most use cases.

Some solution that goes in the same direction is the modification of the _unescape function in the grammar_visitor. This would allow us, to have the behavior with resolvers and interpolations as it was before and we treat escaped interpolations differently. My idea would be to keep escaped interpolations escaped, until we want to have the real value of the config, so accessing, to_object() and to_yaml(). WDYT?

UPDATE: I tried with modifying and came up with removing the +1 in grammar_visitor.py line 359:

- text = s.text[-(len(s.text) // 2 + 1) :]
+ text = s.text[-(len(s.text) // 2 ) :]

and this seems working. I will try to run the tests with it soon...

I'm not sure I entirely follow what you're trying to do here, but this breaks a bunch of tests.

@MatteoVoges
Copy link
Contributor Author

I'm not sure I entirely follow what you're trying to do here, but this breaks a bunch of tests.

You tried already, or an assumption?

@odelalleau
Copy link
Collaborator

I'm not sure I entirely follow what you're trying to do here, but this breaks a bunch of tests.

You tried already, or an assumption?

I tried it.

@MatteoVoges
Copy link
Contributor Author

but in the meantime I updated #1113 so that resolve() now takes a flag to control whether or not to escape interpolation strings.

Sounds great. Thank you really much!

I tried it.

Alright. Then I just got lucky with my testscript...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants