Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lexical scoping for augmentations #3738

Open
lrhn opened this issue Apr 30, 2024 · 4 comments
Open

Lexical scoping for augmentations #3738

lrhn opened this issue Apr 30, 2024 · 4 comments
Labels
request Requests to resolve a particular developer problem static-metaprogramming Issues related to static metaprogramming

Comments

@lrhn
Copy link
Member

lrhn commented Apr 30, 2024

The augmentation specification contains the following paragraph:

The static and instance member namespaces for an augmented type are shared across the declaration of the type in the augmented library and all augmentations of that type. Identifiers in the bodies of members (both implicit ones and explicit uses like this. or TypeName.) are resolved against that complete merged namespace.

I think the "implicit ones" is a mistake. It changes a lexical scope to an implicit context scope.

Currently an unqualified identifier is resolved in the lexical scope, which are the textually surrounding declarations, then bottoming out at the library's declaration scope and import scope.
If an identifier id is not declared in those scopes, it becomes an implicit this.id, which is then an error if that would not be valid.

As currently specified in the augmentation spec, the name of a static or instance member can come from a different augmentation, which is not in the textual scope, but is still introduced into the scope that identifiers are resolved in.

It's not that it's technically a problem to define it that way, and it has the "advantage" that textually merging augmentations in into the base declaration preserves resolution. At least if names introduced by later augmentations are also in scope, which they are.

What it loses is readability. Consider:

library some_lib;
part 'some_augment.dart';

bool _isOdd(int n) {
  print("Global is Odd!");
  return n.isOdd;
}

class C {
  bool isEven(int n) {
    if (n == 0) return true;
    return !_isOdd(n - 1);
  }
}

class D {
  bool isEven(int n) {
    if (n == 0) return true;
    return !_isOdd(n - 1);
  }
}

and

// "some_augment.dart":
part of  'some_lib.dart';

augment class C {
  bool _isOdd(int n) {
    print("Augment is Odd!");
    return n.isOdd;
  }
}
augment class D with Oddity {}
mixin Oddity {
  bool _isOdd(int n) {
    print("Mixin is Odd!");
    return n.isOdd;
  }
}

These two classes, C and D, look identical. They both refer to _isOdd.
To the casual reader, it looks like they are both referencing the global _isOdd.

However, with the currently specified behavior, the C _isOdd reference actually refers to the instance method added in augment class C, because that declaration is added to the lexical scope of every class C declaration. It becomes lexically in scope, even if it's not textually declared in the surrounding code.

The D declaration keeps referncing the global _isOdd, because while the augment class D adds an implementation of _isOdd, it doesn't add a declaration in the same namespace.

Having two textually identical declarations that mean different things in seemingly the same textual scope, is bad for readability.

I recommend that we do not combine the lexical scopes of declarations and augments.
That would mean that:

  • Both _isOdd references in C and D are resolved the same way, to the global _isOdd, which is the only one in the textual scope.
  • If they want to do this._isOdd(n-1), they have to write it out. Just as if they want to reference a static member. (We should still allow static and instance members with the same name, even in the same textual scope, but especially with augmentations where they can then be in different scopes.)
  • Lexical scoping doesn't change. It means exactly what it has always meant: Textual scope until reaching the library declaration scope.
  • But you can't just squash together multiple augmentations into one declaration and assume that name resolution keeps working. That's hardly surprising, we'd already have to change import references to be prefixed when augments can come from different parts with different imports. We just also have to change other unqualified references, to this.id or Type.id, or possibly something else if they refer to a further-out declaration than is now shadowed.
    • I don't really care about that, because we shouldn't define augmentations in terms of desugaring. We should assign semantics to the syntax the author actually wrote. Moving code around can be an implementation strategy, but should not be a specification method.
@lrhn lrhn added request Requests to resolve a particular developer problem static-metaprogramming Issues related to static metaprogramming labels Apr 30, 2024
@eernstg
Copy link
Member

eernstg commented May 1, 2024

We have talked about "re-binding errors" for a while (and I just added a description of this kind of error to the sketch of a proposal in #3741). I think this is exactly what we need in order to address the concerns that you're expressing.

In the example you're describing we would incur a re-binding error because the reference to _isOdd in C.isEven is proto-bound to the global declaration of _isOdd in some_lib, but it is bound to the instance member C._isOdd if we add the augmentation of C in some_augment.dart.

I think it's a fair claim that re-binding errors will filter out the programs whose bindings are confusing in this sense. It would be very interesting to scrutinize the concept of proto bindings and see if there are any compelling examples where it does not work in this manner. One thing to consider is, of course, whether the re-binding errors are too constraining, e.g., in the sense that they make it impossibly hard to write certain macros that ought to be within reach.

@lrhn
Copy link
Member Author

lrhn commented May 1, 2024

I worry a lot about the phrase "re-binding errors" because there should never be any re-binding.

A program is a static entity. It means what it means, each identifier referst to what it resolves to at the point where it is resolved.
A "re-binding" suggests that there is a program change, but there shouldn't be any program changes, the program we compile is the program the user gave us. There are no "proto bindings", a name refers to what it refers to, no more and no less.

If we need the phrase "re-binding" to describe our semantics, we have a problem. And possibly not a well-defined semantics.

What can be confusing is if a name doesn't refer to what it looks like it refers to. That's a readability problem (Readability == Code does what it looks like its doing).
We can decide to have behavior that is hard to read, but we shouldn't have behavior that is defined in terms of a different program than the one the author provided.

If we want to make a program transformation as part of compilation, it's up to us to ensure that it's a semantics preserving transformation.
If we define the semantics of something through desugaring (which I really think we shouldn't, it's always caused problems), then we need to figure out whether we do identifier resolution before or after the desugaring, but we will not do it twice.

(Any desugaring which moves code into a new context has to be extremely clear about when it happens in the compilation pipe-line. If it happens before type inference, it cannot use static types. If it happens after type inference, we must specify the static type of every expression (usually by making every type explicit, and not use any syntax that would normally trigger type inference).
If the desugaring happens during type inferece, where all the fun things happen anyway, we have to be clear about which sub-expressions have been type inferenced, and which have not.
Our problems so far have include desugaring bindings into function expressions, which broke when we added async functions, desugaring for/in into some code sequence which is a statement, seemingly untyped — possibly because the desugaring predates Dart 2.0, so supposedly the desugaring happens pre-type-inference. But the desugaring moves the iterable expression out of the context type of Iterable<E> that it gets from E e in, so it's not well-defined what the typing really is, and implementations have varied. And it's desugaring into statements, so it's not a valid desugaring of element for-in. The more we desugar, the less well-defined our semantics are, and the more fragile it is against later language changes.)

There not being any changes is not true for macros, but that is fundamentally different, because the pre-macro input is not a Dart program. Macros start with something which is (possibly, going on likely) not a Dart program, because there are unresolved identifiers or incomplete class declarations. We then provide some macro-specific introspection on that partial program, and allows it to add further syntactic declarations to the program in specific ways, and afterwards the result should be a Dart program using completely normal Dart semantics. We can talk about proto-bindings there, but those are different from the semantics of a Dart program, they are just our "best guess" about what the binding will be when the program is complete. And if we guess incorrectly, and the macro generated code invalidates those guesses, then ... well, we get to decide what to do. The macro tool does whatever the macro tool chooses to do, and the only requirement is that the resulting output is valid Dart. Everything else is a choice we can make, because we never promised correct Dart semantics for a non-Dart-program.

@eernstg
Copy link
Member

eernstg commented May 1, 2024

I worry a lot about the phrase "re-binding errors" because there should never be any re-binding.

The notion of "re-binding errors" that I describe in #3741 does not involve two equally valid results from name resolution, it is built on the notion of proto bindings (which is a new concept that applies to incomplete Dart source code artifacts) and normal Dart name binding.

So we're not "deciding what a name means" and then later "changing our minds" (I agree that we should never do that).

It is the normal Dart name binding step that provides the binding of a name that contributes so greatly to the determination of the dynamic semantics of the program. This step is performed on the final, merged library.

The proto bindings are a well-defined property of a special kind of incomplete software artifact, namely a library that has parts that have augmenting declarations, along with its tree of parts, or a part that has augmenting declarations, along with its parents and its children.

The proto bindings serve as a sanity check on the merged library, and they are only intended to improve on the code comprehensibility for a human reader of the code, they are never taken into account when it comes to the final meaning of the program.

The point is that a human reader who is looking at a library with augmentations or a part with augmentations can use all the normal habits of reading the code and understanding what it does, disregarding the declarations that may be added to any scope during merging, and conclude that a certain identifier refers to a certain declaration (and, hence, that this identifier has a specific meaning which is associated with that declaration). In the case where a name appears to be undefined (because it is introduced into some scope during merging), the human reader can simply skip the parts that rely on the meaning of this name, still concluding a large number of things about the code, ignoring augmentations.

The re-binding errors occur in the case where this approach is inconsistent with the result of merging. That is, when looking at a part (as if augmentations do not exist) we conclude one thing about a name, but the actual meaning of the name after merging is different.

I believe the re-binding errors are helpful for a human reader because they justify the kind of proto-binding based human reasoning which is described above.

A counterpoint could be that re-binding errors are too constraining, because they turn too many useful programs into compile-time errors. This is something that we'll need to explore in practice, and we would also need to think about useful idioms or programming patterns that will work when we encounter that kind of failure.

I think the alternative is impractical: There is no way we can modify the static analysis of Dart such that each library with parts with augmentations and each part whose library has augmentations can have their names resolved without taking any augmentations into account.

@lrhn
Copy link
Member Author

lrhn commented May 1, 2024

the notion of proto bindings (which is a new concept that applies to incomplete Dart source code artifacts) and normal Dart name binding.

That sounds more complicated than necessary, and more error prone. And it still has the underlying assumption that something changes relative to the source code that the user provided.
Otherwise we could just define the final binding directly.

If the proto-binding is more like a guess at "what a reasonable reader would assume this variable means from reading just this file", and we make it a problem if that doesn't match the actual binding, then I can see a value.

But I'd also suggest that we just make the code mean what the reasonable reader would assume, instead of shooting them in the foot, and then tell them why we shot them in the foot afterwards.

The re-binding errors occur in the case where this approach is inconsistent with the result of merging.

Introducing a "proto semantics" which are intended to match user expectations, then doing a transformation that isn't proto-semantics preserving, and making it an error if the transformation didn't preserve proto-semantic for this concrete program, ... prompts the question of why did we not just preserve the user's expectations instead?

The proto bindings are a well-defined property of a special kind of incomplete software artifact, namely a library that has parts that have augmenting declarations, along with its tree of parts, or a part that has augmenting declarations, along with its parents and its children.

I don't want to consider such a program as incomplete. It's the complete source that the user has provided. The totalitly of the files define the contents of the library. It's all there. The word "incomplete" is inaccurate.
For us to choose to look at individual files first, ignoring the rest of the world, and then complaining when that didn't give the correct result, seems like a problem that is solved by looking at the entire world.

If we have a problem specifying a "merging" which desugares augmentations into non-augmentations, and parts with imports into parts without imports, maybe the solution is not to add more complexity on top, but to not try to do that merging to begin with. Because it isn't necessary, and it seems to just be adding complications.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
request Requests to resolve a particular developer problem static-metaprogramming Issues related to static metaprogramming
Projects
None yet
Development

No branches or pull requests

2 participants