Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

An outline of part based augmentations #3741

Closed
eernstg opened this issue Apr 30, 2024 · 5 comments
Closed

An outline of part based augmentations #3741

eernstg opened this issue Apr 30, 2024 · 5 comments
Labels

Comments

@eernstg
Copy link
Member

eernstg commented Apr 30, 2024

We have considered changing library augmentations such that they would become parts. In other words, we would reuse the existing part syntax ('part' <uri> ';' and 'part' 'of' <uri> ';') and allow each part to have its own imports, and allow augmenting declarations to occur anywhere (in parts or libraries). Here is an outline of how that could work.

Let's use the word module to denote an entity which is either a library or a part.

Example

graph BT;
  B1["part1.dart"] <--> A["main.dart"]
  B2["part2.dart"] <--> A
  C11["partpart1.dart"] <--> B1
  C12["partpart2.dart"] <--> B1
  C2["partpart3.dart"] <--> B2 

This example is mentioned a few times below, in order to make some rules or considerations concrete.

Preliminaries

We would presumably have to preserve the existing semantics of parts because anything else would be a massively breaking change, for any manually written code using parts, but also for various code generators.

We could make a distinction between a part of a library and a part of a part, but I'll assume here that we try to treat part-parts the same as parts.

This implies that every name declared in the top-level scope of any of these modules is in the library scope of the library at the top. This implies in turn that the set of modules that constitute a tree with a library at the top and some parts below it must declare distinct sets of top-level names. If there is a name clash between any two top-level declarations in any two nodes in the tree then it will also be a name clash in the library, and hence there will be a compile-time error.

Each node in the tree will have a top-level scope where all names in the top-level scope of all parents (including imported names) are available, plus all names in the top-level scope of all nodes in the tree under this node.

For example, the top-level scope of part1.dart would contain all names declared at the top level of main.dart including imported names and import prefixes, plus all names declared at the top level of part1.dart including names imported by part1.dart, plus all names declared at the top level of partpart1.dart and partpart2.dart (but nothing from the imports of these child modules).

In order to preserve readability, each augmenting declaration must augment a declaration (original or augmenting) of the same name which occurs in the same module, textually earlier, or in a parent (direct or indirect) in the module tree.

This implies, for example, that it is a compile-time error if an augmenting top-level declaration named n occurs in both part1.dart and part2.dart. If the original declaration occurs in a module M (which can be the library or a part) then every augmenting top-level declaration with the same name must occur on a path from M downwards in the tree.

Note that this path restriction implies that the ordering of augmentations is independent of the tree traversal ordering, as long as it is a pre-order traversal. As an aside, this means that we can sort part directives without disrupting the semantics.

Merging of augmentations

With these preliminaries in place we can discuss the merging step itself.

The augmentation feature specification mentions a merging process which will produce a single library from a module tree as described above. It has been discussed, e.g., in #3643. In any case, some details are still unresolved.

In order to simplify the following, we introduce a constraint on import prefixes: It is a compile-time error if a module contains an import with a prefix p, and p is also the name of a top-level declaration in the module tree.

This may be helpful during implementation, but the crucial point is that this eliminates a source of ambiguity for the human reader of the code. The assumption is that it is a good trade-off to simplify code comprehension slightly by having this constraint, in return for the inconvenience of having to choose unique names also for import prefixes.

The merging step then proceeds as follows:

In each module in the tree, each identifier expression is marked as originating in that module. The module tree is then flattened: For each module Mj in the tree, depth-first, append the top-level declarations from each immediate part to Mj in the textual order of the part directives. This yields a single library containing all the code from the entire module tree. Call it M0.

Add the import and export directives from each part to M0. Each import directive is modified to have a fresh name as prefix, unless it already has a prefix.

Augmenting declarations of type-introducing declarations (classes, mixins, etc.) are eliminated by appending each augmenting declaration to the original declaration, in source order. This step is repeated recursively for members of each declaration that has members with augmenting declarations.

For function augmentations, the last augmenting declaration retains its name. The previous declaration is renamed to a fresh private name, and augmented is replaced by that name. Similarly for variable declarations where augmented can occur in an augmenting declaration's initializing expression. There will be more details about this.

At this time, M0 contains pre-augmentation code: All augmenting declarations are gone.

Next, name resolution occurs, following the rules of current Dart insofar as the given identifier expression denotes declaration in M0 (at the top level, or in some nested scope).

If this is not the case then the name is imported, or undefined. Next, let Mj be the module that the given identifier expression id originated from. Then transform id to freshName.id if id is in the imported namespace of the import with prefix freshName and that import originated from Mj and id is imported via that prefix. Otherwise repeat the lookup in the same way for the parent of Mj, recursively, until the library is reached.

This implies that every name that originated from Mj is resolved according to the standard Dart scope rules in the merged library M0, but if it is imported then it is taken from the imports into Mj and its parents, recursively, in that order.

For example, if main.dart and part1.dart both import a declaration named foo and partpart2.dart contains foo as an identifier expression then it will be resolved to the declaration which was imported into part1.dart. An import into partpart1.dart with the same name would be ignored, and so would an import of the same name into part2.dart or partpart3.dart.

Finally, we introduce some compile-time errors associated with augmentation merging, to improve on the resulting code comprehensibility: It is a compile-time error if an identifier expression id in a module M is bound to a declaration D1 using the merging process described above, but it is proto-bound to a different declaration D2 when viewed in M in context of the module tree, not in context of the final, merged library.

Note that proto binding is a new concept. We cannot just rely on regular Dart name resolution because a module can contain identifiers whose declaration is provided by a different module which is neither a child nor a parent.

A proto binding is the result of a variant of lexical lookup whose outcome can be a declaration or nothing. In the case where the outcome is nothing, the given identifier is resolved as unresolved (which is not an error, and also does not imply that the identifier id is transformed into this.id). It is applied to identifier expressions in modules that take part in an augmentation merging process. For a given module Mj, the proto-binding of an identifier is performed with respect to the lexically enclosing scopes (where an augment class declaration is treated the same as a class declaration, and similarly for other declaration kinds), where the top-level scope contains all declarations from all parents of Mj in the module tree as well as all children, recursively.

Proto bindings can be computed for selectors that are identifiers or operators as well (so we can proto-bind y in x.y and + in x + y), in the case where the receiver has been proto-bound to a declaration.

In summary, the augmentation processing step consists of a proto-binding step where identifiers (identifier expressions as well as selectors, including operators) are resolved "as far as possible", followed by a merging step, followed by a check that the final name resolution does not give rise to bindings that are different from the ones that were produced by proto binding (no error occurs when the proto binding is unresolved, no matter how the name is resolved after merging).

The point is that the tree of modules is more comprehensible if it is possible to trust the lookups that we can see before merging. The bad case that we're avoiding is when a name seems, locally in a module, to resolve to one specific declaration, but it actually resolves to a completely different declaration after merging.

[Edit May 1st: Added headers, clarified the structure, and added a few paragraphs about re-binding errors.]

@lrhn
Copy link
Member

lrhn commented Apr 30, 2024

I still maintain that we don't need a flattening, and can (and should!) define the semantics on the actual syntax that the user provides, without rewriting it first.

Flattening can be an implementation choice. It's OK to make sure that it's possible, but flattening at the kernel level or below shouldn't need to worry about name resolution.

@eernstg
Copy link
Member Author

eernstg commented May 1, 2024

[..we..] can (and should!) define the semantics on the actual syntax that the user provides, without rewriting it first.

That's a noble goal, but I do not think it's realistic. It's simply not manageable if we do not allow ourselves to say that "a class has a declaration", and instead insist that we must say "assume that the class has the declaration D and augmentations A1 .. Ak" and similarly for every instance member of that class (oh, and static members, too, by the way, and constructors).

It seems obvious to me that we must talk about the result of merging all augmentations, yielding a library in Dart-without-augmentations. You may insist that this is a semantic property, and we're never talking about syntax that differs from the syntax that the developers wrote, but the outcome is the same: We must eliminate augmentation in an early phase of the specification (and, presumably, implementation) of the language, such that we can proceed to do "normal Dart stuff", because we already have a pretty good idea about how to do that. I don't think it's going to help anybody to stick to the raw syntax of the augmentations for any longer than we absolutely must.

@lrhn
Copy link
Member

lrhn commented May 1, 2024

I think it is possible to define out way out of the "declaration = stack of declarations" ambiguity.
It may take some work, but I don't think it requires a complete rewrite.

We will have to distinguish two concepts:

  • Syntactic declaration: A source clause, as it occurs in the input.
  • Semantic declaration: The combined meaning of a stack of syntactic declarations for the same semantic entity.

In the current specification, those two are the same. When we ask "does the declaration of C have a declared superclass", we look at the syntactic declaration and checks if it has an extends clause.

In the new distinguishing approach, a name does not denote a syntactic declaration, but all the syntactic declarations with that name (which must be one non-augmentation declaration and a number of augmentation declarations, in augmentation application order, which have been checked to be compatible augmentations of the same kind of declaration, otherwise we'd have had an error earlier).

Then we have to define, for every query we make today against a declaration, if it's a query on the semantic declaration, how the result is derived from the stack of syntactic declarations.

For example:

  • A stack of syntactic class declarations D has a declared (semantic) superclass C if
    • The stack has the form top::rest, and either:
      • The syntactic top declaration has an extends clause with a type clause that denotes a (semantic) class declaration C, or
      • The syntactic top declaration has no extends clause, rest is not empty, and rest has a declared superclass C.
  • A stack of syntactic class declarations has the following sequence of declared interfaces (semantic class declarations denoted by implements clause entries):
    • If the stack is empty, then the empty sequence of class declarations.
    • If the stack is top::rest, then the result is the declared interfaces of rest followed by:
      • No further declarations, if top has no implements clause.
      • The declarations denoted by each type clause, in source order, of top's implements clause, if it has one,

Generally, define a property on a stack of syntactic declarations, usually inductively.
We can then use that property directly on a semantic declaration, because a semantic declaration is a stack of syntactic declarations.

We use this to define the necessary properties of a semantic declartion, just like we do today, and then define the overlying semantics in terms of those properties, rather than direct syntactic declaration inspection.
Just like we do today. (Or where we don't, it shouldn't be a big change to make, and t will help the semantic definitions to raise their abstraction level above the physical syntax).

We still have to define the rules for which stacks of syntactic declaratons are allowed, those that we can give a consistent meaning to, and we likely have to do multiple validation passes to ensure that something that seems to be provisionally valid before we even have a type hierarchy, is also valid when we have a type hierarchy, and types, and type inference.

@lrhn
Copy link
Member

lrhn commented May 2, 2024

I generally agree except for the merging, and some of the scoping.

The scoping I suggest is:

  • A library is defined by all its part files.
  • The top-level syntactic declarations of every part file become the declaration scope of the library, which occurs in each file's scope chain, below the imports.
    • That is, all library member declarations are global to the library.
    • There should be one declaration-scope entry per non-augment declarations, with augment declarations being stacked on top of those in augment application order.
    • It's a compile-time error to have multiple (non-augment) declarations with the same name (or same base name, unless one is a getter and the other a setter, with mutable variable declarations counting as both a getter and a setter). The usual.
    • It's a compile-time error to have an incompatible augment declaration on top of a prior declaration stack, or an augment declaration with no corresponding non-augment declaration.
    • And let's keep the "path restriction" on augment placement too, so an augment declaration must augment declarations that are all either prior in the same file, or in transitive parent files.
  • Each Dart file (library or part) has a combined import scope, which is a scope chain defined as:
    • Start with the "inherited" combined import scope of the parent file, if any (that is, if not the library file).
    • Extend that with the un-prefixed import scope defined by the un-prefixed imports of the current file.
      • So all the declarations in the export scopes of the imported libraries of unprefixed import declarations, which are not hidden by show/hide clauses. Conflicting names are conflicted as usual, so it's an error to refer to them.
      • These names shadow any inherited names, as usual for extending a scope.
    • Extend with the prefix-scope, which contains names for all prefixes of import directives in the current file.
      • For each import prefix, start with the import prefix scope of the most recent transitive parent file which
        declared a prefix with that name, if any, or no/an empty scope if not.
      • Extend that scope with the import scope of the imports of the current file that uses this prefix.
        (Again, the declarations of the export scopes of the import directives with that prefix, which are not
        hidden by a hide or show clause, and with conflicts being conflicted names.
  • This combined import scope is the top of the lexical scope of the file.
  • Extend that with a library declaration scope, which holds all the declarations of the current library.
    • It's a compile-time error if an import prefix has the same base name as a declaration in the library declaration scope.
    • We're not putting import prefixes and top-level declarations into the same scope, because we want part files
      to be able to inherit and shadow import prefixes with their own imports or import prefixes,
      but they cannot shadow library member declarations. We make it an error to have the same name,
      because that makes the prefix unreachable, its name is shadowed by the library member in every available
      scope.
  • The result of that is the top-level scope of the library, which declarations are declared in. It contains all inherited imports up to the library file, with part file imports possibly shadowing inherited imports, and then all the library declarations.

Further, for nested scopes, I suggested in #3738 that the lexical scope for syntactic "scope-bearing declaration" declation (a syntactic class, mixin, enum, extension, or extension-type declaration, with or without augment) should not contain the names of members of that declaration which were not declared in that syntactic declaration.
The augmentation specification says that all members declared in all the syntactic declarations with the same name, are in the lexical scope inside the declaration. See #7378 for why I think that's too confusing.

If we use only the textual scope here, then the names in the lexical scope of a name to resolve are, in order:

  • The ones declared textually in the same method or member, its parameters and its type parameters.
  • The members declared textually in the same surrounding scope-bearing declaration, and its type parameters.
  • All library member declarations of the current library.
  • Any import prefix of the current file.
  • Any name imported by an unprefixed import in the current file.
  • Then recurse the previous two items for each parent file up to the library file.

A name lookup will search those scopes until it finds the name (or same base name), then decide that that is what the name refers to.

This design allows

  • A part file to inherit and use all imports and import prefixes of the parent file.
  • A part file to shadow all inherited imports and import prefixes with its own imports.
    • So if a part file makes sure to import all its dependencies, then it can ignore the inherited imports from the parent file. The inherited names cannot cause a conflict or problem.
  • Library member declarations are still global.
  • All existing libraries and parts should keep working.

There is no "merging strategy" here, because I don't think we should have one.
We should define thes emantics of the program that the user supplied without any rewriting.
An identifier is resolved in the file where it's declared, in the lexical scope of the position it occurs.
We may say, informally, that a language feature works "as if" it was desugared to a simpler language in some way, but that should be a derived propery of the specification, not the specification itself. (It can even be a goal for the specification to allow a desugaring, but we should not let that force us into designing a worse feature than otherwise possible.)

@eernstg
Copy link
Member Author

eernstg commented May 10, 2024

Closing: The language team prefers to talk about the semantics of augmentation in terms of a different model that does not rely on any operation which can be considered to be "moving code around". I like that perspective better, too, by the way.

Note also that the don't-move-any-code approach has the same behavior as the model described here, if the model described here is modified slightly: It should then not reduce a sequence of one original declaration and one or more augmentations to a single declaration. They just stay separate. This yields a slightly different binding environment for each identifier in an augmented declaration, because names declared in distinct enclosing declarations (augmenting or not) are not in the scope of each other. For example, we'll need to use this.foo() rather than foo() in order to call an instance method which is declared in a different augmenting declaration, and similarly for C.foo() vs. foo() when foo is a static method and the enclosing declaration has the name C.

@eernstg eernstg closed this as completed May 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants