Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow recursive typedef #3714

Open
mmcdon20 opened this issue Apr 19, 2024 · 5 comments
Open

Allow recursive typedef #3714

mmcdon20 opened this issue Apr 19, 2024 · 5 comments
Labels
feature Proposed language feature that solves one or more problems

Comments

@mmcdon20
Copy link

Apologies if this is a duplicate, but I don't think there is an existing issue specifically on this topic.

Occasionally I run into situations where it would be nice if a typedef could refer to itself by name in its definition.

Often this takes the form of a simple data structure like one of the following:

typedef LinkedList<T> = (T head, LinkedList<T>? tail);
typedef BinaryTree<T> = ({BinaryTree<T>? left, BinaryTree<T>? right, T value});

The alternative is to write a class which is significantly more verbose:

class LinkedList<T> {
  final T head;
  final LinkedList<T>? tail;
  const LinkedList(this.head, this.tail);

  @override
  bool operator ==(Object other) =>
      other is LinkedList<T> && other.head == head && other.tail == tail;

  @override
  int get hashCode => Object.hash(head, tail);

  @override
  String toString() => 'LinkedList<$T>($head, $tail)';
}

class BinaryTree<T> {
  final BinaryTree<T>? left;
  final BinaryTree<T>? right;
  final T value;
  const BinaryTree({this.left, this.right, required this.value});

  @override
  bool operator ==(Object other) =>
      other is BinaryTree<T> &&
      other.left == left &&
      other.right == right &&
      other.value == value;

  @override
  int get hashCode => Object.hash(left, right, value);

  @override
  String toString() =>
      'BinaryTree<$T>(left: $left, right: $right, value: $value)';
}

Another example, I came across this Rosetta Code problem called Church numerals.

I attempted to translate the java solution to dart as follows:

typedef ChurchNum = dynamic Function(Function);

ChurchNum get zero => (f) => (x) => x;
ChurchNum next(ChurchNum n) => (f) => (x) => f(n(f)(x));
ChurchNum plus(ChurchNum a) => (b) => (f) => (x) => b(f)(a(f)(x));
ChurchNum pow(ChurchNum m) => (n) => m(n);
ChurchNum mult(ChurchNum a) => (b) => (f) => (x) => b(a(f))(x);
ChurchNum toChurchNum(int n) => n <= 0 ? zero : next(toChurchNum(n - 1));

int toInt(ChurchNum c) {
  int counter = 0;
  ChurchNum funCounter(ChurchNum f) {
    counter++;
    return f;
  }

  plus(zero)(c)(funCounter)((x) => x);
  return counter;
}

void main() {
  ChurchNum three = next(next(next(zero)));
  ChurchNum four = next(next(next(next(zero))));

  print('3+4=${toInt(plus(three)(four))}'); // prints 7
  print('4+3=${toInt(plus(four)(three))}'); // prints 7

  print('3*4=${toInt(mult(three)(four))}'); // prints 12
  print('4*3=${toInt(mult(four)(three))}'); // prints 12

  // exponentiation.  note the reversed order!
  print('3^4=${toInt(pow(four)(three))}'); // prints 81
  print('4^3=${toInt(pow(three)(four))}'); // prints 64

  print('  8=${toInt(toChurchNum(8))}'); // prints 8
}

The above solution works, but is not particularly type safe. It would be better if we could define ChurchNum as follows:

typedef ChurchNum = ChurchNum Function(ChurchNum);

You can of course implement a type safe solution by defining ChurchNum as a class that accepts a ChurchNum Function(ChurchNum) in the constructor, but then you have unnecessary wrapper objects, and a more verbose solution.


A third situation where this could be useful is if union types were also added to the language.

With both features you could define a Json for example:

typedef Json = Map<String, Json> | List<Json> | String | num | bool | Null;

That said, I think this feature would be useful even on its own.

@mmcdon20 mmcdon20 added the feature Proposed language feature that solves one or more problems label Apr 19, 2024
@lrhn
Copy link
Member

lrhn commented Apr 23, 2024

The reason that this is not allowed is that type aliases are not nominative, but structural.
They do not exist as types, only as ways to express types.

An alias like

typedef LinkedList<T> = (T head, LinkedList<T>? tail);

doesn't introduce a new type, only a name for an existing type.
So what is the type of the second field of that record type?
Which actual argument can I call a ChurchNum with?

We don't know. This definition doesn't define a type, but a recursive constraint on a type. We have to define how to solve that equation.

The obvious answer is to give the fixed point. The only problem is that that is an infinite type, and we need to represent it in finite memory, and time, so we can't just unfold it.

The other option is to introduce fixed point operators in the type model.
That means introducing a structural type construction like:

𝕗𝕚𝕩 T = (T head, LinkedList<T>? tail);

That is defined to be the (least?) fixed point of that equation. Which is an infinite, but regularly recurring, type.

We have to derive that type from a declaration like:

typedef F<X, Y> = Map<X, F<String, List<Y>>>;

by ... probably being clever and seeing which parameters occur recursively.
This one could be recognized as equvalent to a directly specified fixed-point:

typedef F<X, Y> = Map<X, 𝕗𝕚𝕩 $F<Y> = Map<String, List<$F<List<Y>>>>>;

Maybe.
(Brings up the point that there are multiple ways to define the same type. Fx.
𝕗𝕚𝕩 F = List<F> and List<𝕗𝕚𝕩 F = List<F>> are equivalent, which can be seen by one unfolding.
Are:

typedef B = List<C>;
typedef C = List<B>;

and equivalent to

typedef L = List<L>;

? Sure. But how easy is it do detect it?)

The defintion would have to not be directly recursive. A 𝕗𝕚𝕩 T<X> = T<List<X>> is not allowed, since the fixed point of that is empty. However 𝕗𝕚𝕩 T<X> = List<T<X>> would be fine, it adds an outer type construct, then only recurses inside, something nominative types can sometimes do as well (abstract class C<T> implements List<C<T>> {} is allowed).

Then we need to define what every function we have on types means for this type: subtyping, normalization, least upper bound, and assig an interface to the type (which the "no direct recursion" helps with, since that will likely give an outermost type).

And it's a real type, which can exist at runtime as the reified your argument of, say, a list, so we need to do subtyping tests at runtime too.

And we have to figure out what it means if someone combines more than one of those with each other.

That's a possible approach. It might be undecidable. I'm not enough of a type theorist to know, but it looks too much like lambda calculus to not make me fear that it's Turing complete.
It's definitely complicated at a very high level.

@ykmnkmi
Copy link

ykmnkmi commented Apr 23, 2024

My case:

typedef State = State? Function(Parser parser);

State state = fragment;

while (index < length) {
  state = state(parser) ?? fragment;
}

@lrhn
Copy link
Member

lrhn commented Apr 24, 2024

The recursive types that actually work are usually the ones including functions, nullability or collections.
Functions because parameter types are contravariant, and you don't have to call it, and it doesn't have to return its return type (it can throw).
Nullability because for general union types, it works when one of the parts is a non-recursive type. Nullable types is that kind of union type (and FutureOr isn't).
Collections because they can be empty.

That's all examples of not having to actually produce a value of a component type of the recursive type, so the recursion can end for an actual value.

Saying typedef TurtlesAllTheWayDown = List<TurtlesAllTheWayDown>; isn't particularly useful, the only solutions to that equation are Never, List<Never>, List<List<Never>>, etc., so you never get to actually have a useful value in there. But it's not an empty type because <Never>[] works.

Saying typedef Void = ({value: Void}); just defines an empty type (no values have the type). The RHS has an unconditional covariant occurrence of the recursive type, so creating an instance of the type requires having an instance of the type.
The definition typedef Void = Void; is just as empty, if we allow that. If we see a type definition as a set of acceptable types, we'd create that set inductively starting with just {Never}, and then the solution here is just Never.

So, there are some uses of recursive types that make sense.
Not as many as one would hope, especially without general union types, but thaving one that involves a function isn't surprising.

You can have your recursive type today, if you're willing to put a class in the middle.
Say:

class State {
  final _State? Function(Parser) _callback;
  State(this._callback);
  State? call(Parser parser) => _callback(parser);
}
State fragment = State(_fragmentCallback);
// ...
  State state = fragment;
  while (index < length) {
     state = state(parser) ?? fragment;
  }

Nominative types (class types, really, because extension types have problems with self-references too) is the officially recommended way to have self-referential types.

It works because the compiler doesn't need to unfold the recursion until it reaches an object of that type, and the type system ensures that the value will satisfy the type. The object itself knows its precise type (instance of that class, type arguments are this and that), which tells everything that is needed about the type. There is no need to check component values, it's entirely a type comparison operation.

For structural types, there is no inherent type to compare against. Well, not for records at least, functions do have an inherent function type. And union types have as much type as the actual value. So it's records.
Still, the compiler needs to be able to check component values against the unfolded type, so it needs to unfold the recursive type. Then the problem becomes stopping again. And if any of the components have a recursive type, we need to match them up, somehow.
(At least instances of structural types are usually immutable, so we can't create structurally typed object cycles, which would require a recursive structural type to represent the actual runtime type.)

Recursive structural types are not necessarily impossible. Heck, the Trine language (the programming language used in CS 101 when I studied) had recusive structural typing, and was able to recognize that List<𝕗𝕚𝕩 T=List<T>> and 𝕗𝕚𝕩 T=List<T> was the same type, and the same as 𝕗𝕚𝕩 T = List<List<T>>, after doing some unwinding and cycle normalization. (And, in some cases, an exponentially large error message when two types weren't equal, which suggests that it wasn't always an effective algorithm. Or maybe that was the generic modules, it's been a while.)

Subtyping of recursive types is well studied.
It's also very complicated for the user to reason about, and one can think of object oriented type systems as a way to dodge the issue by encapsulating the recursive references. Adding proper recursive types back into an object oriented language will raise the complexity again. So not something that is a slam dunk, even if possible and expressive, it might just be too complicated for the common user, and the workarounds are "good enough".

(We did manage to allow class C<T extends C<T>> {} which has an infinitely unfolding bound. And our instantiate-to-bounds algorithm gives up precisely for that kind of F-bounded polymorphism, because we don't have recursive types. With them, it could solve the instantiate-to-bounds as C<𝕗𝕚𝕩 T = C<T>>. But we can also just keep doing what we're doing today, and use a super-bounded type as solution. It's not precise, but it's also not wrong.)

@mmcdon20
Copy link
Author

mmcdon20 commented May 7, 2024

I was curious if there were any other languages that support a recursive typedef like in this proposal.

I found that most languages which support a typedef do not support exactly these recursive definitions (although in some cases the language may have some alternative solution).

Examples that don't support recursive typedef

Scala

// illegal cyclic type reference: alias ... of type LinkedList refers back to the type itself
type LinkedList[T] = (T, Option[LinkedList[T]])
// illegal cyclic type reference: alias ... of type Church refers back to the type itself
type Church = Church => Church

Kotlin

// Recursive type alias in expansion: LinkedList
typealias LinkedList<T> = Pair<T, LinkedList<T>?>
// Recursive type alias in expansion: Church
typealias Church = (Church) -> Church

Haskell

-- Cycle in type synonym declarations:
-- Main.hs:8:1-45: type LinkedList a = (a, Maybe (LinkedList a))
type LinkedList a = (a, Maybe (LinkedList a))
-- Cycle in type synonym declarations:
-- Main.hs:7:1-30: type Church = Church -> Church
type Church = Church -> Church

Swift

// type alias 'LinkedList' references itself
typealias LinkedList<T> = (T, LinkedList<T>?)
// type alias 'Church' references itself
typealias Church = (Church) -> Church

Go

// generic type cannot be alias
// invalid recursive type: LinkedList refers to itself
type LinkedList[T interface{}] = struct {
	head T
	tail *LinkedList[T]
}
// invalid recursive type: Church refers to itself
type Church = func(Church) Church

F#

// This type definition involves an immediate cyclic reference through an abbreviation
type LinkedList<'t> = 't * Option<LinkedList<'t>>
// This type definition involves an immediate cyclic reference through an abbreviation
type Church = Church -> Church

Rust

// error[E0391]: cycle detected when expanding type alias `LinkedList`
type LinkedList<T> = (T, Option<LinkedList<T>>);
// error[E0391]: cycle detected when expanding type alias `Church`
type Church = fn(Church) -> Church;

C++

// error: use of undeclared identifier 'LinkedList'
template<class T>
using LinkedList = std::tuple<T, LinkedList<T>>;
// error: unknown type name 'Church'
using Church = Church(*)(Church);

OCaml

(* Error: The type abbreviation linkedlist is cyclic *)
type 'a linkedlist = 'a * 'a linkedlist option;;
(* Error: The type abbreviation church is cyclic *)
type church = church -> church;;

Elm

-- This type alias is recursive, forming an infinite type!
type alias LinkedList a = (a, Maybe (LinkedList a))
-- This type alias is recursive, forming an infinite type!
type alias Church = Church -> Church

Examples that do support recursive typedef

Typescript

However, I found that Typescript DOES allow exactly this sort of recursive typedef.

The following is a valid Typescript program:

type LinkedList<T> = [T, LinkedList<T>?];
type Church = (c: Church) => Church;

The following definitions are also allowed in Typescript:

type TurtlesAllTheWayDown = Array<TurtlesAllTheWayDown>;
type Void = {value: Void};

However this definition is not:

type Void = Void; // Type alias 'Void' circularly references itself.

Haxe

The Haxe language appears to also support the same rules as Typescript for it's typedef feature.

typedef LinkedList<T> = {head: T, tail: Null<LinkedList<T>>};
typedef Church = Church -> Church;

These definitions are also allowed:

typedef TurtlesAllTheWayDown = Array<TurtlesAllTheWayDown>;
typedef Void_ = {value: Void_};

However this definition is not:

typedef Void_ = Void_; // Recursive typedef is not allowed

Odin

The following definitions are allowed in Odin. Although I will point out that the Church definition is not very useful in Odin due to the fact that proc expressions do not close over any values that are in scope, so with a nested proc you cannot refer to an argument in an outer proc from an inner proc, but the type checking does work.

LinkedList :: struct($T: typeid) { head: T, tail: ^LinkedList(int) }
Church :: proc(Church) -> Church

For TurtlesAllTheWayDown, technically the type alias appears to be allowed by the compiler, however when you try to create an instance of TurtlesAllTheWayDown you get a Segmentation fault.

TurtlesAllTheWayDown :: []TurtlesAllTheWayDown 

turtles : TurtlesAllTheWayDown = {} // Segmentation fault (core dumped) (exit status 139)

The following is allowed:

Void :: struct {value: ^Void}

But these definitions are not:

Void :: struct {value: Void} // Illegal declaration cycle of `Void`
Void :: Void // Invalid declaration value 'Void'

Puppet

This language is not free or open source, so I didn't actually try this one out, but it's clear from the documentation that recursive typedef is allowed in this language: https://www.puppet.com/docs/puppet/5.5/lang_type_aliases.html#creating-recursive-types


Given that there does seem to be a handful of languages that support this feature, it seems plausible that it may at least be technically possible for dart to support it.

If anyone is interested in experimenting with recursive typedef I think Haxe and Typescript are probably the best languages to try.

@lrhn
Copy link
Member

lrhn commented May 9, 2024

Typescript is probably being clever enough to recognize that the recursion goes through a nullable type or a function type. The reason that matters is that you can create a value of the recursive type without already having a value of the type. (Or it's being naive enough to not recognize that there is a problem, it just never tries to unfold the type until it needs to. It's easier in a language without type soundness.)

type LinkedList<T> = [T, LinkedList<T>?];                // [1, null] is a LinkedList<int>
type Church = (c: Church) => Church;                     // function(_) { throw "Nope"; } is a Church
type TurtlesAllTheWayDown = Array<TurtlesAllTheWayDown>; // [] is a TurtlesAllTheWayDown
type Void = {value: Void};                               // {get value() { throw "nope"; }} is a Void

A type like type Moar<T> = (T, Moar<T>); requires you to have a value of the type before you can create a value of the type, which means the least fixed point of the type is the empty type.

Doing typedef Moar<T> = (T, Moar<T>?); means that you can create an object of the type just by having a null value, and for typedef RecMoar<T> = RecMoar<T> Function(RecMoar<T>);, a value of (Object? _) => throw "nope" is an instance of the type, because you can use Never as the concrete expression of the type without needing to evaluate that expression.

The distinction between nominal type and structural type can also be seen in Haskell.
The type Void = Void is not allowed, but data Void = VOID(Void) is. It's a named type with a constructor, like a class in Dart. It's still empty because it's impossible to call the constructor without an existing value of the type it constructs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Proposed language feature that solves one or more problems
Projects
None yet
Development

No branches or pull requests

3 participants